Theme: smartblue
Nodejs Chapter 37: MySQL Subqueries and Table Joins
- In all previous examples, we performed create, read, update, delete operations on a single table, but in normal development, we will never define all fields in a single table.
- What difference is there between putting all fields in one table versus putting all your code in one file?
- We learned the concept of modularity back in Chapter 9, and this idea can also be applied to database design. We therefore categorize tables by the different properties and functions of their fields, resulting in a clearer structure
- However, in frontend code, we can connect individual files into a complete project through the two import/export specifications of ESM or CommonJS.
- So how do we connect multiple MySQL tables together?
- And that is exactly the topic we will cover today: Subqueries and Table Joins
Subqueries
What is a Subquery
In MySQL, a subquery (Subquery) is a query nested inside another SQL query. Subqueries can be used in a variety of SQL statements, including SELECT, INSERT, UPDATE, and DELETE statements, as well as in FROM, WHERE, and HAVING clauses. Subqueries allow us to execute one query inside another, giving SQL queries greater flexibility and the ability to handle more complex logic.
- Simply put, as we mentioned earlier, subqueries establish
one-wayrelationships between tables.- This is just like import/export in frontend modularity: a child table nested in the main table cannot reverse query the content of the main table, because our goal is to have the child table serve the main table (PS: the concepts of main table and child table are metaphors I created for easier explanation, MySQL itself does not have these concepts), and the subquery is used by the main query
- A subquery is like an imported module: it provides data or functionality for the main query (the module that uses it), but the main query cannot directly manipulate the internals of the subquery, and the data processing inside the subquery is invisible to the main query – it is only responsible for providing data.
- And as for why use subqueries, we explained this in the introduction at the beginning: it implements the idea of modularity, and also brings benefits such as better readability, easier understanding, and more convenient maintenance.
Subquery Example
- Let's first clarify our approach (two steps):
- First, we need two tables, corresponding to the main query (main table) and the subquery (child table)
- Establish a subquery association between these two tables
Creating the Tables
- First, let's start with step one: we will create a user table as the main table, and a child table that records users' login activities.
- The
userstable is the main table, which stores the user's basic information; thelogin_activitiestable is the child table, which is associated with theidof theuserstable via a foreign key, and records the user's login activities
- The
Main Query Table
-- Create main table `users` CREATE TABLE `users` ( `id` int NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, `email` varchar(255), PRIMARY KEY (`id`) );
– Insert data intousers
INSERT INTOusers(name,
VALUES
(‘迷你余’, ‘mini@gmail.com’),
(‘小余’, ‘small@gmail.com’),
(‘中余’, ‘medium@gmail.com’),
(‘大余’, ‘large@gmail.com’),
(‘超大余’, ‘xlarge@gmail.com’);
Subquery Table
- When initializing the data here, we query the
primary key idfrom the user table (main query table), and fill it into thelogin_activitiestable as theforeign key id.- The purpose of this is to establish the association between the two tables for the next steps
- This bulk operation approach also improves insertion efficiency
-- Create child table `login_activities` CREATE TABLE `login_activities` ( `activity_id` int NOT NULL AUTO_INCREMENT, `user_id` int NOT NULL, `login_time` timestamp DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`activity_id`), FOREIGN KEY (`user_id`) REFERENCES `users`(`id`) );
– Insert data intologin_activities
– Assume that every user has logged in at least once
INSERT INTOlogin_activities(user_id)
SELECTidFROMusers;
Establishing the Query Association
- Here we need to master one concept:
- Parentheses in Subqueries: In SQL, subqueries usually need to be enclosed in parentheses to distinguish the boundary between the subquery and the main query
- Suppose we want to query the names of all users who have login activity (this is a good approach for filtering active users or removing inactive followers).
- In the syntax below, we use the
IN keyword,INcan replace multipleORconditions. This means that only one condition inside the parentheses ofIN(xxx,xxx,...)needs to be satisfied. It is usually used together withWHEREfor filtering.
- In the syntax below, we use the
SELECT `name` FROM `users`
WHERE `id` IN (
SELECT `user_id` FROM `login_activities`
);
-
We need to look at this in two parts:
- The first part is the main query, which retrieves the name data
SELECT `name` FROM `users`-
The second part is the core functionality of this example: we want to query users who have logged in at least once after May 1st this year
- At this point, the first part gets us all usernames
- Through the subquery (child table), we get the list of usernames that logged in on or after this year's May Day holiday
SELECT `name` FROM `users` WHERE `id` IN ( SELECT `user_id` FROM `login_activities` WHERE `login_time` >= '2024-05-01' );
-
From the syntax above, I think everyone should understand: a subquery is not actually a specific syntactic rule, but rather an expression of an idea.
- We split the subquery into two steps, but when viewed individually, each step is a normal, ordinary operation (querying names, querying login times)
- If we had put everything in a single table in the first place, we would not need this concept at all. It is the many benefits brought by the clear modular structure that promote the functional classification of tables. The concept of subqueries was developed based on this ideological foundation, which is why we say it is an expression of an idea.
- In terms of operation, it is indeed a very ordinary approach; what is valuable is the idea behind it – when learning, you must understand the design philosophy behind it.
-
We cannot say: "I use subqueries because that is the only way to get the functionality I want." This understanding is incorrect, because the fact that you need this approach is already based on having separated the tables, which means you have already accepted the benefits of structured thinking (improving the flexibility and efficiency of database queries), and subqueries are just a natural solution and extension of this way of thinking.
Join Queries
- There are many application scenarios in daily development. If you want to combine content from two tables, simply using
subqueriesalone is not enough.- Let's take the previous subquery example: we got the names of active users after May Day through a subquery, but we cannot see their specific most recent login time, right?
- If we want to see both pieces of information (
username,login time) in a single table, that is, merge the content of two tables, subqueries alone cannot do this, because theusernameandlogin timedata are not in the same table – they come from the main query table and the child table respectively.
- This is when
join queriescome into play, they can solve this problem. Join queries are divided intoinner joins,outer joins, andcross joins. Let's take a look.
Inner Joins
What is an Inner Join
An inner join returns matching rows from both tables. Only rows that have matching values in the joined columns will appear in the query result.
How to Use Inner Joins
- The simple introduction above may still be a bit abstract, so let's see how it works.
SELECT * FROM `users`,`login_activities`
- By initially querying both tables at the same time, we can see that all the content from both tables is combined.
- But something seems wrong: there are suddenly 25 total rows of data, but we should only have 5 rows.
- This is because we did not add a WHERE restriction, so the data from the two tables are not matched one-to-one, instead they are combined in a 5×5 Cartesian product, generating 25 different rows of data.
- An inner join is based on this foundation, and uses
primary and foreign keysin the WHERE clause to achieve a one-to-one matching of data.- In terms of operation, it is still very simple: you just compare the primary and foreign keys. But as we said before, the idea is what matters.
# Match the primary key id of users to the foreign key id of login_activities
SELECT * FROM `users`,`login_activities` WHERE `users`.`id` = `login_activities`.`user_id`
- Everything looks good so far. Even if there is a lot of content, we just need to replace the
* wildcardin the command with the specific content we want.
SELECT `name`,`login_time` FROM `users`,`login_activities` WHERE `users`.`id` = `login_activities`.`user_id`
- But inner joins actually have a problem: our data cannot always be perfectly matched. For example, the user table has 12 rows of data, but the login table may only have 8 rows of data.
- If you do an inner join in this case, only the 8 matching rows will be displayed, and the remaining 4 rows will not be shown. This means the result may not be comprehensive.
- If I want to achieve: show all usernames, show the login time for users who have logged in recently, and show NULL for those who have not. Inner joins cannot do this – inner joins are like an intersection.
Outer Joins
What is an Outer Join
An outer join returns all rows from at least one of the tables, even if there is no match in the other table. The behavior differs slightly depending on whether it is a left outer join, right outer join, or full outer join.
- Outer joins are divided into left and right types, and each has its own specific syntax.
Left Outer Join
Returns all rows from the left table (table1), plus matching rows from the right table (table2). If there is no match in the right table, those rows will be filled with null values.
- Syntax:
LEFT JOIN [table name] ON [join condition] - The order of tables referenced after
FROMalso has different effects.- The first table is called the driving table, and the join is based on this table.
- Why is the first table the driving table? Because it is on the left side of
LEFT JOIN– a left join is dominated by the left side. For a right outer join, the right side is the driving table.
- For easier demonstration, I will delete one record from the login table.
DELETE FROM `login_activities` WHERE `user_id` = 5
-
Now the login table has 4 records, and the user table has 5. We will use the user table as the driving table to see if the problem of missing unmatched data from the other table that occurred with inner joins still exists.
-
Previously we used the
WHEREkeyword for filter conditions, but for left and right outer joins, we use theONkeyword. What is the difference between the two?
The ON Keyword
- Purpose: The
ONkeyword is mainly used in join queries to specify the join condition, that is, to define how to match rows from two tables. - Context:
ONis used inJOINstatements, used to specify how to join rows by comparing columns from two tables.
The WHERE Keyword
- Purpose: The
WHEREkeyword is used to specify filter conditions, that is, to determine which rows should be included in the query result. - Context:
WHEREcan be used in any SELECT query, not just join queries. It filters after the data is joined, so you can useWHEREafterJOINto further restrict the result.
# When joining tables, especially for outer joins, it is very important to use ON to define the join condition, because this directly affects which rows are included in the final join result. The ON condition determines how to match data from the two tables, which is the foundation for building the joined result set.
SELECT `name`,`login_time` FROM `users` LEFT JOIN `login_activities` ON `users`.`id` = `login_activities`.`user_id`
- You can clearly see that the fifth row is also displayed. Data that does not exist in the login table is represented as
NULL.
Right Outer Join
Returns all rows from the right table (
table2), plus matching rows from the left table (table1). If there is no match in the left table, those rows will be filled with null values.
- As mentioned earlier, for a right outer join, the right side is the driving table.
- If we simply change
LEFTtoRIGHTin theleft outer joinexample above, leaving everything else unchanged, what result do we get?
- If we simply change
SELECT `name`,`login_time` FROM `users` RIGHT JOIN `login_activities` ON `users`.`id` = `login_activities`.`user_id`
- Now we only get 4 rows of data, not 5. This shows that the join is indeed dominated by the table on the right side of
RIGHT JOIN.
Summary
- These two outer join methods give us two options, depending on which table we want to prioritize.
- Theoretically, these two methods cover the vast majority of usage scenarios, but as we know, there are actually three possible options when joining two tables.
- As shown in the image above, a left outer join gets A+B, and a right outer join gets B+C. From the diagram, we can easily see that there is actually another option: A+B+C.
- There is a method for this, called Full Outer Join, which returns all rows from both tables. If there is no match on one side, that side's rows will be filled with null values.
- However, not all database systems support full outer joins, so we do not cover it here.
This is an independent discussion topic split from the original thread at https://juejin.cn/post/7368367426210578471








