Review Old Knowledge to Gain New Insights - MySQL Edition

MySQL is one of the most important topics for every backend developer to learn. However, as time goes by in your career, some knowledge you learned before but rarely use will gradually fade from memory~~

This article aims to connect as much of MySQL's basic knowledge as possible in a single post, and provide concise, vivid explanations for each specific topic. Whether you are a working professional or a new backend developer, I hope you can gain something from reading this article.

This article may not go very deep into specific topics. If you find any omissions or errors, you are welcome to discuss them in the comment section to help improve this content together.

1. What is MySQL

MySQL is currently the most popular relational database in the industry. Simply put, a relational database is a database that organizes data in table format.

2. Indexes: Empowering Queries

2.1. B+ Tree

Data is stored for the purpose of querying, and when it comes to queries, efficiency is inevitably a core concern. MySQL improves search efficiency by adding indexes. So why can indexes improve search efficiency? To answer this question, we need to understand B+ Tree, the data structure MySQL uses for indexes.

A B+ Tree is divided into root nodes, index nodes, and leaf nodes. Both root nodes and leaf nodes maintain a set of sorted pointers (taking the above image as an example, the root node maintains five pointers labeled 1, 20, 20, 30, 40 respectively). When querying data, the search process starts from the root node, locates layer by layer according to the sorted interval, and finally finds the leaf node that stores the target data.

Before B+ Tree, MySQL used B Tree as its underlying data structure. The difference between B+ Tree and B Tree is also a common interview question:

  • The fundamental difference between B+ Tree and B Tree is that B+ Tree only stores data in leaf nodes, while B Tree also stores data in internal nodes.
  • Since index nodes do not store data, they can store a larger range of index entries. Reading indexes in MySQL incurs disk I/O, which is a very time-consuming operation. This design of B+ Tree reduces the number of disk I/O operations, resulting in higher query efficiency.

We mentioned an important point earlier: disk I/O. Therefore, when setting the node size for B+ Tree in MySQL (the default size is about 16KB), it is best to set it as a multiple of the disk page size, which makes disk I/O more efficient.

2.2. Composite Index

A composite index is a B+ Tree built using multiple attributes together as the index:

2.3. Clustered Index

In the InnoDB engine, the data itself is stored in a B+ Tree index, and this index is the clustered index. The leaf nodes of the clustered index store the full row data of the table, and all other indexes are non-clustered indexes. The leaf nodes of non-clustered indexes maintain a pointer pointing to the corresponding leaf node in the clustered index.

Leaf nodes of non-clustered indexes only store the attributes used by the index. For example, if we build an index on class + gender, this index only contains class and gender information. When we need to query a student's name based on class and gender, we need to go back to the clustered index via the pointer maintained by the non-clustered index leaf node to get the name attribute. This process is commonly referred to as a table return query.

Although a table return query still uses an index, it reduces query efficiency. Therefore, when creating indexes, we should include as many required attributes as possible. This way we can get all the information we need just by querying the non-clustered index. When this condition is met, it is called index covering.

2.4. Index Invalidity

A query does not use an index when the index is invalid. Essentially, all index invalidity cases are caused by not following the leftmost prefix matching principle.

Take the composite index mentioned above as an example:

If we query directly using only gender as the condition, the index will naturally be invalid in this structure. We can only traverse all nodes in a full scan, so the index becomes invalid in this case.

Several common situations that can break leftmost prefix matching:

  • The query condition for a composite index does not include the leading column(s).
  • LIKE, NOT, <>, BETWEEN operators cause index invalidity.
  • The query condition contains functions, calculations, or implicit type conversion.
  • In cases where the index does not cover all required columns or the index has very low selectivity, the query optimizer determines that a full scan is more efficient than using the index.

Low selectivity means that many rows in the table share the same value for the indexed column, for example, gender.

3. Transactions

Transactions in MySQL are mainly used to solve data consistency problems, ensuring data integrity and reliability across multiple database operations.

3.1. The Four ACID Properties of Transactions

  1. Atomicity : All operations in a transaction are either fully completed or not completed at all. If any step in the transaction fails, the transaction will be rolled back to the state before it started, as if the transaction never happened. For example, in a bank transfer scenario, deducting the amount from account A and adding the same amount to account B must succeed or fail at the same time to avoid lost funds or incorrect accounting.
  2. Consistency : The database remains in a consistent state before and after transaction execution. Even if an error occurs during transaction processing, the database can be restored to a consistent state.
  3. Isolation : Concurrently executed transactions do not interfere with each other, and each transaction behaves as if it is operating the database alone. This means that before a transaction is completed and committed, other transactions cannot see its intermediate state, preventing problems such as "dirty reads", "non-repeatable reads", and "phantom reads".
  4. Durability : Once a transaction is committed, its changes are permanently stored in the database, and will not be lost even if a system failure occurs.

MySQL guarantees the four ACID properties through the following mechanisms:

  1. Atomicity : InnoDB uses undo logs to implement transaction atomicity. Before a transaction executes any modification, it first records the reverse operations of these changes into the undo log. If the transaction fails and needs to be rolled back, InnoDB executes these reverse operations based on the undo log, undoing the modifications already made to ensure the transaction is treated as if it never happened.
  2. Consistency : The consistent state of the database is maintained indirectly by ensuring the atomicity, isolation, and durability of transactions.
  3. Isolation : Isolation is guaranteed through transaction isolation level settings and the MVCC mechanism.
  4. Durability : InnoDB uses redo logs to guarantee transaction durability. When a transaction is committed, its changes are first recorded to the buffer pool in memory, then asynchronously written to the redo log. This way, even if a system crash occurs after the transaction is committed, the data that has not been persisted to disk can be recovered via the redo log.

3.2. The Four Transaction Isolation Levels

  1. Read Uncommitted (READ UNCOMMITTED,RU): The lowest isolation level, allows reading uncommitted data changes, which can lead to dirty reads, phantom reads, or non-repeatable reads.
  2. Read Committed (READ COMMITTED,RC): Allows reading data that has been committed by concurrent transactions, can prevent dirty reads, but phantom reads or non-repeatable reads can still occur.
  3. Repeatable Read (REPEATABLE READ,RR, default): Multiple reads of the same field return consistent results, unless the data is modified by the transaction itself. This level prevents dirty reads and non-repeatable reads, but phantom reads can still occur.
  4. Serializable(SERIALIZABLE): The highest isolation level, fully compliant with ACID requirements. All transactions are executed one after another in sequence, so it is completely impossible for transactions to interfere with each other.

In database transaction processing, dirty reads, non-repeatable reads, and phantom reads are several phenomena that may occur under different transaction isolation levels. They describe different scenarios where one transaction reads uncommitted or committed data from another transaction, detailed as follows:

  1. Dirty Read : When one transaction (transaction A) reads changed data that has not yet been committed by another transaction (transaction B), if transaction B subsequently rolls back, the data read by transaction A is invalid, which is the so-called "dirty data".
  2. Non-Repeatable Read : Within the same transaction, if you read the same data two or more times, the results of later reads are inconsistent with the first read, because another (already committed) transaction modified or deleted the data.
  3. Phantom Read : Phantom read occurs within the same transaction: when you first query a range of data and get some rows, then execute the same range query again in the same transaction, you find additional new rows. This is caused by another transaction inserting new data rows into the table during the interval.

3.3. Locks

When a transaction starts executing an operation that requires locking, such as UPDATE, DELETE or SELECT ... FOR UPDATE/SHARE statements, InnoDB determines the type and scope of the lock based on the specific statement type and the set transaction isolation level.

What type of lock is added in what scenario:

  • For an equality query, if the query condition matches a unique index, InnoDB usually adds a record lock (row lock) to lock the matching row.
  • For an equality query, if the index is not unique, InnoDB adds a gap lock on the next record to prevent insertions, and also adds a row lock to the found records. This combination is called Next-Key Lock, which prevents phantom reads.
  • For range queries, InnoDB adds gap locks at both ends of the query range to prevent insertion of new records within the range, which would cause phantom reads. A row lock is also added to every matching row within the range.

Where are locks added on indexes:

  • When an operation involves an index, whether it is a primary key index or a non-primary key (secondary) index, InnoDB implements row locking by locking the index entries. If the query uses a secondary index, InnoDB will not only add a lock on the secondary index entry, but also add a lock on the corresponding clustered index entry to ensure data consistency.

All the locks mentioned above are mutual exclusion locks (pessimistic locks), which directly restrict concurrent operations. MySQL also supports an optimistic locking mechanism: every row in MySQL has a transaction ID (version number), which is auto-incrementing. By checking the version number, you can determine whether an update operation was interleaved with other modification operations, and thus determine whether the update succeeded.

3.4. MVCC (Multi-Version Concurrency Control) Mechanism

MVCC is a multi-version concurrency control mechanism. It allows transactions to see the data they expect based on visibility rules, which reduces system overhead (for RC and RR isolation levels). Simple SELECT queries do not acquire locks, while operations that require a current read, such as deletes, updates, and SELECT FOR UPDATE, will acquire locks.

The essence of the MVCC mechanism is that transactions generate a set of "snapshots" based on version numbers in the Undo log (which is also used as a rollback log to guarantee atomicity). When reading data, transactions preferentially read data from the snapshot, avoiding interference from other transaction threads and solving the problems of dirty reads and non-repeatable reads.

4. From Single Node to Cluster

4.1. Database and Table Sharding

As business volume grows, the performance of a single-node database will eventually hit a bottleneck. When a single node can no longer handle the load, we have to add more machines. However, database services are different from application services: application services provide logical processing and are data-light, while databases are data-heavy query services. When adding machines, how to split data is the long-discussed problem of database and table sharding.

Database splitting is usually done based on business scenarios: tables for the same business domain are grouped into the same database. This approach usually has little visibility or impact for most developers.

Table splitting refers to splitting the data of a single large table into multiple smaller tables, which are stored on different machines. There are two approaches to table splitting: vertical splitting and horizontal splitting:

  • Vertical Splitting: Vertical table splitting splits a single table into multiple independent small tables based on the characteristics of columns, each containing a subset of the original table's columns. For example, a table that contains both basic user information and detailed user behavior records can be split into a basic user information table and a user behavior table.
  • Horizontal Splitting: Horizontal table splitting splits a large table into multiple small tables according to certain rules (such as ID range, modulo algorithm, etc.). These small tables have the same schema with the same columns, but each table stores a different subset of the data rows from the original table, and queries are routed to the corresponding table.

4.2. Hash Skew in Horizontal Sharding

Horizontal splitting usually uses a hash algorithm for query routing, often based on information such as user ID. This approach can encounter the problem of hash skew. For example, a large super merchant has a huge number of orders, so the table storing this merchant's data will carry much higher traffic than other tables. If multiple super merchants are hashed to the same table, the traffic imbalance will be even more severe.

However, consistent hashing is not necessarily the optimal solution for database sharding scenarios. For example, when we perform horizontal table splitting, the sharding key is often the key query condition. If we query all orders for a company, and we shard by company ID, we only need to search one table to get all the data. If we use consistent hashing, the data may be scattered across multiple tables, and the query efficiency becomes unpredictable. (This is a personal opinion, may not be correct, welcome discussion).

There are two approaches to solve the problem mentioned above, where multiple super merchants are hashed to the same table:

  • Combine business scenarios to use a composite sharding key. If a single field cannot distribute data evenly, you can use a composite sharding key that combines two or more fields to achieve better load balancing. For example, in an order scenario, queries are usually made based on merchant ID + order type, so you can use merchant ID + order type as the composite sharding key to distribute the large merchant's data across multiple tables and spread the load.
  • In special cases, you can perform hash fine-tuning. In most scenarios, the number of super users/merchants is limited, so you can use a separate hashing strategy for super users to ensure their data is distributed as evenly as possible.

4.3. Data Duplication – Master-Slave Replication

The core purpose of redundant data storage is to ensure high availability: on one hand, it avoids single points of failure, and on the other hand, it enables read-write separation to improve efficiency.

MySQL master-slave replication replicates the binlog from the master server to the slave server, and replays it on the slave to achieve consistent data between master and slaves.

  1. Read-write separation

MySQL read-write separation means all data modification operations are performed on the master, then synchronized to slaves via binlog, and data reads are handled by slaves. Read-write separation improves read efficiency, but since there is a data synchronization process, there is a data consistency problem (whether the data on the master and slaves are consistent). MySQL provides three levels of consistency solutions:

  • Asynchronous Replication (lowest consistency) : After the master completes executing the transaction and writes it to the binlog, the transaction is considered successful regardless of whether the slaves have applied the update.
  • Semi-synchronous Replication (medium consistency): After the master completes executing the transaction, it does not return a response immediately. It waits until at least one slave has received the binlog and successfully written it to the relay log before considering the transaction successful
  • Fully Synchronous Replication (highest consistency): After the master completes executing the transaction, the transaction is only considered successful after all slaves have executed the transaction.

Higher data consistency comes with lower execution efficiency, and the default strategy for MySQL is asynchronous replication.

In addition, due to synchronization delay, when using the asynchronous replication strategy, updates on the master cannot be synchronized to the slaves immediately (especially when the master and slaves are physically far apart, for example cross-border), therefore you should avoid performing a read immediately after inserting data.

4.4. Disaster Recovery Solution

As mentioned earlier, redundant data storage also avoids the single point of failure problem. When the master fails, an election algorithm is used to select one of the slaves to promote to the new master:

  1. Failure Detection:
    • Each node continuously monitors the heartbeat of other nodes. If a node does not receive a heartbeat from the master for longer than the preset timeout, it will consider the master to be failed and start the election process.
  1. Initiate Election:
    • Any node that considers the master to be failed can initiate an election. This node will send a request to other nodes, declaring that it wants to become the new master.
  1. Candidate Qualification Confirmation:
    • Nodes that receive the election request will evaluate their own status (degree of data synchronization, priority, etc.), and may respond to support the request or put forward their own candidacy. This step ensures that only the most up-to-date node participates in the election, avoiding data inconsistency.
  1. Voting and Consensus:
    • All nodes participating in the election reach consensus through a series of message exchanges. Each node votes for the candidate it considers appropriate (usually based on version number: the node with the most up-to-date data is the most suitable to be the new master), and collects votes. Once a node gets the support of most nodes (i.e., more than half), it is elected as the new master.
  1. Master Confirmation and Status Update:
    • After a node gets enough votes to become the new master, it broadcasts this message to all other nodes. Other nodes confirm and adjust their own status to become slaves, and start receiving updates from the new master.
  1. Data Consistency Check:
    • Before the new master starts accepting write operations, some consistency checks may be performed to ensure data consistency.

This is a separate discussion topic extracted from the original post at https://juejin.cn/post/7368864814065958962