Just yesterday, I encountered a textbook-level database problem. I needed to delete an index on a large table containing 60 million rows of data. I thought a simple ALTER TABLE
command would do the trick, but it turned out to be a harrowing journey of troubleshooting and repair lasting several hours.
The whole process involved a series of issues, including DDL deadlocks, fatal flaws in the table engine, table corruption and repair, and abnormal index statistics. I've recorded the entire process in this retrospective, hoping to remind myself in the future and to help anyone else who might encounter similar difficulties.
The protagonist of the story: A table named wa_xinghao
, with over 60 million rows of data.
Act 1: Undercurrents Beneath a Calm Surface – Why Did ALTER TABLE
Freeze?
It all started with this command:
ALTER TABLE `wa_xinghao` DROP INDEX `product_id-s1`;
After execution, the terminal froze without any response. At the same time, I noticed that all queries related to this table in the application became extremely slow, even timing out. My first reaction was: Locked!
I immediately opened a new database connection and executed SHOW FULL PROCESSLIST;
for diagnosis.
Usually, the most common reason for an ALTER
to freeze is Waiting for table metadata lock
, meaning that some other long transaction or query is holding the table's metadata lock. But this time, the State
I saw was:
copy to tmp table
This status sent a chill down my spine. It meant that MySQL was performing the most primitive and time-consuming table reconstruction operation: creating a temporary table with the new structure (without the index) and then copying the 60 million rows of data into it, one by one. During this process, the original table would be locked for a long time. For a data volume of 60 million, this was undoubtedly a disaster.
Act 2: Digging Deeper – Why COPY
Instead of INPLACE
?
I'm using MySQL 5.7, which has long supported Online DDL (INPLACE
algorithm) for DROP INDEX
. It shouldn't have regressed to COPY
. To verify this, I tried to force the algorithm:
ALTER TABLE `wa_xinghao` DROP INDEX `product_id-s1`, ALGORITHM=INPLACE, LOCK=NONE;
As a result, MySQL ruthlessly gave me an error: ERROR 1845 (0A000): ALGORITHM=INPLACE is not supported for this operation. Try ALGORITHM=COPY.
Now the problem was clear: For some reason, MySQL believed that an INPLACE
operation could not be performed on this table. I immediately executed SHOW CREATE TABLE wa_xinghao;
to view the table structure, and the truth came out:
) ENGINE=MyISAM ...
The root cause was found! It's the MyISAM
storage engine. I had mistakenly assumed it was the InnoDB engine, as the database seemed to be full of InnoDB engines at first glance!
MyISAM
is an old, non-transactional engine with many fatal flaws, one of which is very weak support for DDL operations. It has almost no concept of "online" operations, and most ALTER
operations will lead to table locking and table reconstruction.
Act 3: One Wave After Another – Table Corruption and Lengthy Repair
After locating the problem, I couldn't tolerate the long table lock, so I made a decision that now seems very dangerous: I KILL
ed the ALTER
process that was copy to tmp table
.
After a brief period of calm, a new nightmare arrived. When I tried to do anything to this table (even a SELECT
), I received a new error:
ERROR 1194 (HY000): Table 'wa_xinghao' is marked as crashed and should be repaired
Yes, due to forcibly interrupting the modification of the underlying files, the MyISAM
table "lived up to expectations" and became corrupted. This is a typical manifestation of the non-crash-safe nature of the MyISAM
engine.
The only way out was to repair it:
REPAIR TABLE `wa_xinghao`;
Then, I entered a new round of lengthy waiting. Through SHOW PROCESSLIST;
, I saw the status change to:
Repair by sorting
This status indicates that MySQL is rebuilding the index file (.MYI
) by sorting. For 60 million rows of data and multiple indexes, this is an extremely CPU- and I/O-intensive process. All I could do was wait patiently and pray that the server wouldn't be overwhelmed. This time, I absolutely didn't dare to KILL
it again.
Act 4: A "Little Bonus" After the Storm – Strange Index Cardinality
After hours of torment, the table was finally repaired successfully!
When checking the table structure, I inadvertently discovered an interesting phenomenon:
As shown in the figure above, the Cardinality of the sn
and sn2
indexes was actually larger than the cardinality of the primary key PRIMARY
! This is logically impossible because the primary key is absolutely unique, and its cardinality should be equal to the total number of rows.
Reasoning:
- Cardinality is an estimate: For performance reasons, MySQL estimates cardinality by random sampling, which inherently has errors.
- Outdated statistics: The main reason is that after the chaotic process of "crash-repair", the table's statistics were not updated in time, resulting in outdated and inaccurate data.
The solution is simple: manually force an update of the statistics:
ANALYZE TABLE `wa_xinghao`;
After execution, the cardinality returned to normal.
Final Chapter: Rebirth from the Ashes – Embracing InnoDB
This harrowing experience made me determined to completely abandon MyISAM
. My ultimate goal was: To convert the table engine to InnoDB
while deleting redundant indexes.
Directly executing ALTER TABLE ... ENGINE=InnoDB
will also trigger table reconstruction with locking, which is undesirable. There are two best practices:
Option 1: The King of Online Operations – pt-online-schema-change
This tool from Percona Toolkit is the industry standard for handling DDL on large tables. By creating a "ghost table" and using triggers to synchronize incremental data, it can complete table structure changes without locking the table and with minimal impact on the business.
# One command to complete the two major tasks of "deleting index" and "converting engine" at the same time
pt-online-schema-change \
--alter "DROP INDEX `product_id-s1`, ENGINE=InnoDB" \
h=your_host,D=your_database,t=wa_xinghao,u=your_user,p=your_password \
--execute
Option 2: Manual Offline Operation (Requires a Downtime Window)
If the business allows for a short downtime for maintenance, you can adopt the idea of "migrate data first, then create indexes", which is extremely fast.
- Create a new InnoDB table
wa_xinghao_new
without indexes. - Quickly export data to a file:
SELECT ... INTO OUTFILE ... FROM wa_xinghao;
- Quickly load data into the new table:
LOAD DATA INFILE ... INTO TABLE wa_xinghao_new;
- Create the required indexes online on the new table:
ALTER TABLE wa_xinghao_new ADD INDEX ...;
- Within the downtime window, atomically rename:
RENAME TABLE wa_xinghao TO wa_xinghao_old, wa_xinghao_new TO wa_xinghao;
Reflections
This experience, although bumpy, was very rewarding. I have summarized the following key lessons:
- Engine is fundamental: Before doing anything with a large table, look at
ENGINE
first! Different engines have vastly different behaviors. - Say goodbye to
MyISAM
: For any business with write, update, and high availability requirements, please immediately migrateMyISAM
toInnoDB
.MyISAM
's non-crash-safe nature and table-level locks are time bombs buried in the system. - Respect the
KILL
command: Do not easilyKILL
a DDL process that is writing to aMyISAM
table, otherwise you will most likely get a corrupted table. - Make good use of professional tools: For large table DDL,
pt-online-schema-change
orgh-ost
are not "optional", but "essential". - Understand status information: The
State
inSHOW PROCESSLIST
is the golden key to troubleshooting.copy to tmp table
andRepair by sorting
both tell us what is happening under the hood.