Skip to content

Just yesterday, I encountered a textbook-level database problem. I needed to delete an index on a large table containing 60 million rows of data. I thought a simple ALTER TABLE command would do the trick, but it turned out to be a harrowing journey of troubleshooting and repair lasting several hours.

The whole process involved a series of issues, including DDL deadlocks, fatal flaws in the table engine, table corruption and repair, and abnormal index statistics. I've recorded the entire process in this retrospective, hoping to remind myself in the future and to help anyone else who might encounter similar difficulties.

The protagonist of the story: A table named wa_xinghao, with over 60 million rows of data.

Act 1: Undercurrents Beneath a Calm Surface – Why Did ALTER TABLE Freeze?

It all started with this command:

sql
ALTER TABLE `wa_xinghao` DROP INDEX `product_id-s1`;

After execution, the terminal froze without any response. At the same time, I noticed that all queries related to this table in the application became extremely slow, even timing out. My first reaction was: Locked!

I immediately opened a new database connection and executed SHOW FULL PROCESSLIST; for diagnosis.

Usually, the most common reason for an ALTER to freeze is Waiting for table metadata lock, meaning that some other long transaction or query is holding the table's metadata lock. But this time, the State I saw was:

copy to tmp table

This status sent a chill down my spine. It meant that MySQL was performing the most primitive and time-consuming table reconstruction operation: creating a temporary table with the new structure (without the index) and then copying the 60 million rows of data into it, one by one. During this process, the original table would be locked for a long time. For a data volume of 60 million, this was undoubtedly a disaster.

Act 2: Digging Deeper – Why COPY Instead of INPLACE?

I'm using MySQL 5.7, which has long supported Online DDL (INPLACE algorithm) for DROP INDEX. It shouldn't have regressed to COPY. To verify this, I tried to force the algorithm:

sql
ALTER TABLE `wa_xinghao` DROP INDEX `product_id-s1`, ALGORITHM=INPLACE, LOCK=NONE;

As a result, MySQL ruthlessly gave me an error: ERROR 1845 (0A000): ALGORITHM=INPLACE is not supported for this operation. Try ALGORITHM=COPY.

Now the problem was clear: For some reason, MySQL believed that an INPLACE operation could not be performed on this table. I immediately executed SHOW CREATE TABLE wa_xinghao; to view the table structure, and the truth came out:

sql
) ENGINE=MyISAM ...

The root cause was found! It's the MyISAM storage engine. I had mistakenly assumed it was the InnoDB engine, as the database seemed to be full of InnoDB engines at first glance!

MyISAM is an old, non-transactional engine with many fatal flaws, one of which is very weak support for DDL operations. It has almost no concept of "online" operations, and most ALTER operations will lead to table locking and table reconstruction.

Act 3: One Wave After Another – Table Corruption and Lengthy Repair

After locating the problem, I couldn't tolerate the long table lock, so I made a decision that now seems very dangerous: I KILLed the ALTER process that was copy to tmp table.

After a brief period of calm, a new nightmare arrived. When I tried to do anything to this table (even a SELECT), I received a new error:

ERROR 1194 (HY000): Table 'wa_xinghao' is marked as crashed and should be repaired

Yes, due to forcibly interrupting the modification of the underlying files, the MyISAM table "lived up to expectations" and became corrupted. This is a typical manifestation of the non-crash-safe nature of the MyISAM engine.

The only way out was to repair it:

sql
REPAIR TABLE `wa_xinghao`;

Then, I entered a new round of lengthy waiting. Through SHOW PROCESSLIST;, I saw the status change to:

Repair by sorting

This status indicates that MySQL is rebuilding the index file (.MYI) by sorting. For 60 million rows of data and multiple indexes, this is an extremely CPU- and I/O-intensive process. All I could do was wait patiently and pray that the server wouldn't be overwhelmed. This time, I absolutely didn't dare to KILL it again.

Act 4: A "Little Bonus" After the Storm – Strange Index Cardinality

After hours of torment, the table was finally repaired successfully!

When checking the table structure, I inadvertently discovered an interesting phenomenon:

As shown in the figure above, the Cardinality of the sn and sn2 indexes was actually larger than the cardinality of the primary key PRIMARY! This is logically impossible because the primary key is absolutely unique, and its cardinality should be equal to the total number of rows.

Reasoning:

  1. Cardinality is an estimate: For performance reasons, MySQL estimates cardinality by random sampling, which inherently has errors.
  2. Outdated statistics: The main reason is that after the chaotic process of "crash-repair", the table's statistics were not updated in time, resulting in outdated and inaccurate data.

The solution is simple: manually force an update of the statistics:

sql
ANALYZE TABLE `wa_xinghao`;

After execution, the cardinality returned to normal.

Final Chapter: Rebirth from the Ashes – Embracing InnoDB

This harrowing experience made me determined to completely abandon MyISAM. My ultimate goal was: To convert the table engine to InnoDB while deleting redundant indexes.

Directly executing ALTER TABLE ... ENGINE=InnoDB will also trigger table reconstruction with locking, which is undesirable. There are two best practices:

Option 1: The King of Online Operations – pt-online-schema-change

This tool from Percona Toolkit is the industry standard for handling DDL on large tables. By creating a "ghost table" and using triggers to synchronize incremental data, it can complete table structure changes without locking the table and with minimal impact on the business.

bash
# One command to complete the two major tasks of "deleting index" and "converting engine" at the same time
pt-online-schema-change \
  --alter "DROP INDEX `product_id-s1`, ENGINE=InnoDB" \
  h=your_host,D=your_database,t=wa_xinghao,u=your_user,p=your_password \
  --execute

Option 2: Manual Offline Operation (Requires a Downtime Window)

If the business allows for a short downtime for maintenance, you can adopt the idea of ​​"migrate data first, then create indexes", which is extremely fast.

  1. Create a new InnoDB table wa_xinghao_new without indexes.
  2. Quickly export data to a file: SELECT ... INTO OUTFILE ... FROM wa_xinghao;
  3. Quickly load data into the new table: LOAD DATA INFILE ... INTO TABLE wa_xinghao_new;
  4. Create the required indexes online on the new table: ALTER TABLE wa_xinghao_new ADD INDEX ...;
  5. Within the downtime window, atomically rename: RENAME TABLE wa_xinghao TO wa_xinghao_old, wa_xinghao_new TO wa_xinghao;

Reflections

This experience, although bumpy, was very rewarding. I have summarized the following key lessons:

  1. Engine is fundamental: Before doing anything with a large table, look at ENGINE first! Different engines have vastly different behaviors.
  2. Say goodbye to MyISAM: For any business with write, update, and high availability requirements, please immediately migrate MyISAM to InnoDB. MyISAM's non-crash-safe nature and table-level locks are time bombs buried in the system.
  3. Respect the KILL command: Do not easily KILL a DDL process that is writing to a MyISAM table, otherwise you will most likely get a corrupted table.
  4. Make good use of professional tools: For large table DDL, pt-online-schema-change or gh-ost are not "optional", but "essential".
  5. Understand status information: The State in SHOW PROCESSLIST is the golden key to troubleshooting. copy to tmp table and Repair by sorting both tell us what is happening under the hood.