Resolving gh-ost Lock Contention in Sharded MySQL and Vitess Topologies

A gh-ost migration that sails through the row-copy phase can still hang at the final swap, timing out because it cannot acquire the exclusive metadata lock (MDL) the atomic RENAME needs — and in a sharded keyspace that stall multiplies across every shard running the same change.

Where this fits

This page resolves one specific failure inside Coordinating Multi-Shard Schema Migrations: the cutover-time lock stall on a shard whose copy is otherwise healthy. The broader control surface — the migration state machine, the concurrency budget, and the global cutover barrier — is defined by Online DDL orchestration and migration coordination. Here we stay narrow: why gh-ost blocks on metadata locks, how to find the session that holds the lock, and which flags let a controller clear the stall without abandoning a copy that may have run for hours. Because the migration is submitted through the VTGate routing layer and terminated on each shard’s primary, a single stuck cutover also back-pressures the proxy’s connection pools, which is why the problem is worth solving precisely rather than by blindly raising timeouts.

Why gh-ost stalls at the cutover

For almost its entire runtime gh-ost is polite: it streams the binary log, copies rows in chunks, and applies ongoing changes to a shadow table (_<table>_gho) while writing a changelog (_<table>_ghc). None of that needs an exclusive lock. The contention appears only at the very end. To make the swap atomic, gh-ost runs a two-step cutover: it takes a table lock on the original, renames the original out of the way, and renames the shadow into place. That RENAME requires an exclusive MDL on the table, and MySQL will not grant it while any other session holds even a shared MDL — which every open transaction that has merely touched the table holds until it commits.

So the stall is rarely caused by gh-ost itself. It is caused by an ordinary application transaction — a long-running report, an idle-in-transaction connection, an uncommitted batch job — that has read or written the table and not yet committed. gh-ost queues behind it in Waiting for table metadata lock, and once --cut-over-lock-timeout-seconds elapses it aborts the cutover attempt and retries, up to --default-retries times, before giving up. In a sharded fleet the same benign query pattern can hold the lock on several shards at once, so a change that cut over cleanly on twenty shards hangs on three, leaving the keyspace split across two schema versions.

Diagnosing the blocking session

Before touching any flag, identify which session holds the lock. MySQL’s performance_schema.metadata_locks shows every granted and pending MDL; joining it to performance_schema.threads gives you the process list ID you can KILL, plus the SQL text of the offender.

SELECT
  ml.OBJECT_SCHEMA,
  ml.OBJECT_NAME,
  ml.LOCK_TYPE,
  ml.LOCK_STATUS,
  t.PROCESSLIST_ID,
  t.PROCESSLIST_TIME AS held_seconds,
  t.PROCESSLIST_STATE,
  t.PROCESSLIST_INFO AS current_sql
FROM performance_schema.metadata_locks ml
JOIN performance_schema.threads t
  ON ml.OWNER_THREAD_ID = t.THREAD_ID
WHERE ml.OBJECT_TYPE = 'TABLE'
  AND ml.OBJECT_SCHEMA NOT IN ('performance_schema', 'mysql', 'sys')
ORDER BY ml.LOCK_STATUS DESC, held_seconds DESC;

The row with LOCK_STATUS = 'PENDING' and LOCK_TYPE = 'EXCLUSIVE' is gh-ost waiting for the swap. The rows with LOCK_STATUS = 'GRANTED' and a SHARED_* lock on the same OBJECT_NAME are the blockers; the one with the largest held_seconds is almost always the culprit. A control-plane daemon can run this query against each shard’s primary on a sub-second cadence during an active migration — a small Python poller using any MySQL DB-API driver is enough — and correlate a pending exclusive lock older than a threshold with the migration’s own progress before it turns into a proxy-visible timeout.

gh-ost progress itself is readable two ways: from the changelog table (SELECT * FROM _<table>_ghc ORDER BY id DESC LIMIT 5) and from its interactive command socket, /tmp/gh-ost.<schema>.<table>.sock by default and overridable with --serve-socket-file. Never hard-code that socket path in an orchestrator without confirming the --serve-socket-file value the migration was actually launched with, or the controller will send throttle/unthrottle commands into the void.

Clearing and preventing the stall

Once you know the blocker, you have two levers: get the lock granted, or stop gh-ost from asking for it at the wrong moment.

Clear the immediate stall. If a single long transaction is holding the shared MDL, KILL <PROCESSLIST_ID> releases it and gh-ost’s next retry (within --default-retries) completes the swap. This is safe precisely because the cutover is designed to be retried — killing the blocker does not corrupt the copy.

Prevent the stall recurring. The controller should gate when the cutover fires and give it enough headroom to succeed. The flags below are the ones that matter for lock contention specifically:

Flag	Type	Default	Recommended (production, sharded)
`--cut-over-lock-timeout-seconds`	int (s)	`3`	`3`–`6` — long enough to win the lock during a quiet moment, short enough to fail fast and retry
`--default-retries`	int	`60`	`60`–`120` — more attempts to catch a gap between blocking transactions
`--max-lag-millis`	int (ms)	`1500`	`1500` — pause copy when replica lag exceeds this, so cutover starts from a caught-up state
`--throttle-query`	string	`""`	a `SELECT 1 WHERE ...` that returns non-zero during known heavy windows
`--throttle-control-replicas`	csv	`""`	the shard’s replica set, so lag on any replica throttles the copy
`--throttle-flag-file`	path	`""`	a file the orchestrator touches to pause all migrations fleet-wide
`--panic-flag-file`	path	`""`	a file whose creation aborts this migration immediately

The orchestration rule that keeps a sharded fleet out of trouble is: serialize the cutover even when you parallelize the copy. Let up to N shards run their row-copy phase concurrently within a concurrency budget, but issue the cutover one shard (or one small batch) at a time. Simultaneous swaps across shards that share a primary host or a replication path pile exclusive-lock requests onto the same lock manager and lengthen every wait. Serializing the swap keeps each cutover’s contention window isolated and short.

Two supporting practices reduce the blast radius further. Drive the heavy copy phase into per-region traffic troughs — the timing logic belongs in Scheduling DDL Windows Across Multiple Timezones — so there are simply fewer long transactions competing at swap time. And before cutting over, warm the proxy: issue a lightweight SELECT 1 against the target so VTGate has a live route and the swap does not coincide with a cold connection pool re-establishing itself.

Edge cases and gotchas

Idle-in-transaction connections are invisible in the process list’s SQL. A session showing Sleep with PROCESSLIST_INFO = NULL can still hold a shared MDL from an earlier statement in an uncommitted transaction. Trust metadata_locks, not the current SQL text.
Autocommit-off application frameworks are a common hidden blocker. ORMs that open a transaction per request and keep the connection pooled can hold MDLs far longer than the query that acquired them. Cap connection idle time rather than lengthening gh-ost timeouts.
Raising --cut-over-lock-timeout-seconds too high backfires. A long timeout makes each failed attempt expensive and lets gh-ost sit holding its own lock request longer, worsening pile-up on a busy primary. Prefer a short timeout with more retries.
Orphaned shadow tables after an aborted cutover. A migration that exhausts its retries leaves _<table>_gho and _<table>_ghc behind on the shards it reached. Cleanup must be idempotent — check for and drop these before a retry so residual artifacts don’t collide with the new run.
Killing the wrong session. On a shard with several long readers, kill only the one holding a GRANTED shared MDL on the migrating table. Killing an unrelated long query buys nothing and disrupts application traffic.
A partial cutover splits the keyspace schema. If some shards swap and others time out, in-flight queries fanned through the proxy can observe two column shapes at once. Roll the whole fleet forward or entirely back before declaring the migration terminal — never leave it half-applied.

Verifying the cutover cleared

After clearing the blocker, confirm on the affected shard’s primary that the exclusive wait is gone and the swap actually happened:

-- Should return ZERO rows once the RENAME has completed
SELECT OBJECT_NAME, LOCK_STATUS
FROM performance_schema.metadata_locks
WHERE OBJECT_SCHEMA = '<keyspace>'
  AND OBJECT_NAME = '<table>'
  AND LOCK_STATUS = 'PENDING';

Zero pending rows means no session is still waiting on the table’s MDL. Cross-check that the shadow tables are gone (SHOW TABLES LIKE '\_<table>\_gh%' returns nothing) and, at the keyspace level, that every shard reports the migration complete with an identical target schema via SHOW VITESS_MIGRATIONS LIKE '<migration_uuid>'. Only when no shard is left in running or cutover, and the pending-lock query is empty on each primary, is the coordinated cutover genuinely done.

Frequently asked

Does raising --cut-over-lock-timeout-seconds fix the stall? Rarely on its own. It only helps if the blocking transactions are shorter than the new timeout; against a persistently held MDL it just makes each failure slower. Find and clear the blocker instead.

Is native Vitess Online DDL immune to this? No — any engine that ends in an atomic RENAME needs the same exclusive MDL and can be blocked by the same long transaction. The trade-offs between engines are covered in Vitess Native Online DDL vs External Tools; the lock discipline here applies to both.

How does the controller know a shard is stuck rather than just slow? By modeling each shard as a state machine and watching for a pending exclusive MDL that outlives the cutover threshold — the per-shard state model is defined in Tracking Migration Progress and State Machines.

Coordinating Multi-Shard Schema Migrations — the fan-out, concurrency budget, and global cutover barrier this stall interrupts.
Tracking Migration Progress and State Machines — the per-shard state model a controller uses to detect a stuck cutover and recover idempotently.
Vitess Native Online DDL vs External Tools — choosing the engine that drives each shard’s copy and swap, and where its lock behaviour differs.

← Back to Coordinating Multi-Shard Schema Migrations · Parent area: Online DDL Orchestration & Migration Coordination

Resolving gh-ost Lock Contention in Sharded MySQL and Vitess Topologies

Where this fits #

Why gh-ost stalls at the cutover #

Diagnosing the blocking session #

Clearing and preventing the stall #

Edge cases and gotchas #

Verifying the cutover cleared #

Frequently asked #

Related #