Optimizing Vindex Performance for High QPS

At sustained high queries-per-second, a single mis-shaped vindex turns routine reads into cluster-wide scatters, and the resulting fan-out — not the underlying MySQL — becomes the thing that saturates connection pools and blows out P99 latency.

Where This Fits

Vindex performance is a read-path routing problem that lives one level below the mechanics of configuring lookup vindexes for cross-shard joins: that page shows how to define an owned consistent_lookup mapping so a secondary-column query resolves to one shard; this page is about keeping that resolution cheap when the same query runs a few hundred thousand times a second. The routing decisions themselves are made by the VTGate routing architecture, and the vindex, keyspace, and routing-rule abstractions they operate on are grounded in VSchema configuration and routing rule management. If any of those three concepts are unfamiliar, read them first — everything below assumes you already have a working vindex and are trying to make it fast under load.

The optimisation target is blunt: keep the overwhelming majority of queries on a single shard. A hash lookup that routes to one shard costs one round trip; the same query forced to scatter costs one round trip per shard plus an aggregation hop in VTGate, and that cost grows with your shard count exactly as traffic grows. High-QPS tuning is almost entirely the discipline of preventing accidental scatter and making the unavoidable lookups hit cache.

Why Routing Cost Explodes Under Load

VTGate evaluates every statement against the VSchema to decide whether it can compute a target shard from the query’s predicates. A functional vindex such as hash or xxhash computes the keyspace ID arithmetically — an O(1), allocation-free step. A WHERE clause that matches no vindex column, or an ambiguous VSchema that offers the planner no deterministic path, leaves VTGate no choice but to dispatch the query to every shard and merge the results.

The trap is that a scatter is not an error — it returns correct rows, just expensively. At low QPS nobody notices. Under load the arithmetic turns hostile: at 200k QPS across 16 shards, a 1% scatter rate is 2,000 queries/sec each opening a connection on all 16 shards, i.e. 32,000 shard-level requests/sec generated by traffic the VSchema could have routed to one shard. That is how a schema-level oversight surfaces as connection-pool exhaustion and tail-latency cliffs rather than as a query that looks slow in isolation.

Keep Routing Deterministic

The cheapest optimisation is refusing to let ambiguity into the VSchema in the first place. Three rules do most of the work.

Match the vindex type to the column’s data type. hash and xxhash are for fixed-width numeric or UUID keys and give O(1) routing. unicode_loose_md5 exists for case-insensitive string keys; it is not a drop-in for hash on an integer or UUID column — using it there pays a string-hashing CPU cost on every route for no benefit. Pick the type that fits the real column, not the one that happens to compile.

Enforce uniqueness where the data is unique. A _unique lookup returns exactly one keyspace ID and routes to exactly one shard. A non-unique lookup on a column that is actually 1:1 stores redundant mapping rows and forces the planner to treat a single-row read as a potential multi-shard read. Declare consistent_lookup_unique whenever the column is unique per row.

Prove the plan before it ships. Every routing change should be gated on VEXPLAIN showing a single-shard operator, run through an async VSchema validation workflow so an ambiguous definition fails CI instead of scattering in production:

VEXPLAIN PLAN SELECT * FROM customer WHERE email = 'a@example.com';

A healthy result is an OperatorType of Route with an EqualUnique variant naming your vindex. Scatter here means the predicate matched no usable vindex and the query will fan out under load.

Make the Lookup Hop Cheap

An owned lookup vindex adds a first hop — SELECT keyspace_id FROM <lookup_table> WHERE <from_col> = ? — before the targeted read. Under high QPS that hop must be as close to free as a functional vindex, which comes down to placement and caching.

Colocate the mapping with its base row. The single most damaging misconfiguration is a lookup table sharded on a different key than its base table: the routing hop itself becomes a cross-shard join, so you pay a scatter to avoid a scatter. Shard the mapping table on its from column so each mapping row lives on the same shard as the row it points at (the full VSchema for this is in the lookup vindex configuration page).

Let the buffer pool absorb the reads. The mapping table is a narrow, read-heavy, hot index. Size each shard’s InnoDB buffer pool so the entire lookup table and its primary index stay resident — a mapping read that touches disk under load is a latency spike waiting to happen. This is a MySQL-level tuning knob, applied per tablet, not a Vitess flag.

Cache the parsed plan, not the row. VTGate caches the parsed and planned form of a query so repeat statements skip re-planning; it does not cache the row-level mapping result. Keep queries parameterised (bind variables, not inlined literals) so they share one cached plan entry — inlining literals defeats the cache and drives VTGate CPU up linearly with QPS.

Avoid composite lookups on the hot path. A multi-column lookup vindex compounds mapping-table width and, for consistent_lookup, widens the transactional write that every base-row mutation must perform. Reserve composite lookups for cases where business logic genuinely requires them; on a high-QPS path a single-column _unique lookup is dramatically cheaper.

Bound Scatter at the Router

Some scatter is unavoidable — an analytics query, a lookup miss during backfill, an intentionally unrouted DELETE. The job is to cap its blast radius so one adversarial query cannot take VTGate down. These flags belong on every VTGate in the fleet; a mixed fleet gives you inconsistent protection.

vtgate \
  --query_timeout 30s \
  --max_memory_rows 100000 \
  --queryserver-config-max-result-size 100000 \
  # ...other flags

Flag (`VTGate`)	Type	Default	Recommended (production)
`--query_timeout`	duration	`0` (unbounded)	`30s` — cap a lingering scatter before it drains connection pools
`--max_memory_rows`	int	`300000`	`100000` — bound the rows buffered in `VTGate` heap for scatter aggregation
`--queryserver-config-max-result-size`	int	`0` (unset)	size to your largest legitimate result so a runaway scatter aborts, not the process
`--transaction_mode`	enum	`MULTI`	`TWOPC` only where `consistent_lookup` needs atomic mapping+base writes
`--grpc_max_message_size`	int	`16777216`	raise to your largest aggregated result set to avoid mid-flight truncation

Set --max_memory_rows conservatively: it is the guardrail that turns “one bad scatter query” into a single failed statement instead of an out-of-memory VTGate. Align --query_timeout with max_connections on the underlying MySQL and with the pod resource limits from your high-availability tablet configuration so a scatter storm cannot cascade into a connection-exhaustion outage during shard rebalancing.

Coordinate Tuning Changes with Live Traffic

A routing or schema change applied carelessly at high QPS is its own outage. Propagation of a new VSchema is eventually consistent — each VTGate reloads asynchronously — so treat an apply as the start of a rollout and drive it from an idempotent script that polls for convergence rather than fire-and-forget. Schema modifications behind the vindex must move through Online DDL orchestration so cutovers do not collide with the routing change; the windowing pattern for that is in coordinating multi-shard schema migrations. The rule that protects QPS is to decouple schema application from routing-rule activation: land the schema everywhere, verify it, and only then flip the routing so no window exists where half the fleet scatters.

Edge Cases and Gotchas

Literal-inlined queries defeat the plan cache. Two queries that differ only by a WHERE constant become two cache entries. Under load this floods the cache and pushes VTGate CPU up with QPS. Always parameterise.
Backfill exposed early causes selective scatter. A newly declared owned lookup only maps rows written after it exists; activating the read path before backfill completes makes older keys scatter while new keys route. Bounded --query_timeout keeps that residual scatter from exhausting pools while the backfill catches up.
Lookup drift silently misroutes. Writes that bypass Vitess leave an unowned mapping stale, so reads route to the wrong shard and return “not found” for rows that exist. Prefer owned consistent_lookup and reconcile periodically.
A hot consistent_lookup throttles writes. Transactional mapping updates run a distributed commit per base-row mutation. On a write-heavy path this shows up as TWOPC lock waits — see handling cross-shard transactions in Vitess. If the column is append-mostly, a non-transactional owned lookup may be enough.
ORDER BY/LIMIT on a scatter buffers everything. Aggregating and sorting a scattered result set holds rows in VTGate heap up to --max_memory_rows; a scatter that also sorts is the classic heap-pressure query.
Wrong vindex type wastes CPU quietly. unicode_loose_md5 on a numeric column routes correctly but burns string-hash CPU on every request — invisible at low QPS, measurable at high QPS.

Verification

Confirm the tuning holds where it matters: the ratio of single-shard to scattered queries, watched continuously, not sampled once. VTGate exports per-query-plan execution counts; a rising scatter share for a query that should route single-shard is the earliest signal that a vindex regressed or a mapping is drifting. Platform teams should never eyeball this — poll it and alert. The idempotent pattern below reads VTGate’s exported metrics and fails when the single-shard hit-rate for lookup-routed queries falls below target:

import urllib.request
import json


def single_shard_hit_rate(vtgate_host: str, min_rate: float = 0.99) -> float:
    """Fail if too many queries are scattering instead of routing single-shard."""
    with urllib.request.urlopen(f"http://{vtgate_host}:15001/debug/vars") as resp:
        stats = json.load(resp)

    # QueriesRouted counts single-shard/targeted plans; QueriesProcessed
    # includes scatter. The gap is your fan-out rate.
    routed = sum(stats.get("QueriesRouted", {}).values())
    total = sum(stats.get("QueriesProcessed", {}).values())
    rate = routed / total if total else 1.0

    if rate < min_rate:
        raise SystemExit(
            f"single-shard hit-rate {rate:.3%} below target {min_rate:.1%} "
            f"— check for lookup drift or an ambiguous VSchema route")
    return rate

Wire this into the same observability that tracks vindex hit rates, scatter frequency, and lookup-table read latency, and alert when the single-shard hit-rate for lookup-routed queries drops or scatter frequency climbs. Combined with deterministic vindex types, a colocated and cached mapping table, and bounded scatter at the router, that hit-rate is the one number that tells you the routing layer is holding up under sustained high QPS.

Configuring Lookup Vindexes for Cross-Shard Joins — how to define, back-fill, and own the mapping this page tunes for throughput.
Async VSchema Validation Workflows — gate every routing change on a VEXPLAIN single-shard plan before it reaches production.
Dynamic Routing Rules and Query Rewriting — shift and rewrite traffic on top of a working vindex without a redeploy.

← Back to Configuring Lookup Vindexes for Cross-Shard Joins

Optimizing Vindex Performance for High QPS

Where This Fits #

Why Routing Cost Explodes Under Load #

Keep Routing Deterministic #

Make the Lookup Hop Cheap #

Bound Scatter at the Router #

Coordinate Tuning Changes with Live Traffic #

Edge Cases and Gotchas #

Verification #

Related #