Securing Multi-Tenant Sharded Databases

In a multi-tenant SaaS platform, a single Vitess keyspace holds the data of thousands of independent customers, and the entire security model rests on one guarantee: no query issued on behalf of tenant A may ever read or write a row that belongs to tenant B. Sharding makes that guarantee harder, not easier — the same logical table is spread across many vttablet primaries, a query can fan out to every shard at once, and the enforcement point moves out of the application and into the routing layer. This page resolves one specific operational challenge: how to compose Vitess’s isolation primitives — tenant-scoped vindexes, VTGate authentication, MySQL-level table ACLs, and mutual TLS between control-plane components — into a defense-in-depth posture where a compromised application credential, a misrouted scatter query, or a partially applied schema change cannot cross a tenant boundary. It sits under Vitess Sharding Architecture & Topology Design, which defines the keyspace, shard, and tablet primitives this page hardens.

Prerequisites

Before hardening a multi-tenant keyspace, confirm the following are in place:

Vitess 15+ (stable table-ACL enforcement via --table-acl-config-mode, static and gRPC auth plugins for VTGate, and per-tablet TLS flags; 17+ recommended for the current gRPC transport security defaults).
A tenant-bearing schema. Every tenant-scoped table must carry an explicit tenant_id column that participates in the primary key, so it can back a deterministic vindex. Retrofitting this column onto an existing table is itself an Online DDL orchestration exercise — plan it as a coordinated migration, not an ad-hoc ALTER.
A chosen partitioning model. You should have already decided how tenants map to shards; the trade-offs between hash, range, and lookup distribution are covered in understanding Vitess keyspace partitioning models. This page assumes tenant_id is the primary vindex column.
Routing-layer familiarity. You need to understand how the stateless VTGate routing layer resolves a query to one shard or scatters it across all of them, because tenant isolation is enforced at exactly that resolution step.
vtctldclient and vtadmin access, plus a Python DB-API driver (PyMySQL or mysqlclient) pointed at VTGate for validating isolation over ordinary SQL.
A secrets store and a PKI (Vault, cert-manager, or an internal CA) able to issue short-lived certificates for VTGate, VTTablet, and the topology server.

How tenant isolation is enforced across the topology

Tenant isolation in Vitess is not a single feature — it is four independent boundaries stacked so that a breach at one layer is still contained by the next. Understanding where each boundary lives is what lets you reason about what a given attacker or bug can actually reach.

The routing boundary is the one sharding introduces. When a query carries a WHERE tenant_id = ? predicate and tenant_id is the primary vindex, VTGate computes the keyspace ID, maps it to exactly one shard, and sends the query only there. The other shards — holding every other tenant’s data — never see the statement. The danger is the absence of that predicate: a query with no vindex-resolvable filter scatters to all shards, and now one tenant’s session is reading across the entire fleet. Isolation therefore depends on every tenant-scoped query resolving to a single shard, which is a property of the VSchema routing contract, not of the SQL text alone.

The authentication boundary lives at VTGate’s front door. VTGate speaks the MySQL wire protocol and can authenticate connecting applications against a static credential file, a MySQL-backed user table, or an external plugin. Each authenticated identity carries a UserData string that Vitess propagates downward as the caller identity — this is the principal that the next boundary authorizes.

The authorization boundary lives at each VTTablet, enforced by table ACLs. A table ACL binds table groups to three role sets — readers, writers, and admins — keyed by the caller identity VTGate forwarded. This is a MySQL-adjacent grant system evaluated inside the tablet, so even a query that reaches a shard is rejected if its principal lacks the role for that table. Table ACLs do not filter by tenant_id — they gate which principals may touch which tables — so they complement, never replace, the routing boundary.

The transport boundary wraps every hop. VTGate-to-VTTablet gRPC and VTTablet-to-topology-server traffic all carry tenant data or control decisions in flight; mutual TLS ensures a rogue pod cannot impersonate a tablet or sniff cross-shard results. Without it, the strongest routing and ACL rules protect data at rest while leaking it on the wire.

The critical design consequence: Vitess does not natively inject a tenant_id predicate for you. There is no built-in row-level security that rewrites SELECT * FROM orders into SELECT * FROM orders WHERE tenant_id = <session tenant>. Isolation is enforced by making tenant_id the vindex and by refusing to expose credentials that can issue unscoped scatter queries — not by silently filtering rows. Treat any query that scatters as a query that sees every tenant.

Step-by-step: hardening a multi-tenant keyspace

Each step is independently verifiable — confirm its effect before moving on.

1. Make tenant_id the primary vindex. Bind every tenant-scoped table to a deterministic vindex on tenant_id so a scoped query resolves to one shard. This VSchema is the load-bearing isolation artifact:

{
  "sharded": true,
  "vindexes": {
    "tenant_hash": {
      "type": "hash"
    }
  },
  "tables": {
    "orders": {
      "column_vindexes": [
        { "column": "tenant_id", "name": "tenant_hash" }
      ]
    },
    "invoices": {
      "column_vindexes": [
        { "column": "tenant_id", "name": "tenant_hash" }
      ]
    }
  }
}

Verify: vtexplain or VEXPLAIN PLAN SELECT * FROM orders WHERE tenant_id = 42 shows a single-shard route (ShardName resolved), not a Scatter.

2. Front VTGate with authenticated, tenant-scoped credentials. Enable the static auth plugin so every connecting application presents a distinct identity that maps to a caller principal. Never share one superuser credential across tenants or services.

vtgate \
  --mysql_auth_server_impl static \
  --mysql_auth_server_static_file /etc/vitess/vtgate_users.json \
  --mysql_server_ssl_cert /etc/vitess/certs/vtgate-server.crt \
  --mysql_server_ssl_key  /etc/vitess/certs/vtgate-server.key

{
  "app_orders_writer": [
    { "UserData": "app_orders_writer", "Password": "<bcrypt-or-mysql-native-hash>" }
  ],
  "app_orders_reader": [
    { "UserData": "app_orders_reader", "Password": "<bcrypt-or-mysql-native-hash>" }
  ]
}

Verify: connecting with a wrong password is refused at VTGate; a correct connection reports its principal via SELECT database() succeeding while an unknown user is rejected before any shard is touched.

3. Enforce table ACLs at every tablet. Turn on ACL enforcement so the forwarded principal is authorized per table group. A read-only reporting service must not hold writers on any tenant table.

vttablet \
  --table-acl-config /etc/vitess/table_acl.json \
  --table-acl-config-mode enforce \
  --enforce-tableacl-config

{
  "table_groups": [
    {
      "name": "tenant_data",
      "table_names_or_prefixes": ["orders", "invoices"],
      "readers": ["app_orders_reader", "app_orders_writer"],
      "writers": ["app_orders_writer"],
      "admins": ["dba"]
    }
  ]
}

Verify: a DELETE issued by app_orders_reader is rejected at the tablet with an ACL error, while the same statement from app_orders_writer is admitted.

4. Establish mutual TLS between components. Require certificates on the VTGate-to-VTTablet gRPC channel so only trusted tablets participate in the topology.

# VTTablet: present a server cert and require client certs
vttablet \
  --grpc_cert /etc/vitess/certs/vttablet.crt \
  --grpc_key  /etc/vitess/certs/vttablet.key \
  --grpc_ca   /etc/vitess/certs/ca.crt

# VTGate: present a client cert when dialing tablets
vtgate \
  --tablet_grpc_ca   /etc/vitess/certs/ca.crt \
  --tablet_grpc_cert /etc/vitess/certs/vtgate-client.crt \
  --tablet_grpc_key  /etc/vitess/certs/vtgate-client.key

Verify: a tablet started without a CA-signed cert fails the gRPC handshake and never joins the serving graph; vtctldclient GetTablets omits it.

5. Deny direct MySQL access to the tablets. The underlying MySQL instances must accept connections only from their co-located VTTablet, forcing all tenant traffic through the authenticated, ACL-checked proxy path. Bind mysqld to the tablet’s local socket or a private address and grant application-facing MySQL users no remote host. This closes the bypass where an attacker with a MySQL credential skips VTGate — and its vindex routing — entirely.

Verify: a direct mysql -h <tablet-host> connection from an application subnet is refused; only the local VTTablet can connect.

6. Bind credential issuance to tenant onboarding. Automate credential rotation and ACL updates as part of the tenant provisioning workflow rather than as manual DBA steps. Python orchestration builders can drive this over vtctldclient and the VTGate SQL interface:

import subprocess, pymysql

def provision_tenant(tenant_id: int, keyspace: str = "commerce"):
    """Confirm a new tenant resolves to a single shard before enabling its credential."""
    conn = pymysql.connect(host="vtgate.internal", port=15306, db=keyspace)
    with conn.cursor() as cur:
        # A scoped probe must not scatter — if it does, isolation is not yet safe.
        cur.execute("VEXPLAIN PLAN SELECT id FROM orders WHERE tenant_id = %s", (tenant_id,))
        plan = "\n".join(str(r) for r in cur.fetchall())
    if "Scatter" in plan:
        raise RuntimeError(f"tenant {tenant_id} would scatter — refusing to issue credential")
    # Only now roll out the scoped credential + ACL entry via your secrets pipeline.
    subprocess.run(["vtctldclient", "GetVSchema", keyspace], check=True)

Verify: the probe raises for any tenant whose query would scatter, blocking credential issuance until routing is corrected.

Configuration reference

Flag / setting	Component	Type	Default	Recommended (production)
`--mysql_auth_server_impl`	VTGate	string	`none`	`static` or `mysqlbased` — never `none` in multi-tenant
`--mysql_auth_server_static_file`	VTGate	path	—	mount read-only; reload on rotation
`--mysql_server_ssl_cert` / `--mysql_server_ssl_key`	VTGate	path	—	required so client credentials are not sent in clear
`--table-acl-config`	VTTablet	path	—	one ACL file per keyspace, version-controlled
`--table-acl-config-mode`	VTTablet	string	`simple`	`enforce` (reject unauthorized principals, do not just log)
`--enforce-tableacl-config`	VTTablet	flag	off	on — fail startup if the ACL file is missing or invalid
`--grpc_cert` / `--grpc_key` / `--grpc_ca`	VTTablet	path	—	set all three for mutual TLS on the serving gRPC port
`--tablet_grpc_cert` / `--tablet_grpc_key` / `--tablet_grpc_ca`	VTGate	path	—	set so VTGate presents a client cert to tablets
`--grpc_auth_mode`	control plane	string	none	enable per-RPC auth on `vtctld`/`vtctldclient` links
`--no_scatter`	VTGate	flag	off	on for tenant-facing gateways — refuse un-vindexed scatter queries outright

The misconfigurations that leak tenants are predictable: leaving --mysql_auth_server_impl none turns VTGate into an open door; running table ACLs in simple/log-only mode records violations without blocking them; and issuing an application credential that maps to an admins role lets a compromised app read every tenant’s data. The single highest-impact control is --no_scatter on tenant-facing gateways — it converts an accidental unscoped query from a silent cross-tenant read into an immediate, visible error.

Failure modes specific to multi-tenant isolation

Unscoped scatter leak. Symptom: a tenant-facing endpoint returns rows belonging to other tenants; VTGate metric vtgate_queries_processed shows a spike in Scatter plan types. Root cause: a query reached VTGate without a vindex-resolvable tenant_id predicate — often an ORM generating SELECT ... WHERE email = ? with no tenant column, or a reporting query written against a non-vindex column. Mitigation: enable --no_scatter on the tenant gateway so the query errors instead of scattering; add the missing tenant_id predicate; where a secondary access pattern is legitimate, back it with a lookup vindex that still resolves to a single shard rather than allowing a fan-out.

Credential over-privilege. Symptom: an audit shows a read-only service successfully executing writes, or an app principal appearing in an admins group. Root cause: the table ACL granted a broader role than the workload needs, or a shared credential is reused across services. Mitigation: split credentials per workload and per role; keep the ACL file in version control with review gates; run tablets in --table-acl-config-mode enforce so the grant is actually checked, not logged.

Routing drift after a schema change. Symptom: immediately after a migration, some queries that used to route to one shard begin to scatter. Root cause: a column change altered the vindex-eligible key, or the VSchema was updated out of step with the DDL, so tenant_id is no longer resolved for the new table shape. Mitigation: couple every tenant-table DDL with its VSchema update as one coordinated change — the pattern in coordinating multi-shard schema migrations — and re-run the VEXPLAIN single-shard probe as a post-cutover gate.

Transport downgrade. Symptom: a new tablet joins the serving graph without presenting a client certificate; vtctldclient GetTablets lists a tablet whose cert CN is unexpected. Root cause: one component was deployed without its TLS flags, and the channel silently fell back to plaintext. Mitigation: make --grpc_ca mandatory on every tablet so an uncertified peer fails the handshake rather than degrading; alert on any serving tablet whose certificate does not chain to the internal CA.

Bypass via direct MySQL. Symptom: an application connects straight to a MySQL primary and reads across tenants, bypassing VTGate routing and ACLs entirely. Root cause: mysqld accepts remote connections and an app-facing MySQL user exists with a permissive host mask. Mitigation: bind mysqld to localhost/private only, grant application users no external host, and treat any non-VTTablet connection to a primary as an alert. Isolation only holds if VTGate is the only path to the data.

Every recovery path must fail closed: when isolation state is uncertain — a scatter observed, an uncertified tablet seen, an ACL file that failed to load — the safe action is to reject queries and page, not to continue serving and hope the predicate was present.

Verifying tenant isolation

Confirm all four boundaries hold, not just the one you last changed:

-- Routing boundary: a scoped query must resolve to exactly one shard.
VEXPLAIN PLAN SELECT * FROM orders WHERE tenant_id = 42;

-- Isolation probe: an unscoped query must ERROR (with --no_scatter), not return rows.
SELECT count(*) FROM orders;   -- expect: "plan includes scatter" rejection

A clean posture shows four independent green signals. The routing boundary: every tenant-scoped query in VEXPLAIN output routes to a single shard and the unscoped probe is refused. The authentication boundary: connecting with an unknown or wrong-password user is rejected at VTGate before any shard is contacted. The authorization boundary: a writers-only statement issued by a readers principal returns an ACL error from the tablet. The transport boundary: vtctldclient GetTablets lists only tablets whose certificates chain to the internal CA, and a tablet started without one never enters the serving graph. For a scripted gate, assert that the single-shard probe contains no Scatter, that the unscoped probe raises, and that a cross-role write is denied — run all three on every deploy that touches the VSchema, the ACL file, or the auth config. When degradation occurs on a shard, confirm that fallback routing for shard outages still preserves tenant scoping rather than widening a query to healthy shards a tenant should never reach.

Understanding Vitess Keyspace Partitioning Models — how tenants map to shards, the decision that sets the routing boundary.
Designing Horizontal Shard Topologies — replication, failover domains, and resharding while isolation controls stay intact.
VTGate Routing Architecture Deep Dive — how the proxy resolves single-shard versus scatter, the engine behind tenant scoping.
Configuring Lookup Vindexes for Cross-Shard Joins — routing secondary access patterns to one shard instead of a fan-out.
Implementing Fallback Routing for Shard Outages — preserving tenant context when a shard degrades.

← Back to Vitess Sharding Architecture & Topology Design · Related area: VSchema Configuration & Routing Rule Management

Securing Multi-Tenant Sharded Databases

Prerequisites #

How tenant isolation is enforced across the topology #

Step-by-step: hardening a multi-tenant keyspace #

Configuration reference #

Failure modes specific to multi-tenant isolation #

Verifying tenant isolation #

Related #

Related in Sharding Architecture & Topology