Multi-tenancy is not a single architectural decision
When developers talk about multi-tenancy, they usually mean one of three things: shared database, separate schemas, or separate databases. Articles on the subject treat the choice as the central decision. In practice, multi-tenancy is a collection of architectural decisions that compound over time, and the database strategy is only one of them.
We've been building multi-tenant SaaS platforms since 2012. The earliest ones are still in production. Here's what a decade of operating them has taught us about building for scale.
Define "tenant" before anything else
The most common mistake in multi-tenant architecture is assuming everyone shares a definition of "tenant." In a B2B SaaS platform, is a tenant a company? A department within a company? A user with a subscription? The answer shapes every subsequent decision.
We've seen platforms where a tenant is a company, but company employees belong to teams that have their own permission boundaries and data isolation requirements. Is the team a sub-tenant? If so, does the multi-tenancy implementation need to be recursive?
Answer the question explicitly in writing before architecture begins. The answer changes the schema, the authentication system, the billing model, and the access control implementation. Starting without a clear answer means refactoring all four later.
Database strategy: the real tradeoffs
The standard comparison of shared database versus separate schemas versus separate databases focuses on isolation and cost. The less-discussed dimension is operational complexity.
Separate databases per tenant offers the strongest isolation and the easiest tenant-level backup and restore. It also means running migrations across hundreds or thousands of databases, managing connection pools that grow with tenant count, and building tooling to operate a fleet of databases rather than a single one. At ten tenants, separate databases is manageable. At ten thousand, it's a significant engineering investment.
Shared database with row-level security is the opposite extreme: operationally simple, cheapest at scale, but requiring extreme care to prevent data leakage. A single missing WHERE tenant_id = ? clause is a serious incident.
Schema-per-tenant sits in the middle: better isolation than row-level security, lower operational overhead than separate databases. It's our default choice for platforms expecting hundreds to low thousands of tenants, with migration tooling that understands schema namespacing.
Tenant context propagation is the hardest part
Whatever database strategy you choose, you need to propagate tenant context through every layer of the application: HTTP requests, queued jobs, scheduled tasks, event handlers, and third-party webhooks. Missing any of these is a data isolation bug.
The insidious part is that missing context propagation in queued jobs is common and invisible in testing. Tests run synchronously. Jobs that lose tenant context only fail in production, processing data for the wrong tenant or no tenant at all.
We enforce tenant context as a required parameter for every job dispatch. If you can't provide tenant context at dispatch time, the job isn't being called from the right place. This constraint has caught architectural mistakes that would have been production incidents.
Performance at tenant scale
Single-tenant applications have straightforward performance characteristics: query performance degrades as data volume grows, and you scale the database. Multi-tenant applications have a more complex problem: a single large tenant can degrade performance for all tenants.
This "noisy neighbour" problem requires architectural responses at multiple levels. At the database level: indexes must be designed with tenant_id as the leading column for any query that filters by tenant. Queries that look fast in development on a single-tenant database can be catastrophically slow in production on a shared database with one tenant at 10x the data volume of others.
At the application level, rate limiting per tenant prevents any single tenant from exhausting shared resources. At the infrastructure level, large tenants may eventually need their own database or compute tier — your architecture should make this migration possible without application changes.
Tenant onboarding and offboarding
Tenant onboarding should be fully automated. Manual steps in the onboarding process don't scale and introduce inconsistencies. Every resource a new tenant needs — database schema, default configuration, seed data, storage buckets — should be provisioned by code that runs reliably on the hundredth tenant as it did on the first.
Tenant offboarding deserves equal attention and rarely gets it. What happens when a tenant cancels? Data retention requirements vary by jurisdiction. GDPR mandates that personal data be deleted on request. Some regulated industries require data to be retained for years after contract termination. Your offboarding process must handle both cases correctly without manual intervention.
The upgrade problem
Running database migrations across a large tenant fleet safely is a problem that most teams underestimate until they're doing it. A migration that adds a column to a single-tenant database takes seconds. The same migration across a thousand tenant schemas, deployed during business hours with zero downtime, is a different engineering challenge.
We've moved to a pattern of separating schema changes from code changes with a compatibility window between them. New code must work with both old and new schema. Old code must work with new schema. This allows rolling deployments where the migration runs independently of the application deployment, dramatically reducing the risk of downtime.