I've watched this happen enough times that I can predict it.
You launch. The product works. Users sign up. Then, somewhere between 5,000 and 15,000 active users—usually around month 14 or 18 of operating—the system starts behaving differently. Queries that used to take 50ms now take 2 seconds. Your dashboard shows a sawtooth pattern of CPU spikes. On-call becomes weekly. Your engineers stop shipping features because they're too busy firefighting.
You haven't hit any particular scaling limit. You've hit several of them at once, which is worse, because each one looks like a different problem.
Failure Mode 1: The Database
Most SaaS applications at this stage are running a single PostgreSQL instance. During development and early growth, a query plan that does a sequential table scan on a table with 200,000 rows is fast enough that nobody notices. At 2M rows with 30 concurrent queries, it's not.
The first thing I do in any scaling engagement: run `pg_stat_statements` and sort by total execution time. Without exception, the top 10 queries tell the story. There will be N+1 queries—queries that fire once per item in a list, so a page showing 50 users triggers 51 database calls instead of one. There will be missing indexes on columns used in WHERE clauses and JOINs. There will be queries that load entire tables when they need one column.
```sql -- Classic N+1: fetching user count per organisation separately SELECT * FROM organisations WHERE active = true; -- Then for each organisation: SELECT COUNT(*) FROM users WHERE organisation_id = $1;
-- Fix: aggregate in one query SELECT o.*, COUNT(u.id) AS user_count FROM organisations o LEFT JOIN users u ON u.organisation_id = o.id WHERE o.active = true GROUP BY o.id; ```
Add a read replica. Route all read-path queries—reports, list views, search, analytics—to the replica. Your primary handles only writes. This cuts primary load by 60-80% on most SaaS workloads.
Add PgBouncer in transaction pooling mode in front of both instances. At 500 concurrent users all hitting your API, you will exhaust PostgreSQL's connection limit without a connection pool.
Failure Mode 2: Synchronous Processing in the Request Path
When your SaaS was small, it was acceptable to send the welcome email synchronously during user registration. The email library returned in 200ms and nobody noticed the latency.
At scale, your email provider has occasional spikes. Your request handler blocks waiting for the email to send. Users experience slow registration. Your load balancer's health check timeout looks borderline. One bad minute at your email provider cascades into degraded service for everyone registering at that moment.
Move everything out of the request path that doesn't need to be there. Email sending. Webhook delivery. PDF generation. Analytics events. Third-party API calls that don't affect the response.
```typescript // Before: synchronous, fragile async function registerUser(email: string, password: string) { const user = await db.users.create({ email, password }); await sendWelcomeEmail(user); // blocks on external API await analytics.track('user.registered', { userId: user.id }); // blocks return user; }
// After: resilient async function registerUser(email: string, password: string) { const user = await db.users.create({ email, password }); await queue.publish('user.registered', { userId: user.id }); return user; // returns immediately }
// Queue consumer processes async, retries on failure queue.subscribe('user.registered', async ({ userId }) => { const user = await db.users.findById(userId); await sendWelcomeEmail(user); await analytics.track('user.registered', { userId }); }); ```
BullMQ with Redis is the stack I use for most TypeScript SaaS applications. It handles retries, delayed jobs, priority queues, and job concurrency. The operational overhead is low.
Failure Mode 3: Multi-Tenancy Isolation
Schema-per-tenant, row-level security (RLS), or separate databases—which is right depends on your user count and growth trajectory.
Shared schema with row-level security (tenant_id on every table, Postgres RLS policies) works well up to mid-scale. It's operationally simple: one schema, one migration to apply, one backup to manage. The risk is a missing WHERE clause in a query exposing one tenant's data to another. RLS policies enforced at the database level mitigate this but require discipline to configure correctly.
Schema-per-tenant adds isolation at the PostgreSQL schema level. Migrations become more complex—you're running the migration against every tenant schema, which takes time at scale. But the blast radius of a data leak is smaller.
Separate databases per tenant is the right answer for enterprise clients with specific data residency requirements or very high query volumes, but the operational overhead is substantial. Start here only if regulatory requirements demand it.
I consistently see teams choose separate databases for early-stage SaaS because it sounds more professional. Then they have 50 tenants and 50 databases to upgrade, backup, and monitor. Schema-per-tenant is the right default for most SaaS architectures under 10,000 tenants.
Failure Mode 4: No Rate Limiting
At 10K users, someone will write a script that hits your API in a loop. Intentionally or not—data exports, sync jobs, integration bugs—this will happen. Without rate limiting, that script degrades service for everyone.
Rate limiting at the API gateway or load balancer level: 100 requests per minute per authenticated user, 10 per minute for unauthenticated endpoints. Use Redis for the counter with a sliding window. Return 429 with a Retry-After header.
This is a 90-minute implementation that prevents a category of incidents that will otherwise cost you hours of on-call.
The Fix Sequence
When I inherit a SaaS that's struggling at scale, I apply fixes in this order: N+1 queries first (instant wins, no architectural change required), then read replica, then connection pooling, then queue-based processing, then rate limiting. By the time I've done those five things, the system is usually handling the load comfortably.
The architectural changes—multi-tenancy strategy, horizontal scaling, event-driven redesign—come later, when the immediate pressure is relieved and there's space to think.
The goal is not to build for 10M users when you have 10K. The goal is to not let 10K break you while you figure out how to get to 100K.