How to scale a SaaS backend without rewriting it
Scaling problems are diagnosis problems. Here's how we think about caching, queues, sharding, and Postgres tuning — measured against real workload, not hype.
Scaling problems are diagnosis problems
Most teams don't have scaling problems. They have measurement problems wearing a scaling costume. Before you change a line of code, you should know — exactly — which queries, endpoints, and workloads are eating your budget.
The first 30% of any scaling engagement is profiling. The remaining 70% is the smallest change that buys you 18 months.
The order we attack bottlenecks
1. Database first
For 8 out of 10 SaaS backends, the bottleneck is Postgres. Specifically: missing indexes on hot queries, N+1 patterns from the ORM, and connection pool exhaustion. Fix these before touching anything else.
Tools that earn their keep: pg_stat_statements, pgBadger, and an honest look at your slow query log.
2. Caching second
A Redis layer in front of expensive read paths often buys you 10x headroom for one week of work. Cache aggressively where data is read 100x more than it's written — dashboards, lists, settings.
What we don't cache: anything where stale data could cause a billing error or compliance issue. The savings aren't worth the support load.
3. Queues for anything that can wait 200ms
Email, webhooks, image processing, report generation, anything that integrates with a slow third party — push it to a queue. The HTTP request returns in 50ms; the work happens in the background. We typically reach for Inngest, BullMQ, or SQS depending on the team.
4. Sharding only when you've earned it
Sharding is the most expensive change you can make to a backend. Don't do it until you've exhausted vertical scaling, read replicas, and a serious caching layer. When the time comes, shard by tenant — never by something arbitrary like row id.
5. Edge runtimes for the right shape of work
If your bottleneck is geographic latency to read-heavy endpoints, Cloudflare Workers or Vercel Edge can cut p95 by 60% — but only for stateless reads. Don't put writes at the edge unless you understand the consistency model you're signing up for.
Cost is a scaling dimension
Most "scaling" engagements pay for themselves through cloud cost cuts before the perf wins land. Idle compute, oversized RDS instances, runaway log volume, and unused observability seats add up fast. We talk through this as part of our Cloud & DevOps service.
What we don't do
- Rewrite in Go because someone read a blog post
- Move to microservices to "fix" what is actually a query problem
- Add Kafka before there is a multi-team coordination problem
- Skip the rollback plan
How we'd help
Our Scale your backend engagement is exactly this playbook — profiling first, smallest fix second, observability and SLOs at the end so the team owns the new shape. Tell us what's hurting and we'll come back with a diagnosis and a plan inside a week.
