The architecture decisions that work at 100K transactions per day become the bottlenecks at 1M. Here is what needs to change and in what order.

A payment gateway that processes 100,000 transactions per day has different bottlenecks than one that processes 1 million. The architecture that got you to 100K will not get you to 1M without changes. The architecture that gets you to 1M will not get you to 10M without further changes. Understanding which bottlenecks matter at which scale saves you from solving the wrong problems at the wrong time.

The 100K to 1M transition

At 100K transactions per day (~1.2 TPS average, ~5–10 TPS peak), most architectures work. A well-designed monolith or a small number of services, a single primary database with one or two read replicas, and a single region can handle this load comfortably.

The transition to 1M per day (~12 TPS average, ~50–100 TPS peak) is where specific components start to break.

Database writes are the most common first bottleneck. Every authorization creates at least one database write - the transaction record - and often several more (audit log, balance update, idempotency key). At 100 TPS, this is 100 writes per second across potentially many tables. Primary database write throughput becomes a constraint.

Mitigation before you need read replicas and sharding: identify which writes are in the critical path (must complete before the response is sent) and which are not (audit logs, analytics events). Move non-critical writes to an asynchronous queue. This dramatically reduces write pressure on the critical path.

Connection pool exhaustion is the second common bottleneck. At higher concurrency, application instances exhaust their database connection pools. The symptoms look like latency spikes with the database CPU at 30% - the database is not overloaded, but application threads are waiting for connections.

PgBouncer or an equivalent connection pooler in transaction mode eliminates this bottleneck. Application instances connect to the pooler; the pooler maintains a smaller set of actual database connections. This is one of the highest-leverage optimizations available.

The 1M to 10M transition

At 10M transactions per day (~120 TPS average, ~500–1000 TPS peak), the single-primary database model reaches its limits for write-heavy workloads. The solutions depend on your data model.

Read scaling with replicas: most payment workloads are read-heavy. Authorization status checks, balance queries, and reporting read far more than they write. Routing read queries to replicas reduces primary load significantly. The engineering requirement is ensuring read replicas are used for queries that can tolerate replication lag, and the primary is used for queries that cannot.

Write scaling with sharding: for platforms where write volume is the constraint, sharding distributes the write load across multiple database primaries. Shard by customer ID or merchant ID - the identifier that determines which database partition handles a given transaction. The complexity of sharding is in cross-shard operations (a payment from customer A to customer B in different shards) and in ensuring the shard key distribution is even enough to avoid hot shards.

Asynchronous settlement: at high volume, the latency budget for synchronous operations becomes very tight. Separate the authorization decision (low latency, synchronous) from the settlement operation (higher latency, can be asynchronous). Authorization succeeds or fails in milliseconds. Settlement processes in the background. This decoupling allows each to be optimised independently.

PSP integration at scale

At 1M+ transactions per day, PSP API limits become a real constraint. PSPs rate-limit by API key, by merchant, and sometimes by card scheme. Understanding your PSP's limits and how close you are to them is operational work that needs to happen before you hit them.

Mitigation strategies:

Routing diversification: spread transaction volume across multiple PSPs. This reduces exposure to individual PSP rate limits and to individual PSP degradations.
Request batching: where the PSP supports it, batch authorization checks or capture requests. Not all PSPs support this for authorization.
Async webhooks: if your authorization flow triggers downstream webhooks, process them asynchronously with a queue rather than inline with the authorization response.

Load testing as an engineering practice

At scale, the only way to find bottlenecks before users find them is load testing. Load testing must be:

Realistic: use transaction patterns that match production. A load test with a single merchant and a single card scheme will not find the bottlenecks that appear with multi-merchant, multi-scheme traffic.
Sustained: peak-load tests matter, but sustained load at 80% of peak for 30 minutes reveals bottlenecks that one-minute spikes do not.
Instrumented: run load tests with full observability active. The database query that's slow at 100 TPS but acceptable at 10 TPS will show up clearly in traces.

If your gateway is approaching a scale threshold and you're not sure which bottlenecks will appear first, a scalability assessment will identify them before your users do.

For MENA-specific scaling considerations - including Ramadan peak traffic patterns, multi-currency wallet complexity, and regional PSP rate limits - see our pillar guide: Scaling Payment Gateways in MENA.

Our Scalability & Performance Engineering service covers the full stack from database write throughput through to PSP routing and multi-region failover.

Scaling Payment Gateways Beyond 1 Million Transactions Per Day

The 100K to 1M transition

The 1M to 10M transition

PSP integration at scale

Load testing as an engineering practice

Working on a payment platform challenge?

Related articles

The Most Common Architecture Mistakes in Payment Gateways

Observability for Transaction-Critical Systems

How to Reduce Payment Infrastructure Costs by 30%