Performance

Your Database Is the Bottleneck (Here's How to Fix It)

Binadit Engineering · Mar 28, 2026 · 10 min read
Database server hardware

Your application is not slow. Your database is. This is true in roughly 90% of the performance incidents we investigate. The application code executes in microseconds. The network round trip takes a few milliseconds. But the database query? That takes 800 milliseconds because it is scanning 4 million rows to return 12 results, and it does this 200 times per page load because the ORM is generating N+1 queries that nobody noticed during development.

The insidious thing about database performance problems is that they are invisible at small scale. With 1,000 rows and 10 concurrent users, everything is fast. With 2 million rows and 500 concurrent users, the same queries that worked fine in staging bring production to its knees. Every slow query multiplies under load — and once the connection pool is exhausted, the entire application stops responding, not just the slow pages.

This article covers why databases become bottlenecks, the mistakes that make things worse, and the specific optimizations that actually fix the problem.

Why Databases Become Bottlenecks

N+1 queries from ORMs. This is the single most common database performance problem in modern applications. An ORM like Eloquent, Django ORM, or ActiveRecord makes it easy to write code like foreach ($orders as $order) { echo $order->customer->name; }. This looks like one query, but it executes one SELECT for the orders list, then one SELECT per order to fetch the customer — turning a single page load into 201 queries for 200 orders. Under load, this is catastrophic.

Missing composite indexes. Most developers know to index primary keys and foreign keys. But queries often filter and sort by multiple columns: WHERE status = 'active' AND created_at > '2026-01-01' ORDER BY priority DESC. Without a composite index on (status, created_at, priority), MySQL or PostgreSQL will use a single-column index and then sort the results in a temporary table — or worse, do a full table scan. On a table with 5 million rows, this is the difference between 2ms and 3,000ms.

Default configuration not tuned for workload. MySQL ships with innodb_buffer_pool_size = 128M. If your database is 20GB, only 0.6% of your data fits in memory. Every read hits disk. PostgreSQL defaults to shared_buffers = 128MB and work_mem = 4MB. These defaults are designed to run on a Raspberry Pi, not a production server with 64GB of RAM. Yet we regularly find production databases running with default settings.

No query analysis. If the slow query log is not enabled, you are blind. You do not know which queries are slow, how often they run, or how much total time they consume. Without this data, optimization is guesswork.

Table scans on large tables. A SELECT COUNT(*) FROM orders WHERE YEAR(created_at) = 2025 cannot use an index on created_at because the function wraps the column. MySQL must scan every row, evaluate the function, and count matches. Rewriting as WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01' uses the index and completes in milliseconds instead of seconds.

No read replicas. A single database server handles both writes (INSERT, UPDATE, DELETE) and reads (SELECT). Under heavy read load, the read queries compete with write queries for CPU, memory, and I/O. For read-heavy workloads — which describes most web applications — routing read queries to one or more replicas is a straightforward way to double or triple capacity. This is a critical step when you need to scale your web application beyond a single-server architecture.

Connection exhaustion. Every database connection consumes memory — roughly 5-10MB per connection in MySQL with default settings. An application server running 50 PHP-FPM workers, each holding a database connection, uses 50 connections. Add three application servers and you are at 150 connections. Add a spike in traffic and workers start queuing — but they hold their connections while waiting, so the pool fills up. New requests get SQLSTATE[HY000] [1040] Too many connections and the application crashes.

Common Mistakes

Adding more RAM instead of fixing queries. A 3-second query on a 16GB server will still be a 2.8-second query on a 64GB server. More RAM helps with buffer pool hit rates, but it does not fix a query that scans 4 million rows because of a missing index. We have seen teams spend $2,000/month on a larger database instance when a $0 index would have solved the problem.

No EXPLAIN analysis. The EXPLAIN statement (MySQL) or EXPLAIN ANALYZE (PostgreSQL) shows exactly how the database executes a query: which indexes it uses, how many rows it examines, whether it creates temporary tables or filesorts. Running EXPLAIN on your slow queries is the single most valuable diagnostic step, yet most developers have never used it.

-- MySQL: identify the problem
EXPLAIN SELECT * FROM orders 
WHERE customer_id = 4821 
AND status = 'shipped' 
ORDER BY created_at DESC 
LIMIT 20;

-- Look for: type=ALL (full table scan), rows (high number),
-- Extra containing "Using filesort" or "Using temporary"

Indexing everything. The opposite extreme is also harmful. Every index speeds up reads but slows down writes, because the database must update every relevant index on INSERT, UPDATE, and DELETE. A table with 15 indexes will have noticeably slower write performance. Index strategically based on your actual query patterns — not defensively on every column.

Not using connection pooling. Without a connection pooler, each application process opens its own connection to the database, uses it for a few milliseconds, then holds it idle until the next query. Connection poolers like ProxySQL (MySQL) or PgBouncer (PostgreSQL) maintain a pool of connections and multiplex application requests across them. A system that needed 300 direct connections can often run on 30 pooled connections.

Running analytics on the production database. A reporting query that aggregates 6 months of order data will lock rows, consume CPU, and compete with real-time traffic. Run analytics on a read replica, or better yet, replicate to a dedicated analytics database. Production should serve production traffic only.

What Actually Works

Enable and analyze the slow query log. For MySQL:

# my.cnf
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 0.5
log_queries_not_using_indexes = 1

For PostgreSQL, use pg_stat_statements:

# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all

-- Then query the stats:
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

Use pt-query-digest (Percona Toolkit) to aggregate the MySQL slow query log into a ranked list of the most expensive queries by total execution time. Fix the top 5 and you will often resolve 80% of the load.

Build a proper indexing strategy. Analyze your slow queries, look at the WHERE, JOIN, and ORDER BY clauses, and create composite indexes that cover the most common patterns. The column order matters: put equality conditions first, range conditions last.

-- Bad: two single-column indexes
CREATE INDEX idx_status ON orders(status);
CREATE INDEX idx_created ON orders(created_at);

-- Good: one composite index covering the common query pattern
CREATE INDEX idx_status_created ON orders(status, created_at);

-- For: WHERE status = 'active' ORDER BY created_at DESC
-- This composite index covers both the filter and the sort

Tune buffer pool and memory settings. For MySQL with InnoDB:

# my.cnf - for a server with 64GB RAM dedicated to MySQL
innodb_buffer_pool_size = 48G        # 75% of RAM
innodb_buffer_pool_instances = 16     # reduce contention
innodb_log_file_size = 2G             # larger for write-heavy
innodb_flush_log_at_trx_commit = 2    # trade durability for speed
innodb_io_capacity = 2000             # for SSD storage
innodb_read_io_threads = 16
innodb_write_io_threads = 16
max_connections = 500
thread_cache_size = 64

For PostgreSQL:

# postgresql.conf - for 64GB RAM
shared_buffers = 16GB                 # 25% of RAM
effective_cache_size = 48GB           # 75% of RAM
work_mem = 256MB                      # per-sort/hash operation
maintenance_work_mem = 2GB            # for VACUUM, CREATE INDEX
random_page_cost = 1.1                # for SSD storage
effective_io_concurrency = 200        # for SSD
max_connections = 200
wal_buffers = 64MB

Implement connection pooling. For MySQL, ProxySQL sits between the application and the database:

# ProxySQL configuration
mysql_servers:
  - address: 10.0.1.10    # primary
    hostgroup: 0
    max_connections: 100
  - address: 10.0.1.11    # replica
    hostgroup: 1
    max_connections: 100

mysql_query_rules:
  - match_pattern: "^SELECT"
    destination_hostgroup: 1   # route SELECTs to replica
  - match_pattern: ".*"
    destination_hostgroup: 0   # everything else to primary

For PostgreSQL, PgBouncer in transaction-mode pooling typically reduces the required connection count by 80-90%.

Read replicas for read-heavy workloads. Most web applications are 90% reads. Setting up one or two read replicas and routing SELECT queries to them effectively doubles or triples your read capacity without changing application code (if you use ProxySQL or a framework-level read/write split).

Query result caching with Redis. For queries that are expensive but do not change frequently — top products, category counts, dashboard aggregations — cache the results in Redis with an appropriate TTL. This complements database optimization; it does not replace it. A slow query cached is still a slow query when the cache misses.

Real-World Scenario: The Missing Composite Index

A SaaS platform with approximately 3 million rows in their main events table was experiencing intermittent timeouts during business hours. Application monitoring showed p99 latency spiking to 3,200ms between 9 AM and 5 PM, with the database server at 98% CPU.

The slow query log revealed the culprit:

SELECT * FROM events 
WHERE account_id = ? 
AND event_type = 'pageview' 
AND created_at >= '2026-03-01' 
ORDER BY created_at DESC 
LIMIT 50;

# Time: 3.1s  Rows_examined: 2,847,391  Rows_sent: 50

The table had an index on account_id and a separate index on created_at, but no composite index. MySQL chose the account_id index, which narrowed results to about 50,000 rows for this account, then scanned all of them to filter by event_type and sort by created_at. During peak hours, this query ran approximately 400 times per minute across all accounts.

The fix was straightforward:

CREATE INDEX idx_account_type_created 
ON events(account_id, event_type, created_at);

This single composite index allowed MySQL to seek directly to the matching account_id + event_type combination and walk the created_at index in descending order, returning 50 rows without scanning or sorting.

In addition, we implemented PgBouncer (they later migrated to PostgreSQL) for connection pooling, reducing active connections from 280 to 35 while handling the same traffic.

Results:

  • p99 latency: 3,200ms → 45ms
  • Database CPU during peak: 98% → 22%
  • Rows examined per query: 2,847,391 → 50
  • Active database connections: 280 → 35
  • Application timeouts: ~120/day → 0

This is a pattern we see frequently on platforms that started small and grew into millions of rows. The queries that worked at 10,000 rows become catastrophic at 3 million. If your application has similar symptoms — performance degradation correlated with table growth and concurrent users — it is almost certainly an indexing or query problem. The same principle applies when diagnosing slow WooCommerce stores that have accumulated thousands of products and orders.

The Optimization Sequence

Database optimization should follow a specific order. Each step has diminishing returns, so start with the highest-impact changes:

  1. Enable slow query logging. You cannot fix what you cannot see. Enable the log, wait 24 hours, then analyze.
  2. Fix the top 5 slow queries. Use EXPLAIN to understand why they are slow. Add composite indexes, rewrite function calls on columns, eliminate N+1 patterns. This alone typically resolves 70-80% of load.
  3. Tune memory settings. Set innodb_buffer_pool_size or shared_buffers appropriately for your available RAM. Verify the buffer pool hit rate is above 99% (SHOW ENGINE INNODB STATUS in MySQL).
  4. Implement connection pooling. Install ProxySQL or PgBouncer. Configure connection limits that match your actual concurrency needs, not your worst-case fear.
  5. Add read replicas if needed. If after steps 1-4 the primary is still CPU-bound on reads, route SELECT queries to a replica. Most cloud providers make this a one-click operation.
  6. Partition large tables. Tables above 50 million rows benefit from range partitioning — typically by date. This keeps each partition at a manageable size for indexes and maintenance operations like OPTIMIZE TABLE.

Most applications only need steps 1 through 3. Connection pooling and read replicas are for applications handling hundreds of concurrent connections or thousands of queries per second.

If your queries take more than 100ms, something is wrong. Let us look at your database and identify what is holding your application back.