Debugging Slow Queries
Problem / Context
You’re on-call and a critical endpoint slowed down after a feature
launch. Product dashboards time out and customers see spinners. You need
a repeatable way to find the slowest queries, understand why the planner
chose a bad path, and apply the smallest fix (index, rewrite, fresh
stats) to restore performance.
Slow queries hurt performance. PostgreSQL has tools to find and fix
them.
Core Concept
- Always start with a plan (EXPLAIN ANALYZE, BUFFERS) to see reality
vs estimates.
- Cheap fixes first: fresh stats, missing index, narrower SELECT,
correct data types.
- Verify change with before/after metrics and drop redundant indexes
to keep writes fast.
Slow query triage (make it
fast)
Goal: Find why a query is slow, fix cheap things first, verify
improvement.
Implementation (step by step
SQL)
1. Get the plan
EXPLAIN SELECT * FROM orders WHERE user_id = 123;
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 123;
Look for:
- Seq Scan on huge table with selective predicate → missing index
- Nested Loop with large inner row estimate → bad stats / wrong join
order
- Hash Join + high rows removed afterwards → filter pushdown
opportunity
- Lots of time in sorting / hashing → maybe reduce columns or add
index matching order
2. Cheap fix checklist
| Symptom |
Likely Fix |
| Seq Scan on selective column |
CREATE INDEX (col) |
| Using function on column in WHERE |
Expression index or precomputed column |
| Repeated large sort |
Index on ORDER BY columns |
| Low correlation after update |
VACUUM (ANALYZE) or just ANALYZE |
| Underestimated rows |
Increase statistics target (ALTER TABLE … ALTER COLUMN … SET
STATISTICS n) |
3. pg_stat_statements (top
offenders)
Prerequisite: the extension must be preloaded at server start.
- Set shared_preload_libraries and restart the server, then run CREATE
EXTENSION in the database.
Example (docker-compose):
services:
db:
image: postgres:17.4
ports:
- ${DB_HOST}:${DB_PORT}:5432
command: >-
-c shared_preload_libraries=pg_stat_statements
-c pg_stat_statements.track=all
-c pg_stat_statements.max=10000
environment:
POSTGRES_DB: ${DB_NAME}
POSTGRES_USER: ${DB_USER}
POSTGRES_PASSWORD: ${DB_PASS}
volumes:
- postgres_data:/var/lib/postgresql/data
- ./internal/:/tmp/postgres/
volumes:
postgres_data:
Then enable in your database:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
PostgreSQL 15/16+ (modern columns):
SELECT
toplevel,
queryid,
calls,
total_plan_time AS plan_ms,
total_exec_time AS exec_ms,
ROUND((total_exec_time / NULLIF(calls,0))::numeric, 2) AS mean_exec_ms,
rows,
query
FROM pg_stat_statements
WHERE toplevel IS TRUE
ORDER BY total_exec_time DESC
LIMIT 10;
Note: total_plan_time, total_exec_time (PG 15/16+) and total_time (PG
13–14) are already in milliseconds. No need to multiply by 1000. The
mean above is ms per call.
PostgreSQL 13–14 (legacy column names):
SELECT
calls,
total_time AS total_ms,
mean_time AS mean_ms,
rows,
query
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;
Tips
- Reset stats when starting a local investigation:
SELECT pg_stat_statements_reset();
- Filter by database if needed:
WHERE dbid = (SELECT oid FROM pg_database WHERE datname = current_database()).
- Focus on high total cost first (total_exec_time/total_time), then
high mean per call.
4. Lock wait vs compute
If runtime is long but buffers and rows are low, it may be waiting on
a lock. See Locks for diagnosing blockers and
deadlocks.
SELECT pid, now()-query_start AS run, wait_event_type, wait_event, query
FROM pg_stat_activity WHERE state <> 'idle' ORDER BY run DESC;
5. Refresh statistics
Outdated stats cause bad plans.
ANALYZE orders; -- or entire DB
For large changes, autovacuum might lag.
6. Avoid SELECT *
Pulling unused large columns increases I/O:
SELECT id, user_id, status FROM orders WHERE user_id=123;
This avoids visiting TOAST for large JSON/text.
7. Batch backfills and updates
A long UPDATE that touches many rows can block. Process in
chunks:
UPDATE orders SET flag=true WHERE ... LIMIT 1000; -- repeat in app/job
(Or use ORDER BY with a primary key cursor.)
8. Log slow queries
Set log_min_duration_statement = 500 (ms). Keep it
practical; too low is noisy. Rotate and parse logs.
9. EXPLAIN ANALYZE caution
It runs the query. For destructive queries, wrap in a transaction and
ROLLBACK, or test on staging.
10. Index creation safety
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);
This avoids a long ACCESS EXCLUSIVE lock. It needs a separate
transaction and cannot be inside another transaction block.
11. Common pitfalls
- Function on column prevents index use
(
WHERE lower(email)=...) → add functional index.
- Implicit cast mismatch (text vs uuid) → explicit cast or correct
type.
- Leading wildcard (
LIKE '%abc') → cannot use btree;
consider trigram or GIN.
- Over-indexing: too many similar indexes slow writes.
12. Verification loop
- Capture baseline (mean_time, buffers, runtime)
- Apply change (index / rewrite)
- Re-run EXPLAIN (ANALYZE, BUFFERS)
- Confirm timing + buffer reductions
- Drop any now-redundant index
13. Summary checklist
- Plan inspected?
- Stats fresh?
- Missing / wrong index fixed?
- Data fetched trimmed?
- Lock waits ruled out?
- Improvement measured?
Fix the highest total cost query, then repeat. Small consistent wins
beat big refactors.
Variations and Trade‑Offs
- Rewrite vs index: sometimes reformulating the query (join order,
predicate pushdown) beats adding yet another index.
- Sampling: on giant tables, test on a realistic subset to iterate
quickly, then verify on staging with size parity.
- Planner hints aren’t native; use enable_* GUC toggles only for
diagnosis, not in production.
Pitfalls
- EXPLAIN without ANALYZE shows estimates only; you may miss the real
cost.
- Over-indexing speeds reads but hurts writes; balance and prune
duplicates.
- Function-wrapped predicates block index use unless you add
expression indexes.
Recap (Short Summary)
Measure with EXPLAIN (ANALYZE, BUFFERS), fix the smallest cause
first, and verify with before/after timing and buffers. Keep indexes
purposeful and trim SELECT lists.
Optional Exercises
- Capture top 10 queries by total_time via pg_stat_statements and fix
one with the cheapest index.
- Rewrite a query to push filters before joins and compare plans.
- Add a functional index for a lower(email) lookup and confirm index
usage in the plan.
Summary (Cheat Sheet)
- Plan: Use EXPLAIN (ANALYZE, BUFFERS). Signal: seq
scan on selective filter. Fix: add/adjust index. Verify: buffers/time
drop.
- Statistics: ANALYZE table. Signal: row estimate off
vs actual. Fix: raise STATISTICS target. Verify: estimates align.
- Lock Wait: Inspect pg_stat_activity wait_event.
Signal: wait_event_type = ‘Lock’. Fix: resolve blocker or reschedule.
Verify: runtime shrinks.
- Column Trim: Select only needed columns. Signal:
high buffer hits, TOAST reads. Fix: avoid SELECT *. Verify: lower
I/O.
- Functional Predicate: WHERE function(col) = value.
Signal: no index usage. Fix: expression index. Verify: index scan
appears.
- Sorting: Large Sort node with high time/mem. Fix:
index matching ORDER BY. Verify: Sort removed.
- Join Order: Nested Loop with huge inner scans.
Signal: bad estimates. Fix: analyze + rewrite join. Verify: plan uses
better join.
- Regressions: New index overhead. Signal: slower
writes. Fix: drop redundant index. Verify: write latency recovers.
- Logging: Set log_min_duration_statement. Signal:
too noisy or missing. Fix: tune threshold. Verify: actionable
entries.
- Verify: Re-run baseline metrics after change. If
unclear, capture after numbers and document delta.
- Pitfalls: EXPLAIN without ANALYZE; over-indexing;
ignoring VACUUM.
Loop: Measure → Hypothesize → Change → Verify → Document.
References
- EXPLAIN:
https://www.postgresql.org/docs/current/using-explain.html
- Buffer usage (BUFFERS):
https://www.postgresql.org/docs/current/using-explain.html#USING-EXPLAIN-BUFFERS
- Query planning (planner):
https://www.postgresql.org/docs/current/runtime-config-query.html
- Statistics target:
https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGET
- pg_stat_statements:
https://www.postgresql.org/docs/current/pgstatstatements.html
- Logging slow statements (log_min_duration_statement):
https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOG-MIN-DURATION-STATEMENT