Back to all terms
Database
Databaseintermediate

Denormalization Strategies

Intentionally introducing redundant data into a normalized schema to optimize read performance by reducing expensive JOIN operations.

Also known as: strategic denormalization, data duplication, read optimization

Description

Denormalization is the deliberate introduction of data redundancy to improve query performance. While normalization minimizes redundancy and update anomalies, it can result in queries that require multiple JOINs across many tables, which becomes expensive at scale. Denormalization strategies include adding computed columns (e.g., storing order_total instead of computing SUM(line_items.price) on every read), duplicating frequently accessed fields (storing customer_name on orders), or pre-aggregating data into summary tables.

The most common denormalization patterns are: materialized aggregates (storing counts, sums, or averages alongside the source data), pre-joined tables (combining data from multiple normalized tables into a read-optimized table), and cache tables (periodically refreshed snapshots of complex query results). Each pattern has different consistency trade-offs. Materialized aggregates must be updated in the same transaction as the source data to maintain consistency, while cache tables can tolerate some staleness.

Denormalization must be driven by measured performance needs, never by speculation. Profile your queries first, confirm that JOINs are the bottleneck (not missing indexes or poor query structure), and then denormalize the specific access pattern that needs optimization. Always maintain the normalized source of truth and treat denormalized data as a derived cache. Document the update paths that keep denormalized data in sync, and add monitoring to detect drift.

Prompt Snippet

Denormalize only after confirming JOINs are the bottleneck via EXPLAIN ANALYZE and pg_stat_statements. For high-read aggregates like dashboard counters, maintain a summary table updated within the same transaction using a trigger or application-level write-through. For less time-sensitive aggregates, use PostgreSQL materialized views with REFRESH MATERIALIZED VIEW CONCURRENTLY on a cron schedule. Document each denormalization in the schema's ADR with the specific query it optimizes, the consistency guarantee (synchronous vs eventual), and the refresh/invalidation mechanism.

Tags

performancedata-modelingarchitecturequery-optimization