Database Normalization (1NF-3NF)
The process of organizing database columns and tables to reduce data redundancy and improve data integrity through progressive normal forms.
Description
Normalization is a systematic approach to decomposing tables to eliminate redundancy and dependency anomalies. First Normal Form (1NF) requires that each column contains atomic (indivisible) values and each row is unique -- no repeating groups or arrays. Second Normal Form (2NF) builds on 1NF by requiring that every non-key column depends on the entire primary key, not just part of it -- this matters for tables with composite primary keys. Third Normal Form (3NF) requires that non-key columns depend only on the primary key, not on other non-key columns (no transitive dependencies).
For example, a denormalized orders table might store customer_name, customer_email, product_name, and product_price alongside order data. This creates update anomalies: changing a customer's email requires updating every order row for that customer. Normalizing to 3NF splits this into customers, products, and orders tables linked by foreign keys, ensuring each fact is stored exactly once.
Most production schemas should aim for 3NF as a starting point. Beyond 3NF, there are Boyce-Codd Normal Form (BCNF), 4NF, and 5NF, but they are rarely needed in practice. The discipline of normalization forces you to think carefully about data dependencies and entity boundaries, which leads to cleaner schemas even if you later selectively denormalize for performance. Always normalize first, then denormalize with clear justification.
Prompt Snippet
Design the initial schema in Third Normal Form: ensure every non-key attribute depends on the key, the whole key, and nothing but the key. Identify transitive dependencies (e.g., order -> customer_id -> customer_email) and extract them into separate tables with foreign key constraints. Use an ER diagram tool (dbdiagram.io or pgModeler) to visualize relationships during design review. Document any intentional violations of 3NF in the schema's ADR with the performance justification and the denormalization strategy used.
Tags
Related Terms
Denormalization Strategies
Intentionally introducing redundant data into a normalized schema to optimize read performance by reducing expensive JOIN operations.
Foreign Key Constraints
Database-enforced rules that guarantee a value in one table's column must reference an existing row in another table, preventing orphaned records.
Database Migrations
Version-controlled scripts that incrementally modify the database schema, enabling reproducible and reversible schema changes across environments.