01 - Learning Database Normalization
Learn normalization forms for database tables using a Video Game Store example.
“Normalize until it hurts…”
Hello everyone,
Today I wanted to talk about something that I had to learn for my database design and implementation class: database normalization.
Avoiding abstract theory, I’ll walk you through a “real-world” example of a video game store database that I created based on the Steam marketplace.
Why it matters?
Database normalization is the process of organizing your data efficiently by minimizing redundancy, eliminating data anomalies, and improving data integrity.
You usually do this in the logical design step, which converts the conceptual model into a more detailed structure that can be implemented into a relational database.
Without organization, the database will become a nightmare to maintain:
We might end up storing the same information in multiple places
Updates in one piece of data might require changes in dozens of tables
Deleting records could accidentally remove other import information
Queries might become slower and more complex
Whether you’re building a personal project or working on enterprise software, proper database design saves you time and headaches:
Easier updates: When information changes, you only need to update it in one place.
Better data integrity: Constraints and relationships ensure your data stays consistent.
More flexible queries: Well-structured data can be queries in countless ways.
Future-proofing: As requirements change, a normalized database adapts more easily
The Five Normalization Forms
Normalization is typically described in terms of “normal forms”, with each form building on the previous one. Let’s break it down using the game store example.
First Normal Form (1NF) : No Repeating Groups
The rule: Each table cell should contain only a single value, and each record needs to be unique.
The problem: Look at this Game table that tries to store multiple genres for each game:
Game
- GameID: 1
- Title: "The Witcher 3"
- Genres: "Action, RPG, Adventure" <- This is the problem!
- Developer: "CD Projekt RED"
- Price: 39.99This approach makes it impossible to:
Search for all games in a specific genre
Add or remove a single genre from a game
Maintain a consistent list of genre names
The solution: Create a separate table for genres (Genre) and a junction table (GameGenre) to connect games with genres:
Game
- GameID: 1
- Title: "The Witcher 3"
- DeveloperID: 2
- Price: 39.99
Genre
- GenreID: 1
- Name: "Action"
- GenreID: 2
- Name: "Adventure"
- GenreID: 3
- Name: "RPG"
GameGenre
- GameGenreID: 1
- GameID: 1
- GenreID: 1 (Action)
- GameGenreID: 2
- GameID: 1
- GenreID: 2 (Adventure)
- GameGenreID: 3
- GameID: 1
- GenreID: 3 (RPG)Second Normal Form (2NF): No Partial Dependencies
The rule: A table is in 2NF if it’s in 1NF and all non-key attributes depend on the entire primary key, not just part of it.
This is important when you have composite primary keys (keys made up of many multiple columns).
The problem: Consider a poorly designed GameGenre table:
GameGenre
- GameID: 1 (Part of composite primary key)
- GenreID: 3 (Part of composite primary key)
- GameTitle: "The Witcher 3" <- Depends only on GameID
- GenreName: "Role-Playing" <- Depends only on GenreIDThe issue here is that GameTitle depends only on GameID, not on the complete key (GameID + GenreID). Similarily, GenreName depends only on GenreID.
The solution: Move these attributes to their respective tables:
Game
- GameID: 1
- Title: "The Witcher 3"
- Price: 39.99
Genre
- GenreID: 3
- Name: "Role-Playing"
- Description: "Games where players assume roles..."
GameGenre
- GameGenreID: 1
- GameID: 1
- GenreID: 3Third Normal Form (3NF): No Transitive Dependencies
The rule: A table is in 3NF if it’s in 2NF and non-key attributes don’t depend on other non-key attributes.
The problem: Look at the transaction table that includes game information:
Transaction
- TransactionID: 1
- UserID: 2
- GameID: 3
- GameTitle: "Skyrim" <- Depends on GameID, not TransactionID
- DeveloperName: "Bethesda" <- Depends on GameID, not TransactionID
- Price: 19.99
- TransactionDate: 2023-05-22Here, GameTitle and DeveloperName depend on GameID, which isn’t the primary key of this table.
The solution: Separate the data appropriately:
Transaction
- TransactionID: 1
- UserID: 2
- TransactionDate: 2023-05-22
- TotalAmount: 19.99
TransactionGame
- TransactionGameID: 1
- TransactionID: 1
- GameID: 3
- PriceAtPurchase: 19.99
Game
- GameID: 3
- Title: "Skyrim"
- DeveloperID: 3
- CurrentPrice: 19.99
Developer
- DeveloperID: 3
- Name: "Bethesda"Boyce-Codd Normal Form (BCNF): All Determinants Must Be Candidate Keys
The rule: For every non-trivial dependency X → Y in the table, X must be a superkey (candidate key or contain a candidate key).
This is a stricter version of 3NF that addresses some edge cases.
The problem: Move user details to a separate table:
UserLibrary
- LibraryID: 1 (Primary Key)
- UserID: 101 (Unique)
- Username: "gamer123" <- Determined by UserID
- CreatedDate: 2023-01-15UserID determines Username, but UserID isn’t the primary key.
The solution: Move user details to a separate table:
UserLibrary
- LibraryID: 1
- UserID: 101
- CreatedDate: 2023-01-15
User
- UserID: 101
- Username: "gamer123"
- Email: "gamer123@email.com"Fourth Normal Forms (4NF)
These deal with multi-values dependencies and join dependencies.
The rule: A table is 4NF if it’s in BCNF and has no multi-valued dependencies, except in cases where every multi-valued dependency is a functional dependency.
The problem: Multi-valued dependencies occur when two attributes in a table independently depend on a third attribute, creating redundant combinations of data. For example, in our game store database, if you tried to track both game genres and game achievements in a single table:
GameDetails
- GameID: 1
- GenreID: 3 (Multi-valued dependency on GameID)
- AchievementID: 2 (Multi-valued dependency on GameID)This creates unnecessary data redundancy because every possible combination of genre and achievement must be stored for each game, even though genres and achievements are independent of each other.
The solution: Split the table into separate tables that each represent one multi-values dependency:
GameGenre
- GameID: 1
- GenreID: 3
GameAchievement
- GameID: 1
- AchievementID: 2Fifth Normal Form (5NF)
The rule: A table in 5NF if every join dependency in the table is implied by the candidate keys.
The problem: Join dependencies occur when a table can be decomposed into multiple tables and then reconstructed via joins without losing or gaining information. For example, if you tried to represent games, developers, and publishers in a single table:
GameProduction
- GameID
- DeveloperID
- PublisherIDThis structure might imply incorrect relationships if not every developer works with every publisher on every game. The table cannot accurately represent complex relationships between these three entities
The solution: Create a structure that explicitly defines the relationships:
Game
- GameID
- Title
- DeveloperID (FK)
- PublisherID (FK)
Developer
- DeveloperID
- Name
Publisher
- PublisherID
- NameWhen to break the rules
“…denormalize until it works”
When normalization is generally a good practice, there are legitimate reasons to “denormalize” in certain scenarios:
Performance optimization: Sometimes, controlled redundancy can improve read performance for frequently accessed data.
Reporting databases: Analytical systems often use denormalized structures like star schemas to optimize for complex queries.
Caching layers: Duplicating some data can reduce the need for complex joins in high-traffic systems.
The key is to denormalize intentionally, not accidentally. When you choose to break normalization rules, document your decision and understand the trade-offs.
Database normalization is a practical approach to organizing information that will make your applications more robust, maintainable, and scalable.
Next time you’re designing a database, consider normalization.
Thank yourself later.

