Data Modeling 101: Concepts, Examples, and Why It Matters
Data driven can only be achieved by data model first
Everybody talks about data and everybody wants data. But how do you design robust data models that actually meet your users’ data needs? This article explains the basics of data modeling for beginners and will help you design data structures that scale and make sense — to everybody.
What Is Data Modeling, and Why Should You Care?
Data modeling is the process of defining and structuring how data is stored, organized, and used in your systems. A data model is the blueprint for your data infrastructure. It helps to understand your business domain and improves communication with your stakeholders. It supports making decisions about your data architecture.
A data model contains the following elements:
Entities to represent real-world objects, such as a Customer or a Product.
Keys to uniquely identify entities, such as a customer’s email address or a product’s SKU.
Attributes to represent the properties of entities, such as a customer’s name or a product’s price.
Types to define the nature of data attributes, such as strings, dates, or numbers.
Relations to represent interactions or associations, for example, a Customer buys a Product.
(Optional) Boundaries to define the scope of the model, specifying which entities and relationships are included and which are outside its viewpoint.
Without a data model, your data systems can quickly turn into a spaghetti of tables, schemas, and APIs, leaving your data engineers tangled up in a data mess and your business analysts wondering how a data-driven culture can be achieved with a data foundation so shaky.
What do good data models aim to achieve?
Efficiency
Good data models are easy to read, quick to comprehend, and simple to communicate. Teams are saving time by better understanding the business domain and having a clearer vision of what needs to be implemented.Standardization
Having good standards and guidelines is important. It ensures consistency and an easy to follow connection between the deeper technological layers up to high-level concepts.Reusability
Well-structured data models provide reusable components, because writing the same stuff ten times is not fun. For example, a Customer model can be reused across marketing, sales, and support, ensuring consistency and saving time.Collaboration
When your team — and potentially other teams — follows the same set of rules and concepts in data modeling, collaboration becomes much easier. Data can be connected, data can be exchanged and data can be compared without requiring extensive transformation and rework. This accelerates development in general and leads to more robust interaction between systems.Data quality
Good data models inherently promote and build in key data qualities. What does this mean? For example, a good data model ensures that attributes requiring only dates are defined as date fields — not as datetime or, data gods forbid, string or varchar types — which are too broad and prone to errors.
The Three Levels of Data Modeling
Conceptual, logical, and physical models represent the three levels of data modeling. As you move deeper, more details and complexity are added to the model. As you move higher, the model becomes more abstract and simplified for better understanding. To better illustrate these levels, we are using a fictional Airline as an example in the following sections.
Conceptual Data Model: The Big Picture
A conceptual data model (CDM) focusses on broad concepts and is used for representing and understanding business concepts. At this stage, key entities and the important relations between them are identified. It is important that non-technical users should be able to understand it.
In our airline example, a CDM might look like this:
The model consists of four entities: Plane, Flight, Passenger, and Ticket. They have relations between them, allowing us to read this model like simple prose:
A passenger books a flight and holds a ticket.
A flight is operated by a plane.
There are different schools of thought when it comes to CDMs.
One approach advocates always including a business key for each entity in a CDM. This has the advantage of forcing you to be very precise in your conceptual modeling process, requiring you to ask business stakeholders detailed questions about the real-world objects you are trying to represent. However, this approach can make conceptual modeling more challenging, as it limits your ability to group different real-world objects under a single entity if they do not share a natural business key.
For example, consider a Payment entity. In the real world, cash payments and card payments may not share a single unique identifier, making it difficult to represent them under the same entity in a conceptual model. This might require you to create separate entities, such as Cash Payment and Card Payment, increasing complexity at the conceptual level.
Another approach to conceptual modeling involves adding attributes to conceptual entities. This can help you get more detailed information from business stakeholders about the real-world objects you’re modeling. However, it risks losing sight of the big picture and the interactions between entities, focusing instead on details that could be worked out at a later stage.
For example, adding attributes like Passenger Age or Ticket Price at the conceptual level might start discussions with specifics, distracting from general questions like how passengers interact with tickets and flights.
As with any software development artifact, it’s good practice to review your work. An effective review is easier with a checklist to go through step by step, ensuring nothing is overlooked. For a conceptual model, a checklist could look something like this:
Is it understandable for non-technical users?
Does it help technical users understand the business?
Do the entities represent real-world business objects?
Are these business objects parts of a business processes?
Are the relationships named in a meaningful way?
(Optional) Does every entity have a business key?
Logical Data Model: Adding Details, Not Technology
The logical level adds more structure.
Within the logical data model (LDM), you’re defining attributes, keys, and relationships in detail. At the same time you’re remaining independent of specific technologies or systems. This means no system-specific data types, no mapping-tables, or any other implementation details. It also means no uppercase or hungarian or any other notation, but still written in natural language.
If you haven’t defined the primary keys at the conceptual level, you must define them here at the logical level at the latest. Primary keys are unique identifiers for a row within a dataset. Without them, it becomes difficult to link data, clean duplicate data, or work with data across interfaces.
The primary key is not necessarily the business key. It is recommended that your data model generates its own so-called surrogate keys (as primary keys). and treats the business key as a separate attribute. However, this distinction is typically made at the physical level, as it is an implementation detail. At the logical level, the business key is often used as the primary key, to keep the model independent from the actual implementation.
The logical data model describes relationships in more detail.
Relationships connect entities. The so-called cardinality describes how many entities can be related to the connected entity, for example a plane ticket belongs to one and only one passenger, but a passenger can have several tickets. You should specify the cardinality for each relationship on the logical level, including the minimum and maximum possible number of connected entities.
Foreign keys are used to establish relationships between different entities in a data model. They are attributes in one table that reference the primary key of another table, ensuring referential integrity and enabling the linking of related data across tables. The cardinality of the relationship determines how foreign keys are applied, defining whether it is a one-to-one, one-to-many, or many-to-many relationship.
The logical data model adds attributes.
The attributes should be as detailed as possible. They don’t need a data type yet–those are assigned in the physical data model. However, the name can already indicate the data type, for example:
Booking Date for a date type like 2025–12–12
Arrival Datetime for a date and time type like 2025–12–12T12:25:10Z
If the attribute is in a ISO-unit, the attribute’s name should already specify this to prevent confusion, for example:
Height in Feet instead of just Height
Weight in Kilogram instead of just Weight
Don’t worry about the length of the attributes’ names. The advantage of a readable and unambiguous data model outweighs possible misunderstandings with abbreviations or missing information. The era of limiting attribute field names in databases to a maximum of eight characters is long over (phew).
The logical data model needs a different notation than the conceptual data model because it is much more detailed. The entity-relationship diagram is the recommended type for a logical data model. I prefer the so-called crow’s foot notation (also called Martin Notation) in the data model to describe relationships with their cardinality.
Building on our airline example, the logical data model looks like this:
So, how does this differ from the conceptual level?
Additional entity flight operation: There is a new data entity called Flight Operation. It connects the two data entities Flight and Aircraft. Each flight operation represents the actual flight that took place at the specific Operating Date.
Primary keys: Each data entity now has a primary key that identifies individual rows in the dataset. They are marked with “PK” and underlined. This is optional, but helps a lot with readability.
Foreign keys and relationships: Relationships with their cardinality are added. Foreign keys are marked as “FK,” following the same notation as primary keys before (again optional but improves readability). The names of foreign keys are identical to the primary key they reference.
Attributes: Attributes are added to the entities. I am using a capitalized notation, but that’s not necessary. It just needs to be consistent across different data models.
Physical Data Model: The Tech Blueprint
The physical data model (PDM) adds the technical implementation details. It realizes the business solution, that was modeled before on the logical level. It bakes the PDM into specific software or hardware. For a relational database system like PostgreSQL, a PDM’s data entities can be directly implemented as tables and fields as columns. The PDM comes with the following additions and changes in the model:
Naming: The names for tables, columns, and any other system entities are the same as they are in the system. They include underlines, might be lowercase or uppercase (although this might not matter in most databases) and be generally less readable but machine-conformant for use in source code.
Data types: System-specific data types define how the data’s fields are stored in the storage system, for example an identifier with int64 for a 64-bit long integer field.
Normalization or Denormalization: The PDM is optimized for specific non-functional requirements, for example write-performance or data consistency. This is achieved by normalization or denormalization of the data model, to reduce redundancy or speed up data access.
The following diagram shows the resulting PDM in our Airline example:
We normalized the data model and added tables for airports and aircraft types. Data types we introduced, for example int64 for primary and foreign keys. The relationships between the tables can be enforced by the database system by referential constraints, for example a foreign key constraint. A foreign key constraint in our example would prevent an airport being added without having a flight connection to this airport. However, our PDM implementation allows this, so the airports table can be filled with a set of airports available from a different source without having to have the flight connections first.
In this example, we reused an entity-relationship model for drawing the PDM. This is perfectly fitted for the implementation in a relational database system. However, if you’re planning on implementing in a NoSQL database like MongoDB, you would need a different type of diagram. The Luna Modeller is recommended for data modeling in the NoSQL world. For drawing the PDM, always choose the diagram that makes reading and implementing the technical solution in the chosen technology easy.
Frequently Asked Questions About Data Modeling
This sections gives you a short overview of frequently asked questions related to data modeling and real world application.
What is the difference between conceptual, logical, and physical data models?
Conceptual data models describe the business domain at a high level, focusing on entities and their relationships without technical or solution details. Logical data models add more structure by including attributes, keys, and relationship details while still being independent of technology. Physical data models turn the design into real database structures, specifying tables, columns, data types, and other system-specific attributes needed for implementation.
Why is data modeling important?
Data modeling helps you understand the business, design solutions, and navigate your data systems. The conceptual data model lets you communicate with business departments, understand the domain, and speak the same language. The logical model helps you craft a solution for business problems while staying connected to the business context. The physical model turns that into a working database structure to implement a sustainable data solution for your users.
Who uses data modeling?
Data modeling is used by data engineers, data architects, business analysts, data scientists — basically anyone working with data and data structures. It helps align business and technical teams by creating a shared understanding of the data. Even if you’re not a full-time data modeler, having basic modeling skills can make your work much more effective.
Do I need special tools for data modeling?
You don’t need a special tool in the beginning — especially for simple conceptual modeling. But I highly recommend switching to a proper tool early on. Otherwise, you’ll likely end up with a graveyard of outdated models. It’s very helpful if your tool supports a repository of your modeled data objects. That way, you can reuse entities, connect models, and collaborate with others — similar to version control for code.
Ideally, choose a tool that supports open formats, so you can export models, build on them, and support model-driven development connected to your real data.
Is data modeling only for large companies?
No — data modeling is important for any company that takes its data seriously. It helps you understand your data and gives structure as your systems grow. Trying to add data modeling as an afterthought, for example by reverse-engineering databases, is usually more expensive than including it from the start.
Final Thoughts
I hope this article helped you understand data modeling and will support your journey into the world of data. Data modeling is definitely worth the effort and will pay off as your data world grows. And as always: keep on coding and keep on creating!