Data Modeling
Introduction
The process of creating a visual representation or "blueprint" of how data is structured, stored, and related within an organization.
Acts as a bridge between complex business requirements and the technical database design
Levels
Conceptual Data Model (The "What")
This is a high-level view created for business stakeholders. It focuses on Entities (things the business cares about) and their Relationships. It doesn’t involve any technical details like data types.
Example: A "Customer" places an "Order."
Logical Data Model (The "How")
This adds more detail to the conceptual model. It defines the Attributes (columns) for each entity and the specific nature of the relationships, but it is still independent of the specific database technology (e.g., whether you use PostgreSQL or Databricks).
Example: "Customer" now has a
Customer_ID,Email, andSignup_Date.
Physical Data Model (The "Implementation")
This is the actual technical blueprint used to build the database. It includes database-specific details like Data Types (Integer, String, Timestamp), Primary Keys, Foreign Keys, and Indexes.
Example:
Customer_IDis defined as aBIGINT PRIMARY KEY.
Pattern
Relational Modeling
A method of structuring data in a database by organizing it into tables (also called "relations") consisting of rows and columns. Each row represents a unique record (a "tuple"), and each column represents a specific attribute of that data.
The "relational" part comes from the fact that these tables are linked to one another using common data points, typically Primary Keys and Foreign Keys.
In a relational model, you split data into many small, specialized tables.
CustomersTable:CustomerID(PK),Name,Email,AddressID(FK)AddressesTable:AddressID(PK),Street,City,ZipCodeProductsTable:ProductID(PK),DrugName,ManufacturerID(FK),PriceManufacturersTable:ManufacturerID(PK),CompanyName,ContactPrescriptionsTable:PrescriptionID(PK),CustomerID(FK),DoctorNameSalesTable:SaleID(PK),PrescriptionID(FK),ProductID(FK),Date,Quantity
Dimensional Modeling
A specialized data design technique used primarily for data warehousing and business intelligence.
Designed with STAR Schema
Fact Tables (The "Quantitative" Data)
The Fact table sits at the center of your model. It contains the "metrics" or "measurements" of a business process.
Content: Mostly numbers (integers, decimals) and foreign keys.
Example: In a retail store, a Fact table record would store
Quantity_Sold,Unit_Price, andDiscount_Amount.
Dimension Tables (The "Qualitative" Data)
Dimension tables surround the Fact table. They provide the context (the "Who, What, Where, When, and Why") for the numbers in the Fact table.
Content: Descriptive text and attributes used for filtering and grouping.
Example: A
Dim_Producttable would containProduct_Name,Brand,Category, andColor.
Example
Fact
Fact_Sales
Sale_ID, Date_FK, Product_FK, Store_FK, Total_Amount, Tax
Dimension
Dim_Date
Date, Month, Quarter, Year, Is_Weekend
Dimension
Dim_Product
Product_Name, Sub_Category, Category, Supplier
Dimension
Dim_Store
Store_Name, City, State, Region, Manager
Difference
Primary Use
OLTP (Apps, Websites, Point-of-Sale)
OLAP (Dashboards, Power BI, AI)
Structure
Many small tables (Normalized)
Few large tables (De-normalized)
Data Redundancy
Low (Data is stored once)
High (Data is repeated for speed)
Query Speed
Slower for large reports
Extremely fast for large reports
Complexity
Hard for non-tech users to navigate
Easy for business users to understand
Last updated