Data Modeling

Introduction

  • The process of creating a visual representation or "blueprint" of how data is structured, stored, and related within an organization.

  • Acts as a bridge between complex business requirements and the technical database design

Levels

Conceptual Data Model (The "What")

This is a high-level view created for business stakeholders. It focuses on Entities (things the business cares about) and their Relationships. It doesn’t involve any technical details like data types.

  • Example: A "Customer" places an "Order."

Logical Data Model (The "How")

This adds more detail to the conceptual model. It defines the Attributes (columns) for each entity and the specific nature of the relationships, but it is still independent of the specific database technology (e.g., whether you use PostgreSQL or Databricks).

  • Example: "Customer" now has a Customer_ID, Email, and Signup_Date.

Physical Data Model (The "Implementation")

This is the actual technical blueprint used to build the database. It includes database-specific details like Data Types (Integer, String, Timestamp), Primary Keys, Foreign Keys, and Indexes.

  • Example: Customer_ID is defined as a BIGINT PRIMARY KEY.

Pattern

Relational Modeling

  • A method of structuring data in a database by organizing it into tables (also called "relations") consisting of rows and columns. Each row represents a unique record (a "tuple"), and each column represents a specific attribute of that data.

  • The "relational" part comes from the fact that these tables are linked to one another using common data points, typically Primary Keys and Foreign Keys.

In a relational model, you split data into many small, specialized tables.

  • Customers Table: CustomerID (PK), Name, Email, AddressID (FK)

  • Addresses Table: AddressID (PK), Street, City, ZipCode

  • Products Table: ProductID (PK), DrugName, ManufacturerID (FK), Price

  • Manufacturers Table: ManufacturerID (PK), CompanyName, Contact

  • Prescriptions Table: PrescriptionID (PK), CustomerID (FK), DoctorName

  • Sales Table: SaleID (PK), PrescriptionID (FK), ProductID (FK), Date, Quantity

Dimensional Modeling

  • A specialized data design technique used primarily for data warehousing and business intelligence.

  • Designed with STAR Schema

Fact Tables (The "Quantitative" Data)

The Fact table sits at the center of your model. It contains the "metrics" or "measurements" of a business process.

  • Content: Mostly numbers (integers, decimals) and foreign keys.

  • Example: In a retail store, a Fact table record would store Quantity_Sold, Unit_Price, and Discount_Amount.

Dimension Tables (The "Qualitative" Data)

Dimension tables surround the Fact table. They provide the context (the "Who, What, Where, When, and Why") for the numbers in the Fact table.

  • Content: Descriptive text and attributes used for filtering and grouping.

  • Example: A Dim_Product table would contain Product_Name, Brand, Category, and Color.

Example

Fact

Fact_Sales

Sale_ID, Date_FK, Product_FK, Store_FK, Total_Amount, Tax

Dimension

Dim_Date

Date, Month, Quarter, Year, Is_Weekend

Dimension

Dim_Product

Product_Name, Sub_Category, Category, Supplier

Dimension

Dim_Store

Store_Name, City, State, Region, Manager

Difference

Primary Use

OLTP (Apps, Websites, Point-of-Sale)

OLAP (Dashboards, Power BI, AI)

Structure

Many small tables (Normalized)

Few large tables (De-normalized)

Data Redundancy

Low (Data is stored once)

High (Data is repeated for speed)

Query Speed

Slower for large reports

Extremely fast for large reports

Complexity

Hard for non-tech users to navigate

Easy for business users to understand

Last updated