PostgreSQL Data Warehouse (Bronze–Silver–Gold)

End-to-end SQL pipeline built in PostgreSQL with DBeaver; Gold layer consumed in Looker Studio.

Details

Role: Data Analyst / Analytics Engineer

Stack: PostgreSQL · DBeaver · SQL · Looker Studio

Deliverables: Layered warehouse (Bronze/Silver/Gold) · Star schema marts · BI dashboard

🔗 [GitHub repo / SQL scripts]

🔗 [Database schema documentation]

🔗 [Looker Studio dashboard]

🔗 [Presentation deck]

🔗 [PDF one-pager]

Summary

Executive Summary

This project implements a structured data warehouse in PostgreSQL using a three-layer approach (Bronze/Silver/Gold). Raw source extracts are landed in Bronze, standardized and cleaned in Silver, and reshaped into analytics-ready marts in Gold. The Gold layer is then used to power a Looker Studio dashboard for customer and product reporting.


The outcome is a transparent, scalable SQL pipeline that separates raw ingestion from business logic and supports both BI and ad-hoc analysis.

Business Goal

Create a reliable warehouse foundation for CRM + ERP data that enables:

  • Consistent, reusable business metrics
  • Fast BI reporting with a curated Gold layer
  • Clear lineage from raw data to reporting outputs
  • Extensibility for future sources and new marts

Architecture

Layered Warehouse Design

Bronze (Raw)

  • Minimal transformation
  • Source-aligned tables (raw extracts)
  • Goal: preserve the original payload and make ingestion repeatable

Silver (Clean & Conformed)

  • Cleaning (null handling, deduplication, type casting)
  • Standardization (naming conventions, consistent keys)
  • Derived fields and canonical attributes

Gold (Business-Ready)

  • Analytics-ready tables/views
  • Star schema marts optimized for reporting
  • A single place for metric definitions used by BI

🖼️ [Architecture diagram image]

Data Flow

Sources → Bronze → Silver → Gold → Consumption

  • Sources: CRM + ERP extracts
  • Bronze: source-specific raw tables
  • Silver: standardized, validated entities
  • Gold: fact + dimension model
  • Consumption: Looker Studio reporting, ad-hoc SQL, downstream analytics

🖼️ [Data flow diagram image]

Gold Layer: Data Modeling

Star Schema (Reporting-First)

The Gold layer is designed for semantic clarity and BI performance:

  • Fact table: fact_sales (event-level measures at a defined grain)
  • Dimensions: dim_customers, dim_products (conformed attributes for slicing)

This structure supports stable metrics, consistent joins, and scalable reporting.

🔗 [Link to ERD / model explanation]

Transformations (Silver Layer)

Standardization Pattern

Key transformation categories:

  • Cleaning: null handling, deduplication, type normalization
  • Conforming: harmonized keys across CRM/ERP, standardized naming
  • Derivation: reusable business fields and metric-ready attributes

🔗 [Link to transformation SQL / examples]

BI Output (Looker Studio)

Dashboard: Customer & Product Report

The Looker Studio dashboard consumes the Gold layer to provide:

  • Customer segmentation summary
  • Sales by segment
  • Product/customer breakdown tables
  • Filterable exploration (segment, age group, etc.)

🖼️ [Dashboard screenshot]

🔗 [Live dashboard link]

Key Insights

  1. Layer separation prevents “business logic drift”
    Raw ingestion stays stable while transformations evolve safely in Silver and Gold.
  2. Gold-layer star schema reduces downstream complexity
    BI and analysts share one consistent definition of metrics and dimensions.
  3. Standardization in Silver is the scaling lever
    Conformed keys and canonical attributes enable new marts with minimal rework.

Limitations & Next Steps

Planned upgrades

  • Automated data quality checks (freshness, row counts, uniqueness, referential integrity)
  • Incremental loading patterns and change tracking
  • Orchestration/scheduling and version-controlled SQL migrations
  • Optional semantic layer standardization for BI metric governance