PostgreSQL Data Warehouse (Bronze–Silver–Gold)

End-to-end SQL pipeline built in PostgreSQL with DBeaver; Gold layer consumed in Looker Studio.

Details

Role: Data Analyst / Analytics Engineer

Stack: PostgreSQL · DBeaver · SQL · Looker Studio

Deliverables: Layered warehouse (Bronze/Silver/Gold) · Star schema marts · BI dashboard

🔗 [GitHub repo / SQL scripts]

🔗 [Database schema documentation]

🔗 [Looker Studio dashboard]

🔗 [Presentation deck]

🔗 [PDF one-pager]

Summary

Executive Summary

This project implements a structured data warehouse in PostgreSQL using a three-layer approach (Bronze/Silver/Gold). Raw source extracts are landed in Bronze, standardized and cleaned in Silver, and reshaped into analytics-ready marts in Gold. The Gold layer is then used to power a Looker Studio dashboard for customer and product reporting.

The outcome is a transparent, scalable SQL pipeline that separates raw ingestion from business logic and supports both BI and ad-hoc analysis.

Business Goal

Create a reliable warehouse foundation for CRM + ERP data that enables:

Consistent, reusable business metrics
Fast BI reporting with a curated Gold layer
Clear lineage from raw data to reporting outputs
Extensibility for future sources and new marts

Architecture

Layered Warehouse Design

Bronze (Raw)

Minimal transformation
Source-aligned tables (raw extracts)
Goal: preserve the original payload and make ingestion repeatable

Silver (Clean & Conformed)

Cleaning (null handling, deduplication, type casting)
Standardization (naming conventions, consistent keys)
Derived fields and canonical attributes

Gold (Business-Ready)

Analytics-ready tables/views
Star schema marts optimized for reporting
A single place for metric definitions used by BI

🖼️ [Architecture diagram image]

Data Flow

Sources → Bronze → Silver → Gold → Consumption

Sources: CRM + ERP extracts
Bronze: source-specific raw tables
Silver: standardized, validated entities
Gold: fact + dimension model
Consumption: Looker Studio reporting, ad-hoc SQL, downstream analytics

🖼️ [Data flow diagram image]

Gold Layer: Data Modeling

Star Schema (Reporting-First)

The Gold layer is designed for semantic clarity and BI performance:

Fact table: fact_sales (event-level measures at a defined grain)
Dimensions: dim_customers, dim_products (conformed attributes for slicing)

This structure supports stable metrics, consistent joins, and scalable reporting.

🔗 [Link to ERD / model explanation]

Transformations (Silver Layer)

Standardization Pattern

Key transformation categories:

Cleaning: null handling, deduplication, type normalization
Conforming: harmonized keys across CRM/ERP, standardized naming
Derivation: reusable business fields and metric-ready attributes

🔗 [Link to transformation SQL / examples]

BI Output (Looker Studio)

Dashboard: Customer & Product Report

The Looker Studio dashboard consumes the Gold layer to provide:

Customer segmentation summary
Sales by segment
Product/customer breakdown tables
Filterable exploration (segment, age group, etc.)

🖼️ [Dashboard screenshot]

🔗 [Live dashboard link]

Key Insights

Layer separation prevents “business logic drift”
Raw ingestion stays stable while transformations evolve safely in Silver and Gold.
Gold-layer star schema reduces downstream complexity
BI and analysts share one consistent definition of metrics and dimensions.
Standardization in Silver is the scaling lever
Conformed keys and canonical attributes enable new marts with minimal rework.

Limitations & Next Steps

Planned upgrades

Automated data quality checks (freshness, row counts, uniqueness, referential integrity)
Incremental loading patterns and change tracking
Orchestration/scheduling and version-controlled SQL migrations
Optional semantic layer standardization for BI metric governance