Role: Data Analyst / Analytics Engineer / Data Engineer
Domain: Sports Analytics (NHL)
Stack: Python · PostgreSQL · DBeaver · SQL · NHL API
Python Scraping Pipeline + PostgreSQL Dimensional Model
Role: Data Analyst / Analytics Engineer / Data Engineer
Domain: Sports Analytics (NHL)
Stack: Python · PostgreSQL · DBeaver · SQL · NHL API
This project implements a full NHL analytics data warehouse by scraping official NHL data using Python and modeling it into a PostgreSQL dimensional warehouse.
The solution combines API ingestion, data normalization, surrogate key modeling, and multi-grain fact tables to enable advanced hockey analytics at player, team, game, and event levels.
The warehouse is designed to support both BI reporting and advanced analytical modeling.
Public NHL data is available, but:
This project solves those problems by:
Data Pipeline
NHL API
↓
Python Scrapers (nhlpy + psycopg2)
↓
PostgreSQL Data Warehouse (nhl_dw schema)
↓
SQL Analytics / BI / Python Analysis
The warehouse follows dimensional modeling best practices.
dim_datedim_playerdim_teamdim_seasondim_venuefact_gamefact_team_gamefact_skater_gamefact_goalie_gameevent_playModeling Principles
The ingestion layer is built in Python using nhlpy and psycopg2.
Key design choices
Example logic
This ensures no players are lost due to API fragmentation.
This warehouse enables:
Typical analytical questions supported:
Planned upgrades