Portfolio
Projects
Production systems, dashboards, and pipelines built across data engineering, business intelligence, and automation.
Public Procurement Intelligence Pipeline
End-to-end data pipeline extracting and analyzing government procurement data from a national SPA portal. Combines Selenium session auth, internal REST API calls, parallel extraction with 25 workers, exponential backoff, and produces a master dataset for market intelligence analysis.
Credit Risk Classifier — Random Forest
Binary classification model to predict loan default probability on 32K credit records. Implements feature engineering (DTI ratio, income buckets, age groups), trains Logistic Regression and Random Forest with class balancing, and evaluates using AUC-ROC, KS Statistic, and confusion matrix. AUC: 0.93.
ELT Reconciliation Pipeline — Airflow
ELT pipeline consolidating data from an ERP, a payments portal, and a compliance registry. Runs daily via Airflow, normalizes entity names across three sources using fuzzy matching (pg_trgm + recordlinkage), and loads results into a conformed dimension table in the DWH.
ERP Analytics Views — SQL Server
Set of analytical SQL views over ERP transactional tables for year-over-year sales comparisons, client retention analysis, and commission reconciliation.
Local Audio Transcription Pipeline — Whisper + CUDA
Local pipeline using openai-whisper with CUDA (RTX 3000) for legal-context audio transcription. Includes faster-whisper alternative for speed.