Portfolio

Projects

Production systems, dashboards, and pipelines built across data engineering, business intelligence, and automation.

Pipeline
production

Public Procurement Intelligence Pipeline

End-to-end data pipeline extracting and analyzing government procurement data from a national SPA portal. Combines Selenium session auth, internal REST API calls, parallel extraction with 25 workers, exponential backoff, and produces a master dataset for market intelligence analysis.

PythonSeleniumREST APIpandas+3
ML / AI
production

Credit Risk Classifier — Random Forest

Binary classification model to predict loan default probability on 32K credit records. Implements feature engineering (DTI ratio, income buckets, age groups), trains Logistic Regression and Random Forest with class balancing, and evaluates using AUC-ROC, KS Statistic, and confusion matrix. AUC: 0.93.

Pythonscikit-learnRandom Forestpandas+1
Pipeline
production

ELT Reconciliation Pipeline — Airflow

ELT pipeline consolidating data from an ERP, a payments portal, and a compliance registry. Runs daily via Airflow, normalizes entity names across three sources using fuzzy matching (pg_trgm + recordlinkage), and loads results into a conformed dimension table in the DWH.

Apache AirflowPythonPostgreSQLpg_trgm+2
SQL / ERP
production

ERP Analytics Views — SQL Server

Set of analytical SQL views over ERP transactional tables for year-over-year sales comparisons, client retention analysis, and commission reconciliation.

SQL ServerSage 300T-SQLERP
NLP / AI
active

Local Audio Transcription Pipeline — Whisper + CUDA

Local pipeline using openai-whisper with CUDA (RTX 3000) for legal-context audio transcription. Includes faster-whisper alternative for speed.

PythonWhisperCUDANLP