┌──────────────────────────────────────────────��──────────────────────┐ 
│ Data Ingestion Layer 
││ loan_data.csv → PostgreSQL (Feature Store) → Data Quality Checks 
│└────────────────────────────┬────────────────────────────────────────┘ 
│ ┌────────────────────────────▼────────────────────────────────────────┐ 
│ Training & Validation │ 
│ ┌──────────────────┐ ┌────────────────────┐ ┌─────────────────┐ 
│ │ │ XGBoost Training │→ │ MLflow Experiment │→ │ Model Registry 
│ │ │ │ (Gini, KS, IV) │ │ (All Metrics) │ │ (Production) 
│ │ │ └──────────────────┘ └────────────────────┘ └─────────────────┘ │ └────────────────────────────┬────────────────────────────────────────┘ 
│ ┌────────────────────────────▼────────────────��───────────────────────┐ 
│ Deployment & Serving │ 
│ ┌────────────────────────────────────────────────────────────────┐ 
│ │ │ FastAPI Service (Docker Container) 
│ │ │ │ ├─ /predict → Single Prediction 
│ │ │ │ ├─ /batch-predict → Batch Inference (10K records) 
│ │ │ │ ├─ /health → Service Health 
│ │ │ │ └─ /metrics → Prometheus Metrics 
│ │ │ └────────────────────────────────────────────────────────────────┘
│ │ ┌────────────────────────────────────────────────────────────────┐ 
│ │ │ Cloud Deployment: GCP Cloud Run 
│ │ │ │ ├─ Auto-scaling (0-100 instances) 
│ │ │ │ ├─ Load Balancing 
│ │ │ │ └─ SSL/TLS + API Keys 
│ │ │ └────────────────────────────────────────────────────────────────┘ 
│ └────────────────────────────┬────────────────────────────────────────┘ 
│ ┌────────────────────────────▼────────────────────────────────────────┐ 
│ Monitoring & Drift Detection │ 
│ ┌─────────────────────────────────────────────────────────────┐ 
│ │ │ Evidently AI Reports: 
│ │ │ │ ├─ Data Drift → PSI (Population Stability Index) 
│ │ │ │ ├─ Prediction Drift → Target Distribution Shift 
│ │ │ │ ├─ Data Quality → Missing Values, Duplicates 
│ │ │ │ └─ Model Performance → Gini, KS Degradation 
│ │ │ └─────────────────────────────────────────────────────────────┘ 
│ │ ┌─────────────────────────────────────────────────────────────┐ 
│ │ │ Prometheus + Grafana: 
│ │ │ │ ├─ Predictions per minute 
│ │ │ │ ├─ Latency (p50, p95, p99) 
│ │ │ │ ├─ Error rates 
│ │ │ │ └─ Model version in production 
│ │ │ └─────────────────────────────────────────────────────────────┘ 
│ └────────────────────────────┬────────────────────────────────────────┘ 
│ ┌────────────────────────────▼────────────────────────────────────────┐ 
│ Orchestration & Automated Retraining 
│ │ ┌────────────────────────────────────────────────────────────────┐ 
│ │ │ Apache Airflow DAG (Weekly Schedule) 
│ │ │ │ ├─ Extract new data from production logs 
│ │ │ │ ├─ Run drift detection (Evidently) 
│ │ │ │ ├─ IF drift detected OR degradation → Retrain 
│ │ │ │ ├─ Validate new model (Gini > 0.40) 
│ │ │ │ ├─ Promote to Production if valid 
│ │ │ │ └─ Rollback if validation fails 
│ │ │ └────────────────────────────────────────────────────────────────┘ 
│ └─────────────────────────────────────────────────────────────────────┘

## Key Features

### 1. **Financial Domain Metrics** (Not Just Accuracy)
- **Gini Coefficient** (0.35+ = Good)  
- **KS Statistic** (Kolmogorov-Smirnov)  
- **Information Value** (IV per feature)  
- **Population Stability Index** (PSI for drift)

### 2. **Production-Ready Serving**
- FastAPI with async/await
- Batch predictions (up to 10K records)
- Request validation with Pydantic
- Prometheus metrics export

### 3. **Experiment Tracking**
- MLflow: All hyperparameters, metrics, artifacts
- Model Registry: Version control + staging (Dev → Staging → Production)
- Automatic promotion based on Gini threshold

### 4. **Drift Monitoring**
- Evidently AI: Data drift, prediction drift, data quality
- Alert system: PSI-based thresholds
- Automated HTML reports

### 5. **Orchestration**
- Apache Airflow DAG: Weekly retraining pipeline
- Conditional logic: Retrain only if drift/degradation detected
- Automatic model promotion

### 6. **CI/CD**
- GitHub Actions: Test → Lint → Build Docker → Push to GCR → Deploy to Cloud Run
- Automated testing: Unit tests + integration tests
- Code coverage tracking

## Directory Structure
finrisk-mlops-pipeline/ 

│ ├── app/ # FastAPI application 

│ ├── main.py # API routes 

│ ├── schema.py # Pydantic models 

│ └── model_loader.py # Model loading from MLflow 


│ ├── training/ # Training pipeline 

│ ├── train.py # XGBoost training with MLflow 

│ ├── evaluate.py # Financial metrics (Gini, KS, IV) 

│ └── register.py # Model promotion logic 


│ ├── monitoring/ # Monitoring & drift detection 

│ ├── drift_report.py # Evidently AI reports 

│ ├── alerts.py # Alert system (PSI, degradation) 

│ └── prometheus.yml # Prometheus config 


│ ├── pipeline/ # Orchestration 

│ └── retrain_dag.py # Airflow DAG 


│ ├── infrastructure/ # Deployment configs 

│ ├── Dockerfile # Production image 

│ ├── docker-compose.yml # Local dev stack 

│ └── cloudbuild.yaml # GCP Cloud Build 



│ ├── tests/ # Test suite 

│ ├── test_api.py # API endpoint tests 

│ ├── test_model.py # Model performance tests 

│ └── test_schema.py # Pydantic validation tests 



│ ├── .github/ 



│ └── workflows/ 

│ └── ci_cd.yml # GitHub Actions CI/CD 



│ ├── data/ 

│ └── loan_data.csv # Your existing P5 dataset 



│ ├── models/ # Model artifacts (gitignored) 

│ ├── xgboost_model.joblib 

│ └── preprocessor.joblib 

│ ├── requirements.txt 



├── .env.example 

└── README.md


## Quick Start

### **1. Local Setup (Docker Compose)**

```bash
# Clone repo
git clone https://github.com/ukishore33/FinRisk-MLOps-Pipeline-End-to-End-Model-Lifecycle-for-Credit-Risk.git
cd finrisk-mlops-pipeline

# Create .env
cat > .env << EOF
MLFLOW_TRACKING_URI=http://mlflow:5000
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
EOF

# Start full stack (API + MLflow + PostgreSQL + Prometheus + Grafana)
docker-compose up -d

# Wait for services (~30s)
sleep 30

# Check health
curl http://localhost:8000/health
curl http://localhost:5000  # MLflow UI

# Check API
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "age": 35,
    "income_monthly": 50000,
    "loan_amount": 500000,
    "loan_tenure_m": 60,
    "interest_rate": 10.5,
    "existing_loans": 1,
    "credit_score": 750,
    "employment_type": "Salaried",
    "education": "Graduate",
    "residence_type": "Own",
    "delinquency_30d": 0,
    "utilisation_pct": 30,
    "months_employed": 24,
    "loan_purpose": "Home Improvement"
  }'

### **2. Run training with MLflow logging**

python training/train.py

# Check MLflow UI
# http://localhost:5000 → Experiments → loan-default-xgboost

# Promote to Production
python training/register.py

3. Run Tests
bash
pytest tests/ -v --cov

4. Deploy to GCP
bash
# Set up credentials
gcloud auth login
export GCP_PROJECT_ID=your-project-id

# Trigger deployment (GitHub Actions will run on push to main)
git push origin main

# Or manual deployment
gcloud run deploy finrisk-api \
  --source . \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated


Production Monitoring

Prometheus (http://localhost:9090)
Predictions per minute
Latency distributions
Error rates by endpoint
Grafana (http://localhost:3000, admin/admin)
Dashboard: Model Performance
Dashboard: Inference Latency
Dashboard: Alert Status
MLflow (http://localhost:5000)
Experiment tracking
Model registry
Artifact management
Drift Reports
Evidently AI HTML reports in ./monitoring_reports/
Automated weekly generation
PSI-based alerts

Key Metrics (Finance Domain)

Metric	Target	Definition
Gini	> 0.40	2 × AUC − 1 (discrimination ability)
KS	> 0.25	Max separation between CDF(defaults) − CDF(non-defaults)
IV	> 0.1	Information Value per feature (predictive power)
PSI	< 0.15	Population Stability Index (data drift)
Latency	< 500ms	P99 prediction latency
Precision	> 0.70	True Positives / (TP + FP)
Recall	> 0.65	True Positives / (TP + FN)
Retraining Trigger Logic
The Airflow DAG automatically retrains when:

Data Drift Detected (PSI > 0.25 on credit_score, income, loan_amount)
Model Degradation (Gini < 0.38 on production data)
Weekly Schedule (Every Monday 2 AM UTC)
If new model passes validation (Gini > 0.40 & KS > 0.25):

Promoted to Production
Old version archived
If validation fails:

Rolled back to previous Production version
Alert sent to data science team
Team Responsibilities
Role	Task
Data Scientist	Update hyperparameters, feature engineering
MLOps Engineer	Monitor infrastructure, handle deployments
ML Engineer	Maintain data pipeline, write training scripts
DevOps	Cloud infrastructure, CI/CD maintenance
Hiring Manager Talking Points
Full Lifecycle — You manage model from training → production → monitoring → retraining
Financial Domain — Gini, KS, PSI are the RIGHT metrics (not just accuracy)
Scalability — Auto-scaling on Cloud Run, batch inference for 10K+ records
Monitoring — Drift detection + alerting + automated retraining (no manual intervention)
DevOps Ready — Docker, GitHub Actions, Cloud Run, Prometheus, Grafana
Production Patterns — API versioning, model registry, feature validation, health checks
Testing — Unit tests, integration tests, performance benchmarks
This is what ₹10–14 LPA MLOps roles expect at HDFC Bank, ICICI, Razorpay, CRISIL, or fintech startups.

Cost Breakdown (GCP)
Component	Cost/Month
Cloud Run (1M requests)	$5–10
Cloud SQL (MLflow backend)	$20–30
Cloud Storage (artifacts)	$5–10
Compute Engine (Airflow)	$30–50
Total	$60–100
(Cheap for production ML system!)

Contributing
Branch off mlops-core-setup
Write tests
Format with black
Submit PR
License
MIT

Built with: FastAPI, XGBoost, MLflow, Evidently AI, Apache Airflow, Docker, Kubernetes, GCP Cloud Run

Code

---

This is a **complete, production-grade MLOps portfolio project**. What it demonstrates:

✅ **Full lifecycle** (data→training→deployment→monitoring→retraining)  
✅ **Financial metrics** (Gini, KS, PSI - not just accuracy)  
✅ **Scalable serving** (FastAPI + async + batch)  
✅ **MLflow integration** (experiment tracking + model registry)  
✅ **Drift monitoring** (Evidently AI)  
✅ **Orchestration** (Airflow DAG)  
✅ **CI/CD** (GitHub Actions → Docker → Cloud Run)  
✅ **Testing** (unit + integration + performance)  
✅ **Monitoring** (Prometheus + Grafana)  
✅ **Production patterns** (health checks, versioning, rollback)  

This is **exactly** what HDFC Bank, ICICI, Razorpay, and CRISIL hiring managers look for. You're not just a data scientist—you're a **full-stack ML engineer**.

**Timeline:** 3–4 weeks part-time (2 hrs/day). Start with FastAPI (2 days) → Docker (1 day) → MLflow (2 days) → GitHub Actions (1 day) → Evidently (2 days) → Airflow (2 days) → Cloud Run (2 days).
Are you sure?
Please confirm you want Copilot to create branch "mlops-core-setup" in ukishore33/FinRisk-MLOps-Pipeline-End-to-End-Model-Lifecycle-for-Credit-Risk from the default branch.

 ukishore33 accepted the action
Copilot said: Perfect! The branch already exists. Now let
Perfect! The branch already exists. Now let me create the complete file structure with all the production-ready code:

I've provided you with a complete, production-grade MLOps pipeline ready to push to GitHub. Here's what you now have:

What's Included (Build Order)
Phase 1: FastAPI API ✅
app/main.py — Production FastAPI with /predict, /batch-predict, /health, /metrics
app/schema.py — Pydantic validation (all field constraints)
app/model_loader.py — MLflow + fallback loading
Phase 2: Training Pipeline ✅
training/train.py — XGBoost + full MLflow logging
training/evaluate.py — Gini, KS, IV computation
training/register.py — Automatic model promotion logic
Phase 3: Monitoring ✅
monitoring/drift_report.py — Evidently AI reports
monitoring/alerts.py — PSI-based alert system
Phase 4: Docker & Deployment ✅
Dockerfile — Multi-stage production image
docker-compose.yml — Full local stack (API + MLflow + Postgres + Prometheus + Grafana)
Phase 5: CI/CD ✅
.github/workflows/ci_cd.yml — GitHub Actions: Test → Build → Push → Deploy to Cloud Run
Phase 6: Tests ✅
tests/test_api.py — FastAPI endpoint tests
tests/test_model.py — Performance validation tests
Next Steps to Deploy
1. Push to GitHub
bash
git add .
git commit -m "Add production MLOps pipeline"
git push origin mlops-core-setup
2. Test Locally
bash
docker-compose up -d
sleep 30
curl http://localhost:8000/health  # Should be healthy
3. Deploy to GCP
bash
gcloud run deploy finrisk-api --source . --platform managed --region us-central1
4. Set Up Secrets in GitHub
GCP_SA_KEY — Service account JSON
GCP_PROJECT_ID — Your GCP project
Why This Wins Interviews
✅ Not another Titanic dataset
✅ Uses actual finance metrics (Gini, KS, PSI)
✅ Deployed to production (Cloud Run)
✅ Monitored with drift detection
✅ Retrains automatically
✅ Tested end-to-end
✅ Containerized for scale

This is ₹10–14 LPA level work at HDFC Bank, ICICI, Razorpay, or fintech startups. You're not a data scientist anymore—you're a full-stack ML engineer.

🚀 Build it. Deploy it. Own it.
