AI Product Photo Detector

Overview

AI Product Photo Detector is a complete MLOps system -- not just a model, but a full production pipeline for binary classification of real vs AI-generated product images. Built around EfficientNet-B0 with Grad-CAM explainability, trained on the CIFAKE dataset (CIFAR-10 real images vs Stable Diffusion AI-generated counterparts). The project covers every stage of the ML lifecycle: data versioning, multi-environment training, experiment tracking, model registry, containerized serving, infrastructure-as-code, CI/CD automation, and production monitoring.

Results

Metric	Value
Accuracy	92.8%
F1-Score	93.1%
Precision	92.5%
Recall	93.7%
Inference Latency	131ms

The best performing model (Run 3, lr=3e-4) was promoted to production through the MLflow model registry. A quality gate enforces minimum thresholds of accuracy ≥ 0.85 and F1 ≥ 0.80 before any model can be deployed.

Key Features

Feature	Description
ML Model	EfficientNet-B0 with Grad-CAM explainability for interpretable predictions
API	FastAPI with JWT authentication, rate limiting, batch and explain endpoints
Training	3 modes: Local/Docker, Google Colab, and Vertex AI
Monitoring	Prometheus + Grafana with 20+ metrics, 13 alert rules, and drift detection
Infrastructure	Terraform (5 modules, 2 environments) + Docker (5 services) + Cloud Run
CI/CD	GitHub Actions with 5 workflows and automated quality gates
Testing	316 tests, 70%+ coverage across 3 levels (unit, integration, E2E)

Fonctionnalité	Description
Modèle ML	EfficientNet-B0 avec explicabilité Grad-CAM pour des prédictions interprétables
API	FastAPI avec authentification JWT, rate limiting, endpoints batch et explain
Entraînement	3 modes : Local/Docker, Google Colab et Vertex AI
Monitoring	Prometheus + Grafana avec 20+ métriques, 13 règles d'alerte et détection de dérive
Infrastructure	Terraform (5 modules, 2 environnements) + Docker (5 services) + Cloud Run
CI/CD	GitHub Actions avec 5 workflows et quality gates automatisés
Tests	316 tests, 70%+ de couverture sur 3 niveaux (unitaire, intégration, E2E)

System Architecture

The pipeline follows a linear flow from data to production monitoring: Data (DVC + GCS) → Training (PyTorch) → Registry (MLflow) → Package (Docker) → Infra (Terraform) → CI/CD (GitHub Actions) → Deploy (Cloud Run) → Serve (FastAPI) → Monitor (Prometheus + Grafana) Each stage is automated and version-controlled. DVC tracks dataset versions in Google Cloud Storage, MLflow records every training run, Docker packages the model and API, Terraform provisions cloud infrastructure, GitHub Actions orchestrates testing and deployment, and Prometheus collects production metrics.

Model Architecture

The classification pipeline is built on EfficientNet-B0 with transfer learning:

Base Model -- EfficientNet-B0 pretrained on ImageNet (5.3M parameters)
Custom Head -- Dropout (0.3) + Linear layer for binary classification
Explainability -- Grad-CAM generates visual heatmaps highlighting the regions driving each prediction

Training Configuration

Parameter	Value
Dataset	CIFAKE (CIFAR-10 + Stable Diffusion)
Architecture	EfficientNet-B0
Optimizer	AdamW (weight decay)
Scheduler	CosineAnnealingLR
Best Learning Rate	3e-4 (Run 3)
Parameters	5.3M (transfer learning)

MLOps Pipeline

Data Versioning

DVC (Data Version Control) manages dataset versions with remote storage on Google Cloud Storage. Dataset changes are tracked alongside code in Git, ensuring full reproducibility.

Experiment Tracking

MLflow manages the complete experiment lifecycle:

Metrics Logging -- Loss, accuracy, F1-score, precision, recall per epoch
Model Registry -- Version control with staging/production promotion
Artifact Storage -- Model weights, confusion matrices, training curves, Grad-CAM samples
Hyperparameter Tracking -- Learning rate, batch size, augmentation config, optimizer settings

Training Modes

Mode	Environment	Use Case
Local / Docker	Local machine or Docker container	Development and quick iterations
Google Colab	Cloud notebook with free GPU	Prototyping with GPU acceleration
Vertex AI	Google Cloud managed training	Production training at scale

Grad-CAM Explainability

Every prediction can include a Grad-CAM heatmap overlay via the /predict/explain endpoint, showing which image regions contributed most to the classification decision.

Mode	Environnement	Cas d'usage
Local / Docker	Machine locale ou conteneur Docker	Développement et itérations rapides
Google Colab	Notebook cloud avec GPU gratuit	Prototypage avec accélération GPU
Vertex AI	Entraînement managé Google Cloud	Entraînement de production à grande échelle

API and Serving

FastAPI serves the model with a production-grade endpoint architecture:

Endpoint	Method	Description
/predict	POST	Single image classification with confidence score
/predict/batch	POST	Batch prediction for multiple images
/predict/explain	POST	Prediction with Grad-CAM heatmap overlay
/health	GET	Health check and model status
/metrics	GET	Prometheus-compatible metrics

The API includes a JWT authentication pipeline and configurable rate limiting to prevent abuse in production.

Endpoint	Méthode	Description
/predict	POST	Classification d'une image avec score de confiance
/predict/batch	POST	Prédiction par lots pour plusieurs images
/predict/explain	POST	Prédiction avec heatmap Grad-CAM
/health	GET	Vérification de l'état de santé et statut du modèle
/metrics	GET	Métriques compatibles Prometheus

Infrastructure

Terraform

Infrastructure is defined as code across 5 modules managing 2 environments (dev and prod):

Module	Purpose
Cloud Run	Serverless container deployment
IAM	Service accounts and permissions
Monitoring	Alert policies and notification channels
Networking	VPC and firewall rules
Storage	GCS buckets for data and artifacts

Docker Compose

The local development stack runs 5 services:

Service	Port	Purpose
FastAPI	8000	Inference API
MLflow	5000	Experiment tracking UI
Streamlit	8501	Web prediction interface
Prometheus	9090	Metrics collection
Grafana	3000	Monitoring dashboards

Cloud Run Deployment

Production deployment targets Google Cloud Run with auto-scaling, HTTPS, and custom domain configuration. Total infrastructure cost is kept under $0.50/month through Cloud Run's scale-to-zero capability and efficient resource allocation.

Module	Fonction
Cloud Run	Déploiement serverless de conteneurs
IAM	Comptes de service et permissions
Monitoring	Politiques d'alerte et canaux de notification
Networking	VPC et règles de pare-feu
Storage	Buckets GCS pour les données et artefacts

Service	Port	Fonction
FastAPI	8000	API d'inférence
MLflow	5000	Interface de suivi des expériences
Streamlit	8501	Interface web de prédiction
Prometheus	9090	Collecte de métriques
Grafana	3000	Tableaux de bord de monitoring

CI/CD

GitHub Actions automates the full build-test-deploy cycle with 5 workflows:

CI Pipeline

Ruff -- Python linting and formatting checks
mypy -- Static type checking
pytest -- 316 tests across unit, integration, and E2E levels
CodeQL -- Security vulnerability scanning
Docker -- Container build and validation

CD Pipeline

Auto-deploy -- Triggered on merge to main after CI passes
Smoke Test -- Post-deployment health and prediction validation
Rollback -- Automatic rollback on failed smoke tests
Quality Gate -- Model accuracy ≥ 0.85 and F1 ≥ 0.80 enforced before deployment

Monitoring

Prometheus Metrics

20+ custom metrics tracked in production:

Request metrics -- Latency histograms, throughput counters, error rates
Model metrics -- Prediction confidence distribution, class balance
System metrics -- Memory usage, active requests, queue depth
Drift metrics -- Feature distribution shifts detected via statistical tests

Alert Rules

13 alert rules configured across categories:

Availability -- Service down, health check failures
Performance -- Latency spikes, error rate thresholds
Model -- Prediction drift, confidence anomalies
Infrastructure -- Resource saturation, container restarts

Grafana Dashboards

Pre-configured dashboards for API performance, model behavior, and infrastructure health with automatic alerting via notification channels.

Cost Management

The entire production deployment runs at under $0.50/month:

Resource	Cost
Cloud Run	~$0.00 (scale-to-zero, free tier)
GCS Storage	~$0.01 (small dataset and artifacts)
Container Registry	~$0.10 (image storage)
Monitoring	~$0.00 (free tier)
Total	< $0.50/month

Ressource	Coût
Cloud Run	~0.00$ (scale-to-zero, offre gratuite)
GCS Storage	~0.01$ (dataset et artefacts de petite taille)
Container Registry	~0.10$ (stockage des images)
Monitoring	~0.00$ (offre gratuite)
Total	< 0.50$/mois

Tech Stack

Category	Technologies
ML / DL	PyTorch, EfficientNet-B0, torchvision, Grad-CAM
API	FastAPI, Uvicorn, Pydantic, JWT auth
MLOps	MLflow, DVC, Model Registry, Vertex AI
Monitoring	Prometheus, Grafana, structlog, drift detection
Frontend	Streamlit
Infrastructure	Terraform, Docker, Docker Compose, Google Cloud Run
CI/CD	GitHub Actions (5 workflows), CodeQL, quality gates
Testing	pytest (316 tests), Ruff, mypy, 70%+ coverage
Cloud	Google Cloud Platform (GCS, Cloud Run, Vertex AI, IAM)
Dataset	CIFAKE (CIFAR-10 + Stable Diffusion)

AI Product Photo Detector

Architecture and evidence

Overview

Results

Key Features

System Architecture

Model Architecture

Training Configuration

Configuration d'Entraînement

MLOps Pipeline

Data Versioning

Experiment Tracking

Training Modes

Grad-CAM Explainability

Versionnage des Données

Suivi des Expériences

Modes d'Entraînement

Explicabilité Grad-CAM

API and Serving

Infrastructure

Terraform

Docker Compose

Cloud Run Deployment

Terraform

Docker Compose

Déploiement Cloud Run

CI/CD

CI Pipeline

CD Pipeline

Pipeline CI

Pipeline CD

Monitoring

Prometheus Metrics

Alert Rules

Grafana Dashboards

Métriques Prometheus

Règles d'Alerte

Tableaux de Bord Grafana

Cost Management

Tech Stack