# Project Scope

The scope of the **EBS-TECH 5 Deduplication Solution** is to implement a comprehensive system that identifies and eliminates duplicate customer records in MTN Nigeria's database. This is achieved through the **Deduplication Engine**, which acts as the core engine containing advanced **biometric matching (T5-ABIS)** and **demographic analysis (Elasticsearch)**. The solution aims to improve data integrity and reduce fraudulent activities by targeting a processing throughput of **70-100 Records Per Second (RPS)**.

This project is executed in two phases.

### <mark style="color:blue;">**Phase 1: Deduplication Engine Logic (Completed)**</mark>

This phase focused on the core processing logic and the validation of the biometric/demographic pipelines.

#### In-Scope Items

* **Biometric & Demographic Engine:**
  * Integration of **T5-ABIS** for 1:N facial/fingerprint matching (Tech5).
  * Implementation of **Elasticsearch** for fuzzy demographic matching (EBS).
* **Processing of Test Records:**
  * Ingestion and processing of **100,000 test records** from MTN’s database to validate the sequential 5-step pipeline (Pre-check, Liveness, Demo, Bio, Match).
* **Identification & Flagging:**
  * Automated routing of duplicate entries to the **Adjudication Table**.
  * Implementation of "Fail-Fast" logic for **Liveness Detection (T5-LDS)**.

#### Out-of-Scope Items

* Full-featured Dashboards or monitoring tools for MTN staff (Frontend UI).
* Hardware upgrades for field agents.

### <mark style="color:orange;">**Phase 2: Middleware Development (The Deduplication Engine - Completed)**</mark>

This phase focuses on the operationalization of the Middleware, which is the fully integrated Deduplication Engine.

#### In-Scope Items

* **Middleware (Deduplication Engine) Development:**
  * Development of the **Middleware**, which acts as the comprehensive Deduplication Engine containing both the **Biometrics (Tech5)** and **Demographics (EBS)** layers.
  * Exposure of the engine via the **API Service** (`/api/v1/ingest/` and `/records`) to serve as the integration point for MTN's legacy systems.
  * Implementation of **Kafka** producers for reliable data buffering within the middleware.
* **Scaling of the Middleware:**
  * Deployment of **KEDA (Kubernetes Event-driven Autoscaling)** to allow the middleware to process MTN's full customer database (100M+ records) and handle traffic spikes.

#### Essential Integrations

* **SIM Registration Platform Connection:**
  * Real-time integration via the "High-Speed Stream" path to prevent duplicate registrations at the point of entry.
* **Fraud Management System Alerts:**
  * Publishing fraud/duplicate events to the `dedup_results` Kafka topic for consumption by MTN's Fraud Management systems.
* **Basic Dashboard:**
  * Provisioning of the **Admin API** to support basic monitoring of deduplication status and queue depths.

#### Out-of-Scope Items

* **Core Engine Modifications:** Any structural modifications to the existing deduplication engine logic (completed in Phase 1).
* **Field Hardware:** Field agent hardware upgrades or replacements.
* **Advanced Analytics:** Business intelligence or reporting beyond duplicate tracking and basic system health (e.g., The Data Warehouse ETL process).
