# System Features

### Key Features

* **Sequential Processing Pipeline:** A single, orchestrated workflow ensures every record goes through a predictable 5-step check (Customer, Liveness, Demo, Bio, Match).
* **External Liveness Verification:** Integrates with the **T5-LDS** API to stop spoofing attempts (e.g., using a photo of a photo) at the start of the pipeline.
* **Dual-Layer Matching:**
  1. **Demographic:** Uses Elasticsearch and `rapidfuzz` for high-speed fuzzy matching on names, DOB, etc.
  2. **Biometric:** Integrates with **T5-ABIS** for high-accuracy 1:N facial identification.
* **Centralized Adjudication Queue:** All `DUPLICATE` and `FRAUD` Records are automatically routed to the `adjudication` table, with all evidence (scores, match IDs, reasons) attached.
* **Manual Resolution Fast-Path:** If an admin manually resolves a record in the `adjudication` table (`resolution_status = 'NEW'`), the system will automatically "fast-path" it into the `customers` table upon re-ingestion, bypassing all checks.

### Key Benefits

* **Data Integrity:** Removes duplicates and corrects errors, creating a single source of truth.
* **Operational Efficiency:** Drastically reduces manual effort and boosts efficiency through automation.
* **Fraud Reduction:** Proactively identifies and mitigates identity fraud risks.
* **Enhanced Data Quality:** Improves data accuracy and enhances security measures

### Technical Goals & Performance Targets

To achieve the business objectives, the system is built to meet the following technical goals:

1. **High Throughput (Batch):** The system must be capable of processing the existing backlog of **100 million records** from the data warehouse in an efficient, timely manner.
2. **High Throughput (Real-time):** The architecture must support a real-time ingestion rate of **70-100 Records Per Second (RPS)**.
3. **Accuracy:** Implement advanced fuzzy demographic and 1:N biometric matching to accurately identify both hard and soft duplicates.
4. **Resilience:** The use of Kafka and Celery's retry mechanisms ensures that if an external API (like T5-LDS) fails, records are not lost and processing is retried.
5. **Scalability:** The system scales horizontally. The Celery workers are managed by KEDA to automatically add more pods as the `raw_records` Kafka lag increases.
6. **Auditability & Manual Review:** The system provides a clear audit trail and routes all flagged records to the `adjudication` queue for a final human decision.
