💼System Features

Lists the key capabilities of the system and the positive value they deliver to the business.

Sequential Processing Pipeline: A single, orchestrated workflow ensures every record goes through a predictable 5-step check (Customer, Liveness, Demo, Bio, Match).
External Liveness Verification: Integrates with the T5-LDS API to stop spoofing attempts (e.g., using a photo of a photo) at the start of the pipeline.
Dual-Layer Matching:
1. Demographic: Uses Elasticsearch and rapidfuzz for high-speed fuzzy matching on names, DOB, etc.
2. Biometric: Integrates with T5-ABIS for high-accuracy 1:N facial identification.
Centralized Adjudication Queue: All DUPLICATE and FRAUD Records are automatically routed to the adjudication table, with all evidence (scores, match IDs, reasons) attached.
Manual Resolution Fast-Path: If an admin manually resolves a record in the adjudication table (resolution_status = 'NEW'), the system will automatically "fast-path" it into the customers table upon re-ingestion, bypassing all checks.

Data Integrity: Removes duplicates and corrects errors, creating a single source of truth.
Operational Efficiency: Drastically reduces manual effort and boosts efficiency through automation.
Fraud Reduction: Proactively identifies and mitigates identity fraud risks.
Enhanced Data Quality: Improves data accuracy and enhances security measures

To achieve the business objectives, the system is built to meet the following technical goals:

High Throughput (Batch): The system must be capable of processing the existing backlog of 100 million records from the data warehouse in an efficient, timely manner.
High Throughput (Real-time): The architecture must support a real-time ingestion rate of 70-100 Records Per Second (RPS).
Accuracy: Implement advanced fuzzy demographic and 1:N biometric matching to accurately identify both hard and soft duplicates.
Resilience: The use of Kafka and Celery's retry mechanisms ensures that if an external API (like T5-LDS) fails, records are not lost and processing is retried.
Scalability: The system scales horizontally. The Celery workers are managed by KEDA to automatically add more pods as the raw_records Kafka lag increases.
Auditability & Manual Review: The system provides a clear audit trail and routes all flagged records to the adjudication queue for a final human decision.

Last updated 1 month ago