πΌSystem Features
Lists the key capabilities of the system and the positive value they deliver to the business.
Key Features
Sequential Processing Pipeline: A single, orchestrated workflow ensures every record goes through a predictable 5-step check (Customer, Liveness, Demo, Bio, Match).
External Liveness Verification: Integrates with the T5-LDS API to stop spoofing attempts (e.g., using a photo of a photo) at the start of the pipeline.
Dual-Layer Matching:
Demographic: Uses Elasticsearch and
rapidfuzzfor high-speed fuzzy matching on names, DOB, etc.Biometric: Integrates with T5-ABIS for high-accuracy 1:N facial identification.
Centralized Adjudication Queue: All
DUPLICATEandFRAUDRecords are automatically routed to theadjudicationtable, with all evidence (scores, match IDs, reasons) attached.Manual Resolution Fast-Path: If an admin manually resolves a record in the
adjudicationtable (resolution_status = 'NEW'), the system will automatically "fast-path" it into thecustomerstable upon re-ingestion, bypassing all checks.
Key Benefits
Data Integrity: Removes duplicates and corrects errors, creating a single source of truth.
Operational Efficiency: Drastically reduces manual effort and boosts efficiency through automation.
Fraud Reduction: Proactively identifies and mitigates identity fraud risks.
Enhanced Data Quality: Improves data accuracy and enhances security measures
Technical Goals & Performance Targets
To achieve the business objectives, the system is built to meet the following technical goals:
High Throughput (Batch): The system must be capable of processing the existing backlog of 100 million records from the data warehouse in an efficient, timely manner.
High Throughput (Real-time): The architecture must support a real-time ingestion rate of 70-100 Records Per Second (RPS).
Accuracy: Implement advanced fuzzy demographic and 1:N biometric matching to accurately identify both hard and soft duplicates.
Resilience: The use of Kafka and Celery's retry mechanisms ensures that if an external API (like T5-LDS) fails, records are not lost and processing is retried.
Scalability: The system scales horizontally. The Celery workers are managed by KEDA to automatically add more pods as the
raw_recordsKafka lag increases.Auditability & Manual Review: The system provides a clear audit trail and routes all flagged records to the
adjudicationqueue for a final human decision.
Last updated