πŸ’ΌSystem Features

Lists the key capabilities of the system and the positive value they deliver to the business.

Key Features

  • Sequential Processing Pipeline: A single, orchestrated workflow ensures every record goes through a predictable 5-step check (Customer, Liveness, Demo, Bio, Match).

  • External Liveness Verification: Integrates with the T5-LDS API to stop spoofing attempts (e.g., using a photo of a photo) at the start of the pipeline.

  • Dual-Layer Matching:

    1. Demographic: Uses Elasticsearch and rapidfuzz for high-speed fuzzy matching on names, DOB, etc.

    2. Biometric: Integrates with T5-ABIS for high-accuracy 1:N facial identification.

  • Centralized Adjudication Queue: All DUPLICATE and FRAUD Records are automatically routed to the adjudication table, with all evidence (scores, match IDs, reasons) attached.

  • Manual Resolution Fast-Path: If an admin manually resolves a record in the adjudication table (resolution_status = 'NEW'), the system will automatically "fast-path" it into the customers table upon re-ingestion, bypassing all checks.

Key Benefits

  • Data Integrity: Removes duplicates and corrects errors, creating a single source of truth.

  • Operational Efficiency: Drastically reduces manual effort and boosts efficiency through automation.

  • Fraud Reduction: Proactively identifies and mitigates identity fraud risks.

  • Enhanced Data Quality: Improves data accuracy and enhances security measures

Technical Goals & Performance Targets

To achieve the business objectives, the system is built to meet the following technical goals:

  1. High Throughput (Batch): The system must be capable of processing the existing backlog of 100 million records from the data warehouse in an efficient, timely manner.

  2. High Throughput (Real-time): The architecture must support a real-time ingestion rate of 70-100 Records Per Second (RPS).

  3. Accuracy: Implement advanced fuzzy demographic and 1:N biometric matching to accurately identify both hard and soft duplicates.

  4. Resilience: The use of Kafka and Celery's retry mechanisms ensures that if an external API (like T5-LDS) fails, records are not lost and processing is retried.

  5. Scalability: The system scales horizontally. The Celery workers are managed by KEDA to automatically add more pods as the raw_records Kafka lag increases.

  6. Auditability & Manual Review: The system provides a clear audit trail and routes all flagged records to the adjudication queue for a final human decision.

Last updated