Overview
The client had a heavy manual anonymization step before CV scoring. Each candidate profile required repetitive editing of protected fields.
The challenge
Incoming CVs had mixed formats and languages. The system needed robust OCR, accurate entity detection, and immutable redaction logs while preserving source formatting.

Our approach
We implemented OCR + private NER inference on EU-hosted infrastructure. The pipeline outputs a redacted CV and a machine-readable compliance log for each processed document.
What we delivered
- •OCR ingestion for scanned and text-native CVs
- •Private NER model for protected attributes
- •Structured JSON redaction log for each document
- •Containerized module integrated into existing HR platform
Architecture and implementation
- •No external API dependency for personal data processing
- •Redaction and logging stages isolated for auditability
- •EU-hosted deployment with predictable scaling
Delivery timeline
Week 1
Input and policy mapping
Defined protected fields and redaction rules.
Week 2-5
Pipeline implementation
Built OCR, NER, and formatting-preserving redaction flow.
Week 6-7
Integration and QA
Connected to client workflow and validated multilingual accuracy.
Week 8
Production handoff
Stabilized operations and delivered runbook.
Screens and artifacts

CV processing time
Technology
Compliance and controls
- Data residency enforced on EU infrastructure
- Audit-ready redaction logs
- Protected-attribute masking before downstream AI scoring
Key outcomes
- ✓Massive reduction in recruiter manual effort
- ✓Standardized anonymization across languages and formats
- ✓Compliance traceability built into every processed record
Related case studies
This case study is shown at a summary level due to NDA. Full technical detail and references are available on request. Client name and some internal policy thresholds are omitted. Contact us.
