NLP Data Pipeline Developer for Clinical Journal Structuring
Architected a safety-focused pipeline to extract and structure clinical data from multilingual (Hinglish) journals. Developed tools for cross-lingual synonym and term matching using transformer models. Implemented strict guardrails and verification steps to ensure output grounding and minimize errors. • Built a semantic scorer using Multilingual-BERT for synonym identification. • Enforced grounding using Jaccard Union logic to reduce redundancy. • Designed for high accuracy in extracting relevant clinical information from noisy text. • Used Pydantic-based validation to eliminate potential hallucinations in outputs.