AI Data Engineer – LLM Evaluation & Financial Model QA
Reviewed and scored large language model (LLM) outputs across English, Spanish, and French datasets. Applied structured rubrics to measure helpfulness, factuality, coherence, and safety compliance. Authored and refined prompts for red-teaming exercises, identifying potential policy violations and recommending safer rewrites. Collaborated with cross-functional teams to ensure consistent scoring and bias mitigation. Achieved over 99% QA pass rate and contributed to improved LLM alignment and dataset integrity.