AI Language Evaluator (Contract / Project-Based)
As an AI Language Evaluator at Welocalize, contributed to AI training and model reliability through systematic prompt authoring, response evaluation, and fact verification. Consistently reviewed and rated AI-generated English and multilingual outputs for clarity, correctness, and safety alignment. Applied evolving guidelines for annotation and evaluation to ensure high data quality and inter-reviewer consistency. • Conducted adversarial testing by stress-testing models with ambiguous and edge-case prompts. • Delivered detailed written justifications for evaluation decisions supporting iterative quality improvements. • Flagged and documented systematic weaknesses in LLM behaviour such as overgeneralisation and cultural misalignment. • Performed side-by-side ranking of model outputs to assess tone drift, logical consistency, and safety risks.