AI Model Evaluator (Freelance / Contract)
Evaluated and annotated large language model outputs for tasks such as summarisation, question answering, code generation, and creative writing. Assessed model responses for factual accuracy, harmlessness, helpfulness, and instruction compliance, supporting RLHF training pipelines. Documented failure modes, hallucinations, and reasoning errors, providing structured rationales for each annotation decision. • Maintained a quality score above 95% across annotation tasks. • Crafted adversarial prompts to test model robustness in red-teaming exercises. • Used inter-annotator agreement metrics to ensure annotation reliability. • Produced detailed written evaluations to inform model improvements.