AI Training & Evaluation Specialist (Contract)
As an AI Training & Evaluation Specialist at Scale AI, I evaluated and refined model outputs across multiple knowledge domains, ensuring strict adherence to guidelines and benchmarks. I designed domain-specific prompts to challenge and test reasoning abilities of large language models as part of high-impact research projects. Multi-dimensional rubrics were used for granular response assessment to enhance model reliability. • Evaluated language model outputs for accuracy, factual consistency, logical flow, and ethical alignment. • Crafted and applied multi-step prompts to comprehensively test AI instruction-following and reasoning. • Documented evaluation justifications in structured audit-ready language for research traceability. • Collaborated asynchronously, adapting to evolving protocols and maintaining data confidentiality.