AI Evaluation Analyst
As an AI Evaluation Analyst at RWS, I performed structured evaluation of large language model outputs for accuracy, relevance, and alignment. My work focused on designing and refining evaluation rubrics to guide objective model assessment, with particular attention to reasoning failures and policy compliance. I supported model training teams by generating analytical reports and collaborating on domain-specific evaluation criteria. • Assessed generative AI text outputs for logical consistency and behavioral standards. • Designed and customized evaluation rubrics for varied reasoning tasks. • Identified ambiguities, hallucinations, and safety issues in model responses. • Informed benchmarking and continuous model improvement through detailed documentation.