AI Systems Evaluation Analyst (RLHF) — Contractor
As an AI Systems Evaluation Analyst (RLHF), I evaluated browser-based AI agents for reasoning quality, factual accuracy, and success in real-world user tasks. My work focused on structured side-by-side model comparisons, fact-checking, and detailed analysis to detect errors and performance gaps. I provided analytical feedback and justifications that contributed to improved AI model alignment and system benchmarks. • Assessed and rated LLM outputs for reliability and logical consistency. • Conducted structured evaluations including SxS comparisons and hallucination detection. • Generated analytical reports that informed benchmark and alignment improvements. • Utilized proprietary or internal evaluation tools provided by the client in a remote setting.