LLM Trainer
Worked as an LLM Trainer on the Computer Use Evals project, contributing to training and evaluation of AI agents performing real-world browser and system tasks. Executed 200+ high-quality demonstrations involving multi-step workflows such as file management, Docker operations, Git workflows, and cross-platform system configurations. Performed response evaluation using structured rubrics including task completion, factual accuracy, and AI performance, ensuring strict adherence to SOPs and justification templates. Identified objective claims, validated outputs, and maintained high-quality standards with zero tolerance for deviations in execution paths, naming conventions, and sequence accuracy. Contributed to improving model reliability by providing precise, reproducible, and well-structured training data aligned with production-level requirements.