LLM Trainer (Freelance)
As an LLM Trainer at Snorkel AI, I reviewed and validated coding tasks generated from open-source pull requests. I ensured structure, solution correctness, test case sufficiency, and metadata accuracy for high-quality benchmarks used in AI agent evaluation. I evaluated coding agents using Docker-based oracle and NOP test pipelines and submitted comprehensive review reports to enhance dataset reliability. • Assessed task validity, instruction clarity, and alignment with test requirements. • Analyzed logs for agent outcomes across models like GPT-5 and Claude Sonnet. • Conducted failure mode and timeout troubleshooting to maintain benchmark standards. • Leveraged Docker and internal/proprietary validation workflows for end-to-end pipeline execution.