Software Engineer, AI Trainer | Contract
As an AI Trainer and Software Engineer, I authored evaluation problems for code migration benchmark tests used for assessing AI agents. My work involved designing grading pipelines and developing testing environments tailored for code-related benchmarks in a restricted docker run-time. I validated whether AI-generated code changes matched golden responses and analyzed failure modes for interoperability or precision errors. • Developed and maintained docker test environments across Python, Go, JavaScript, and TypeScript. • Authored grading pipelines and functionally tested pytest suites with significant test coverage. • Conducted rubric-based scoring for multiple programming languages as part of the RLHF pipeline. • Performed detailed failure analysis for cross-language and algorithm errors.