AI Code Evaluator & Data Annotator
I evaluated AI-generated JavaScript and TypeScript code samples for correctness, logical soundness, and style quality. I applied the CARE framework to rate LLM outputs, focusing on reasoning accuracy, relevance, and explanation quality. I provided ranked comparative feedback to support RLHF training pipelines and documented subtle bugs and edge case failures. • Evaluated large batches of LLM-generated code for multiple criteria • Applied structured quality assessment via the CARE framework • Provided comparative feedback and rankings for RLHF model training • Maintained annotation consistency and high inter-annotator reliability