Afterquery
Worked as a Data Annotator and AI Model Evaluator on a full-stack web development benchmarking project (0–1 application builds). Designed complex, realistic prompts simulating real-world developer workflows (e.g., React, Node.js, authentication, database integration) to test AI coding agents. Evaluated model outputs using structured rubrics across dimensions such as planning, instruction-following, code quality, debugging ability, tool usage, and context awareness. Conducted side-by-side model comparisons, tagged error types (e.g., context ignored, incomplete debugging, security risks), and provided detailed scoring rationales to ensure consistent and high-quality evaluation.