AI Code Evaluation and Model Training
AI code evaluation project on Anthropic's feedback platform. I evaluate two AI models side by side on real open source GitHub repositories. Tasks include writing coding prompts, reviewing model generated code turn by turn, rating responses across 7 quality dimensions (logic, correctness, naming, organization, interface design, error handling, documentation, production readiness), writing detailed technical feedback, and selecting the better model with justification. I also review model attempts at fixing real GitHub issues, checking for correctness, regressions and code quality. The project involves working with Python, JavaScript and TypeScript codebases of varying size and complexity. Quality measures include structured evaluation rubrics, actionable improvement suggestions with specific file and function references, and multi turn conversations guiding models toward merge ready code.