Code Extensions
In this project, I evaluated and annotated code outputs generated by two AI “personal assistant” models. - Scope: Rated the correctness of tool calls (e.g., browsing, search), verified parameter usage, and assessed each code snippet’s functionality. - Tasks: Labeled ~500 code segments for accuracy, style, and adherence to internal guidelines, then provided feedback on improvements. - Quality Measures: Followed a detailed rubric (covering instruction following, truthfulness, and harmlessness), maintained a 95%+ quality threshold via regular spot checks and QA reviews. - Purpose: Results helped refine the AI models’ coding capabilities and improve overall response quality for future development by use of Delphi Technique.