AI Model Output Evaluator & Prompt Engineer
This role involved evaluating AI-generated outputs from code review assistant models. I systematically assessed responses to ensure correctness, logical coherence, and identification of hallucinations or model failures. Detailed prompt and test case design was also completed to probe model behavior on tricky edge cases. • Evaluated AI model responses on programming error identification. • Designed prompts and test cases for systematic assessment. • Diagnosed hallucinations, reasoning errors, and incorrect generalizations. • Provided precise technical feedback to enhance model training.