AI Output Evaluator and Tester
I contributed to AI implementation projects by testing custom chat interfaces integrated with Gemini AI and DeepSeek models. My focus was on identifying and labeling hallucinations, conducting detailed evaluations of output responses. The work ensured quality and reliability of model-generated responses. • Tested and rated numerous AI outputs for factual accuracy • Labeled instances of hallucinations and errors for further model tuning • Documented evaluation procedures for future annotation standards • Played a crucial role in iterative model improvement cycles