AI Model Output Evaluation & Response Classification (RAG Systems)
Worked on AI-driven RAG and Non-RAG workflows involving structured evaluation and classification of model-generated responses. Assessed outputs for accuracy, relevance, factual consistency, and adherence to annotation guidelines. Performed comparative evaluation and quality rating of responses to support dataset refinement and model improvement. Maintained 96%+ quality reliability while exceeding productivity targets, ensuring consistent labeling standards and reduced rework. Contributed to identifying edge cases and improving guideline clarity for better annotation consistency.