Conversational AI Evaluation & Multi-Modal Data Annotation for LLM Training
This project involved large-scale conversational AI evaluation and multi-modal data annotation to support the training and fine-tuning of large language models (LLMs). I performed structured tasks such as response evaluation and rating using detailed rubrics, pairwise comparison of AI outputs, RLHF-based feedback, named entity recognition (NER), text classification, sentiment labeling, and contextual accuracy assessments. I also annotated image and audio datasets for classification and validation tasks according to defined taxonomies. The project covered thousands of data samples across multiple batches, contributing to iterative model improvement cycles. I adhered to strict quality standards, including consistent guideline application, participation in calibration sessions to improve inter-annotator agreement, internal quality reviews, and maintaining 95%+ accuracy benchmarks while meeting productivity and turnaround targets in a secure remote environment.