LLM Response Evaluation & Text Classification for AI Model Training
The project included evaluating AI-generated responses for relevance, factual accuracy, coherence, safety, and instruction-following quality. Performed pairwise comparison tasks to rank model outputs and provided detailed reasoning to support evaluation decisions as part of RLHF (Reinforcement Learning from Human Feedback) workflows. Additionally, completed large-scale text classification tasks including sentiment analysis, topic categorization, and intent labeling across diverse datasets. Ensured strict adherence to annotation guidelines and maintained consistency across thousands of data points.