LLM Response Evaluation & Code Quality Annotation (RLHF)
Performed RLHF-based data annotation tasks on the Outlier (Scale AI) platform focused on evaluating and ranking Large Language Model (LLM) outputs. The project involved grading and labeling AI-generated responses across multiple quality dimensions including coding correctness, logical reasoning, factual accuracy, and instruction adherence. Key tasks included side-by-side response comparison and preference ranking, identifying hallucinations and factual errors in model outputs, annotating AI-generated Python and JavaScript code for syntax errors, edge cases, and performance issues, and classifying responses based on helpfulness, harmlessness, and overall quality. Followed strict rubric-based annotation guidelines with rigorous quality control standards to ensure data consistency and reliability for model fine-tuning pipelines.