Advance AI Trainer
Project Scope: Participated in a LLM evaluation project focused on assessing and comparing the performance of different language models in generating human-like text responses. The primary goal was to evaluate model-generated responses to a diverse set of prompts, focusing on linguistic naturalness and error categorization. Data Labeling: -Assessed the naturalness and conversational quality of AI-generated responses for each prompt. -Ranked the responses from different models for each prompt, providing detailed justifications and qualitative feedback for the assigned ratings. Project Size: -Evaluated responses to 20,000 prompts across different LLM. Quality Measures: -Utilized human review to maintain high annotation accuracy and ensure nuanced evaluation beyond automated metrics.