AI Model Evaluator & Data Reviewer (Independent Contractor)
This role involved evaluating AI-generated responses for accuracy, reasoning, and guideline adherence, and writing as well as refining prompts for large language model performance testing. The work included ranking and analyzing model outputs for issues like hallucinations and errors, reviewing and labeling TTS audio datasets for script accuracy, and verifying emotional cues and audio clarity. Additionally, multi-turn AI conversations were assessed and rubrics were enhanced for improved clarity and specificity.