AI Trainer / LLM Evaluator
This ongoing role focuses on evaluating and training large language models (LLMs) with an emphasis on prompt/response quality and factual accuracy. Responsibilities include rating LLM outputs using standardized rubrics and spotting common error patterns such as hallucinations and irrelevant responses. The experience also entailed creating training examples and supporting dataset quality through QA spot-checking. • Assessed LLM responses for instruction-following and content clarity based on defined grading criteria. • Generated prompt/response training examples under strict formatting and constraint adherence. • Identified and documented common LLM failure modes, providing targeted feedback for improvements. • Performed QA checks on labeled datasets, identifying and addressing issues like duplicates and inconsistency.