AI Content Evaluator & Data Labeler .
Evaluated Large Language Model (LLM) responses based on strict dimensions: Truthfulness, Instruction Following, Harmlessness, and Localization. Provided detailed logical justifications for each rating and identified model hallucinations to improve response accuracy.