AI Data Labeling & LLM Evaluation Specialist
I specialized in evaluating and labeling AI-generated text outputs to train and enhance language models. My responsibilities included identifying common LLM failure modes, detecting hallucinations, and assessing response quality based on specified rubrics. I applied structured human judgment for RLHF workflows, provided precise feedback, and ensured clarity, factual accuracy, and alignment with user intent. • Conducted preference ranking and evaluation of AI responses • Annotated instruction-following, clarity, and hallucination issues • Compared model outputs for quality and user alignment • Delivered written justifications to support model improvement