Search Relevance & LLM Response Evaluation Specialist
Worked on large-scale AI training and evaluation projects focused on search relevance, language quality, and LLM response assessment. Responsibilities included rating search results and model-generated responses based on detailed guideline frameworks covering relevance, accuracy, helpfulness, safety, and linguistic quality. Performed fine-grained evaluation of model outputs for fluency, tone, coherence, and instruction-following. Identified issues related to ambiguity, hallucinations, bias, and factual inconsistencies. Contributed to improving model performance by providing structured feedback on error patterns and edge cases. Handled high-volume annotation tasks while maintaining strict quality standards and consistency across evolving guideline updates. Regularly worked with nuanced judgment scenarios requiring deep reading comprehension, contextual reasoning, and linguistic sensitivity. Maintained strong quality metrics through adherence to calibration feedback, consensus alignmen