LLM Response Evaluation & Preference Ranking
Evaluated large language model responses based on detailed project guidelines, focusing on accuracy, relevance, helpfulness, and instruction-following. Performed preference ranking between multiple responses and provided structured, guideline-aligned justifications for each decision. Reviewed edge cases involving partial correctness, ambiguity, hallucinations, and policy-sensitive content. Ensured consistent and objective evaluations while maintaining high quality standards across large task volumes. Collaborated within a quality-controlled evaluation environment emphasizing reliability and precision.