LLM Response Evaluation Project Contributor
In the Large Language Model (LLM) Response Evaluation Project, I performed RLHF-based assessments of AI-generated responses. My tasks included ranking outputs, evaluating reasoning quality, and providing feedback to aid model improvement. I followed detailed guidelines to maintain high standards of consistency. • Compared and ranked multiple LLM outputs for reasoning, accuracy, and usefulness. • Detected hallucinations, inconsistencies, and factual errors in AI-generated text. • Documented structured feedback and thorough justifications after each evaluation. • Used internal evaluation platforms to ensure standardized annotation processes.