AI Training / LLM Evaluation (Freelance)
Evaluated large language model (LLM) responses using established rubrics focused on accuracy, relevance, clarity, and safety. Ranked and compared LLM-generated answers to ensure quality and selected top-performing outputs for further processing. Identified and reported policy violations, inconsistencies, and substandard content per detailed guidelines. • Reviewed LLM outputs for rubric compliance and content quality. • Performed response ranking and comparison tasks to assess usefulness. • Documented concise justifications and flagged problematic content. • Maintained consistency and reliability through strict guideline adherence.