LLM Text Evaluation & Instruction-Following Review (Outlier/Scale AI)
Evaluated AI-generated text responses for accuracy, reasoning quality, safety, and instruction-following. Performed ranking, classification, rewriting, and chain-of-thought verification tasks. Identified inconsistencies, hallucinations, and safety violations to improve LLM performance across diverse domains.