AI reasoning, Safety, & Logic Evaluation
Evaluated model‑generated reasoning across complex prompts, focusing on logical coherence, factual accuracy, and safety compliance. Assessed multi‑step reasoning chains, identified hallucinations, and flagged unsafe or misleading outputs. Provided structured ratings and detailed feedback to improve model alignment, truthfulness, and reliability. This work required strong analytical judgment, attention to subtle errors, and consistent application of nuanced evaluation guidelines across diverse reasoning tasks.