LLM Output Evaluation & Text Annotation Specialist
Worked on large-scale language model training and evaluation datasets focused on improving reasoning accuracy and instruction-following performance. Tasks included rating and annotating model-generated responses using structured evaluation rubrics. Evaluations covered correctness, logical consistency, hallucination detection, safety compliance, and relevance to prompts. I categorized outputs, identified systematic failure modes, and provided corrective feedback signals used for reinforcement learning fine-tuning (RLHF). Maintained high annotation consistency across large datasets and adhered to strict quality benchmarks. Participated in periodic quality audits and calibration tasks to ensure inter-annotator agreement and dataset reliability.