LLM Output Evaluation and Annotation
Participated in the evaluation and annotation of Large Language Model (LLM) outputs, focusing on response quality, reasoning accuracy, factual correctness, and alignment with user intent. Tasks included classifying responses, identifying reasoning errors, bias, hallucinations, and policy compliance issues, as well as providing structured feedback to improve model performance. Emphasis was placed on consistency, quality standards, and careful judgment rather than high-volume mechanical labeling. Quality assurance was maintained through guideline adherence and iterative review.