LLM Response Evaluation & Text Annotation for AI Model Improvement
Worked on large-scale text annotation and evaluation tasks to improve LLM performance and alignment. Responsibilities included rating AI-generated responses based on relevance, accuracy, coherence, safety, and instruction-following. Performed comparative ranking of multiple model outputs and provided structured feedback to enhance reinforcement learning from human feedback (RLHF) pipelines. Contributed to supervised fine-tuning (SFT) by writing high-quality prompt-response pairs, correcting model hallucinations, identifying edge cases, and flagging bias or unsafe outputs. Maintained strict adherence to annotation guidelines, ensured consistency across tasks, and achieved high agreement scores during quality audits. Regularly collaborated with QA teams to refine labeling standards and improve dataset reliability.