AI Data Annotation & LLM Evaluation Specialist (Outlier AI Projects)
I contributed to large-scale AI training and evaluation projects focused on improving the performance, reasoning, and safety of large language models (LLMs). My responsibilities included: Performing Reinforcement Learning from Human Feedback (RLHF) evaluations Rating AI-generated responses based on accuracy, coherence, helpfulness, and safety Writing high-quality prompts and ideal responses for supervised fine-tuning (SFT) Identifying hallucinations, bias, and unsafe outputs during red teaming exercises Comparing multiple model outputs and selecting the most aligned response Applying structured rubrics and scoring guidelines to ensure consistency Providing detailed justification notes to support quality audits