AI Evaluation & Annotation Specialist
As an AI Evaluation & Annotation Specialist, I performed Reinforcement Learning from Human Feedback (RLHF) and multimodal data labeling across text, image, and video tasks. I consistently maintained above-benchmark inter-annotator agreement and robust dataset quality assurance in high-throughput remote workflows. I ensured rigorous calibration, prompt failure documentation, and comparative analysis across multiple large language models. • Annotated and validated 70,000+ data points spanning text, images, and videos using RLHF frameworks • Conducted comparative A/B evaluations on Claude, GPT-4o, Gemini, Grok, and Perplexity output • Performed in-depth quality audits, flagging safety violations, unsupported claims, and instruction-following errors • Leveraged tools including Outlier AI, Remotasks, Labelbox, CVAT, Supervisely, and V7 Darwin