Large-Scale Evaluation and Annotation of AI-Generated Text for Quality and Safety
Conducted large-scale text data labeling and evaluation projects focused on AI-generated content quality, safety, and alignment. Responsibilities included annotating and rating model outputs for coherence, factual accuracy, contextual relevance, bias, and harmful content across multiple NLP tasks such as question answering, summarization, and prompt–response generation. Designed and applied detailed annotation guidelines, performed multi-pass quality checks, and contributed to RLHF pipelines by generating high-quality prompts and reference responses. Project scale exceeded millions of text samples, with strict inter-annotator agreement thresholds and automated validation workflows to ensure consistency and reliability.