RLHF and Prompt Evaluation for Code and Text Models
Worked on large-scale data labeling and evaluation projects focused on improving model reasoning and alignment. Tasks included evaluating and ranking AI model responses using RLHF methodologies, curating golden responses for supervised fine-tuning, and annotating code-based prompts to enhance multi-language routing accuracy. Ensured data quality through double-blind reviews, consistency checks, and adherence to platform-specific annotation guidelines.