LLM Evaluation and Text Classification Project
Worked as a freelance data annotator on a project to train and evaluate large language models. Responsibilities included classifying responses based on quality, accuracy, and coherence, as well as rating prompt-response pairs to assess instruction following and helpfulness. Contributed to quality assurance by spotting inconsistent or low-quality outputs. The project also involved writing and reviewing user prompts and suggested model completions. Adhered to strict guidelines and feedback cycles to ensure data quality. Labeled hundreds of examples with >95% accuracy based on reviewer feedback.