AI model evaluation contractor at DataAnnotation
I worked as an AI model evaluation contractor, performing structured assessments of large language model (LLM) outputs. My responsibilities included evaluating prompts and responses, ranking them, and analyzing errors to assess performance based on different sets of criteria. I consistently provided high-quality feedback for improving the models’ accuracy and alignment with specifications. • Evaluated and rated LLM model prompt and response pairs. • Conducted error analysis and response ranking for quality assessment. • Tailored prompts to analyze the effect of wording on model outputs. • Delivered over 600 hours of consistent high-quality evaluation work.