AI Evaluation & Annotation Specialist
As an AI Evaluation & Annotation Specialist, I was responsible for data annotation tailored toward LLM fine-tuning and generative AI dataset creation. My tasks included benchmarking large language models, evaluating AI outputs for quality, and conducting prompt engineering as well as response ranking. I tracked model accuracy, bias, and relevance metrics while ensuring high-quality annotations for instruction tuning datasets. • Labeled and rated text and code outputs for coherence, safety, and correctness. • Evaluated and benchmarked conversational and generative AI system responses. • Designed and executed quality assessment methodologies for LLM results. • Applied prompt engineering to assess and improve LLM task performance.