AI Model Evaluator & Prompt Rater | Outlier AI
I evaluate LLM and chatbot responses for factuality, coherence, safety, and clarity in large-scale language model projects. I provide detailed feedback to improve AI outputs and reduce model errors across multiple iterations. My work supports the creation of scalable, high-quality datasets for advanced language model training. • Labeled and rated thousands of text samples for different evaluation tasks. • Assessed accuracy, hallucinations, relevance, and other key output parameters. • Provided in-depth prompt/output analysis to support model refinement. • Consistently achieved top evaluator performance scores in ongoing projects.