AI Model Evaluator – Outlier
As an LLM Systems Engineer & Prompt Evaluator at Outlier, I evaluated and refined prompts and outputs produced by large language models. This included developing and executing evaluation criteria to assess model output quality, consistency, and relevance for various content generation and communication workflows. I iterated on prompts to improve both outputs and task accuracy through systematic testing and documentation. • Evaluated LLM outputs for quality and alignment with workflow requirements. • Developed structured evaluation criteria and feedback loops for prompt improvement. • Applied RLHF concepts and reviewed generative outputs for reliability. • Documented findings and best practices to ensure reproducibility.