LLM Evaluation Specialist
As an LLM Evaluation Specialist at Outlier, I designed evaluation prompts across various STEM subdomains to assess AI language model performance. I evaluated large language model (LLM) outputs for accuracy, reasoning quality, safety, and instruction adherence. I also provided high-quality human feedback to improve model iterations.