Top 10 (2nd year) AI Contributor – LLM Evaluation & Prompt Engineering
This role involved evaluating and benchmarking large language model (LLM) outputs for quality, accuracy, and alignment. I authored and applied complex prompts, annotated LLM responses for reinforcement learning from human feedback (RLHF), and provided structured evaluation against safety and instruction-following criteria. I also contributed actionable feedback to improve frontier AI systems at Alignerr. • Labeled and scored AI assistant responses using detailed rubrics. • Authored multi-step prompts for robust LLM evaluation. • Identified edge cases and systematic failure modes in AI outputs. • Developed quality reports and contributed to large-scale model alignment projects.