LLM Evaluation Specialist
I evaluated and rated large language model (LLM) outputs on various coding scenarios. My work focused on correctness, instruction following, and safety assessments for reinforcement learning from human feedback and model alignment tasks. I engineered and utilized prompts to review, score, and provide evaluative feedback for model improvement cycles. • Evaluated LLM performance on code generation and completion tasks. • Provided multi-axis ratings for LLM responses based on established criteria. • Designed task-specific prompts tailored to different programming problems. • Supported RLHF and RLVR pipelines by contributing human evaluative judgments.