AI Training & Evaluation Specialist – LLM Code Review, Red Teaming, RLHF
Reviewed and evaluated AI-generated Python code for technical correctness, security, and efficiency as part of LLM evaluation work. Performed adversarial prompting and red teaming to identify failure modes and unsafe outputs in large language models. Compared and ranked model responses using Reinforcement Learning from Human Feedback (RLHF) methodologies. • Evaluated code generation output for correctness and adherence to best practices. • Conducted red teaming via adversarial prompts to test model robustness and uncover vulnerabilities. • Assessed LLM model responses in cybersecurity and software engineering domains for logical and factual accuracy. • Ranked and preferred model completions using nuanced domain judgment.