AI Model Evaluator (RLHF & LLM Safety)
In this role, I evaluated and ranked large language model outputs for correctness, coherence, factual accuracy, and safety using RLHF frameworks. I designed prompt engineering strategies to test LLM reasoning, robustness, and instruction adherence. I also conducted adversarial testing and red teaming to identify vulnerabilities and provided structured feedback for alignment and RL pipelines. • Evaluated LLM outputs across various tasks including code and instruction following • Developed methodologies for adversarial testing and safety assessment • Leveraged prompt engineering in AI evaluation • Delivered structured evaluations to improve model alignment and safety