LLM Post-Training Intern
I evaluated and refined language model outputs for instruction-following and tool-use accuracy, ensuring quality and correctness. I annotated and ranked large language model responses, supporting RLHF pipelines with a focus on reasoning, factual grounding, and safety. I designed adversarial test cases to identify and surface potential model misalignment issues. • Authored structured rubrics for quality assessment • Performed direct evaluation and annotation of LLM outputs • Executed prompt injection and jailbreak adversarial attacks • Supported continuous post-training workflow improvements