Prompt Engineer & Evaluator - Red Teaming/Annotation (Independent)
I executed red teaming and adversarial prompt engineering to stress-test large language models and identify boundary cases in AI reasoning. My iterative approach involved refining prompts, categorizing AI outputs, and systematically evaluating for hallucinations, logical breakdowns, and misalignment. Documentation ensured consistent, reproducible results and informed model improvement strategies. • Designed and tested challenging evaluation prompts • Performed annotation to categorize evaluation findings • Implemented structured multi-step output analysis • Provided comprehensive documentation for ongoing improvement