AI-Focused Evaluation & Prompt Design
I designed prompts for generating and evaluating code and text outputs, with a focus on both API logic and structured response evaluation. Through this role, I reviewed both human and AI-generated outputs for correctness, clarity, and maintainability, mirroring industry LLM evaluation workflows. I created test cases, ranked responses, and identified edge cases to improve output robustness and quality. • Crafted and iterated on prompt designs for AI code and text generation tasks • Systematically evaluated and ranked AI and human responses against rigorous criteria • Designed and applied test cases and scenarios to validate generated outputs • Identified hallucinations, missing logic, and ambiguous instructions to refine AI results