LLM Prompt Evaluation and Code Quality Annotation
Worked on evaluating and annotating large language model outputs across technical and coding-related prompts. Responsibilities included reviewing model-generated responses for correctness, factual accuracy, reasoning quality, and adherence to instructions. Performed binary and rubric-based evaluations, classified outputs by error type, and provided structured feedback to improve model reliability. Leveraged strong software engineering background in Java, Kotlin, and full-stack development to assess code-related responses, detect logical flaws, and verify edge-case handling. Ensured consistency with evaluation guidelines, maintained high accuracy standards, and contributed to improving AI performance on real-world programming and system design tasks.