Independent AI Research & Evaluation Contributor
Independently tested and evaluated large language models (LLMs) such as GPT-4, Claude, and Gemini across a range of task types, documenting errors and model behaviors in detail. Developed hands-on experience in prompt engineering, human feedback cycles, and annotation best practices for AI model improvement. Prepared for professional data annotation and RLHF workflows through platform self-training and ethics study. • Conducted systematic comparative assessments of model outputs for factuality and safety. • Documented hallucinations, refusals, and error patterns in LLM responses. • Self-trained on annotation guidelines used by major data labeling platforms. • Practiced applying industry-standard quality assurance and RLHF processes.