Labeller
I have contributed to multiple AI training and evaluation projects focused on improving large language model (LLM) performance, particularly in coding, reasoning, and transcription domains. My work involved evaluating AI-generated responses for correctness, logical consistency, and adherence to instructions, especially in projects such as Claude Code CLI CHP Transcript Evaluation and agentic coding response evaluation. In these projects, I analyzed multi-step coding solutions, identified errors in reasoning or implementation, and ensured outputs met high-quality standards. I also worked on transcription tasks (ATC project on Alignerr), where I maintained accuracy, formatting consistency, and compliance with detailed annotation guidelines. Through this experience, I developed strong skills in LLM evaluation, quality assurance, annotation workflows, and understanding model behavior, enabling me to contribute effectively to AI training pipelines.