LLM Code Trainer / Dataset Quality Reviewer
Reviewed and refined coding prompts, unit tests, and reference solutions for Python, C, JavaScript, and TypeScript to create code-generation datasets for LLM training. Ensured the quality of datasets by clarifying requirements, aligning prompts and tests, and addressing failure modes. Conducted edge-case analyses and validated data through reproducible Docker-based workflows, ensuring adherence to high technical standards. • Evaluated the technical accuracy and clarity of generated code and unit tests. • Identified and resolved ambiguities between prompts and test cases. • Documented requirements, edge cases, and dataset quality concerns. • Contributed directly to the training and quality of language models through code data curation.