RLHF-based Python Code Optimization Data Annotator
I developed and curated datasets for Reinforcement Learning with Human Feedback (RLHF) to enhance the accuracy of Python code generation. My responsibilities included reviewing, correcting, and evaluating auto-generated Python code and providing structured feedback to guide model improvements. The work also involved annotating coding practices and errors to align model learning with real-world programming standards. • Curated Python code datasets for RLHF training • Annotated code quality, correctness, and performance • Provided structured feedback and correction annotations • Enhanced LLM performance for code generation