Remote Code Evaluation and Prompt Labeling for LLM Training
Evaluated AI-generated code completions and technical responses for correctness, logic, and clarity. Labeled prompt/response pairs, categorized programming errors, and ranked output quality. Tasks required understanding of Python and general coding logic, as well as critical thinking and consistent annotation across a large volume of data. Contributed to the improvement of large language models for software engineering use cases.