LLM Code & Text Evaluation for AI Training
I contributed to multiple AI training data projects focused on improving large language models. Tasks included evaluating model generated code for correctness, structure, and reliability; reviewing AI responses for reasoning quality, factual accuracy, and adherence to instructions; and performing A/B model comparisons to determine preferred outputs. I also completed text classification, error identification, explanation rewriting, and prompt response generation (SFT). Quality standards required strict rubric adherence, detailed error labeling, and consistent application of evaluation guidelines. Projects were medium to large scale, involving hundreds to thousands of annotations across coding, reasoning, and natural language tasks. This work directly improved model performance for real world software development and general-purpose AI capabilities.