Generative AI Response Evaluation and Reasoning Annotation
Worked on Generative AI data labeling project oriented towards assessing, grading, and refining AI text responses. These tasks included checking the outputs of the models in terms of facts, logical connections, depth of reasoning and compliance with the detailed instructions. Conducted RLHF-style assessments, where several model responses are compared to the same prompt and ranked in terms of quality, accuracy and clarity. Determined hallucinations, poor reasoning chains, and vague responses, and gave structured feedback to aid in model refinement. On computational and STEM-related prompts, known correctness by Python-based reasoning and numerically-validated when possible. Adhered to the stringent quality control measures, such as consistency checks, compliance to the guidelines, and feedback loops of reviewers.