LLM Prompt Evaluation and Response Rating for AI Text Models
Contributed to evaluating and improving large language model outputs by rating AI-generated text responses based on accuracy, relevance, and contextual understanding. Tasks included classifying answers, providing qualitative feedback, and annotating prompt-response pairs for supervised fine-tuning. Ensured consistency by following strict guidelines and reviewing outputs against benchmark standards. The project supported enhanced model performance in natural language understanding, question answering, and content generation applications.