AI Model Training & Evaluation
I worked on a project focused on evaluating large language models (LLMs) to ensure their responses met quality, accuracy, and alignment standards. The project involved crafting structured prompts with specific constraints to guide the model’s behavior and assessing its outputs against predefined client objectives. My data labeling tasks included: Prompt Engineering – Creating prompts that tested the model’s ability to follow constraints and generate high-quality responses. Response Evaluation – Reviewing AI-generated text for accuracy, coherence, relevance, and adherence to guidelines. Correction & Refinement – Modifying responses when they failed to meet quality standards, ensuring compliance with client expectations. To measure quality, we followed client-defined evaluation metrics, including accuracy, clarity and truthfulness.