Prompt Generation and Evaluation
I was responsible for designing, testing, and refining prompts for a large language model to improve response accuracy, contextual relevance, and task compliance across diverse use cases. The project scope covered natural language understanding and generation tasks, including summarization, classification, question answering, reasoning, and instruction-following. My role involved collaborating with data scientists and model trainers to iteratively enhance prompt performance, reduce ambiguity, and align outputs with defined evaluation benchmarks. I was responsible for performing structured data labeling tasks that supported supervised fine-tuning and reinforcement learning workflows. This included annotating model outputs for factual accuracy, coherence, tone alignment, safety compliance, and instruction adherence. I also categorized error types (e.g., hallucinations, logical inconsistencies, formatting failures), ranked multiple model responses based on quality criteria, and created g