Email evaluation
Led a high-precision prompt evaluation and refinement initiative during the backend migration from GPT-3 to GPT-3.5 for a B2B SaaS email personalization platform. Using Labelbox, I systematically reviewed AI-generated emails for tone accuracy, factual alignment, personalization depth, and coherence. Each sample was annotated with structured qualitative feedback and ranked relative to others to train preference models (RLHF). I also iterated prompt variations and contributed human-written completions for supervised fine-tuning (SFT), with a focus on professional tone and context-sensitive personalization. The dataset covered 1,000+ email samples across diverse sales personas, industries, and use cases. Quality benchmarks were defined by internal bug triage metrics and end-user NPS. My contributions led to an ~80% reduction in buggy outputs and significantly improved downstream model quality as evidenced by customer feedback (5-star G2 reviews). The project adhered to internal QA loops