LLM Fine-Tuning and Safety Evaluation for Dialogue Systems
Executed a large-scale project to fine-tune and improve the safety and performance of a large language model (LLM). Scope included analyzing and labeling complex textual prompts and model responses across diverse domains to identify errors, biases, and unsafe outputs. Developed and adhered to detailed annotation guidelines for quality and consistency. Tasks involved writing and evaluating high-quality prompt-response pairs for supervised fine-tuning (SFT), classifying undesirable content, and performing red-teaming to proactively discover potential model failures. Processed thousands of data samples, contributing directly to the enhancement of the model's dialogue capabilities and alignment with safety protocols.