Turing / Talents AI / Labelness
Led the development of specialized datasets for training and fine-tuning large language models (LLMs) across various domains, including finance and technology. Designed and implemented data collection, cleaning, and annotation pipelines to ensure data quality and consistency. Collaborated with cross-functional teams to define data requirements and quality standards. Developed and enforced strict annotation guidelines, resulting in a 30% improvement in interannotator agreement. Designed and executed comprehensive evaluation protocols to assess the performance, safety, and fairness of various LLMs. Developed a suite of automated evaluation scripts to measure metrics such as accuracy, coherence, and bias. Engineered and optimized prompts to enhance model performance on a wide range of NLP tasks, including text summarization, question answering, and code generation. Curated a library of over 500 effective prompt templates for internal use.