Senior AI Engineer - Data Labeling and Prompt Evaluation (Bold Estimation)
As a Senior AI Engineer at Bold Estimation, I designed and managed LLM training and annotation workflows for improving NLP model performance. I refined prompt engineering, established annotation guidelines, and standardized labeling practices across multi-platform tools. I analyzed model outputs for errors, performed data quality audits, mentored trainers on review methods, and optimized model evaluation routines. • Led the creation of training datasets and edge-case annotation standards using Labelbox and Scale AI. • Conducted A/B and benchmarking tests to validate prompt and labeling improvements. • Developed QA and data audit scripts for large-scale training data reviews in Python and SQL. • Enhanced model fairness, safety, and consistency through targeted dataset design and bias monitoring.