NL-to-SQL Annotation & Dataset Builder — Academic Project
I designed and labeled a corpus of over 500 natural language queries, mapping them to structured SQL ground truths. The annotation involved intent recognition and schema-mapping, with rigorous quality control performed throughout the data pipeline. Project outcomes were validated through publication and dataset preparation for AI training tasks. • Query intent classes were labeled as SELECT, INSERT, JOIN, and aggregate. • Entity mentions in text were tagged to support schema mapping and structured output. • Ground-truth data for instruction-following tasks was created for LLM fine-tuning. • Annotation quality was ensured through systematic reviews and publication of methodology.