AI Training Data Preparation & Collection
Prepared and structured training datasets for AI/LLM applications using multi-source data. Performed data cleaning, normalization, and de-duplication to ensure dataset quality. Utilized Google Sheets and prompt engineering to organize and validate labeled content. • Managed dataset assembly from community feedback and structured reports • Enforced validation and consistency before dataset delivery • Automated content generation and summarization via LLM tools • Focused on language and semantic accuracy from linguistics background