AI Text and Code Evaluation for LLM Fine-Tuning
Contributed to the testing and troubleshooting of a large-scale multilingual text generation, summarization, and function-calling model. Responsibilities included writing and revising prompt–response pairs, evaluating model outputs based on accuracy and coherence, and identifying entities for downstream NLP processes. Additionally, developed Python scripts to automatically validate and ensure consistency across annotation batches. The project involved more than 100,000 text samples and adhered to strict quality standards, including inter-annotator agreement and multi-phase review protocols. Supported the integration of labeled datasets into an automated CI/CD pipeline, enabling continuous model retraini