LLM Fine-Tuning for Scientific Text Classification
From January to December 2023, I led a data labeling project focused on fine-tuning LLMs for scientific and mathematical content. The scope included: Text Classification: Categorizing research papers by discipline (e.g., biomathematics, physics) and complexity level. NER Annotation: Identifying and labeling key scientific entities (equations, theorems, datasets) in academic texts. SFT Prompt Engineering: Crafting and rating prompt-response pairs to improve LLM outputs for technical queries. QA Evaluation: Assessing model responses for accuracy, logical consistency, and relevance to STEM domains. Project Size: Annotated 5,000+ documents with a 98% inter-annotator agreement score, adhering to strict guidelines for technical precision. Implemented iterative QA checks to resolve edge cases (e.g., ambiguous mathematical notation).