Lecture expert model
End-to-end data preparation and annotation project focused on transforming long-form historical lecture audio into structured, model-ready training data. Source material consisted of publicly available lecture recordings, which were transcribed using automated speech-to-text tooling and subsequently normalized. I performed systematic entity identification and labeling with emphasis on proper names, historical figures, philosophical concepts, organizations, and symbolic references. Annotations were extracted into structured tabular formats, manually reviewed, and iteratively refined to improve consistency and reduce transcription noise. The resulting labeled dataset was used in multiple fine-tuning and evaluation runs to assess downstream model performance and semantic recall.