Audio Data Collection
I participated in a large-scale audio data collection project with LXT, focused on supporting the training of Large Language Models (LLMs). My tasks involved recording and annotating high-quality voice samples in English and Spanish, ensuring clarity, consistency, and natural delivery. I followed strict project guidelines to meet accuracy and privacy standards, contributing to a dataset that powered speech recognition and natural language processing models. The project required careful attention to detail, time management, and adherence to quality measures such as reviewing submissions for background noise, pronunciation accuracy, and metadata correctness. By consistently meeting performance benchmarks, I helped deliver thousands of validated audio samples that improved the model’s ability to understand diverse accents and real-world speech patterns.