English Singing Voice Synthesis Dataset Creation & Annotation
Developed a high-quality English singing voice synthesis corpus consisting of 10–15 hours of recorded vocals. The project involved detailed phoneme-level annotation with precise start and end timestamps in milliseconds, as well as accurate pitch and musical note labeling aligned with each vocal segment. The dataset was carefully structured to support training and evaluation of singing voice synthesis and text-to-speech (TTS) models.