Usman Abdulsalam - Singing Voice Corpus Annotator — OpenTrain AI — Freelance

Key Skills

Software

Other

Appen

Argilla

CVAT

CrowdSource

Axiom AI

Scale AI

Toloka

Internal/Proprietary Tooling

Top Subject Matter

Singing Voice Corpus Annotation for AI Synthesis

Large Language Model (LLM) Agent Evaluation

Singing Voice Dataset for AI Model Training

Top Data Types

Audio

Text

Video

Document

Top Task Types

Transcription

Data Collection

Red Teaming

Prompt Response Writing SFT

Segmentation

Polygon

Bounding Box

Fine Tuning

RLHF

Question Answering

Evaluation Rating

Text Summarization

Text Generation

Cuboid

Point Key Point

Computer Programming Coding

Function Calling

Entity Ner Classification

Polyline

Freelancer Overview

Singing Voice Corpus Annotation. Brings 2+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Other. Education includes Bachelor of Engineering, Ahmadu Bello University, Zaria (2023) and Diploma in Computer Engineering, Ahmadu Bello University, Zaria (2016). AI-training focus includes data types such as Audio and Text and labeling workflows including Transcription, Evaluation, and Rating.

IntermediateHausaYorubaEnglish

Labeling Experience

Founder & Lead Engineer (Deaf-Tech Dataset & Labeling)

OtherTextData Collection

I coordinated the creation and validation of a proprietary Hausa-to-Nigerian Sign Language dictionary dataset. I managed collaboration with NSL interpreters and Deaf academies to build and annotate a verified sign language video dataset. The structured dataset held significant value for assistive technology and was used in AI models for translation between Hausa speech and NSL. • Oversaw annotation workflows with domain experts and end-users. • Validated sign accuracy through expert review and community feedback. • Dataset supported model training for real-time Hausa speech-to-sign translation. • Custom labeling and dictionary building for low-resource, regional language-to-sign mapping.

2025 - Present

AI Data Specialist (RLHF/Code Evaluation)

OtherTextRLHF

I evaluated Python code and AI agent responses as part of reinforcement learning from human feedback (RLHF) workflows for large language models. I focused on correctness, coding standards, and security, contributing scores and feedback for use in RLHF fine-tuning pipelines. Hybrid workflows leveraged AI agents for repetitive extraction while I performed quality control and complex judgment. • Reviewed code for correctness, PEP 8 style, and security flaws. • Scored and rated agent responses for RLHF model training. • Supported LLM companies in building high-quality code/data evaluation sets. • Participated in hybrid AI-human annotation and review processes.

2025 - Present

Web Data Extraction and Automation

OtherTextData Collection

I built complex Python web scraping workflows to extract structured text data suitable for AI model training and evaluation. I integrated tools for both cloud-based deployment and LLM-assisted data normalization and extraction. I delivered validated, clean datasets in CSV and JSON formats for downstream AI applications and labeling pipelines. • Leveraged BeautifulSoup, Selenium, Playwright, Apify, and OpenRouter in workflow. • Ensured full cycle from data extraction to cleanliness and normalization. • Produced datasets for NLP, search, and mapping AI agents. • Datasets were fully validated for quality control.

2025 - Present

LLM Agent Evaluation Scenario Writer

OtherText

I designed structured evaluation scenarios for LLM-based AI agents simulating practical use cases. I defined golden path behaviors, edge cases, and scoring rubrics in JSON and YAML formats for agent behavior assessment. I reviewed agent outputs and iterated on scenarios to ensure coverage and clarity for fine-tuning and reinforcement learning purposes. • Created multi-turn scenarios including calendar, email, maps, and productivity app simulations. • Documented acceptable behaviors and edge case handling for real-world coverage. • Used JSON/YAML as scenario definition and labeling format. • Work contributed directly to structured LLM agent evaluation and RLHF data pipelines.

2025 - Present

Singing Voice Corpus Annotation

OtherAudioTranscription

I annotated English singing voice data at the phoneme level, carefully marking millisecond-precision timestamps and pitch values in Hz. I labeled musical notes using Praat and Sonic Visualiser, and used Montreal Forced Aligner to automate phoneme alignment, drastically reducing manual effort. The result was structured TextGrid files and datasets for singing voice synthesis AI model training. • Data included phoneme segmentation, pitch annotation, and note labeling at high temporal resolution. • Used tools such as Praat, Sonic Visualiser, and Montreal Forced Aligner (MFA). • Delivered approximately 10 to 15 hours of annotated audio for corpus development. • Data supported AI model training for singing voice synthesis.

2025 - Present

Education

A

Ahmadu Bello University, Zaria

Bachelor of Engineering, Computer Engineering

Bachelor of Engineering

2016 - 2023

A

Ahmadu Bello University, Zaria

Diploma in Computer Engineering, Computer Engineering

Diploma in Computer Engineering

2014 - 2016

Work History

M

Murajaah AI Platform

Backend Developer

Zaria

2025 - Present

D

Deaf-Tech

Founder and Lead Engineer

Zaria

2025 - Present