For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
S
Saamir Abbas

Saamir Abbas

AI Evaluation Specialist / LLM Response Analyst

India flagNoida, India
$4.00/hrExpertMercorOneformaAppen

Key Skills

Software

MercorMercor
OneFormaOneForma
AppenAppen

Top Subject Matter

LLM response evaluation
Manufacturing domain
Conversational AI

Top Data Types

TextText
AudioAudio
VideoVideo
DocumentDocument
ImageImage

Top Task Types

TranscriptionTranscription
Bounding BoxBounding Box
Object DetectionObject Detection
Text GenerationText Generation
Text SummarizationText Summarization
RLHFRLHF
Evaluation/RatingEvaluation/Rating
Data CollectionData Collection
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)
ClassificationClassification
Fine-tuningFine-tuning
Question AnsweringQuestion Answering

Freelancer Overview

AI Evaluation Specialist / LLM Response Analyst. Brings 7+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Mercor, OneForma, and Turing. Education includes Bachelor of Science, Era University (2025). AI-training focus includes data types such as Text, Audio, and Video and labeling workflows including Evaluation, Rating, and Transcription.

ExpertEnglishUrdu

Labeling Experience

Mercor

AI Evaluation Specialist / LLM Response Analyst

MercorText
As an AI Evaluation Specialist / LLM Response Analyst at Mercor, I performed structured evaluations of AI-generated responses using rubric-based and side-by-side (SxS) frameworks. I analyzed model outputs for reasoning, correctness, and quality across diverse real-world scenarios. My work supported improvements in AI agent performance, particularly in specialized domains such as manufacturing. • Conducted high-volume, detailed side-by-side comparison of LLM responses. • Provided structured justifications to support reliability of evaluation scores. • Identified failure patterns and proposed areas for model improvement. • Applied knowledge in domain-specific content for manufacturing use-cases.

As an AI Evaluation Specialist / LLM Response Analyst at Mercor, I performed structured evaluations of AI-generated responses using rubric-based and side-by-side (SxS) frameworks. I analyzed model outputs for reasoning, correctness, and quality across diverse real-world scenarios. My work supported improvements in AI agent performance, particularly in specialized domains such as manufacturing. • Conducted high-volume, detailed side-by-side comparison of LLM responses. • Provided structured justifications to support reliability of evaluation scores. • Identified failure patterns and proposed areas for model improvement. • Applied knowledge in domain-specific content for manufacturing use-cases.

2026 - Present
OneForma

Search Engine Evaluator / LLM Evaluation Specialist

OneformaText
At OneForma, I worked as a Search Engine Evaluator and LLM Evaluation Specialist, focusing on the evaluation and ranking of AI-generated responses across multiple domains. Tasks included search relevance, query intent matching, multimodal and fact-based validation, and prompt evaluation for model improvement. I consistently applied detailed guidelines to large-scale diverse datasets. • Performed side-by-side and rubric-based response comparison across conversational and search domains. • Conducted evaluation on text, audio, and image outputs for multimodal tasks. • Handled structured content and language detection tasks such as Wikidata validation. • Ensured accuracy and consistency by following complex project instructions.

At OneForma, I worked as a Search Engine Evaluator and LLM Evaluation Specialist, focusing on the evaluation and ranking of AI-generated responses across multiple domains. Tasks included search relevance, query intent matching, multimodal and fact-based validation, and prompt evaluation for model improvement. I consistently applied detailed guidelines to large-scale diverse datasets. • Performed side-by-side and rubric-based response comparison across conversational and search domains. • Conducted evaluation on text, audio, and image outputs for multimodal tasks. • Handled structured content and language detection tasks such as Wikidata validation. • Ensured accuracy and consistency by following complex project instructions.

2024 - Present

Business Analyst / AI Evaluation Specialist

Text
During my tenure at Turing as a Business Analyst / AI Evaluation Specialist, I evaluated AI outputs via detailed side-by-side comparisons. I focused on identifying hallucinations, inconsistencies, and providing comprehensive justifications using structured evaluation frameworks. My work combined manual evaluation and basic technical tools to ensure accurate findings for model improvement. • Compared AI-generated outputs with gold standards for relevance and correctness. • Utilized browser, command-line, and Python tools in support of quality analysis. • Fact-checked information accuracy to mitigate hallucinations and errors. • Documented results for wider QA and development use.

During my tenure at Turing as a Business Analyst / AI Evaluation Specialist, I evaluated AI outputs via detailed side-by-side comparisons. I focused on identifying hallucinations, inconsistencies, and providing comprehensive justifications using structured evaluation frameworks. My work combined manual evaluation and basic technical tools to ensure accurate findings for model improvement. • Compared AI-generated outputs with gold standards for relevance and correctness. • Utilized browser, command-line, and Python tools in support of quality analysis. • Fact-checked information accuracy to mitigate hallucinations and errors. • Documented results for wider QA and development use.

2026 - 2026

AI Annotation Specialist / QA Reviewer

AudioTranscription
As an AI Annotation Specialist and QA Reviewer at RWS, I performed verbatim and golden audio transcription along with annotation of non-speech events. I evaluated multi-turn dialogues for conversational AI and conducted quality assurance reviews for transcription accuracy. The position involved designing and assessing complex workflow scenarios and contributing to bilingual dataset annotation. • Applied rubric-based frameworks for dialogue quality evaluation in English and Urdu. • Ensured guideline compliance as a promoted reviewer overseeing contributor accuracy. • Evaluated multi-turn real-world scenarios and decision chains for AI agents. • Recorded and annotated diverse audio and video datasets for training purposes.

As an AI Annotation Specialist and QA Reviewer at RWS, I performed verbatim and golden audio transcription along with annotation of non-speech events. I evaluated multi-turn dialogues for conversational AI and conducted quality assurance reviews for transcription accuracy. The position involved designing and assessing complex workflow scenarios and contributing to bilingual dataset annotation. • Applied rubric-based frameworks for dialogue quality evaluation in English and Urdu. • Ensured guideline compliance as a promoted reviewer overseeing contributor accuracy. • Evaluated multi-turn real-world scenarios and decision chains for AI agents. • Recorded and annotated diverse audio and video datasets for training purposes.

2025 - 2026

Document Annotation Specialist

Document
As a Document Annotation Specialist at Invisible Technologies, I validated data extracted from PDFs using structured JSON schemas. This involved cross-referencing, correcting field-level discrepancies, and verifying numerical and textual consistency in complex documents. I contributed to improving extraction models through quality assurance and edge case identification. • Compared and corrected AI-extracted data across documents for accuracy. • Ensured guideline compliance for field validation and totals verification. • Identified and documented complex edge cases to facilitate model upgrades. • Maintained consistency across large, structured datasets for performance metrics.

As a Document Annotation Specialist at Invisible Technologies, I validated data extracted from PDFs using structured JSON schemas. This involved cross-referencing, correcting field-level discrepancies, and verifying numerical and textual consistency in complex documents. I contributed to improving extraction models through quality assurance and edge case identification. • Compared and corrected AI-extracted data across documents for accuracy. • Ensured guideline compliance for field validation and totals verification. • Identified and documented complex edge cases to facilitate model upgrades. • Maintained consistency across large, structured datasets for performance metrics.

2025 - 2025

Education

E

Era University

Bachelor of Science, Zoology

Bachelor of Science
2022 - 2025

Work History

F

Freelance

Graphic Designer / UI-UX Designer / Motion Designer

N/A
2020 - Present