For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
S
Shad Javed Ansari

Shad Javed Ansari

LLM Evaluation Engineer (SxS Evaluation & Model QA)

India flagRemote, India
$20.00/hrEntry LevelCVATLabelimgLabel Studio

Key Skills

Software

CVATCVAT
LabelImgLabelImg
Label StudioLabel Studio
RoboflowRoboflow
Other

Top Subject Matter

AI Response Evaluation
Prompt Engineering
Model QA

Top Data Types

Computer Code ProgrammingComputer Code Programming
ImageImage
TextText
VideoVideo
DocumentDocument

Top Task Types

ClassificationClassification
Computer Programming/CodingComputer Programming/Coding
Evaluation/RatingEvaluation/Rating
Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)
SegmentationSegmentation

Freelancer Overview

LLM Evaluation Engineer (SxS Evaluation & Model QA). Brings 3+ years of professional experience across legal operations, contract review, compliance, and structured analysis. Core strengths include Internal and Proprietary Tooling. AI-training focus includes data types such as Text, Computer Code, and Programming and labeling workflows including Evaluation and Rating.

Entry LevelEnglishHindi

Labeling Experience

AI Data Processing & Automation Tasks (Python-Based)

OtherComputer Code ProgrammingClassificationEvaluation Rating
Worked on Python-based AI training tasks involving automation-style workflows, structured data processing, and AI output evaluation. My responsibilities included writing and debugging Python scripts, extracting and transforming structured data (CSV, JSON, Markdown), and validating outputs against strict formatting and project guidelines. I reviewed and corrected AI-generated code and technical responses, fixed logical and syntax errors, and ensured consistency and correctness across datasets. The project required strong attention to detail, adherence to instructions, and self-QA to deliver reliable, machine-ready outputs for AI training.

Worked on Python-based AI training tasks involving automation-style workflows, structured data processing, and AI output evaluation. My responsibilities included writing and debugging Python scripts, extracting and transforming structured data (CSV, JSON, Markdown), and validating outputs against strict formatting and project guidelines. I reviewed and corrected AI-generated code and technical responses, fixed logical and syntax errors, and ensured consistency and correctness across datasets. The project required strong attention to detail, adherence to instructions, and self-QA to deliver reliable, machine-ready outputs for AI training.

2024 - 2025

LLM Coding Evaluator & QA Analyst (Freelance / Contract)

In this ongoing role, I evaluate large language model-generated programming solutions for correctness, efficiency, and robustness. I design adversarial tests, verify outputs through execution, and provide structured feedback on algorithmic reasoning and solution clarity. My evaluations are aimed at improving model reliability and identifying common failure patterns. • Assessed computer code for correctness and edge-case handling • Detected hallucinated APIs and logical inconsistencies • Provided feedback and suggested optimizations to improve models • Designed and administered adversarial tests for model evaluation

In this ongoing role, I evaluate large language model-generated programming solutions for correctness, efficiency, and robustness. I design adversarial tests, verify outputs through execution, and provide structured feedback on algorithmic reasoning and solution clarity. My evaluations are aimed at improving model reliability and identifying common failure patterns. • Assessed computer code for correctness and edge-case handling • Detected hallucinated APIs and logical inconsistencies • Provided feedback and suggested optimizations to improve models • Designed and administered adversarial tests for model evaluation

2024 - Present

LLM Evaluation Engineer (SxS Evaluation & Model QA)

Text
As an LLM Evaluation Engineer, I conducted rigorous side-by-side evaluations of AI-generated responses using structured rubrics. I focused on assessing factual accuracy, logical consistency, and relevance to real-world scenarios. My duties included detecting hallucinations, ranking outputs, and designing challenging prompts to test model weaknesses. • Evaluated AI responses by factual and logical criteria • Identified hallucinations and fabricated information in outputs • Designed realistic and adversarial prompts for robustness testing • Provided justifications and corrections based on rubric scoring

As an LLM Evaluation Engineer, I conducted rigorous side-by-side evaluations of AI-generated responses using structured rubrics. I focused on assessing factual accuracy, logical consistency, and relevance to real-world scenarios. My duties included detecting hallucinations, ranking outputs, and designing challenging prompts to test model weaknesses. • Evaluated AI responses by factual and logical criteria • Identified hallucinations and fabricated information in outputs • Designed realistic and adversarial prompts for robustness testing • Provided justifications and corrections based on rubric scoring

2026 - 2026

Python Face Recognition Attendance System (AI-Assisted Automation Project)

OtherImageClassificationObject Detection
Developed a Python-based face recognition system to automate attendance tracking using computer vision techniques. The project involved implementing automation workflows for image capture, face detection, recognition, and structured attendance logging. I processed and stored outputs in structured formats (CSV/JSON), debugged Python scripts to improve accuracy and reliability, and validated results across different scenarios and edge cases. The project demonstrates hands-on experience with Python automation, AI-assisted workflows, and image-based data processing.

Developed a Python-based face recognition system to automate attendance tracking using computer vision techniques. The project involved implementing automation workflows for image capture, face detection, recognition, and structured attendance logging. I processed and stored outputs in structured formats (CSV/JSON), debugged Python scripts to improve accuracy and reliability, and validated results across different scenarios and edge cases. The project demonstrates hands-on experience with Python automation, AI-assisted workflows, and image-based data processing.

2024 - 2024
Roboflow

Image Annotation & AI Training Support Tasks

RoboflowImageSegmentationClassification
Supported computer vision AI training projects involving image annotation and dataset validation. Responsibilities included classifying and segmenting visual elements, reviewing annotations for accuracy, and ensuring consistency with project-specific guidelines. Performed quality checks, corrected annotation errors, and maintained clean, structured datasets suitable for machine learning model training and evaluation.

Supported computer vision AI training projects involving image annotation and dataset validation. Responsibilities included classifying and segmenting visual elements, reviewing annotations for accuracy, and ensuring consistency with project-specific guidelines. Performed quality checks, corrected annotation errors, and maintained clean, structured datasets suitable for machine learning model training and evaluation.

2023 - 2024

Education

I

Integral University

Bachelor of Computer Application, Computer Application

Bachelor of Computer Application
2020 - 2023
I

Integral University

Master of Computer Application, Computer Application

Master of Computer Application
2023

Work History

F

Freelance / Self-Employed

Freelance Python Developer

Remote
2024 - Present
F

Freelance

Software Engineer

Remote
2023 - 2025