Sree Durairaj - Expert in LLM Evaluation, Agentic AI Evaluation and Prompt Engineering

Key Skills

Software

CVAT

Google Cloud Vertex AI

Labelbox

Label Studio

SuperAnnotate

Top Subject Matter

No subject matter listed

Top Data Types

Computer Code Programming

Document

Video

Top Label Types

Action Recognition

Computer Programming Coding

Evaluation Rating

Fine Tuning

Prompt Response Writing SFT

Freelancer Overview

I work on testing and improving AI systems in real product environments. Most of my experience comes from checking LLM behaviour, reasoning quality, prompt clarity, and multi-step agent workflows in the tools I build. I focus on reducing ambiguity, spotting weak reasoning, and turning messy outputs into clear, reliable steps. My day to day work includes writing prompts, testing them across different models, reviewing failures, and correcting the logic so the model understands what the task actually requires. I break down complex requests into clean instructions, evaluate agent actions one step at a time, and classify errors into patterns that can be learned and fixed. I also review code generation outputs, check technical reasoning, and rewrite unclear responses into stable examples. I create prompt plus response pairs, rate model outputs, and define “gold standard” behaviours for different task types. My background in product, workflows, and system logic helps me understand edge cases and real-world user expectations. I work with simple language, clear rules, and consistent judgement so that training data stays predictable and easy for models to follow. I can support projects involving LLM evaluation, agent reasoning checks, prompt writing, instruction refinement, and human feedback on model responses.

IntermediateEnglish

Labeling Experience

Code Generation Review and LLM Programming Assistance Evaluation

Google Cloud Vertex AIComputer Code ProgrammingFine TuningEvaluation Rating

I reviewed model generated code snippets for correctness, logic, structure, and alignment with the requested task. I checked for syntax issues, missing steps, security concerns, and reasoning errors in how the model approached programming tasks. I also wrote corrected versions of the code, added explanations, and created examples that demonstrate the right way to solve common coding problems. This improved the quality of training data for coding assistants and multi step reasoning tasks in software development.

2025

Agent Workflow Evaluation and Multi Step Action Verification

Google Cloud Vertex AIComputer Code ProgrammingAction RecognitionRouting

I evaluated AI agent workflows where the model needed to complete a task through multiple steps. My work involved reviewing each action the agent took, checking if the step logically followed the previous one, and identifying errors in reasoning or tool usage. I also wrote corrected step sequences, improved clarity of intermediate actions, and defined ideal “golden flows” for how an agent should complete a task. This helped create structured data that supports better agent planning and consistency.

2025

LLM Output Evaluation and Text Generation Review

LabelboxTextText GenerationText Summarization

I worked on evaluating LLM generated responses for accuracy, clarity, reasoning depth, and alignment with user intent. The work included reviewing multi step answers, identifying reasoning gaps, marking incorrect interpretations, and rating outputs based on task quality. I also created clean prompt plus response examples that models could learn from, and tested variations of prompts to see how model behaviour changed. This helped improve instruction following, reduce ambiguity, and create consistent training data.

2024

Image Annotation & LLM Training Data Preparation for Internal Tools

CVATImageBounding BoxPolygon

I worked on a small internal project where I used CVAT to annotate images and prepare structured examples for LLM training. My tasks included marking object regions, classifying images into predefined categories, and verifying annotation accuracy. Alongside the image work, I created simple instruction-response examples to help models learn how to describe images, identify objects, and follow human feedback. I also reviewed LLM-generated answers for correctness, rewrote unclear descriptions into clearer versions, and provided step-by-step reasoning when needed. The focus was on being consistent with the labeling rules, avoiding ambiguity, and ensuring that every example matched the guidelines. This project gave me hands-on experience combining image annotation with text-based evaluation for model improvement.

2023 - 2024

Education

I

IIM Indore, India

PG Program in Product Management, Product Management

PG Program in Product Management

2021 - 2022

S

SASTRA University, India

Bachelor of Technology, Electrical & Electronics Engineering

Bachelor of Technology

2008 - 2012

Work History

F

Fortitude Legal

Independent Consultant

Leicester

2025 - Present

E

Everlast Gyms – Frasers Group

Leicester

2024 - 2025