Senior Reviewer
During this project, I was promoted as reviewer reviewing the performance of other evaluators, correcting their mistakes according to the manual and providing them constructive feedback.
Hire this AI Trainer
Sign in or create an account to invite AI Trainers to your job.
My experience in AI training data and data labeling is rooted in a strong academic foundation and hands-on industry work. After completing my MS in AI Applications from the University of Strathclyde in Glasgow, I began working with Outlier AI and Stellar AI as a freelance contributor, focusing on data training and collection. In these roles, I developed the ability to analyze datasets critically, identify quality issues, and refine training inputs to improve model outputs. My work involved evaluating LLM responses for accuracy, coherence, and instruction adherence, ensuring that outputs met high-quality standards across a variety of use cases. In addition to evaluation, I specialize in AI prompt engineering—designing and optimizing prompts to enhance model performance across diverse tasks. I have experience in adversarial prompting and testing, where I intentionally stress-test models to uncover weaknesses and guide improvements. I’ve also worked on AI agent task optimization, assessing how models interact with external tools like search and computational systems. A strong focus of my work has been bias and safety testing, ensuring responses remain accurate, fair, and ethically aligned. This combination of analytical rigor, practical experience, and a deep understanding of LLM behavior allows me to contribute effectively to building more reliable and robust AI systems.
During this project, I was promoted as reviewer reviewing the performance of other evaluators, correcting their mistakes according to the manual and providing them constructive feedback.
This was a small project for a startup called algotomy specializing on medical data. I was tasked with collection of empathetic responses to patients problems. For this project public mental health forums and various reddit subs were scraped in response, reply 2X2 rows using Python. Unsuitable rows were manually removed and the company was provided quality dataset for building model.
In this project, chat history between user and AI model was studied. This project was multi-faceted requiring: AI Prompt Engineering – Designing and optimizing effective prompts to improve LLM performance across diverse tasks. LLM Response Evaluation – Assessing AI-generated outputs based on accuracy, coherence, instruction adherence, etc. Adversarial Prompting & Testing – Crafting structured prompts to deliberately fail the model and identify model weaknesses to guide improvements. AI Agent Task Optimization – Evaluating agent behavior in tool-assisted workflows (Google Search, Maps, Wolfram). Bias & Safety Testing – Ensuring responses are accurate, unbiased, and ethically aligned.
The scope of this project was in Physics and Chemistry domain. The questions and responses from the LLM were individually evaluated for factual and mathematical correctness.
Master's in Science, AI and Applications
Bachelor in Engineering, Chemical Engineering
Data Collection and Cleaning
Senior Reviewer