Ansar Khangeldin - Experienced ML Researcher and LLM Evaluation specialist

Key Skills

Software

Google Cloud Vertex AI

Toloka

Don't disclose

Top Subject Matter

No subject matter listed

Top Data Types

3D Sensor

Text

Video

Top Task Types

Action Recognition

Computer Programming Coding

Data Collection

Entity Ner Classification

Object Detection

Freelancer Overview

I have hands-on experience in designing high-quality training data for large language models, with a focus on adversarial and reasoning-challenging examples. I crafted over 600 complex questions that targeted reasoning gaps and factual inconsistencies in LLM (NDA), which were later validated and used to improve model performance. I also validated over 300 question-answer pairs to ensure clarity, correctness, and alignment with model capabilities. My process emphasized edge cases, step-by-step reasoning, and evaluation beyond surface-level accuracy — skills critical for robust model improvement. In addition, my background in computer vision research gives me a solid understanding of working with image and video data. While I haven’t done manual labeling, I have practical experience in dataset preparation, pre-processing, and evaluation pipelines — particularly in tasks like object and motion detection. This blend of LLM training data expertise and CV data handling enables me to bridge NLP and vision data quality needs effectively.

Entry LevelKazakhEnglishRussianSpanish

Labeling Experience

Expert Validation of Complex QA Pairs for Language Model Fine-Tuning

Don T DiscloseTextQuestion AnsweringText Generation

Validated over 300 question-answer pairs intended for LLM training and evaluation. The task involved checking factual correctness, clarity of phrasing, ambiguity resolution, and logical coherence of both questions and answers. Each example was assessed against internal benchmarks for alignment with the model’s target capabilities. Corrections and comments were provided where applicable to improve data quality prior to fine-tuning.

2025 - 2025

Generation of Adversarial Questions for LLMs

Don T DiscloseTextQuestion Answering

Crafted over 600 complex, adversarial questions specifically designed to expose reasoning failures and factual inaccuracies in a large language model. Also, developerd brief answers and annotated type of the question (Brainstorming/Analysys&LogicalReasoning/OpenQA/Text Editing). Each example required deep knowledge of multi-step logic, linguistic ambiguity, or underrepresented domains. Additionally validated over 300 question-answer pairs, checking for answer correctness, clarity, and whether the question truly challenged the model. The work aimed to systematically improve the model’s performance on edge cases that standard benchmarks overlooked.

2025 - 2025

Education

K

KAUST

Master of Science, IVUL Laboratory

Master of Science

2025 - 2025

M

MSU Kazakhstan Branch

Bachelor of Science, Applied Mathematics and Informatics

Bachelor of Science

2021 - 2025

Work History

W

White Rock Group

Machine Learning Engineer/Backend Developer

N/A

2024 - 2025

M

MSU Kazakhstan Branch

Research Engineer (Thesis)

N/A

2023 - 2025