Abdul Waheed - AI/ML Engineer | Generative & Agentic AI Specialist | LLM Evaluation Expert

Key Skills

Software

AWS SageMaker

Appen

Google Cloud Vertex AI

Label Studio

OpenCV AI Kit (OAK)

Other

Internal/Proprietary Tooling

Top Subject Matter

No subject matter listed

Top Data Types

Image

Text

Top Label Types

Evaluation Rating

Prompt Response Writing SFT

RLHF

Text Summarization

Translation Localization

Freelancer Overview

AI/ML Engineer with strong expertise in Generative AI, RLHF, and SFT-based LLM evaluation, contributing to large-scale initiatives like Google GAIA and Twitter AI (Grok). Skilled at crafting, curating, and validating datasets that bridge human reasoning and machine learning from multimodal search workflows to academic question–answer generation. At Google GAIA, I developed datasets modeling how humans search, analyze, and reason step-by-step across text, image, video, and audio modalities. This work helped train Gemini models to understand multi-step problem solving, compare model reasoning paths against human logic, and refine outputs for accuracy and coherence. At Twitter AI, I contributed to RLHF pipelines by designing prompts, reviewing Grok model responses, identifying reasoning gaps, and authoring ideal completions. This process strengthened model alignment and reduced hallucinations. Overall, my focus lies in building high-quality, explainable data that drives human-like reasoning in AI systems, ensuring fairness, linguistic diversity, and real-world performance.

IntermediateUrduPunjabiEnglish

Labeling Experience

Google GAIA – Human Search Reasoning and Multimodal Dataset Creation

Internal Proprietary ToolingImageText GenerationRLHF

Worked under Turing as part of the Google GAIA (Gemini) data team to create high-quality datasets for training multimodal LLMs. Designed and annotated human reasoning sequences that replicate how users search, analyze, and synthesize information step by step across text, image, video, and audio inputs. Labeled reasoning chains, extracted relevant data, and wrote human-like solutions to teach models structured thinking. Validated model reasoning by comparing outputs with human-generated logic under RLHF and SFT frameworks. Ensured data accuracy, consistency, and diversity across modalities.

2024

Twitter AI (Grok) – Prompt Evaluation and Ideal Completion Generation

Internal Proprietary ToolingComputer Code ProgrammingText GenerationRLHF

Contracted through Turing for the Twitter AI (Grok) team to support reinforcement learning with human feedback (RLHF). Evaluated model responses to diverse user prompts, identified factual and logical inconsistencies, and created ideal responses to guide retraining. Collaborated with internal QA teams to assess alignment, tone, and factual accuracy. Provided data-driven feedback that improved model performance, coherence, and user satisfaction.

2024 - 2024

Education

C

COMSATS University Islamabad

Bachelor of Science, Computer Science

Bachelor of Science

2020 - 2024

Work History

H

Hubble42 Inc.

AI/ML Engineer - Generative AI

Lahore

2024 - Present

C

codSeed

AI/ML Engineer

Lahore

2023 - 2024