Anthony Philip - AI Evaluation/LLM Benchmarking Engineer

Key Skills

Software

Appen

CloudFactory

Data Annotation Tech

Mercor

Micro1

Top Subject Matter

Multilingual AI Evaluation

LLM Benchmarking

Legal Services & Contract Review

Top Data Types

Text

Document

Computer Code Programming

Top Task Types

Text Generation

Transcription

Evaluation/Rating

Data Collection

Prompt + Response Writing (SFT)

Classification

Freelancer Overview

AI Evaluation/LLM Benchmarking Engineer. Brings 9+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal and Proprietary Tooling. Education includes Bachelor of Science, Massachusetts Institute of Technology (MIT) (2018) and Master of Science, Stanford University (2020). AI-training focus includes data types such as Text and labeling workflows including Evaluation and Rating.

ExpertEnglish

Labeling Experience

AI Evaluation/LLM Benchmarking Engineer

Text

As a Senior Software Engineer, I evaluated AI coding agents and implemented quality control protocols for LLM benchmarking. My work focused on multilingual data, prompt engineering, and rigorous human and automated audits. I assessed agent outputs for structural failures and edge cases in non-English environments. • Designed Terminal-Bench suites to challenge LLMs in multilingual contexts. • Built task environments with native-language datasets and realistic constraints. • Conducted iterative audits involving human review and LLM-based checks. • Ensured quality through multilayered evaluation and calibration processes.

2021 - Present

Education

S

Stanford University

Master of Science, Computer Science

Master of Science

2018 - 2020

M

Massachusetts Institute of Technology (MIT)

Bachelor of Science, Computer Science

Bachelor of Science

2014 - 2018

Work History

A

Applied AI & Multilingual Systems

Senior Software Engineer

Toronto

2021 - Present

S

Systems Infrastructure & Localization

Software Engineer

Toronto

2018 - 2020