Jonatan Kovacs - Coder/STEM LLM evaluation expert in English, German, French and Hungarian

Key Skills

Software

Appen

Data Annotation Tech

Other

Top Subject Matter

Evaluating code generating LLMs (e.g. A/B testing code correctness)

Evaluating LLM response safety/harmlessness

Generating dataset of STEM/Coding related prompt-answer pairs for LLMs

Top Data Types

Medical Dicom

Text

Video

Top Task Types

Classification

Computer Programming Coding

Evaluation Rating

Question Answering

Text Generation

Freelancer Overview

An expert in scientific A/B testing of general-purpuse, STEM specific, and code generating LLMs. I have been evaluating LLMs already before the AI hype as an AI researcher. What sets me apart is my diverse academic background in maths, physics and STEM subjects. I am not only a data annotator, but work as an AI researcher and etrepreneur, so I have a deep understanding of what my evaluations are used for. My insights from my ongoing research and LLM evaluator experience allows me to provide concise, well articulated, and detailed expert opinion on LLM performance.

IntermediateFrenchGermanEnglishHungarian

Labeling Experience

Prompting and Evaluating LLM conversations (general / coding / STEM specific / math)

Data Annotation TechTextComputer Programming Coding

Prompting LLMs with various tasks, including programming, STEM specific, and mathematical problems. Evaluating the conversations along the following dimensions: - LLM answer accuracy - LLM answer safety - LLM answer verbosity and style Providing expert opinion along these dimensions

2023 - 2023

A/B testing STEM specific LLMs

Data Annotation TechTextText Generation

A/B testing LLMs that generate STEM specific text. Dimensions for evaluation: - LLM STEM specific accuracy - LLM answer harmfulness / safety - LLM instruction following - LLM answer style and verbosity Providing expert opinion along these dimensions.

2023 - 2023

A/B testing code generating LLMs

Data Annotation TechDocumentComputer Programming Coding

A/B testing LLMs that generate code. Dimensions for evaluation: - LLM code correctness - LLM answer harmfulness / safety - LLM instruction following - LLM answer style and verbosity Providing expert opinion along these dimensions.

2023 - 2023

Education

T

TU Delft

Management of Technology, Management, Leadership, Financial Planning in Technology

Management of Technology

2023 - 2024

E

Eötvös Lóránd Tudományegyetem (ELTE)

Bachelor's in Computer Science, Computer Science and Software Engeneering

Bachelor's in Computer Science

2019 - 2023

Work History

S

SQI Hungarian Software Quality Consilting Ltd.

AI researcher and developer

Budapest

2023 - Present

D

Data Annotation tech

Freelancing LLM evaluator and Data Annotator

Budapest

2023 - 2023