For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
G
Galina Andreeva

Galina Andreeva

LLM Evaluation and Prompt Engineer

Georgia flagTbilisi, Georgia
$18.00/hrExpertScale AIMicro1Roboflow

Key Skills

Software

Scale AIScale AI
Micro1
RoboflowRoboflow
TolokaToloka
Internal/Proprietary Tooling

Top Subject Matter

LLM Evaluation/Data Annotation
Customer Support AI/Conversational Agents
Conversational AI/Voice Agent

Top Data Types

TextText
AudioAudio
DocumentDocument

Top Task Types

Prompt + Response Writing (SFT)Prompt + Response Writing (SFT)
ClassificationClassification
TranscriptionTranscription
Red TeamingRed Teaming

Freelancer Overview

LLM Evaluation and Prompt Engineer (Recombine.ai). Brings 4+ years of professional experience across complex professional workflows, research, and quality-focused execution. Core strengths include Internal, Proprietary Tooling, and Google Docs. Education includes Bachelor of Arts, Moscow State University (2018) and Bachelor of Arts, IKIP Saraswati Tabanan (2016). AI-training focus includes data types such as Text and labeling workflows including Evaluation, Rating, and Prompt + Response Writing (SFT).

ExpertEnglishRussianIndonesian

Labeling Experience

LLM Evaluation and Prompt Engineer (Recombine.ai)

Text
Labeled, evaluated, and annotated Russian-language model outputs for LLMs, covering accuracy, completeness, style, and safety. Built, maintained, and analyzed over 1,900 evaluation test suites for prompt and dataset improvement. Conducted rubric-based scoring, error tagging, peer review, and provided feedback to continually improve guideline consistency and dataset quality.• Constructed challenging edge-case scenarios for stress testing• Performed prompt engineering for test suites• Led peer reviews and guideline refinement• Identified, tagged, and documented error types and failure patterns

Labeled, evaluated, and annotated Russian-language model outputs for LLMs, covering accuracy, completeness, style, and safety. Built, maintained, and analyzed over 1,900 evaluation test suites for prompt and dataset improvement. Conducted rubric-based scoring, error tagging, peer review, and provided feedback to continually improve guideline consistency and dataset quality.• Constructed challenging edge-case scenarios for stress testing• Performed prompt engineering for test suites• Led peer reviews and guideline refinement• Identified, tagged, and documented error types and failure patterns

2023 - Present

Content & AI Response Writer (XTIX)

TextPrompt Response Writing SFT
Wrote, structured, and annotated Russian-language customer service responses and macros for AI training and template generation. Reviewed and edited AI-suggested replies for tone, accuracy, safety, and intent categorization to improve LLM agent behavior. Created internal guidelines, built labeled datasets from real support tickets, and adapted tone based on user context for consistency in AI outputs.• Structured data by intent/category/outcome for AI training• Reviewed and rewrote AI-generated responses for accuracy and safety• Developed and updated internal tone and clarity guidelines• Identified annotation patterns and documented common failures

Wrote, structured, and annotated Russian-language customer service responses and macros for AI training and template generation. Reviewed and edited AI-suggested replies for tone, accuracy, safety, and intent categorization to improve LLM agent behavior. Created internal guidelines, built labeled datasets from real support tickets, and adapted tone based on user context for consistency in AI outputs.• Structured data by intent/category/outcome for AI training• Reviewed and rewrote AI-generated responses for accuracy and safety• Developed and updated internal tone and clarity guidelines• Identified annotation patterns and documented common failures

2025 - 2025

Conversational Writer & QA (Solda.ai)

TextPrompt Response Writing SFT
Crafted and refined Russian-language dialogue and prompts for a customer-facing LLM-powered voice agent, focusing on natural phrasing and context retention. Reviewed AI agent conversation transcripts for factual accuracy, tone, safety, and reduced hallucinations to improve real-world deployment quality. Produced prompt variants and response templates to address failure modes and ensure high-quality outputs in live production data.• Edited and QAed multi-turn call transcripts for style and safety• Collaborated with ML, engineering, and product on guideline validation• Monitored and documented user experience post-release• Established consistent style across sample scenarios and scripts

Crafted and refined Russian-language dialogue and prompts for a customer-facing LLM-powered voice agent, focusing on natural phrasing and context retention. Reviewed AI agent conversation transcripts for factual accuracy, tone, safety, and reduced hallucinations to improve real-world deployment quality. Produced prompt variants and response templates to address failure modes and ensure high-quality outputs in live production data.• Edited and QAed multi-turn call transcripts for style and safety• Collaborated with ML, engineering, and product on guideline validation• Monitored and documented user experience post-release• Established consistent style across sample scenarios and scripts

2023 - 2024

Education

M

Moscow State University

Bachelor of Arts, Asian and African Studies

Bachelor of Arts
2013 - 2018
I

IKIP Saraswati Tabanan

Bachelor of Arts, Indonesian Language

Bachelor of Arts
2015 - 2016

Work History

G

Garage IT

Operations Specialist

Moscow
2020 - 2023