Shilosky Pierre - LLM Evaluation and Text Generation Specialist in French

Key Skills

Software

Appen

Scale AI

Top Subject Matter

No subject matter listed

Top Data Types

Audio

Document

Text

Top Task Types

Classification

Evaluation Rating

RLHF

Freelancer Overview

I have contributed to the Cypher-RLHF project for Scale AI (via Outlier), performing high-quality annotations and targeted fact-checking to improve model relevance and robustness. That work sharpened my analytical and critical-thinking skills, trained me to locate and verify accurate information quickly, and increased my efficiency in producing consistent, relevant training examples under project guidelines. On the Whatcom project I focused on accurate audio-to-text transcription, which strengthened my communication skills and attention to nuance in spoken language. Together these experiences made me highly meticulous and reliable in labeling and QA tasks. Key strengths I bring: data labeling and transcription accuracy, information verification, adherence to annotation guidelines, efficient throughput, and clear collaboration with project teams.

Entry LevelFrenchEnglish

Labeling Experience

Whatcom

AppenAudioText Generation

The Whatcom project involved sentence-level audio-to-text transcription under a strict style guide to produce readable, faithful transcripts. Key requirements included preserving intonation via correct punctuation, writing all numbers as words (no digits), and consistently handling disfluencies and non-speech. Quality was maintained through spot checks and peer review, which strengthened my listening accuracy, linguistic attention to detail, and consistency.

2025 - 2025

Cypher - RLHF / Cypher - Evals

Scale AITextEvaluation Rating

The project’s objective was to improve model response accuracy by directly comparing two answers generated from the same prompt. For each comparison, annotators determined whether a set of predefined strict conditions (a minimum of three) were met that would indicate at least a minor failure in one or both responses. This binary/comparative scope focused the work on measurable model weaknesses and helped prioritize corrective signals for training. My specific tasks included blind evaluation and rating of paired responses against five core criteria: accuracy, instruction-following (respect for prompt restrictions), localization, response length (insufficient, appropriate, or overly verbose), and harmful content (e.g., racism, sexism). The project processed roughly 3,000 batches per week, and each rating required a short written justification to ensure traceability and quality control.

2024 - 2025

Education

C

Cadi Ayyad University, Center of Excellence FSJES

Bachelor’s Degree, Applied Finance

Bachelor’s Degree

2022 - 2025

C

Collège Frère André

High School Diploma, General Education

High School Diploma

2014 - 2021

Work History

F

FINNEXUS Club

Secretary General

Marrakech

2024 - Present

M

MOAJ

Customer Advisor

Gueliz, Marrakech

2025 - 2025