AI Response Evaluation & Prompt Design (Mindrift/Toloka)
The project consists of generating multi-turn conversations with LLMs then analysing and rating their responses. Responsibilities include designing realistic prompts across multiple turns, structuring the dialogue with the LLMs appropriately, and evaluating their outputs based on strict dimensions. RLHF-style evaluation contributing to improving model performance through detailed qualitative feedback.