Mercor — N*** (RLHF Evaluation)
Multi-turn conversational AI evaluation for frontier model fine-tuning. Assessed LLM responses across structured rubric dimensions including accuracy, instruction-following, reasoning depth, safety, and stylistic quality. Produced detailed comparative rankings and written justifications for each evaluation. Tasks span enterprise technology, code review, and complex multi-step reasoning domains. Multilingual capability (English, French) applied to cross-lingual quality assessment. Maintained consistently high quality scores across hundreds of evaluations.