LLM Evaluator – Project Babel
Evaluated prompts and responses generated by Large Language Models (LLMs) to refine AI reasoning and dialog ability. Provided in-depth feedback on answer coherence, logic, and style, contributing to ongoing LLM refinement. Assisted with fine-tuning model output to align responses with human intent and ethical standards. • Analyzed prompt-response pairs for accuracy • Documented errors and edge cases • Suggested improvements for dataset coverage • Supported the development of best practices for LLM evaluation