AI Training and LLM Evaluation Specialist
I evaluate and train large language models by reviewing, comparing, and rating their responses across various AI platforms. My role includes the use of structured rubrics to assess instruction retention, inference coherence, specificity, atomicity, and verifiability. This process helps improve model reliability, accuracy, and quality of generated responses. • Conducted systematic evaluation of LLM responses using detailed rubrics • Ranked model outputs to identify strengths and weaknesses in reasoning and factuality • Generated prompt improvements and analyzed responses for hallucinations and errors • Worked with platforms including Alignerr, Handshake AI, and Mercor to benchmark AI models.