Team Lead - Terminal Bench 2.0 (Harbor) | QA Reviewer | LLM Trainer
Oversaw a team in conducting rubric-based evaluation and review tasks for benchmark-oriented LLM projects. Performed detailed prompt and response evaluations, including red-teaming, instruction-following assessment, and edge-case identification for LLM training pipelines. Contributed actively to prompt engineering, annotation, response ranking, and training data refinement workflows. • Maintained and elevated team consistency in LLM quality assurance and review standards. • Reviewed 30+ LLM coding and reasoning tasks per week for accuracy, safety, completeness, and reasoning quality. • Identified hidden failure modes, adversarial scenarios, and performed evaluator calibration across projects. • Liaised with stakeholders to balance throughput targets with rigorous rubric and output criteria.