Ongoing – expert rubric grading and response evaluation for LLM training and benchmarks
I support an ongoing program that supplies high-quality human judgments on challenging, domain-specific prompts and model outputs. Work is document-grounded where a source is provided: I keep claims tied to the supplied material, flag gaps instead of inventing facts, and follow strict output rules (structure, audience, and deliverable type). Scoring mixes writing-quality dimensions (clarity, organization, tone fit) with objective non-writing checks (accuracy, instruction adherence, required calculations or labels when applicable). I maintain consistency across long guidelines, avoid rubric drift, and write short, evidence-based justifications when scores are partial. The role requires careful reading, attention to edge cases, and steady throughput without sacrificing calibration.