RLHF Prompt Evaluation & Code Output Rating
Curated 2 000 + multilingual prompts and completions for instruction-following and code-generation RLHF pipelines. Rated outputs for correctness, bias, and hallucination; wrote gold-standard responses; and auto-scored unit-test pass/fail results. Effort lifted model pass@1 accuracy +12 % and cut reviewer time 30 % under SOC-2 workflows.