Prompt Engineer & AI Model Evaluator | Labelbox (Alignerr Platform) | Freelance
I evaluated and rated machine-generated coding responses on the Alignerr platform for frontier LLMs, following expert-tier workflows. I created and reviewed targeted prompts for coding tasks and systematically ranked side-by-side completions for RLHF and alignment data generation. My work included comprehensive YAML-based annotation, capturing model errors and calibrating feedback to reviewer standards. • Achieved expert-tier status on Alignerr within one month. • Scored LLM outputs on correctness, reasoning, instruction-following, and safety. • Generated preference signals for RLHF training through ranking and evaluation tasks. • Used structured workflows to annotate model hallucinations, intent misreadings, and prompt drift.