Legal AI Evaluator & Prompt Engineer (Legal & Government Domain)
Evaluated large language model responses in a high‑stakes Legal & Government labeling project. Designed and iterated complex legal prompts requiring multi‑step statutory interpretation, jurisdictional analysis, constitutional reasoning (Fourth Amendment, Due Process, First Amendment, nondelegation), and administrative law principles. Conducted multi‑turn adversarial conversations with anonymous AI models, building context over several turns to test legal reasoning, circuit splits, and doctrinal ambiguities. Performed side‑by‑side comparative evaluations of model outputs, selecting preferences and tagging specific legal weaknesses (e.g., factual errors, misapplied precedent, hallucinated citations, jurisdiction errors, overbreadth analysis failures). Ensured prompt difficulty met rigorous AI Critic thresholds, eliminating basic factual lookups and demanding professional‑grade legal analysis. Contributed to improving model accuracy, coherence, and legal safety across constitutional, criminal procedure, immigration, and administrative law domains. *Project size: 300+ labeling tasks. Quality measures: adherence to legal accuracy standards, consistency checks, and detailed weakness tagging following a 20+ tag taxonomy.