Senior Software Architect & AI Systems Evaluator
Performed reinforcement learning from human feedback (RLHF) to improve large language model (LLM) output alignment and reliability. Conducted rubric-based comparative model evaluation, hallucination detection, and prompt engineering trials. Assessed AI systems for bias, safety, and factual consistency in technical and cross-domain environments. • Executed human-in-the-loop (HITL) evaluations to collect preference data. • Designed reward models for supervised fine-tuning (SFT). • Tested models for prompt injection vulnerabilities and security threats. • Audited LLM outputs for PII and HIPAA compliance.