Prompt-to-Output Bias Detection Study (Self-Initiated Project)
Conducted prompt-to-output bias detection experiments across three large language model (LLM) tools. Systematically annotated and categorized outputs, identifying biases, hallucinations, and policy-sensitive cases in generated responses. Demonstrated robust quality assurance discipline aligned with real-world RLHF evaluation workflows. • Tested 60+ prompt variations and categorized 180 LLM outputs • Flagged 23 factual errors and 11 edge cases for bias • Structured annotations documented in Excel • Enhanced detection of demographic and policy-sensitive issues.