Medical and Legal Text Evaluation for RLHF Systems
I contributed to several large-scale AI training projects aimed at improving LLM alignment and safety, particularly in the medical and legal domains. Tasks included: ranking AI-generated responses (RLHF), evaluating text accuracy, identifying bias/harmful outputs, summarizing complex clinical information, verifying translations, and classifying multilingual medical/legal content. I worked in both English and Italian, using detailed task-specific guidelines to maintain consistency and a >95% quality rating across over 1,200+ reviewed items. I also participated in prompt-response refinement and red-teaming activities to detect unsafe or hallucinated medical claims. Projects complied with HIPAA/GDPR standards and required strict adherence to confidentiality and accuracy metrics.