LLM Output Evaluator and AI Content Moderator
Evaluated and improved large language model (LLM) outputs across reasoning, code generation, safety, and factual accuracy. Designed and implemented multilingual AI evaluation workflows and content moderation processes for both English and Spanish datasets. Performed regular critical review of AI-generated outputs to ensure domain accuracy and safety compliance. • Authored and scored responses for prompt engineering and code generation tasks. • Conducted chain-of-thought analysis and RLHF (Reinforcement Learning from Human Feedback) evaluations. • Managed cross-cultural review teams for large-scale annotation projects. • Applied security and anomaly detection expertise during LLM output red-teaming.