Evaluating AI models generated text at Outlier AI
At Outlier AI, I am working as text‐annotator to assess the quality of AI‑generated responses. Working from a custom labeling schema, I reviewed thousands of samples, scoring each on coherence, relevance, factual accuracy, bias and style consistency. I used Label Studio to capture both quantitative ratings and qualitative feedback. Aggregated insights were translated into clear performance metrics and actionable recommendations that directly informed model fine‑tuning, resulting in measurable improvements in generation quality and a significant reduction in unsafe or off‑topic outputs.