Cross-Model Comparative LLM Evaluation
Compared outputs from multiple frontier LLMs on identical prompts. Rated responses for correctness, clarity, style, safety, and grounding. Highlighted strengths, weaknesses, and qualitative differences between models. Provided detailed feedback to support model benchmarking.