LLM
Comparing two responses provided by an AI agent to a prompt and evaluating it based on completeness, factual accuracy, AI performance and overall choose the better response overall guided your defined Gold standard on how the agent should respond to a specific user prompt.