Generalist
This project has many different tasks and below are some examples. AI Response Evaluation – Evaluating AI answers for correctness, relevance, reasoning quality, and completeness. Preference Ranking – Analyzing several AI responses to choose the best one based on project guidelines. Fact-Checking & Hallucination Detection – Detecting incorrect, misleading, or fabricated information. Instruction Following Assessment – Verifying whether the model correctly followed user prompts and constraints. Reasoning & Logic Evaluation – Evaluation of step-by-step reasoning, coherence, and accuracy in problem-solving. Tone & Safety Review – Verifying that answers are appropriate, unbiased, safe, and policy-compliant. Text Quality Review – Assessing the readability, grammar, structure, and clarity of AI outputs. Error Classification – Identifying whether a certain error is factual, logical, formatting, or instruction-related. Feedback & Justification Writing – Explaining rankings and correction