Agent Error Analysis
TThis project focused on the review and evaluation of LLM responses to an human prompt in YAML format. There were 2 answers provided: one fail trajectory where the agent failed the task and one pass trajectory where the agent succeded in the task. The evaluation was based on the completion of success criteria provided by the creator of the task. My job was to evaluate the fairness of the task and assess whether the agent fairly succeded in the task or used some kind of hack.