Attempter
The scope of the project is to Evaluate Ai responses based on user's prompt and evaluate every response based on specific criteria such as Instruction follow, Localization, Truthfulness, Harmlessness. Etc. Then after evaluating you compare between them based on the abovementioned evaluations and decide which one is better or both of them are even