Project Almanac
Computer Code ProgrammingEvaluation Rating
This project involves creating research-based prompts with verifiable answers to challenge advanced models. We basically need to evaluate a model's response on a prompt in a specific domain and aim to stump the model hence training for further cases.
This project involves creating research-based prompts with verifiable answers to challenge advanced models. We basically need to evaluate a model's response on a prompt in a specific domain and aim to stump the model hence training for further cases.
2025 - Present