Large Language Model Evaluation & Prompt Testing
worked on a large-scale AI training projects focused on improving the performance and reliability of a large language models (LLMs). the scope involved evaluating and annotating high-volume text datasets across diverse subject areas, including STEM, history, culture, and general knowledge