LLM Evaluation
I have hands-on experience evaluating AI and chatbot outputs, including large language models (LLMs). My work involved assessing the completeness, accuracy, relevance, and fluency of AI responses across multiple domains. I also created and tested prompts to ensure the models generate contextually appropriate and high-quality outputs. This evaluation work helped improve AI performance, enhance user experience, and maintain high standards for training datasets.