AI Model Evaluation and Data Labeling for Large Language Models (LLMs)
Engaged in detailed evaluation and refinement of Generative AI models by assessing and rating AI-generated responses. The tasks involved meticulously following specific guidelines to ensure the accuracy and relevance of AI outputs in an enterprise context. This included identifying and correcting issues such as "hallucinations" (inaccurate information generated by AI) and evaluating the alignment of responses with given prompts. Worked on multiple dimensions of response quality, including truthfulness, instruction following, verbosity, and formatting. Contributed to the overall improvement of AI systems used by enterprise clients by providing thorough and consistent feedback on AI performance.