Generalist
Project Scope: LLM Evaluation, Rubric Design & Comparative Response Analysis As part of a multimodal AI development initiative, contributed to Large Language Model (LLM) alignment and performance enhancement through structured evaluation and human feedback workflows. Responsibilities included generating high-quality prompts and benchmark texts, designing detailed evaluation rubrics, and conducting head-to-head comparative assessments of model-generated responses. Performed qualitative and quantitative analysis of outputs based on coherence, factual accuracy, reasoning depth, instruction adherence, and safety compliance. Executed structured question-answering tasks to test domain adaptability and contextual understanding, while documenting edge cases and failure patterns to inform iterative fine-tuning and reinforcement learning cycles.