LLM Evaluation & Data Annotation — AdaptiveRAG Project
Built and tested retrieval-augmented generation (RAG) frameworks for LLMs, incorporating faithfulness and hallucination evaluation. Developed and applied custom metrics (EM, F1, faithfulness, hallucination rate) to annotate and assess output quality. Produced publication-quality documentation explaining experimental setup, annotation guidelines, and evaluation criteria.• Applied label definitions for LLM answer faithfulness and hallucination • Annotated and validated responses for quality benchmarking • Iterated evaluation protocols based on model version changes • Communicated annotation outcomes for publication and review.