Machine Learning Intern – EmbedNexus
I conducted benchmarking of large language models (LLMs) on CPU-only hardware, focusing on evaluating their outputs. This process involved running structured tasks and assessing results for latency, accuracy, and consistency. Quantized LLMs were deployed for chat, image analysis, and file-based Q&A workflows, and their performance was systematically compared. • Ran evaluation tasks on LLMs using structured datasets. • Measured and recorded model responses to benchmark output quality. • Assessed consistency and reliability of results under different quantization settings. • Used prompt-driven validation mechanisms and documented findings.