AI & Software Engineer — Remote
I evaluated and ranked large language model (LLM) outputs to improve reasoning accuracy and reduce errors. My work included optimizing datasets and performing detailed error analysis as part of AI training pipelines. I contributed to structured data annotation processes to support LLM development and testing. • Conducted targeted LLM evaluation focusing on reasoning. • Performed response ranking and accuracy analysis. • Improved dataset quality for reduced hallucinations. • Utilized data annotation to refine AI models.