For employers

Hire this AI Trainer

Sign in or create an account to invite AI Trainers to your job.

Invite to Job
Marin Štimac

Marin Štimac

LLM evaluator, RLHF reward model trainer, Senior IT and AI/ML Product Manager

CROATIA flag
Zagreb, Croatia
Entry LevelAws Sagemaker

Key Skills

Software

AWS SageMakerAWS SageMaker

Top Subject Matter

Legal Tech
LLM Evaluation
Risk & Bias Mitigation

Top Data Types

TextText
DocumentDocument
Computer Code ProgrammingComputer Code Programming

Top Task Types

Text Generation
Question Answering
Text Summarization
Evaluation Rating
Computer Programming Coding
Function Calling
Prompt Response Writing SFT
Red Teaming
RLHF
Fine Tuning
Transcription

Freelancer Overview

A Senior IT and AI/ML Product Manager searching for LLM evaluation, RLHF reward model training, data annotation and data labeling projects for additional income, while job hunting for full time Senior PM roles. Key skills include: • Prototyping: Windsurf, VS Code, GitHub Copilot • Data analysis, ETL, visualization: SQL, PostgreSQL, MS Power Query, MS Power BI, ERD, DAX • MLOps: advanced prompt/context engineering, aws SageMaker (data ETL, pipelines, ML model training), Deepeval & GEval (model evaluation coding, utilizing LLM-as-a-judge) • Integrations: RESTful APIs, SOAP protocol, authentication, authorization, tokenization • Tools: Jira and Confluence Admin, Notion Admin, Quantive Admin (OKR methodology), Camunda Admin (process diagrams and user flows, built on BPMN 2.0. standard), Canva

Entry LevelLatinEnglishItalianSerbianSlovenianSpanishCroatian

Labeling Experience

AI platform builder, LLM evaluator

TextEvaluation Rating
As a personal portfolio project, I built and evaluated an AI legal tech platform prototype, focusing on output quality and safety. My work involved evaluating LLM-generated legal text utilizing the LLM-as-a-judge architecture, the GEval metric implemented in Python's Deepeval library and Windsurf. I used strict rubric criteria to assess hallucinations, bias, and security risks in model outputs. • Evaluated legal tech LLM outputs for grounding, accuracy, bias, and safety. • Tuned rubrics and grading criteria using advanced evaluation tooling. • Leveraged Python's Deepeval library with GEval metrics for systematic review. • Applied LLM-as-a-judge architecture in the legal domain.

As a personal portfolio project, I built and evaluated an AI legal tech platform prototype, focusing on output quality and safety. My work involved evaluating LLM-generated legal text utilizing the LLM-as-a-judge architecture, the GEval metric implemented in Python's Deepeval library and Windsurf. I used strict rubric criteria to assess hallucinations, bias, and security risks in model outputs. • Evaluated legal tech LLM outputs for grounding, accuracy, bias, and safety. • Tuned rubrics and grading criteria using advanced evaluation tooling. • Leveraged Python's Deepeval library with GEval metrics for systematic review. • Applied LLM-as-a-judge architecture in the legal domain.

2025 - 2025

Education

M

Microsoft Learn

Certification, Power Query, Power BI, ERD, DAX

Certification
2026 - 2026
U

Udacity, Derek Steer (fmr CEO Mode, current CEO Superframe)

Certification, SQL for data analysis

Certification
2026 - 2026

Work History

S

Self-Employed

Founder and CEO

Zagreb
2024 - Present
M

Mer, a Visma company

Product Manager

Zagreb
2021 - 2024