Evaluation of Responses to Video Inputs
Rating LLM responses to one or more user-submitted video clips, judging them for accuracy, helpfulness, and recognition of places, objects, or situations depicted in the videos.
Hire this AI Trainer
Sign in or create an account to invite AI Trainers to your job.
No subject matter listed
I have nearly 2 full years of experience training LLM's including Gemini, ChatGPT, and Claude. The vast majority of my experience has come on the Data Annotation platform, with a small amount on Labelbox. Due to my diligence and expertise, I was recognized by the DA platform as one of their top workers and was granted access to high-level projects. I have participated in a wide variety of projects, mainly evaluating text-based conversations, but I have worked with audio, image, and video input/outputs as well. I excel at thoroughly assessing LLM outputs based on safety, factuality, instruction following, helpfulness, relevance, tone, format, and conciseness. I am adept at crafting complex user prompts (and/or system prompts) that push the LLM's abilities to its limits, particularly in the Math/STEM fields. When applicable, I also edit model responses for improvement. This is generally well within my abilities since I am comfortable formatting with Markdown, LaTeX, and plain text. I speak and write fluently in English and Spanish, enabling me to produce high-level work in bilingual and/or translation projects. Apart from strictly LLM training, I am skilled in technical writing, deep research, fact-checking, data analysis and presentation, and Microsoft Office / Google tools. I produce original, authentic, high-quality work in a timely manner.
Rating LLM responses to one or more user-submitted video clips, judging them for accuracy, helpfulness, and recognition of places, objects, or situations depicted in the videos.
Training Claude to use correct logic, reasoning, and calculation on STEM prompts that are highly complex or require multiple steps.
Comparing overall quality of chatbot responses to a wide variety of queries in Spanish, taking into account safety, instruction following, and factuality, among other things.
Conversing with LLM's over multiple turns, then editing substandard responses to achieve optimal quality.
Thoroughly assessing every single factual claim in lengthy text responses from LLM. Identifying inaccurate, disputed, or unsupported claims.
Bachelor of Science, Statistics
Delivery Driver/Customer Service
School Bus Driver