Careerflow.ai

LLM - AI Quality Analyst (Personalization) - Spanish

Reposted 6 Days Ago

Be an Early Applicant

Hiring Remotely in ES

Remote

Entry level

Artificial Intelligence • HR Tech • Software • Generative AI

The Role

Design multi-turn prompts to test personalization, evaluate and rank model responses for grounding, integration, and helpfulness, write clear rationales, extract debug info, and clear sessions. Work independently in a remote, full-time contractor role with required daily PST overlap.

Summary Generated by Built In

Role Overview

We’re hiring an AI Quality Analyst to help us evaluate a personalization feature we’re building into Gemini. The idea behind it is pretty straightforward, the model should be able to use what it knows about you (past conversations, Gmail, Search, YouTube activity) to give you answers that actually feel relevant, not just technically correct.

Your job is to put that to the test. You’ll come up with prompts based on your own experiences, run them through the model, and then honestly assess whether the responses felt personalized in a meaningful way or just kind of generic with a personal detail tacked on. It’s equal parts creative and analytical, and the quality of your judgment really does matter here.

Responsibilities

Design multi-turn conversational prompts (typically 1–5 turns) that require the AI to draw on real personal information and experiences.
Evaluate whether the model applied personalization correctly based on what was actually being asked.
Review responses for Grounding issues. Flag anything that looks like a flawed inference or hallucination rather than evidence-backed reasoning.
Assess Integration quality. Does the personal data feel naturally woven in, or does it come across as robotic and forced?
Stack-rank two model responses side-by-side (SxS) based on helpfulness, ease of use, and overall quality.
Write clear, well-structured rationales that reference specific turns in the conversation.
Extract and verify “Debug Info” to confirm chat summaries and data sources were properly used.
Clear evaluation conversations after each session to maintain clean data.

Required Qualifications

Strong English reading and writing skills, the project is conducted entirely in English.
Demonstrated ability to evaluate nuanced or ambiguous AI responses and explain your reasoning clearly.
Comfortable working independently in a remote setup with minimal hand-holding.
Reliable desktop or laptop with a stable internet connection.
Full-time availability in your local time zone with at least 4 hours of daily overlap with PST.

Preferred Qualifications

Experience in data annotation, AI quality evaluation, content moderation, or something similar.
BS/BA degree or equivalent experience in a relevant field - Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or anything analytically rigorous.
Familiarity with personalization concepts and a good instinct for spotting bad inferences or forced connections.
Experience designing prompts or testing AI systems in any capacity.
Sharp attention to detail when comparing side-by-side responses, especially around tone and naturalness.
Ability to write feedback that’s specific and actionable, not just general impressions.

Engagement Details

This is a contractor role starting immediately. We’re running a 24-hour global operation, so schedule consistency matters. There are two commitment options:
• 30 hours/week - at least 4 hours per day, with a minimum 4-hour overlap with PST.
• 40 hours/week - same daily and overlap requirements.

Hiring Process

There are three steps to complete before being considered:

• Screener

• Three assessments

• Language vetting

Shortlisted candidates will receive a Job Interest Form first. Once your profile is reviewed, you’ll have 24 hours to complete an assessment. From there, we’ll get in touch with finalists to go over pre-onboarding requirements.

Skills Required

Strong English reading and writing skills
Demonstrated ability to evaluate nuanced or ambiguous AI responses and explain reasoning clearly
Comfortable working independently in a remote setup with minimal supervision
Reliable desktop or laptop with a stable internet connection
Full-time availability (30 or 40 hours/week) with at least 4 hours of daily overlap with PST

View all jobs at Careerflow.ai

View Careerflow.ai Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

What We Do

Careerflow.ai is an AI-powered career management platform and 'career copilot' dedicated to helping job seekers land their dream jobs. The company provides a comprehensive end-to-end toolkit featuring an AI resume builder, LinkedIn profile optimizer, and job tracking tools. By streamlining the application process and optimizing professional profiles, Careerflow helps users navigate the competitive job market and get hired at top tech and startup companies faster.