We’re hiring an AI Quality Analyst to help us evaluate a personalization feature we’re building into Gemini. The idea behind it is pretty straightforward, the model should be able to use what it knows about you (past conversations, Gmail, Search, YouTube activity) to give you answers that actually feel relevant, not just technically correct.
Your job is to put that to the test. You’ll come up with prompts based on your own experiences, run them through the model, and then honestly assess whether the responses felt personalized in a meaningful way or just kind of generic with a personal detail tacked on. It’s equal parts creative and analytical, and the quality of your judgment really does matter here.
ResponsibilitiesDesign multi-turn conversational prompts (typically 1–5 turns) that require the AI to draw on real personal information and experiences.
Evaluate whether the model applied personalization correctly based on what was actually being asked.
Review responses for Grounding issues. Flag anything that looks like a flawed inference or hallucination rather than evidence-backed reasoning.
Assess Integration quality. Does the personal data feel naturally woven in, or does it come across as robotic and forced?
Stack-rank two model responses side-by-side (SxS) based on helpfulness, ease of use, and overall quality.
Write clear, well-structured rationales that reference specific turns in the conversation.
Extract and verify “Debug Info” to confirm chat summaries and data sources were properly used.
Clear evaluation conversations after each session to maintain clean data.
Strong English reading and writing skills, the project is conducted entirely in English.
Demonstrated ability to evaluate nuanced or ambiguous AI responses and explain your reasoning clearly.
Comfortable working independently in a remote setup with minimal hand-holding.
Reliable desktop or laptop with a stable internet connection.
Full-time availability in your local time zone with at least 4 hours of daily overlap with PST.
Experience in data annotation, AI quality evaluation, content moderation, or something similar.
BS/BA degree or equivalent experience in a relevant field - Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or anything analytically rigorous.
Familiarity with personalization concepts and a good instinct for spotting bad inferences or forced connections.
Experience designing prompts or testing AI systems in any capacity.
Sharp attention to detail when comparing side-by-side responses, especially around tone and naturalness.
Ability to write feedback that’s specific and actionable, not just general impressions.
This is a contractor role starting immediately. We’re running a 24-hour global operation, so schedule consistency matters. There are two commitment options:
• 30 hours/week - at least 4 hours per day, with a minimum 4-hour overlap with PST.
• 40 hours/week - same daily and overlap requirements.
There are three steps to complete before being considered:
• Screener
• Three assessments
• Language vetting
Shortlisted candidates will receive a Job Interest Form first. Once your profile is reviewed, you’ll have 24 hours to complete an assessment. From there, we’ll get in touch with finalists to go over pre-onboarding requirements.
Skills Required
- Strong English reading and writing skills
- Demonstrated ability to evaluate nuanced or ambiguous AI responses and explain reasoning clearly
- Comfortable working independently in a remote setup with minimal supervision
- Reliable desktop or laptop with a stable internet connection
- Full-time availability (30 or 40 hours/week) with at least 4 hours of daily overlap with PST
What We Do
Careerflow.ai is an AI-powered career management platform and 'career copilot' dedicated to helping job seekers land their dream jobs. The company provides a comprehensive end-to-end toolkit featuring an AI resume builder, LinkedIn profile optimizer, and job tracking tools. By streamlining the application process and optimizing professional profiles, Careerflow helps users navigate the competitive job market and get hired at top tech and startup companies faster.








