Applied LLM Engineer - Prompts, Evals & Agents

Reposted 15 Days Ago
8 Locations
In-Office or Remote
Mid level
Artificial Intelligence • Information Technology
The Role
As an AI Engineer, you'll design prompts for voice AI agents, build user-facing tools, integrate LLMs, and evaluate performance based on customer feedback.
Summary Generated by Built In

Synthflow AI is a no-code platform for deploying voice AI agents that automate phone calls across contact center operations and business process outsourcing (BPO) at scale. We help mid-market and enterprise companies manage routine calls to save teams time and resources.

Our agents have already delivered measurable impact:

  • Over 5 million hours of contact center operations saved

  • 35% more calls answered compared to non-AI operators

  • 45 million calls handled with a 99.9% uptime

Backed by Accel, Atlantic Labs, and Singular and trusted by over 1,000 customers, our growth leads an industry shift toward sophisticated and accessible conversational AI.


The Role

We’re hiring an Applied LLM Engineer who lives and breathes prompt engineering and writes excellent production-grade Python. You’ll run a tight feedback loop with customers, turn real conversations into better prompts and eval datasets, and ship changes that measurably improve agent outcomes. This is a highly applied role working directly with customer feedback.

What You’ll Do
  • Design & iterate prompts (system, tool/function-calling, task prompts) to boost voice AI agent success, reliability, and tone.

  • Build co-pilots for customers to author their own prompts: meta-prompted assistants that suggest structures, lint for risks, autocomplete tool schemas, critique drafts, and generate eval cases.

  • Work directly with customer feedback and conversation logs to identify failure modes; translate them into prompt changes, guardrails, and data improvements.

  • Build eval datasets (success labels, rubrics, edge cases, regressions) and run offline/online evaluations (A/B tests, canaries) to quantify impact.

  • Create Python utilities/services for prompt versioning, config-as-code, rollout/rollback, and guardrails (policies, refusals, redaction).

  • Partner with PM/Success to define success metrics (task completion, first-pass accuracy, cost, latency) and instrument dashboards/alerts.

  • Own LLM integration details: function/tool schemas, output parsing/validation (pydantic), retrieval-aware prompting, and fallback strategies.

  • Ensure privacy & compliance (PII handling, anonymization, regional data boundaries) in datasets and logs.

  • Share learnings via concise docs, playbooks, and internal demos.

Must-Have Skills
  • Python: 3+ years writing clean, tested, production code (typing, pytest, profiling); experience building small services/APIs (FastAPI preferred).

  • Prompt Engineering: Hands-on experience designing system/tool prompts, meta-prompting, rubric graders, and iterative prompt tuning based on real user data.

  • LLM Integration: Comfortable with major APIs (OpenAI/Anthropic/Google/Mistral), function/tool calling, streaming, and robust output handling.

  • Evaluation Mindset: Ability to define measurable success, create labeled datasets, and run methodical experiments/A/B tests.

  • Product Sense: Comfortable talking with customers, turning qualitative feedback into shipped improvements.

  • Data Hygiene: Practical experience cleaning, labeling, and balancing datasets; awareness of privacy/PII constraints.

Nice-to-Haves
  • Experience building prompt-authoring UIs/SDKs or internal tooling for prompt versioning and governance.

  • Agentic frameworks & tooling: DSpy, MCP, LangGraph, LlamaIndex, Rasa; experience with agent/tool schemas and orchestration.

  • Observability & eval tooling: Langfuse, LangSmith, Braintrust; building eval harnesses and experiment dashboards.

  • RAG & vector stores: Qdrant/Weaviate/Pinecone and retrieval-aware prompting.

  • Experimentation workflows: A/B testing, prompt diffing/versioning.

  • Infra & analytics: light SQL/log analysis, metrics & tracing, simple Grafana/OTel dashboards.

  • Writing public blog posts or talks about applied LLM techniques.

Interview Process
  1. 30-min intro screen – background, role fit, questions both ways.

  2. Practical exercise (Prompt + Python) – design a prompt strategy, a customer-facing co-pilot flow, and a small eval harness.

  3. Team interviews – deep dive on product mindset, experimentation rigor, and collaboration.

  4. Founder/Leadership chat – scope, impact, and ways of working.

Why Join
  • Own the reasoning layer and the customer co-pilot experience used at scale.

  • Ship fast in a tight customer feedback loop and see your impact measured in days, not quarters.

Founded in Berlin in 2023 by serial entrepreneurs Albert Astabatsyan, Hakob Astabatsyan, and Sassun Mirzakhan-Saky, Synthflow AI democratizes access to advanced voice AI with a no-code platform that lets enterprises easily create, deploy and scale natural-sounding, cost-effective voice agents tailored to their business needs.

Top Skills

Fastapi
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Berlin
45 Employees

What We Do

Forget lengthy development cycles and expensive machine learning teams. With Synthflow AI you can build sophisticated, tailored AI agents without technical skills or coding - just bring your data and ideas.

Similar Jobs

Samsara Logo Samsara

Marketing Manager

Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
Easy Apply
Remote or Hybrid
Vancouver, BC, CAN
2800 Employees
113K-146K Annually

Monadical Canada Inc Logo Monadical Canada Inc

Chief Of Staff

Artificial Intelligence • Software • Generative AI
In-Office or Remote
Montréal, QC, CAN
30 Employees
80K-190K Annually

Webflow Logo Webflow

Senior Full-stack Engineer

eCommerce • Software • Design
Easy Apply
In-Office or Remote
3 Locations
800 Employees
132K-207K Annually

Webflow Logo Webflow

Senior GTM Finance Analyst, Self Serve

eCommerce • Software • Design
Easy Apply
In-Office or Remote
3 Locations
800 Employees
102K-177K Annually

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account