Firework

Prompt Engineer

Posted 5 Days Ago

Be an Early Applicant

Hiring Remotely in México

Remote

Mid level

Artificial Intelligence • Machine Learning • Marketing Tech • Social Media • Software

Firework gives ecommerce businesses, publishers and advertisers a platform to create an engaging user experience.

The Role

Design, test, and refine prompts for LLMs and multi-modal generative models, build golden evaluation datasets, create reusable prompt libraries and templates, integrate prompts into products, and drive continuous improvement via automated and human-in-the-loop evaluation.

Summary Generated by Built In

About Firework

Join Firework – Where Innovation Meets Impact

Firework is revolutionizing connected commerce with the world’s most advanced and largest AI-powered video commerce platform, trusted by global brands and leading retailers. We bring the energy of in-store experiences online, transforming how businesses engage, convert, and build lasting customer relationships.

At Firework, you’ll be part of a high-growth, team-centric environment where innovation thrives and collaboration fuels success. Having raised over $235m to date led by investors such as the SoftBank Vision Fund 2 and operating at a global scale, we offer unparalleled opportunities to work cross-functionally, solve complex challenges, and drive meaningful impact in the future of connected digital commerce.

If you’re curious, ambitious, and energized by big ideas, Firework is the place to grow, lead, and shape the next era of online shopping—together.

Summary

We are seeking a highly creative and technically curious AI Prompt Engineer to help optimize and develop prompts that drive performance from large language models. In this role, you’ll serve as the bridge between human intent and machine output - designing effective inputs to elicit high-quality, reliable, and ethical responses from generative AI systems.

What you’ll be doing

Stay on top of prompt engineering techniques, research, and best practices.
Develop and curate golden datasets for prompt evaluation and regression testing across modalities, ensuring long-term quality control and reproducibility.
Design, test, and refine prompts to support a wide range of generative AI applications—not limited to chat, but also including audio synthesis, avatar animation, lip-sync alignment, and product image generation.
Document best practices and create reusable prompt templates to support internal stakeholders, improving prompt consistency, clarity, and alignment across teams.
Refactoring existing prompts to follow best-practice approaches.
Collaborate with cross-functional teams to integrate AI-driven features into real-world product experiences, ensuring prompts are aligned with user needs, system constraints, and business goals.
Build and maintain prompt libraries with clear versioning, metadata tagging, and usage patterns to support scalable and reusable development.
Drive continuous improvement in prompt performance by using both automated metrics and human-in-the-loop evaluation pipelines.
Contribute to and extend our internal evaluation framework—designing new evaluation flows, creating prompt-specific test cases, and defining metrics tailored to multi-modal output.

You will have

Bachelor’s or Master’s degree in STEM or related field.
Practical experience working with large language models and/or multi-modal generative models (e.g., text-to-audio, text-to-image, video or avatar generation).Familiarity with prompt techniques such as zero-shot, few-shot, chain-of-thought, tool usage, and retrieval-based augmentation.
Strong analytical and linguistic intuition, with the ability to translate abstract goals into effective machine-readable instructions.
Deep interest in language and communication systems, and how humans and machines can interact effectively through prompt-based interfaces.
Ability to create and maintain curated evaluation datasets (“golden sets”) to support ongoing testing and performance benchmarking.
Strong writing and communication skills, with the ability to explain prompt behavior, rationale, and trade-offs to technical and non-technical audiences

We’ll be excited if you have

Hands-on experience with Python or another scripting language of choice.
Experience with Jupyter Notebooks, or LLM ops tools and libraries such as LangChain, LangFuse, PromptLayer, or vector search systems.
Experience designing or working within evaluation pipelines, including human and automated evaluations, metric design, and result interpretation.

Locations

The role is remote, out of Mexico.

Don’t hold back

We understand some candidates may see the above and not apply because they don’t meet all the qualifications. We encourage you to apply anyway; we often find talented candidates that fit many other opportunities we have and look for potential too, not just what you did in the past. As an equal employment opportunity employer, we are a diverse team that strives for an inclusive environment for all. We prohibit discrimination and harassment of any kind based on race, color, sex, religion, sexual orientation, national origin, age, disability, genetic information, pregnancy, or any other protected characteristic as outlined by federal, state, or local laws.

Top Skills

Large Language Models,Multi-Modal Generative Models,Python,Jupyter Notebooks,Langchain,Langfuse,Promptlayer,Vector Search Systems

View all jobs at Firework

View Firework Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Mateo, CA

270 Employees

Year Founded: 2017

What We Do

Firework is the world's leading immersive digital transformation and engagement platform with shoppable video, live streaming commerce, and monetization capabilities.

Powering over 600 direct-to-consumer brands, retailers, and media publishers worldwide, Firework brings TikTok-like interactive video experiences to your own websites and app. We enable customers to create and host native, shoppable video content for engaging product discovery, seamless shopping experiences, and a deeper emotional connection with consumers. The company is backed by IDG Capital, Lightspeed Venture Partners, and GSR Ventures, with over $90 million in capital raised to date with offices in the US(SF and NYC), Toronto, Poland, Slovakia, Brazil, and China.

Why Work With Us

We are a diverse team where everyone belongs. We are creative, curious, and cool in a nerdy way. We believe in growth, results, and in each other and that perfection is a work-in-progress. We are just the right amount of extra and want to change the digital game.