Software Engineer, Data

Sorry, this job was removed at 02:28 p.m. (CST) on Friday, Jul 12, 2024
2 Locations
Remote
Hybrid
3-5 Years Experience
Artificial Intelligence • Machine Learning • Other
The Role

Summary

This role can either be fully in-person or remote.


We believe that high quality data is the most important part of creating high performance machine learning systems, regardless of whether they are simple classifiers or state of the art reasoning agents. Unlike many other organizations, we view this work, and this role, as one of the most important at the company.


In this role, you will work on the most important part of our system--the software infrastructure for collecting, preprocessing, generating, analyzing, and distilling the wide variety of data sources that go into both our primary pretraining data corpus, as well as the datasets for all of the other ancillary and secondary models and system. You will make a meaningful, measurable impact on the performance of our systems, and experience the joy of spending time to make high quality software that makes high quality data.


Example projects

Discover, reprocess, and clean open source datasets that are applicable to our targeted LLM capabilities.

Utilize or develop helper models to accurately classify, tag, or preprocess various forms of text.

Create a high quality OCR pipeline for pulling pre-training text from images and scans.

Integrate with third parties that can manually generate, label, or fix existing datasets, and do so in an efficient, reliable manner.

Work with researchers to find creative ways to leverage existing LLM tooling into data pipelines.

Collaborate with product team to discover what types of data have the most downstream impact on end-to-end product power.

Create statistical methods and analyses to investigate various quality metrics about our datasets, and compare and contrast various data cleaning or generation methodologies.

In summary: create robust systems for making sure all input data into our LLMs is 100% good data.



You are

Detail oriented. Data mistakes are easy to make and hard to catch.

Passionate about data. You should be happy to look at and deeply engage with the raw data.

An excellent software engineer. We care about engineering best practices.

Familiar with Python.



Compensation and Benefits

Work on the most important part of our system

Work at a place that deeply cares about data quality

Work directly on creating software with human-like intelligence

Very generous compensation

Flexible working hours

Work remotely

Time and budget for learning and self improvement

Compensation packages are highly variable based on a variety of factors. If your salary requirements fall outside of the stated range, we still encourage you to apply. The range for this role is $170,000–$350,000 cash, $10,000–$2,000,000 in equity.



How to apply

All submissions are reviewed by a person, so we encourage you to include notes on why you're interested in working with us. If you have any other work that you can showcase (open source code, side projects, etc.), certainly include it! We know that talent comes from many backgrounds, and we aim to build a team with diverse skillsets that spike strongly in different areas.


We try to reply either way within a week or two at most (usually much sooner).


Learn more about our full interview process here.



About us

Imbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. We train our own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, we gain insights into improving both the capabilities of the underlying models and the interaction design for agents.


We aim to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.

The Company
HQ: San Francisco, CA
15 Employees
Hybrid Workplace
Year Founded: 2021

What We Do

We build AI systems that can reason, in order to enable AI agents that can accomplish larger goals and safely work for us in the real world. To do this, we train foundation models optimized for reasoning. On top of our models, we prototype agents to accelerate our own work, seriously using them in order to shed light on how to improve the underlying model capabilities, as well as the interaction design for agents.

We aim to rekindle the dream of the *personal* computer—for computers to be truly intelligent tools that empower us, giving us freedom, dignity, and agency to do the things we love.

Gallery

Gallery

Jobs at Similar Companies

Artlist Logo Artlist

Account Executive Team Lead

Digital Media • Music • Other • Social Media
Hybrid
Tel Aviv-Yafo, ISR
450 Employees

InCommodities Logo InCommodities

Senior Software Developer - NA

Information Technology • Machine Learning • Analytics • Energy • Automation • Renewable Energy
Hybrid
Austin, TX, USA
234 Employees

Halter Logo Halter

Senior Frontend Engineer (Pasture Team)

Greentech • Hardware • Internet of Things • Machine Learning • Software • Business Intelligence • Agriculture
Easy Apply
Hybrid
Auckland, NZL
150 Employees

Similar Companies Hiring

Halter Thumbnail
Software • Machine Learning • Internet of Things • Hardware • Greentech • Business Intelligence • Agriculture
Auckland City, NZ
150 Employees
Artlist Thumbnail
Social Media • Other • Music • Digital Media
Tel Aviv, IL
450 Employees
InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account