Egra

Research Engineer

Sorry, this job was removed at 08:14 a.m. (CST) on Wednesday, May 13, 2026

Be an Early Applicant

New York City, NY, USA

In-Office

Artificial Intelligence • Information Technology • Software

The Role

Hi, I'm Brian, Co-Founder of Egra. We just raised $5.5M to build foundation models for brain signals, and We're looking for research engineers to join our founding team.

You'll have complete ownership over your work from day one. No lengthy onboarding, no waiting for permission, no navigating layers of approval. A small founding team, deep technical problems, and the resources to solve them. You'll define the infrastructure architecture, make critical engineering decisions, and build the systems that make our research possible. If you thrive with high agency and want your work to directly shape the company's trajectory, this is that opportunity.

What you'd be doing

EEG — electrical brain activity recorded from the scalp — is one of the hardest real-world signal modalities in ML: low signal-to-noise ratio, massive subject variability, and device inconsistencies. Most people avoid it for these reasons.

As our research engineer, you'd own the systems that make research possible. To ground it with real examples, the kind of projects you'd own:

Building versioned, reproducible preprocessing pipelines for EEG data from multiple sources — handling device-specific normalization, channel mapping across montages, artifact detection, and signal quality checks. If we ask "which preprocessing version produced this result," your systems answer that instantly.
Designing the experiment tracking and training infrastructure so we can run dozens of pretraining experiments in parallel without losing track of what changed. Hyperparameters, data splits, preprocessing versions, and model checkpoints are linked and reproducible.
Building a data ingestion system that can absorb different EEG formats (EDF, BDF, BIDS, proprietary device exports) and normalize them into a clean internal representation.
Optimizing training pipelines for throughput on noisy, variable-length signal data. Mixed precision, smart batching across different recording lengths, efficient data loading for datasets that don't fit neatly into standard loaders.

Where this is going

We're building toward a world where thought is an interface.

You silently compose a message and it types itself. You navigate an AR display without lifting a finger. Software adapts to your cognitive state in real time. A universal interface between human thought and digital action.

The product we're building to get there has three layers:

A Neural Encoder: a foundation model that maps raw EEG into robust, reusable embeddings that work across devices, subjects, and contexts
A Neural API: a stable interface that any app can call: "What is the user's state?" "What intent is most likely?" "What changed?"
Reference applications: proving utility and driving our data collection flywheel

Near-term, the use cases are already real. A limited vocabulary of thought-to-action commands (volume, select, activate, navigate) would feel like magic to consumers. Sleep staging, stress detection, cognitive load monitoring, and engagement measurement are all feasible with today's signal quality. On the clinical side, we're pursuing avenues like epilepsy monitoring and migraine pre-emption as a wedge for high-quality data, credibility, and early revenue.

Hardware matters too. No comfortable, discreet consumer device today covers the brain regions needed for language decoding. We'll eventually design our own. Think a normal-looking baseball cap with dry electrodes hidden in the brim, or something that looks more like AirPods than a medical device. The model needs to be hardware-agnostic, because the form factors will keep evolving.

None of this works without infrastructure.

The majority of failed ML research fails because of infrastructure, not ideas. Bad data splits leak information. Preprocessing bugs silently invalidate months of experiments. Training runs can't be reproduced because no one tracked the right things. Results look great until someone realizes the evaluation was wrong.

EEG makes all of this worse. We're dealing with data from different devices, electrode layouts, and sampling rates. As we scale from public datasets to clinical partnerships to consumer data collection, the infrastructure has to handle all of it cleanly.

Research culture

You'll be embedded in the research, not adjacent to it.

You ship infrastructure, not features. Your users are researchers (including the founders), and your success is measured by how fast and confidently they can run experiments.

Reproducibility is a first-class product. We treat experiment reproducibility the way a good engineering team treats test coverage.

You have a voice in research decisions. You'll see patterns the researchers miss: data quality issues, training instabilities, evaluation blind spots. We expect you to flag them.

Failed experiments are documentation, not waste. We write up what doesn't work with the same care as what does.

Who we're looking for

You've built the systems that make ML research actually work. You care deeply about data integrity, reproducibility, and clean abstractions.

You don't need EEG experience, but you should have worked with data that's messy, heterogeneous, and doesn't fit neatly into standard ML pipelines. Audio, sensor data, medical signals, time-series — anything where the preprocessing is half the battle.

You should have:

Experience building ML training and data pipelines for real-world data
Strong Python skills and comfort with the PyTorch ecosystem
Experience with experiment tracking, data versioning, and reproducible workflows
The ability to debug data and training issues that span the full stack, from raw signal to loss curve

Bonus points for:

Experience with signal processing or time-series data pipelines
Comfort with distributed training or mixed-precision optimization
Having built internal tools that researchers actually loved using
Familiarity with data formats like EDF, BIDS, or HDF5
Experience with EEG/BCI data pipelines or neuroscience data tooling (MNE-Python, MOABB, Braindecode)

You should NOT apply if:

You've only worked with clean, well-structured datasets
You need detailed specs before you can start building
You're not comfortable working in a 3–5 person team with no dedicated manager

Interview process

Our process is three conversations:

30-minute intro call. We'll tell you what we're working on, you'll tell us what you've worked on. Casual, honest, no prep needed.
30-minute technical conversation. We'll work through a real infrastructure design problem together. No right answer. We want to see how you think about tradeoffs, correctness, and iteration speed.
30-minute deep dive. You'll meet both founders. We'll dig into past projects, talk about how you debug hard data problems, and figure out if we'd enjoy working together every day.

Benefits