Senior Software Engineer, Search Infrastructure

Sorry, this job was removed at 06:22 p.m. (CST) on Monday, Aug 04, 2025
New York, NY
In-Office
175K-250K Annually
Artificial Intelligence • Consumer Web • Machine Learning • Productivity • Sales • Software • Analytics
Scale personalized outreach with better data enrichment
The Role
About Clay

Clay is a creative tool for growth. Our mission is to help businesses grow  — without huge investments in tooling or manual labor. We’re already helping over 100,000 people grow their business with Clay. From local pizza shops to enterprises like Anthropic and Notion, our tool lets you instantly translate any idea that you have for growing your company into reality.
We believe that modern GTM teams win by finding GTM alpha—a unique competitive edge powered by data, experimentation, and automation. Clay is the platform they use to uncover hidden signals, build custom plays, and launch faster than their competitors. We’re looking for sharp, low-ego people to help teams find their GTM alpha.

Why is Clay the best place to work?

  • Customers love the product (100K+ users and growing)

  • We’re growing a lot (6x YoY last year, and 10x YoY the two years before that)

  • Incredible culture (our customers keep applying to work here)

  • Well-resourced (raised a Series B expansion in January 2025 from investors like Sequoia and Meritech)

Read more about why people love working at Clay here and explore our wall of love to learn more about the product.

Data Engineering, Search @ Clay

As a Senior Data Engineer on the Search team, you'll be responsible for building and maintaining the data pipelines that power Clay's comprehensive datasets of companies, people, and job postings. You'll be tackling fundamental challenges in entity resolution—matching millions of records across datasets without common identifiers—while building the foundation for next-generation natural language search capabilities. Our team is scaling from processing millions to billions of records, requiring innovative approaches to data quality, validation, and infrastructure. Strong candidates will have experience building production data pipelines at scale and a deep understanding of search infrastructure.

What You'll Do
  • Design and implement robust entity resolution systems that match and merge records from multiple providers using advanced matching algorithms, enabling large-scale enrichment of customer data

  • Build scalable data pipelines that process billions of profiles while maintaining data accuracy through sophisticated validation and quarantine frameworks

  • Implement modern data architecture patterns that enable point-in-time recovery, analytics at scale, and real-time data quality monitoring

  • Develop systems to normalize and standardize messy real-world data (like locations, company names, and job titles) across billions of records

  • Create intelligent data validation systems that prevent bad data from reaching customers while providing feedback loops for continuous improvement

  • Collaborate with ML engineers to build the data foundation for embedding-based search, enabling users to describe what they're looking for in natural language

What You'll Bring
  • Experience building and maintaining production data pipelines that process millions of records daily

  • Strong proficiency in Python and SQL, with experience in data processing frameworks (Apache Airflow, Prefect, Dagster, or similar)

  • Hands-on experience with search engines (Elasticsearch, OpenSearch, Solr) including data modeling and indexing strategies

  • Understanding of entity resolution, record linkage, and deduplication techniques at scale

  • Experience with both batch and streaming data processing patterns

  • Familiarity with cloud data platforms (AWS, GCP, or Azure) and their data services

  • Strong problem-solving skills with the ability to debug complex data issues across distributed systems

Nice To Haves
  • Experience with workflow orchestration using Dagster or similar modern data orchestration tools

  • Knowledge of ML approaches to entity resolution and experience with embedding pipelines

  • Familiarity with Apache Iceberg or similar table formats for data versioning and time travel

  • Experience with geocoding and location normalization at scale

  • Background in building data platforms that dramatically scale processing capabilities

  • Exposure to our current tech stack:

    • Orchestration: Dagster

    • Search: OpenSearch

    • Databases: PostgreSQL (Aurora), Redis

    • Cloud: AWS (S3, Lambda, ECS)

    • Languages: Python, TypeScript

    • Infrastructure as Code: Terraform

    • Data Validation: Pydantic

Similar Jobs

Zocdoc Logo Zocdoc

Template

Healthtech • Information Technology • Software • Telehealth
Easy Apply
Hybrid
New York, NY, USA
900 Employees
117K-158K Annually
Easy Apply
In-Office or Remote
2 Locations
824 Employees
216K-255K Annually

CoreWeave Logo CoreWeave

Product Manager

Cloud • Information Technology • Machine Learning
In-Office
4 Locations
1450 Employees
161K-237K Annually

CoreWeave Logo CoreWeave

Senior Recruiter

Cloud • Information Technology • Machine Learning
In-Office
4 Locations
1450 Employees
135K-198K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, NY
93 Employees
Year Founded: 2017

What We Do

We are building a new type of data workflow tool that allows any team to find the right data, craft custom workflows, and automate their go-to-market strategy. We like to think of Clay as a composable canvas that unlocks power and creativity for growth-focused teams.

Building a powerful yet easy-to-use tool that enables complex data aggregation, transformation, and automation is no easy feat. It takes a huge amount of creativity, discipline and attention to detail. It also takes a team of brilliant minds, collaborating, supporting and learning from each other.

Why Work With Us

Our team is a vibrant mix of designers, engineers, and GTM experts. We're also aspiring DJs, writers, social workers & more. We believe the right candidate brings a unique perspective, with no pre-set mold for success—just an exciting opportunity to thrive together!

Gallery

Gallery

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account