Sr. AI Data Engineer

Posted Yesterday
Hiring Remotely in United States
Remote
190K-210K Annually
Senior level
Real Estate • Software
The Role
Build and own AI-first data acquisition, ETL/ELT, and serving infrastructure. Create self-healing, LLM-assisted web scrapers and agentic workflows, maintain BigQuery pipelines, design reporting and BI layers, enforce data quality with rule-based and ML/LLM checks, and support production uptime and on-call incident response. Collaborate with engineers, analysts, and product to document and operationalize human-in-the-loop AI systems.
Summary Generated by Built In
 
Senior AI Data Engineer
AI-first data acquisition, serving & reporting

About the role
This is an AI-first role owning the data acquisition, serving, and reporting infrastructure that powers our product. We aggregate public data at scale, and these pipelines are core to what we deliver. We don’t want someone to hand-operate scrapers and hand-debug every breakage—we want an engineer who builds systems that largely run, diagnose, and repair themselves, using LLMs and agentic workflows to keep everything reliable while continuously raising the level of automation.
You own critical systems end to end with minimal hand-holding and treat AI as a core part of the toolkit. Remote, reporting to our Principal Engineer and team lead.
 What you’ll do
  • Own and self-heal our fleet of web scrapers—build LLM-assisted resilience so structural, markup, and anti-bot changes are detected, diagnosed, and self-repaired with minimal manual effort. When something does break, agents do the first pass on root-cause and propose fixes; you review and approve.
  • Keep daily scraping runs stable—monitoring, alerting, retries, and graceful handling of upstream failures so data lands reliably each morning
  • Use LLMs for resilient parsing and entity extraction from messy or changing HTML, reducing reliance on brittle selectors
  • Own and optimize the serving layer and the ETL/ELT pipelines feeding our BigQuery warehouse—ensuring data is fresh, performant, and reliable for live use
  • Build our reporting infrastructure—data models, transformations, and dashboards—plus AI-native layers like natural-language query and LLM-generated narrative insight
  • Drive data quality through both rule-based checks and ML/LLM-based anomaly detection, and manage anti-bot challenges (proxies, rate limiting, request patterns) within legal and ethical guidelines
  • Build and maintain production-grade MCP servers and agentic workflows that expose our data and tooling to internal and AI consumers
  • Partner with the Principal Engineer, analysts, product, and leadership; document systems and best practices for maintainability and human-in-the-loop AI operations
What we’re looking for
  • 6+ years in data engineering, including ownership of mission-critical production systems
  • Strong Python with deep experience building, maintaining, and debugging scrapers (e.g., Scrapy, Playwright, Selenium, BeautifulSoup)
  • AI-first: Hands-on experience building LLM-powered and agentic workflows in production—not just calling an API, but designing systems where agents do meaningful work under human supervision—including production-grade MCP servers
  • Prompt engineering and LLM evaluation/observability—reasoning about output quality, cost, latency, and failure modes the way you’d reason about uptime—plus fluency with AI-assisted dev tools (e.g., Claude Code, Cursor)
  • Proven experience designing reporting/analytics layers—data modeling, transformations (e.g., dbt), and BI tools
  • Hands-on with the GCP data stack—BigQuery, Cloud Composer (managed Airflow), Cloud Storage, Cloud Run or GKE—plus advanced SQL and Docker
  • A reliability mindset—proven track record owning systems, triaging failures, and being accountable for uptime; sound judgment on when to use deterministic code versus an LLM
  • Understanding of the legal and ethical considerations around web scraping
Nice to have
  • Experience training, deploying, and maintaining ML models
  • Experience with MotherDuck / DuckDB, ideally serving data to production applications
  • Experience scaling or refactoring distributed scraping systems
  • Knowledge of Pub/Sub, Dataflow, or other large-scale data processing tools
  • Infrastructure-as-code (Terraform)
  • Experience setting data strategy or mentoring other engineers
Logistics
  • Location: Remote (US based)
  • On-call: This role supports daily scraping and nightly processing runs and a production serving layer; some availability for off-hours incident response may be expected
  • Compensation (based on experience): $190-210K Base Salary + Bonus

Grace Hill offers a robust suite of benefits, including health, dental and vision insurance, 401K, PTO, life insurance, disability insurance, and more.
Unfortunately we are not able to offer visa sponsorship or assistance. Applicants must be based in the US and authorized to work in the US at the time of hire.
 

About us

Grace Hill provides industry-leading SaaS technology solutions designed to make a positive impact in real estate and improve the lives of people where they work and live. Harnessing years of real estate experience and the understanding that people are better together, Grace Hill helps owners and operators increase property performance, reduce operating risk and grow top talent. More than 500,000 professionals from over 1,700 companies rely on Grace Hill’s talent performance solutions covering policy, training, assessment, survey, and data-driven insights.  Visit us at gracehill.com or on Linked
Our HelloData product solves complex data problems for the multifamily industry, utilizing automated pipelines and AI to provide real-time market insights for the nation's top managers, developers, and investors. Our platform is trusted by the industry’s largest operators to help optimize rents, underwrite operating expenses, and grow NOI with its highly accurate data and user-friendly interface. Since being acquired by Grace Hill in April 2025, HelloData has continued to accelerate at an unbelievable rate, growing ARR by over 300% in 2025 alone and on track for a record-breaking 2026. We combine the agility and innovation of a high-growth startup with the stability and resources of an established enterprise, making us the gold standard in multifamily data analytics.

Skills Required

  • 6+ years in data engineering with ownership of mission-critical production systems
  • Strong Python experience
  • Deep experience building, maintaining, and debugging web scrapers (Scrapy, Playwright, Selenium, BeautifulSoup)
  • Hands-on experience building LLM-powered and agentic workflows in production, including production-grade MCP servers
  • Prompt engineering and LLM evaluation/observability experience
  • Proven experience designing reporting/analytics layers, data modeling, transformations (e.g., dbt) and BI tooling
  • Hands-on with GCP data stack (BigQuery, Cloud Composer/Airflow, Cloud Storage, Cloud Run or GKE)
  • Advanced SQL and Docker experience
  • Reliability mindset with proven track record owning systems, triaging failures, and ensuring uptime
  • Understanding of legal and ethical considerations around web scraping
  • US-based and authorized to work in the US at time of hire (no visa sponsorship)
  • Availability for some off-hours incident response / on-call support
  • Experience training, deploying, and maintaining ML models
  • Experience with MotherDuck / DuckDB and serving data to production applications
  • Experience scaling or refactoring distributed scraping systems
  • Knowledge of Pub/Sub, Dataflow, or other large-scale data processing tools
  • Infrastructure-as-code experience (Terraform)
  • Experience setting data strategy or mentoring other engineers
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Greenville, SC
248 Employees
Year Founded: 1998

What We Do

Grace Hill provides industry-leading SaaS technology solutions designed to make a positive impact in real estate and improve the lives of people where they work and live. Harnessing years of real estate experience and the understanding that people are better together, Grace Hill helps owners and operators increase property performance, reduce operating risk and grow top talent. More than 500,000 professionals from over 1,700 companies rely on Grace Hill’s talent performance solutions covering policy, training, assessment, survey, and data-driven insights. Visit us at gracehill.com.

Similar Jobs

Jellyfish Logo Jellyfish

Platform Engineer

Big Data • Cloud • Productivity • Software • Database • Analytics • Automation
Remote or Hybrid
United States
225 Employees
150K-230K Annually

Photon Logo Photon

Senior Data Engineer

Agency • Information Technology
In-Office or Remote
2 Locations
5017 Employees
42K-148K Annually

Celigo Logo Celigo

Senior Data Integrations Engineer - AI

Cloud • Information Technology • Other • Productivity • Software
Remote
United States
675 Employees
145K-160K Annually

Order.co Logo Order.co

Data Engineer

eCommerce • Fintech • Payments • Software
Remote or Hybrid
United States
146 Employees
175K-200K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account