Grace Hill

Sr. AI Data Engineer

Posted Yesterday

Hiring Remotely in United States

Remote

190K-210K Annually

Senior level

Real Estate • Software

The Role

Build and own AI-first data acquisition, ETL/ELT, and serving infrastructure. Create self-healing, LLM-assisted web scrapers and agentic workflows, maintain BigQuery pipelines, design reporting and BI layers, enforce data quality with rule-based and ML/LLM checks, and support production uptime and on-call incident response. Collaborate with engineers, analysts, and product to document and operationalize human-in-the-loop AI systems.

Summary Generated by Built In

Senior AI Data Engineer
AI-first data acquisition, serving & reporting

About the roleThis is an AI-first role owning the data acquisition, serving, and reporting infrastructure that powers our product. We aggregate public data at scale, and these pipelines are core to what we deliver. We don’t want someone to hand-operate scrapers and hand-debug every breakage—we want an engineer who builds systems that largely run, diagnose, and repair themselves, using LLMs and agentic workflows to keep everything reliable while continuously raising the level of automation.
You own critical systems end to end with minimal hand-holding and treat AI as a core part of the toolkit. Remote, reporting to our Principal Engineer and team lead.
What you’ll do

Own and self-heal our fleet of web scrapers—build LLM-assisted resilience so structural, markup, and anti-bot changes are detected, diagnosed, and self-repaired with minimal manual effort. When something does break, agents do the first pass on root-cause and propose fixes; you review and approve.
Keep daily scraping runs stable—monitoring, alerting, retries, and graceful handling of upstream failures so data lands reliably each morning
Use LLMs for resilient parsing and entity extraction from messy or changing HTML, reducing reliance on brittle selectors
Own and optimize the serving layer and the ETL/ELT pipelines feeding our BigQuery warehouse—ensuring data is fresh, performant, and reliable for live use
Build our reporting infrastructure—data models, transformations, and dashboards—plus AI-native layers like natural-language query and LLM-generated narrative insight
Drive data quality through both rule-based checks and ML/LLM-based anomaly detection, and manage anti-bot challenges (proxies, rate limiting, request patterns) within legal and ethical guidelines
Build and maintain production-grade MCP servers and agentic workflows that expose our data and tooling to internal and AI consumers
Partner with the Principal Engineer, analysts, product, and leadership; document systems and best practices for maintainability and human-in-the-loop AI operations

What we’re looking for

6+ years in data engineering, including ownership of mission-critical production systems
Strong Python with deep experience building, maintaining, and debugging scrapers (e.g., Scrapy, Playwright, Selenium, BeautifulSoup)
AI-first: Hands-on experience building LLM-powered and agentic workflows in production—not just calling an API, but designing systems where agents do meaningful work under human supervision—including production-grade MCP servers
Prompt engineering and LLM evaluation/observability—reasoning about output quality, cost, latency, and failure modes the way you’d reason about uptime—plus fluency with AI-assisted dev tools (e.g., Claude Code, Cursor)
Proven experience designing reporting/analytics layers—data modeling, transformations (e.g., dbt), and BI tools
Hands-on with the GCP data stack—BigQuery, Cloud Composer (managed Airflow), Cloud Storage, Cloud Run or GKE—plus advanced SQL and Docker
A reliability mindset—proven track record owning systems, triaging failures, and being accountable for uptime; sound judgment on when to use deterministic code versus an LLM
Understanding of the legal and ethical considerations around web scraping

Nice to have

Experience training, deploying, and maintaining ML models
Experience with MotherDuck / DuckDB, ideally serving data to production applications
Experience scaling or refactoring distributed scraping systems
Knowledge of Pub/Sub, Dataflow, or other large-scale data processing tools
Infrastructure-as-code (Terraform)
Experience setting data strategy or mentoring other engineers

Logistics

Location: Remote (US based)
On-call: This role supports daily scraping and nightly processing runs and a production serving layer; some availability for off-hours incident response may be expected
Compensation (based on experience): $190-210K Base Salary + Bonus

Grace Hill offers a robust suite of benefits, including health, dental and vision insurance, 401K, PTO, life insurance, disability insurance, and more.
Unfortunately we are not able to offer visa sponsorship or assistance. Applicants must be based in the US and authorized to work in the US at the time of hire.

About us

Grace Hill provides industry-leading SaaS technology solutions designed to make a positive impact in real estate and improve the lives of people where they work and live. Harnessing years of real estate experience and the understanding that people are better together, Grace Hill helps owners and operators increase property performance, reduce operating risk and grow top talent. More than 500,000 professionals from over 1,700 companies rely on Grace Hill’s talent performance solutions covering policy, training, assessment, survey, and data-driven insights. Visit us at gracehill.com or on Linked
Our HelloData product solves complex data problems for the multifamily industry, utilizing automated pipelines and AI to provide real-time market insights for the nation's top managers, developers, and investors. Our platform is trusted by the industry’s largest operators to help optimize rents, underwrite operating expenses, and grow NOI with its highly accurate data and user-friendly interface. Since being acquired by Grace Hill in April 2025, HelloData has continued to accelerate at an unbelievable rate, growing ARR by over 300% in 2025 alone and on track for a record-breaking 2026. We combine the agility and innovation of a high-growth startup with the stability and resources of an established enterprise, making us the gold standard in multifamily data analytics.

Skills Required

6+ years in data engineering with ownership of mission-critical production systems
Strong Python experience
Deep experience building, maintaining, and debugging web scrapers (Scrapy, Playwright, Selenium, BeautifulSoup)
Hands-on experience building LLM-powered and agentic workflows in production, including production-grade MCP servers
Prompt engineering and LLM evaluation/observability experience
Proven experience designing reporting/analytics layers, data modeling, transformations (e.g., dbt) and BI tooling
Hands-on with GCP data stack (BigQuery, Cloud Composer/Airflow, Cloud Storage, Cloud Run or GKE)
Advanced SQL and Docker experience
Reliability mindset with proven track record owning systems, triaging failures, and ensuring uptime
Understanding of legal and ethical considerations around web scraping
US-based and authorized to work in the US at time of hire (no visa sponsorship)
Availability for some off-hours incident response / on-call support
Experience training, deploying, and maintaining ML models
Experience with MotherDuck / DuckDB and serving data to production applications
Experience scaling or refactoring distributed scraping systems
Knowledge of Pub/Sub, Dataflow, or other large-scale data processing tools
Infrastructure-as-code experience (Terraform)
Experience setting data strategy or mentoring other engineers