Site Reliability Engineer

Sorry, this job was removed at 10:31 p.m. (CST) on Monday, Feb 03, 2025
Cambridge, MA
In-Office
Artificial Intelligence • Big Data • Machine Learning • Software
Tamr, the leader in data mastering, uses proven, patented AI to help businesses stay ahead.
The Role

At Tamr we envision a world where people in organizations have accurate, up-to-date, mastered data to deliver impactful business outcomes. We assert that the best way to have such data is via a human-guided machine-learning system that improves over time. Our agile data mastering platform provides the core set of tools enabling companies to leverage their existing domain expertise and data to be used as fuel for decision-making.


We are currently looking for a Site Reliability Engineer to join our SRE team as we continue to evolve and expand the Tamr data mastering platform and tool suite. With our growing customer base and increasing demand for cloud and hybrid-cloud offerings, we are growing our SRE team to support the development and delivery of new products, deployment to cloud environments, and incorporation of third-party technologies and tools to enable product engineering.


You will play a key role in designing and delivering solutions that will make Tamr SaaS offering scalable, featureful, resilient, and secure while providing guidance and mentorship to your team. You'll design and operate automation software to provision, upgrade, monitor, and heal Tamr SaaS deployed on various public cloud platforms such as Google Cloud Compute, Amazon Web Services, and Microsoft Azure.


As an SRE, you will participate in a global uninterrupted rotation and help lead incident management, root cause analysis, continuous improvement activities, and managing engineering efforts against a service-level agreement (SLA) and error budget. 


This position reports to the head of SRE.


As a member of the SRE team some of the projects you will be working on are:

-Manage Tamr SaaS in development and production hosted on public cloud platforms.

-Respond to incidents, facilitate post-mortems and ensure closure of follow-up actions items.

-Develop and drive real-time observability solutions that provide visibility into system health.

-Partner with development teams to improve services through rigorous testing and release procedures.

-Participate in system design consulting, platform management, and capacity planning.

-Balance feature development speed and reliability with well-defined service level objectives.

-Create and maintain self-provisioning infrastructure using tools like Ansible, Terraform, and Docker.

-Improving robustness by automation of workflows, process improvements, CI/CD pipelines, and integrating modern toolsets.

-Participate in a 24x7 on-call rotation.


You might be a good fit if you have 3 or more of the following:

-1+ years of experience in DevOps/SRE/Systems Administration with some experience with Linux/Unix systems administration.

-1+ years of experience with cloud-based provisioning, monitoring, and troubleshooting (preferably AWS or GCP).

-1+ year(s) of Docker and Kubernetes or OpenShift experience.

-Familiarity with infrastructure automation tools like Terraform and Ansible.

-Experience with one or more scripting languages such as Python

-Minimum Bachelor's degree in Computer Science or equivalent.


Technologies we use:

-Multi-cloud (GCP/AWS/Azure)

-Git, GitOps, Terraform, Ansible

-Kubernetes, Helm, Istio, Docker

-Big Data Technologies (BigTable/HBase, Dataproc/Databricks/Spark)

-PostgreSQL, BigQuery, Snowflake, Synapse

-Java, Python, Scala

-Jenkins


Additional Information 

This position is based at our office in Cambridge, MA.


Tamr provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws.

Similar Jobs

Milestone Systems Logo Milestone Systems

Site Reliability Engineer

Artificial Intelligence • Other • Security • Software • Analytics • Big Data Analytics
Remote or Hybrid
2 Locations
1500 Employees
160K-180K Annually

MongoDB Logo MongoDB

Senior Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
7 Locations
5550 Employees
127K-249K Annually

DFIN Logo DFIN

Site Reliability Engineer

Fintech • Software
Remote or Hybrid
United States
1750 Employees

Zeta Global Logo Zeta Global

Senior Site Reliability Engineer

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics
Easy Apply
Remote or Hybrid
United States
2429 Employees
140K-170K Annually
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Cambridge, MA
62 Employees
Year Founded: 2012

What We Do

Tamr provides the only AI-Native Master Data Management (MDM) solution that delivers real-time master data for every dashboard, application, and person in your business. Tamr accelerates the discovery, enrichment, and maintenance of Golden Records, enabling informed decision-making, improved revenue growth, and better customer experiences.

Tamr’s patented, AI-centric approach – with human refinement and oversight – delivers value in days or weeks, not months or years like traditional rules-based MDM and DIY solutions. And with intuitive 360-degree entity pages, your business can improve data accessibility across the organization and leverage the best, most accurate data to support analytical and operational use cases in real time. Learn more at tamr.com.

Why Work With Us

The Values, Behaviors, and Culture at Tamr keep us grounded and motivated to do the work we do. We believe that nothing is impossible, we trust each other, we work as a team, we believe in openness and honesty, and we support bold endeavors.

Gallery

Gallery

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account