Arity Jobs

Cloud Platform Lead Consultant or Senior Consultant

Arity

Cloud Platform Lead Consultant or Senior Consultant

Posted 3 Hours Ago

Be an Early Applicant

Hiring Remotely in United States

Remote

100K-171K Annually

Senior level

Big Data • Transportation • Analytics • Big Data Analytics

The Role

Design, deploy, operate, and optimize multi-cloud data platforms and streaming infrastructure (databases, Kafka/MSK, Flink, analytics engines). Build IaC, CI/CD, monitoring and automation using Python and tooling. Lead incident response, root-cause analysis, performance tuning, and provide application-level guidance. Participate in on-call rotation and adopt emerging automation (AI agents/MCP) to improve DevOps workflows and platform reliability.

Summary Generated by Built In

At Allstate, great things happen when our people work together to protect families and their belongings from life's uncertainties. And for more than 90 years, our innovative drive has kept us a step ahead of our customers' evolving needs. From advocating for seat belts, air bags and graduated driving laws, to being an industry leader in pricing sophistication, telematics, and, more recently, device and identity protection.
Job Description
Arity is a part of the Allstate Corporation, which means we have the same innovative drive that keeps us a step ahead of our customers' evolving needs. We collect and analyze enormous amounts of data in order to provide cutting-edge solutions to companies invested in transportation.
We are considering both Lead Consultant or Senior Consultant levels.
The Team
Our engineers are fueled by a passion to impact the future of mobility. They push the boundaries of telematics and transportation tech by creating and supporting cutting-edge products. As part of an Agile team, they are armed with the freedom to innovate and the opportunity to see projects through from start to finish. Using a variety of languages and a top-notch technology stack, our engineers make critical advances in areas like sensor technology, enterprise engineering and platform development. Our team understands what it means to collaborate and communicate in an interconnected global team, all while having trust, transparency and empathy for the end user.
The Operational Data Management team is a specialized group within the Engineering department responsible for the reliability, performance, and scalability of Arity's data infrastructure. We own and operate mission-critical database and streaming platforms-including Apache Cassandra, PostgreSQL, Redis, Valkey, Amazon Redshift, Google BigQuery, Amazon MSK (Kafka), and Google Pub/Sub-as well as analytics and query layers such as Starburst Galaxy and AWS Athena. We partner closely with application development teams to tune, troubleshoot, and optimize applications that depend on these technologies, ensuring that the data platforms powering Arity's mobility insights remain highly available and performant at scale.
The Role
Arity is seeking a Cloud Platform Lead Consultant or Senior Consultant to join our Operational Data Management team within Engineering. This is a fully remote position. In this role, you will design, build, deploy, and operate cloud-native data infrastructure across AWS and Google Cloud Platform while bringing deep hands-on expertise in databases, data streaming, and distributed systems. You will ensure the platforms that ingest, store, and serve billions of miles of driving data remain resilient, observable, and cost-efficient-directly enabling Arity's products and the customers who rely on them to make smarter transportation decisions.
The ideal candidate combines cloud engineering mastery with strong database and streaming fundamentals, advanced production-grade coding skills in Python, and demonstrated hands-on experience building AI agents and Model Context Protocol (MCP) servers to streamline DevOps workflows. A successful candidate rapidly adopts new technologies and delivers production-ready solutions with them, guides application developers on performance improvements and code-level fixes, and independently leads root cause analysis for complex production incidents.
Key Responsibilities

Design, deploy, and manage highly available database platforms including Apache Cassandra, PostgreSQL, Redis, Valkey, Amazon Redshift, and Google BigQuery across multi-cloud environments.
Build, operate, and optimize data streaming infrastructure using Amazon MSK (Kafka), Google Pub/Sub, and Apache Flink to support real-time and batch data pipelines.
Develop and maintain infrastructure-as-code, CI/CD pipelines, and cloud automation using Python and industry-standard tooling to enable repeatable, secure deployments.
Implement comprehensive monitoring, alerting, and observability for data platform services to proactively detect and resolve issues before they impact customers.
Partner with application development teams to troubleshoot, tune, and optimize application performance, query patterns, and data access layers backed by team-managed platforms.
Administer and optimize analytics and query engines including Starburst Galaxy and AWS Athena to deliver performant, cost-effective access to large-scale datasets.
Lead incident response, root cause analysis, and post-incident reviews for production database and streaming systems; drive remediation and preventive improvements.
Participate in an on-call rotation to provide 24x7 support for mission-critical data infrastructure.
Evaluate and adopt emerging technologies-including AI agents and MCP servers-to automate operational tasks, improve developer experience, and accelerate DevOps workflows.
Contribute to capacity planning, disaster recovery, security hardening, and cost optimization initiatives across the data platform estate.
Ability to review application source code, identify root causes of performance or reliability issues, and contribute targeted fixes or optimization guidance in collaboration with development teams.
Demonstrated ability to rapidly adopt unfamiliar technologies and deliver production-ready solutions within days to a week; strong self-directed learning with a track record of picking up new platforms, frameworks, and tools independently.
Proven ability to guide and advise software development teams on application-level performance tuning, query optimization, code-level improvements, and production troubleshooting-functioning as a technical authority on data platform usage patterns.
Strong understanding of distributed systems principles including high availability, fault tolerance, consistency models, and disaster recovery.
Excellent problem-solving, communication, and documentation skills with a track record of ownership in on-call and incident management environments.
Ability to read, debug, and analyze Java application code including Spring Boot and microservice frameworks; proficiency in JVM diagnostics including heap dump analysis, GC tuning, thread dump interpretation, and connection pool (e.g., HikariCP) troubleshooting.

Required Qualifications

3-5 or more years of overall software engineering or infrastructure experience, with at least 2-4 years in site reliability engineering, DevOps, or platform engineering operating production systems at scale.
Demonstrated expertise designing, deploying, and managing cloud infrastructure on AWS and/or Google Cloud Platform, including networking, identity, and security fundamentals.
Strong hands-on experience with relational and NoSQL databases; production experience with PostgreSQL and at least one distributed database such as Apache Cassandra.
Production experience operating data streaming platforms; hands-on experience with Apache Kafka (including Amazon MSK) and a solid understanding of streaming fundamentals (partitions, consumer groups, delivery semantics, backpressure).
Advanced, production-grade Python and Shell scripting development skills, including writing, reviewing, and debugging application code, building custom automation tooling, and developing operational solutions that go well beyond basic scripting.
Strong experience with infrastructure-as-code (e.g., Terraform, Terraform Enterprise (TFE), Env0), Jenkins, Ansible, Git CI/CD pipelines and container orchestration (e.g., Kubernetes) in production environments.
Experience implementing and automating monitoring, logging, and alerting solutions for distributed systems (e.g., Prometheus, Grafana, CloudWatch, Datadog, or equivalent), including building automated runbooks and self-healing remediation workflows.
Proven track record independently leading root cause analysis for complex production incidents that span infrastructure, databases, streaming pipelines, and application code layers.

Desired Skills

Production experience operating and troubleshooting Apache NiFi, including flow design, processor-level debugging, back-pressure configuration, cluster management, and contributing flow-level fixes and optimizations.
Hands-on operational experience with self-managed Apache Flink, including checkpoint management, state backend configuration, TaskManager memory tuning, job graph analysis, and application-level debugging of streaming jobs under backpressure.
Deepened experience with Apache Cassandra, PostgreSQL, DynamoDB, Amazon Redshift, ElastiCache and/or Google BigQuery in production environments.
Advanced experience with Apache Kafka, Apache Flink, Google Pub/Sub and operating streaming workloads across both AWS and GCP.
Experience administering or optimizing Starburst Galaxy, Trino, or AWS Athena for large-scale analytics workloads.
Experience building AI agents or Model Context Protocol (MCP) servers to automate DevOps, observability, or operational workflows.
Hands-on experience with large language models (LLMs), including fine-tuning, prompt engineering, RAG pipeline development, or training custom models for operational and DevOps use cases.
Familiarity with data pipeline orchestration tools (e.g., Apache Airflow, dbt) and event-driven architectures.
Experience troubleshooting and supporting applications deployed on enterprise PaaS platforms (e.g., Cloud Foundry, or equivalent) including understanding platform-level resource constraints, routing, and application lifecycle management.
Working proficiency in Golang sufficient to read production application code, interpret runtime behavior (goroutines, memory, pprof profiling), and contribute targeted performance fixes in collaboration with development teams.
AWS or Google Cloud professional-level certifications.
Experience with performance benchmarking, query plan analysis, and database capacity planning for high-throughput workloads.
Familiarity with application profiling, distributed tracing, and performance diagnostic tooling (e.g., APM, query analyzers, flame graphs) to isolate and resolve end-to-end latency issues.
Contributions to open-source database, streaming, or infrastructure projects.

Supervisory Responsibilities

This job does not have supervisory duties

#LI_NJ1
Skills
Amazon CloudWatch, Amazon CloudWatch, Amazon ElastiCache, Amazon MQ, Amazon Web Services (AWS), Ansible (Software), Apache Airflow, Apache Cassandra, Apache Flink, Apache Kafka, Apache NiFi, Application Performance, AWS DynamoDB, Cloud Engineering, Cloud Foundry, Cloud Infrastructure, Cloud Monitoring, Cloud Native, Cloud Platform, Datadog, Data Pipelines, Data Query, Distributed Databases, Distributed Systems, Git {+ 20 more}
Compensation
Compensation offered for this role is 100,000.00 - 170,500.00 annually and is based on experience and qualifications.
The candidate(s) offered this position will be required to submit to a background investigation.
Joining our team isn't just a job - it's an opportunity. One that takes your skills and pushes them to the next level. One that encourages you to challenge the status quo. One where you can shape the future of protection while supporting causes that mean the most to you. Joining our team means being part of something bigger - a winning team making a meaningful impact.
Allstate generally does not sponsor individuals for employment-based visas for this position.
Effective July 1, 2014, under Indiana House Enrolled Act (HEA) 1242, it is against public policy of the State of Indiana and a discriminatory practice for an employer to discriminate against a prospective employee on the basis of status as a veteran by refusing to employ an applicant on the basis that they are a veteran of the armed forces of the United States, a member of the Indiana National Guard or a member of a reserve component.
For jobs in San Francisco, please click "here" for information regarding the San Francisco Fair Chance Ordinance.
For jobs in Los Angeles, please click "here" for information regarding the Los Angeles Fair Chance Initiative for Hiring Ordinance.
To view the "EEO Know Your Rights" poster click "here". This poster provides information concerning the laws and procedures for filing complaints of violations of the laws with the Office of Federal Contract Compliance Programs.
To view the FMLA poster, click "here". This poster summarizing the major provisions of the Family and Medical Leave Act (FMLA) and telling employees how to file a complaint.
It is the Company's policy to employ the best qualified individuals available for all jobs. Therefore, any discriminatory action taken on account of an employee's ancestry, age, color, disability, genetic information, gender, gender identity, gender expression, sexual and reproductive health decision, marital status, medical condition, military or veteran status, national origin, race (include traits historically associated with race, including, but not limited to, hair texture and protective hairstyles), religion (including religious dress), sex, or sexual orientation that adversely affects an employee's terms or conditions of employment is prohibited. This policy applies to all aspects of the employment relationship, including, but not limited to, hiring, training, salary administration, promotion, job assignment, benefits, discipline, and separation of employment.
Allstate provides a comprehensive technology setup, including a laptop, monitors, headset, keyboard, and mouse. Employees eligible to work from home also receive a monthly connectivity reimbursement to help offset internet costs.
When working from home, you must have a dedicated, private workspace free from distractions, along with appropriate desk and seating. Reliable internet is required, with minimum speeds of 50 MB download and 5 MB upload.

Skills Required

3-5+ years overall software engineering or infrastructure experience, including 2-4 years in SRE/DevOps or platform engineering operating production systems at scale.
Expertise designing, deploying, and managing cloud infrastructure on AWS and/or Google Cloud Platform, including networking, identity, and security fundamentals.
Production experience with relational and NoSQL databases; production experience with PostgreSQL and at least one distributed database such as Apache Cassandra.
Production experience operating data streaming platforms; hands-on experience with Apache Kafka (including Amazon MSK) and understanding of streaming fundamentals.
Advanced, production-grade Python and Shell scripting development skills, including building custom automation tooling and operational solutions.
Experience with infrastructure-as-code and CI/CD: Terraform (TFE/Env0), Jenkins, Ansible, Git CI/CD pipelines, and Kubernetes in production environments.
Experience implementing and automating monitoring, logging, and alerting for distributed systems (Prometheus, Grafana, CloudWatch, Datadog) including automated runbooks and remediation workflows.
Proven track record independently leading root cause analysis for complex production incidents spanning infrastructure, databases, streaming pipelines, and application code layers.
Ability to read, debug, and analyze Java application code including Spring Boot and microservice frameworks; proficiency in JVM diagnostics (heap dumps, GC tuning, thread dumps) and connection pool troubleshooting (e.g., HikariCP).
Participate in an on-call rotation to provide 24x7 support for mission-critical data infrastructure.
Experience administering and optimizing analytics/query engines such as Starburst Galaxy and AWS Athena for large-scale datasets.
Production experience operating and optimizing data streaming and batch pipelines across AWS and GCP (e.g., Google Pub/Sub, Amazon MSK).
Experience with production database platforms including Redis, Amazon ElastiCache, Amazon Redshift, and Google BigQuery.
Demonstrated ability to rapidly adopt new technologies and deliver production-ready solutions with minimal ramp time.
Experience operating or troubleshooting applications on enterprise PaaS platforms and familiarity with application performance diagnostics and profiling.
Operational experience with Apache NiFi (flow design and cluster management) and self-managed Apache Flink (checkpointing, state backend, TaskManager tuning).
Hands-on experience building AI agents or Model Context Protocol (MCP) servers and working with large language models (LLMs) for DevOps/observability automation.
Familiarity with data pipeline orchestration tools (Apache Airflow, dbt) and event-driven architectures.
Working proficiency in Golang sufficient to read production code, interpret runtime behavior, and contribute performance fixes.
AWS or Google Cloud professional-level certifications and contributions to open-source database, streaming, or infrastructure projects.

What the Team is Saying

View all jobs at Arity

View Arity Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Chicago, IL

345 Employees

Year Founded: 2016

What We Do

Founded by The Allstate Corporation in 2016, Arity is a mobility data and analytics company focused on improving transportation. We collect and analyze trillions of miles of driving data, using predictive analytics to build solutions with a single goal in mind: to make transportation smarter, safer and more useful for everyone.

Why Work With Us

At the heart of our mission are the people that work here. At Arity, we believe work and life shouldn’t be at odds with one another. After all, we know that your unique qualities give you a unique perspective. We don’t just want you to see yourself here. We want you to be yourself here.

Gallery

Arity Teams

Learn More

Product

Engineering

About our Teams

Arity Offices

Learn More

Remote Workspace

Employees work remotely.

We are a fully remote company, giving employees the option to work from their home office or local coffee shop across the continental US.

Typical time on-site: None

HQChicago, IL

While Arity is a fully remote company, we do hold office space in the heart of downtown Chicago, IL at the Merchandise Mart building. With plenty of public transportation, employees local to the area, or those traveling to Chicago, can connect in-person with their teams and colleagues.