Distributed Systems Engineer - Data Platform - Logs and Audit Logs

Posted 2 Days Ago
6 Locations
Hybrid
Mid level
Cloud • Information Technology • Security • Software • Cybersecurity
Helping Build a Better Internet
The Role
As a Distributed Systems Engineer, you will design and build logging platforms, develop data connectors, ensure compliance of audit logs, optimize data delivery, and maintain operational health through monitoring and collaboration with other teams.
Summary Generated by Built In
Locations Available: London (UK), Lisbon (Portugal), Austin (US), Denver (US), Atlanta (US)
About Role
We are actively seeking experienced and highly motivated Distributed Systems Engineers to join Cloudflare's dynamic DATA Organisation. This is a pivotal opportunity to contribute to the future of data at Cloudflare, working on systems that are fundamental to our global operations and customer insights. Our organisation is responsible for the entire data lifecycle, encompassing everything from initial ingestion and sophisticated processing to robust storage and efficient retrieval. These systems are the backbone that power critical logs and analytics, providing our customers with real-time, actionable visibility into the health, performance, and security of their online properties.
Our overarching mission is to empower customers to leverage their data effectively, enabling them to drive superior outcomes for their businesses. To achieve this, we design, build, and maintain a suite of high-performance, massively scalable distributed systems that are engineered to handle an unprecedented scale - processing well over a billion events per second. As an engineer within our organisation, you will be presented with unique challenges across various critical parts of our intricate data stack. This role offers the chance to work on cutting-edge technologies and contribute to solutions that operate at the very edge of internet infrastructure.
Our Data Organisation is strategically composed of several key teams, each focusing on a distinct aspect of our comprehensive data platform:
  • Data Delivery / Data Pipeline: This team is responsible for the design, development, and operation of our distributed data delivery pipeline. This system is a high-throughput, low-latency powerhouse, primarily written in Go, and is tasked with ingesting, processing, and intelligently routing massive volumes of data originating from across Cloudflare's vast global network to multiple core destinations. This involves handling diverse data types and ensuring reliable, timely delivery to various downstream systems.

  • Analytical Database Platform: Engineers on this team contribute to and evolve our core analytical platform, which is powered by ClickHouse. This team is dedicated to building and maintaining a high-performance, scalable database platform meticulously optimised for the immense analytical workloads generated by all of Cloudflare's products and services. This includes ensuring data integrity, query optimisation, and continuous platform scalability to meet ever-growing demands.

  • Data Retrieval (Customer-Facing Products): This department is focused on building and continuously improving our customer-facing products, making data not only accessible but also genuinely actionable for our users. This department comprises two main groups:
    • Analytics and Alerts: Members of this group are at the forefront of developing our public APIs such as the GraphQL Analytics API, providing customers and internal Cloudflare teams with flexible access to their data. They will also work on our alerting platform, empowering users to configure and receive near real-time alerts based on the critical logs and metrics observed by our robust data platform. This includes designing intuitive alerting mechanisms and ensuring the reliability of notification systems.
    • Logs and Audit Logs: This specialised team is dedicated to building a robust and easy-to-use logging platform that powers reliable data delivery and seamless integrations with customer destinations. The team's mission is to make it simple for customers to access, manage, and use their log data - ensuring that critical datasets, including comprehensive audit logs, are delivered securely and efficiently to their preferred storage and analysis platforms. The work spans developing intuitive connectors, ensuring data integrity, optimising delivery pipelines, and upholding strict standards for compliance, performance, and usability.

Responsibilities
This particular role is focused on the Logs and Audit Logs group. As a Software Engineer you will focus on the following areas:
  • Design, build, and operate a robust logging platform, ensuring reliable logging, and secure data transfer to a wide array of customer destinations and third-party integrations.
  • Develop and maintain high-performance data connectors and integrations for our log-shipping products, focusing on usability, scalability and data integrity.
  • Create and manage systems for handling comprehensive audit logs, ensuring they are delivered securely and adhere to strict compliance and performance standards.
  • Scale and optimise the data delivery pipeline to handle massive data volumes with low latency, identifying and removing bottlenecks in data processing and routing.
  • Work closely with Product and other engineering teams to define requirements for a new logging platform and integrations.
  • Maintain the operational health of our log delivery platform through comprehensive monitoring and participation in an on-call rotation (with flexibility for out-of-hours technical issue resolution).
  • Collaborate on the architectural evolution of our data egress platform, researching and implementing new technologies to improve efficiency and reliability.
Key Qualifications
  • 3+ years of experience working in software development covering distributed systems and data pipelines.
  • Strong programming skills (Go is preferable), with a deep understanding of software development best practices for building resilient, high-throughput systems.
  • Hands-on experience with modern observability stacks, including Prometheus, Grafana, and a strong understanding of handling high-cardinality metrics at scale.
  • Strong knowledge of SQL, including experience with query optimisation.
  • A solid foundation in computer science, including algorithms, data structures, distributed systems, and concurrency.
  • Strong analytical and problem-solving skills, with a willingness to debug, troubleshoot, and learn about complex problems at high scale.
  • Ability to work collaboratively in a team environment and communicate effectively with other teams across Cloudflare.
  • Experience with data streaming technologies (e.g., Kafka, Flink) is a strong plus.
  • Experience with various logging platforms or SIEMs (e.g., Splunk, Datadog, Sumo Logic) and storage destinations (e.g., S3, R2, GCS) is a plus.
  • Experience with Infrastructure as Code tools like SALT or Terraform is a plus.
  • Experience with Linux container technologies, such as Docker and Kubernetes, is a plus.

If you're passionate about building scalable and performant data platforms using cutting-edge technologies and want to work with a world-class team of engineers, then we want to hear from you! Join us in our mission to help build a better internet for everyone!

Top Skills

Datadog
Docker
Flink
Gcs
Go
Grafana
Kafka
Kubernetes
Prometheus
R2
S3
Salt
Splunk
SQL
Sumo Logic
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
4,400 Employees
Year Founded: 2010

What We Do

Cloudflare, Inc. (NYSE: NET) is the leading connectivity cloud company on a mission to help build a better Internet. It empowers organizations to make their employees, applications and networks faster and more secure everywhere, while reducing complexity and cost. Cloudflare’s connectivity cloud delivers the most full-featured, unified platform of cloud-native products and developer tools, so any organization can gain the control they need to work, develop, and accelerate their business.

Powered by one of the world’s largest and most interconnected networks, Cloudflare blocks billions of threats online for its customers every day. It is trusted by millions of organizations – from the largest brands to entrepreneurs and small businesses to nonprofits, humanitarian groups, and governments across the globe.

Why Work With Us

Cloudflare employees come from all walks of life. We are mission-driven, and our team is energized by a collaborative, creative environment that celebrates our differences and fosters new ways to grow together.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Cloudflare Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

We are committed to developing a global team that is distributed with a flexible working approach. Doing this equitably and inclusively is essential to our success. Visit our careers site for more on 'How & Where We Work.'

Typical time on-site: Flexible
HQSan Francisco, CA
Singapore
Austin, TX
Bengaluru, Karnataka
Boston, MA
Champaign, IL
Denver, Colorado
Lisbon, PT
London, GB
Los Angeles, CA
New York, NY
Seattle, WA
Washington, DC
Learn more

Similar Jobs

Cloudflare Logo Cloudflare

Operations Specialist

Cloud • Information Technology • Security • Software • Cybersecurity
Hybrid
Austin, TX, USA

Cloudflare Logo Cloudflare

Operations Analyst

Cloud • Information Technology • Security • Software • Cybersecurity
Hybrid
Austin, TX, USA

Cloudflare Logo Cloudflare

Software Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Hybrid
Austin, TX, USA

Cloudflare Logo Cloudflare

Product Manager

Cloud • Information Technology • Security • Software • Cybersecurity
Hybrid
2 Locations
158K-193K Annually

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account