Cribl

Sr Software Engineer, Storage

Posted 3 Hours Ago

Be an Early Applicant

Hiring Remotely in United States

Remote

175K-205K Annually

Senior level

Software

Cribl is the AI Platform for Telemetry.

The Role

Design and build autoscaling, self-healing storage infrastructure on AWS. Own Terraform-based IaC, CI/CD and deployment tooling, observability, capacity planning, cluster lifecycle management, and performance/cost optimizations. Drive reliability, incident response, and automation to operate data-intensive distributed storage with minimal human intervention.

Summary Generated by Built In

Join the company that’s building the telemetry infrastructure for the AI era. At Cribl, we partner with IT and Security teams at many of the world’s biggest enterprises, including half of the Fortune 100, to bridge the gap between AI ambition and infrastructure reality. As the AI Platform for Telemetry, we give customers the choice, control, and flexibility to manage and analyze telemetry for both humans and agents, so they can build what’s next.

We’re one of the fastest‑growing private companies and a leading player in a massive, fast‑moving market. With a global workforce, we’re remote‑first and grounded in a simple idea: software is a people business. Cribl is the place where curious, collaborative people can do their best work, grow fast, and bring their full selves to the herd.

Why You'll Love This Role

Cribl is seeking a Senior Software Engineer to join our Storage team, where you'll design and build the infrastructure that allows Cribl's storage layer to scale autonomously. Our platform ingests, indexes, and serves petabytes of telemetry data on AWS — and you'll own the systems that make that possible: autoscaling clusters, automated provisioning, self-healing infrastructure, and the operational tooling that keeps it all running without human intervention.

This is a platform engineering role at its core. You won't just operate infrastructure — you'll build the systems that operate themselves. Think: cluster lifecycle management, automated capacity planning, infrastructure-as-code pipelines that provision and scale storage tiers end-to-end, and the observability layer that closes the loop. You'll bring infrastructure discipline and DevOps automation to a distributed storage system that needs to grow by orders of magnitude while staying rock-solid.

If you're the kind of engineer who builds autoscalers instead of manually resizing, writes controllers instead of runbooks, and thinks about cluster topology as a software problem — this is your role.

As An Active Member Of Our Team, You Will...

Design and build autoscaling systems for storage clusters — automated provisioning, scale-up/scale-down policies, cluster rebalancing, and node lifecycle management.
Own the infrastructure-as-code stack (Terraform) that defines and deploys storage infrastructure end-to-end on AWS.
Build self-healing automation: health checks, automated failover, capacity rebalancing, and remediation controllers that resolve issues before they page anyone.
Develop the CI/CD pipelines and deployment tooling for storage services — safe rollouts, canary deployments, automated rollback.
Design and implement observability for the entire storage platform — metrics, dashboards, SLOs, alerting, and capacity forecasting that drive automated scaling decisions.
Own cluster management tooling: provisioning new tenants, managing cluster topology, coordinating upgrades and migrations with zero downtime.
Drive performance and cost optimization across the storage data path: ingest pipelines, compaction, partitioning, and query execution.
Partner with product engineering to define scalability limits, load test new features, and harden the system for production readiness.
Contribute to incident response and lead blameless post-mortems, turning operational surprises into systemic automation.
This position will require stand-by, on-call, or off-hours duties.

If You've Got It - We Want It

Significant experience building platform/infrastructure systems that manage, scale, and operate distributed services autonomously — not just using infrastructure, but building the layer that automates it.
Strong software engineering skills in TypeScript/Node.js, Go, or similar languages — you write controllers, operators, and automation, not runbooks.
Deep hands-on experience with infrastructure-as-code (Terraform) and AWS services (EC2, ECS/EKS, ASGs, DynamoDB, S3, CloudWatch).
Experience designing and implementing autoscaling systems, cluster orchestration, or automated provisioning for stateful workloads.
Track record operating data-intensive systems at scale — OLAP databases, NoSQL stores, or distributed storage platforms.
Strong platform engineering fundamentals: SLOs, error budgets, capacity planning, incident response, and a bias toward eliminating toil through software.
Comfortable working with high autonomy in a remote, distributed team and communicating effectively across engineering disciplines.
Strong understanding of Linux systems, networking, and performance profiling at the infrastructure level.
Preferred Qualifications
- Experience with DynamoDB or similar NoSQL databases at high throughput — partition design, capacity management, GSI optimization.
- Background in cluster management for OLAP or analytical databases — automated provisioning, rolling upgrades, replication topology.
- Experience with object storage and data lake architectures (S3, Parquet/ORC formats).
- Knowledge of data pipeline optimization: batching strategies, write amplification reduction, partition pruning, compaction policies.
- Background in capacity planning, cost optimization, and resource forecasting for storage-heavy workloads on AWS.
- Experience building internal platforms or developer tooling that other engineers consume (deployment frameworks, service provisioning, self-service infrastructure).
- Opinions about what makes a great on-call experience and a track record of making on-call better for everyone.

#LI-JB1
#LI-Remote

The salary for this role is dependent on geographic location and will be based on the individual candidate's job-related knowledge, skills, and experience.
In addition to base salary, for sales and some sales-adjacent roles, employees are eligible to earn incentive compensation (commission). For all other roles, employees are eligible to participate in the Cribl Corporate Bonus Program.
In addition to a competitive salary, Cribl also offers a generous benefits package which includes health, dental, vision, short-term disability, and life insurance, paid holidays and paid time off, a fertility treatment benefit, 401(k), and equity.

Base Salary Range

$175,000—$205,000 USD

Bring Your Whole Self

Diversity drives innovation, enables better decisions to support our customers, and inspires change for the better. We’re building a culture where differences are valued and welcomed, and we work together to bring out the best in each other. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.

Interested in joining the Cribl herd? Learn more about the smartest, funniest, most passionate goats you’ll ever meet at cribl.io/about-us.

Skills Required

Significant experience building platform/infrastructure systems that manage, scale, and operate distributed services autonomously
Strong software engineering skills in TypeScript/Node.js, Go, or similar languages
Hands-on experience with infrastructure-as-code (Terraform) and AWS services (EC2, ECS/EKS, ASGs, DynamoDB, S3, CloudWatch)
Experience designing and implementing autoscaling systems, cluster orchestration, or automated provisioning for stateful workloads
Track record operating data-intensive systems at scale (OLAP databases, NoSQL stores, or distributed storage platforms)
Platform engineering fundamentals: SLOs, error budgets, capacity planning, incident response, and toil elimination
Strong understanding of Linux systems, networking, and performance profiling at the infrastructure level
Ability to work with high autonomy in a remote, distributed team and communicate across engineering disciplines
Willingness to participate in stand-by, on-call, or off-hours duties
Experience with DynamoDB or similar NoSQL databases at high throughput (partition design, capacity management, GSI optimization)
Background in cluster management for OLAP or analytical databases including rolling upgrades and replication topology
Experience with object storage and data lake architectures (S3, Parquet/ORC formats)
Knowledge of data pipeline optimization: batching strategies, write amplification reduction, partition pruning, compaction policies
Background in capacity planning, cost optimization, and resource forecasting for storage-heavy workloads on AWS
Experience building internal platforms or developer tooling consumed by other engineers (deployment frameworks, service provisioning)
Track record or strong opinions about improving on-call experience and reducing operational burden

Cribl Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Cribl and has not been reviewed or approved by Cribl.

Affordable Benefits — Medical and dental premiums are fully covered for individuals in the U.S., with low costs for dependents, and the plans are described as low‑cost overall. This positions healthcare expenses favorably for many employees.
Leave & Time Off Breadth — Unlimited PTO, paid holidays, and periodic company “refresh” or winter‑break days provide ample time away. Flexible schedules further support taking time when needed.
Wellbeing & Lifestyle Benefits — A monthly stipend for home office, phone, and internet, plus strong remote‑work setup support, underpin the remote‑first model. Additional perks like recharge days and equipment support bolster day‑to‑day wellbeing.

Learn more about Cribl's Compensation & Benefits →

Cribl Insights

What's It Like to Work at Cribl? Cribl Culture & Values Cribl Career Growth & Development What's the Work-Life Balance Like at Cribl? Cribl Leadership & Management Cribl Company Growth, Stability & Outlook

View all jobs at Cribl

View Cribl Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: San Francisco, CA

1,000 Employees

Year Founded: 2018

What We Do

Cribl, the AI Platform for Telemetry, empowers enterprises to manage and analyze telemetry for both humans and agents. Trusted by organizations worldwide, including half of the Fortune 100, Cribl bridges the gap between AI ambition and infrastructure reality. No lock-in. No data loss. No compromises. Cribl’s vendor-agnostic platform ensures data remains portable and interoperable. By cost-effectively handling increasing data volume and variety without delay, Cribl gives enterprises the choice, control, and flexibility to build what’s next.

Why Work With Us

We are building the company that will become the industry leader in IT and Security data. But, doing that doesn’t mean we’re always serious. We approach our work fearlessly, learn quickly, improve constantly, and celebrate our wins at every turn. And more importantly, we laugh a lot.