Operations Engineer, II

Reposted 4 Days Ago
Easy Apply
Be an Early Applicant
Hiring Remotely in Australia
Remote
Mid level
Big Data • Software
The Role
The Operations Engineer II will ensure stability and operation of Aerospike's cloud platforms, executing changes, responding to incidents, and improving workflows.
Summary Generated by Built In

Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.

Global leaders, including Adobe, Airtel, Barclays, Criteo, DBS Bank, Experian, Grab, HDFC Bank, PayPal, Sony Interactive Entertainment, The Trade Desk, and Wayfair, rely on Aerospike for customer 360, fraud detection, real-time bidding, profile stores, recommendation engines, and other use cases. 

 At Aerospike, we dream big and deliver even bigger. Our mission is to unleash the power of the world’s real-time data with a database built for infinite scale, speed, and sustainability.

If you're ready to shape the future of data, join us.

Operations Engineer (II)

As an Operations Engineer II within Aerospike’s Site Reliability Engineering (SRE) organization, you will help ensure the stability, security, and smooth operation of Aerospike’s managed cloud platforms. You’ll execute production changes, assist in incident response, and maintain the runbooks, observability, and automation that keep Aerospike’s services running 24x7 across AWS, Azure, and GCP.

This role focuses on hands-on operational excellence—executing repeatable workflows such as scaling, patching, certificate rotations, and environment provisioning—while collaborating closely with Senior Operations Engineers, Site Reliability Engineers, and Product Engineering teams to improve processes and reliability.

Key Responsibilities
  • Execute routine operational changes including patching, scaling, node replacements, and certificate rotations across managed environments.
  • Serve as the first responder (L1) for production incidents during business hours—triaging, mitigating, and escalating to SRE (L2) or Product Engineering (L3) as appropriate.
  • Follow and improve operational runbooks to ensure safety, consistency, and reliability of production operations.
  • Monitor system health using tools such as Datadog and PagerDuty; investigate alerts, document findings, and assist in reducing noise and false positives.
  • Perform recurring maintenance and compliance tasks such as patch management, access reviews, and configuration validation.
  • Participate in the change control process, preparing and executing planned changes during maintenance windows.
  • Contribute to the creation and upkeep of operational documentation, including standard operating procedures and post-incident reports.
  • Collaborate with SREs to identify opportunities for automation or efficiency improvements in operational workflows.
  • Support deployment readiness by validating monitoring, alerting, and rollback strategies for new features.
  • Participate in a regional on-call rotation, ensuring consistent operational coverage across time zones.
Required Experience
  • 2–5 years of experience in Cloud Operations, DevOps, or Site Reliability Engineering, with hands-on exposure to production systems.
  • Practical experience operating workloads in AWS, Azure, or GCP.
  • Solid understanding of Linux systems administration and basic networking concepts.
  • Familiarity with Kubernetes (AKS/EKS/GKE) and containerized workloads.
  • Experience executing operational workflows and basic automation using scripting or Infrastructure as Code tools (Terraform, Bash, Python, or PowerShell).
  • Working knowledge of monitoring and alerting systems such as Datadog, Prometheus, or Grafana.
  • Basic understanding of certificate management, system patching, and configuration compliance.
  • Strong attention to detail, disciplined approach to change management, and documentation skills.
  • Clear communication skills for coordinating with global teams and documenting operational activities.
Preferred Skills and Qualifications
  • Experience supporting managed database or storage systems (Aerospike, Cassandra, PostgreSQL, etc.).
  • Familiarity with version control systems (GitHub, GitLab) and CI/CD workflows.
  • Exposure to configuration management tools (Ansible, Puppet, or similar).
  • Certifications in cloud technologies (AWS, Azure, or GCP) or Kubernetes (CKA/CKAD).
  • Experience in compliance-driven or audited environments (SOC 2, PCI, ISO 27001).
What Success Looks Like
  • Reliable execution of operational changes with minimal error or rework.
  • Consistently low MTTA for incidents during assigned shifts.
  • Up-to-date and validated runbooks for all supported systems.
  • Active participation in post-incident reviews and operational improvement initiatives.
  • Clear communication, collaboration, and teamwork across SRE and Product Engineering.

Aerospike is an Equal Opportunity Employer. We are committed to providing an environment free from discrimination on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law.




Top Skills

Ansible
AWS
Azure
Bash
Datadog
GCP
Git
Gitlab
Grafana
Kubernetes
Linux
Powershell
Prometheus
Puppet
Python
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Mountain View, CA
191 Employees
Year Founded: 2009

What We Do

The Aerospike Real-time Data Platform enables organizations to act instantly across billions of transactions while reducing server footprint up to 80%. The Aerospike multi-cloud platform powers real-time applications with predictable sub-millisecond performance up to petabyte scale with five-nines uptime with globally distributed, strongly consistent data. Applications built on the Aerospike Real-time Data Platform fight fraud, provide recommendations that dramatically increase shopping cart size, enable global digital payments, and deliver hyper-personalized user experiences to tens of millions of customers. Customers such as Airtel, Experian, European Central Bank, Nielsen, PayPal, Snap, Verizon Media and Wayfair rely on Aerospike as their data foundation for the future.

Similar Jobs

Cloudflare Logo Cloudflare

Solutions Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Remote or Hybrid
Australia
4400 Employees
100K-130K Annually
Remote or Hybrid
Australia
289097 Employees

Pluralsight Logo Pluralsight

Senior Customer Success Manager

Edtech • Information Technology • Software
Remote or Hybrid
Australia
1300 Employees

SailPoint Logo SailPoint

Engagement Manager

Artificial Intelligence • Cloud • Sales • Security • Software • Cybersecurity • Data Privacy
Remote or Hybrid
Australia
2461 Employees

Similar Companies Hiring

PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees
Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account