Site Reliability Engineer

Reposted 3 Days Ago
Hiring Remotely in USA
Remote
Mid level
Other
The Role
As a Site Reliability Engineer, you will design cloud platforms, automate operations, maintain infrastructure, and support engineering teams in delivering reliable services.
Summary Generated by Built In

Site Reliability Engineer
OXIO is the first NeoTelco. We are building the world’s largest, most accessible, and insightful Telecom network. Our platform empowers anyone to spin up their own carrier from a browser, scaling and supporting you as you scale your network to millions of users.

We ensure that users and devices are connected, and stay connected wherever they go: Cross- country, carrier, or cellular technology. We help them pay less for mobile data. This technology is provided through our Carrier-as-a-Service platform: BrandVNO, a fully customizable telecom service. In addition, we enable clients of our service to extract the value from telecom data - enriching their customer experience, business intelligence, and product understanding in the many markets in which we operate.

Come join us in creating a modern technology platform with a group of engineers dedicated to advancing our vision. Our team is passionate about what we build, open to new ideas and challenges, and has our sights set on the future of connectivity.

Responsibilities

  • Design and implement platform on the cloud to support OXIO backend services

  • Automate technical operations: deployments, scaling, recovery, etc.

  • Monitor and maintain mission-critical production infrastructure to ensure maximum uptime

  • Participate in an on-call rotation and culture of continuous improvement through blameless postmortems

  • Enable the Engineering/Telecom/Data Engineering teams by providing them the tools to operate the service they build

Essentials
  • Understanding of Linux/Unix systems (most systems are Linux-based).

  • Familiarity with Linux/Unix system internals like process management, filesystems, memory management, and networking.

  • Proficiency in at least one programming language (Python, Go, or Ruby) and strong skills in scripting (Bash, Perl).

  • Experience with infrastructure provisioning tools such as Terraform, CloudFormation, or Ansible.

  • Familiarity with containerization (Docker) and orchestration tools (Kubernetes).

  • Familiarity with monitoring tools like Prometheus, Grafana, or Datadog.

  • Knowledge of setting up alerts, analyzing logs, and creating dashboards for observability.

  • Familiarity with incident management practices (e.g., runbooks, postmortems).

  • Experience in being part of an on-call rotation and handling incidents.

  • Experience in setting up and maintaining Continuous Integration/Continuous Delivery pipelines (Jenkins, GitLab CI, CircleCI, etc.).

  • Hands-on experience with cloud providers (AWS, Google Cloud, Azure).

  • Knowledge of virtualization technologies (VMware, KVM) and cloud-native architecture.

  • Understanding of TCP/IP, DNS, HTTP/HTTPS, load balancing, and firewalls.

Nice to have
  • Strong understanding of deployment strategies (canary releases, blue-green deployments, etc.).

  • Familiarity with high availability and understanding failover mechanisms.

  • Familiarity with IAM (Identity and Access Management) and zero trust principles.

  • Experience working with distributed systems (e.g., Kafka, Cassandra, Elasticsearch).

  • Building custom monitoring tools or writing complex automation scripts.

  • Functional knowledge of database management (SQL and NoSQL).

  • Familiarity with distributed tracing (Jaeger, OpenTelemetry) and advanced log aggregation strategies (ELK stack, Splunk).

  • Familiarity with performance profiling tools and optimizing application performance under heavy load.

  • Familiarity in load testing and identifying bottlenecks.

  • Familiarity with Configuration Managment using SaltStack for maintaining server configurations.

Skills Required

  • Understanding of Linux/Unix systems
  • Familiarity with Linux/Unix system internals
  • Proficiency in at least one programming language (Python, Go, or Ruby)
  • Strong skills in scripting (Bash, Perl)
  • Experience with infrastructure provisioning tools such as Terraform, CloudFormation, or Ansible
  • Familiarity with containerization (Docker) and orchestration tools (Kubernetes)
  • Familiarity with monitoring tools like Prometheus, Grafana, or Datadog
  • Knowledge of setting up alerts, analyzing logs, and creating dashboards for observability
  • Familiarity with incident management practices (e.g., runbooks, postmortems)
  • Experience in being part of an on-call rotation and handling incidents
  • Experience in setting up and maintaining Continuous Integration/Continuous Delivery pipelines (Jenkins, GitLab CI, CircleCI, etc.)
  • Hands-on experience with cloud providers (AWS, Google Cloud, Azure)
  • Knowledge of virtualization technologies (VMware, KVM)
  • Understanding of TCP/IP, DNS, HTTP/HTTPS, load balancing, and firewalls
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, NY
68 Employees
Year Founded: 2018

What We Do

OXIO is the first telecom-as-a-service (TAAS) platform for brands and enterprises that unbundles mobile telecom infrastructure, capturing the powerful data and true value that it emits. OXIO’s 100 percent cloud-based solution blends the wireless infrastructure of many providers, enabling something that wasn't possible before — a custom-purposed, asset-light network delivered to each brand in a matter of days. OXIO's B2B SaaS solution unlocks the full and uncompromising control of the wireless experience for brands, including actionable intelligence that drives clear value and results. Mobile data, long locked up in telecom silos, allows brands to get closer to their customers than ever. OXIO is headquartered in New York, with offices in Mexico City and Montreal, Canada. For more information, visit oxio.com.

Similar Jobs

Optum Logo Optum

Site Reliability Engineer

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office or Remote
Minnetonka, MN, USA
160000 Employees

MongoDB Logo MongoDB

Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Remote or Hybrid
7 Locations
5550 Employees
127K-249K Annually

Domino Data Lab Logo Domino Data Lab

Site Reliability Engineer

Artificial Intelligence • Machine Learning
Easy Apply
Remote or Hybrid
US
200 Employees
200K-230K Annually

Coinbase Logo Coinbase

Site Reliability Engineer

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Easy Apply
Remote
USA
4700 Employees
218K-257K Annually

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account