Site Reliability Engineer

Posted 7 Days Ago
Be an Early Applicant
Chennai, Tamil Nadu
In-Office
Mid level
Artificial Intelligence • Information Technology • Software
The Role
The Site Reliability Engineer is responsible for maintaining and optimizing satellite communication systems, focusing on automation, incident management, and collaboration with engineering teams.
Summary Generated by Built In

About the role:

We are seeking a highly skilled Site Reliability Engineer (SRE) to join our dynamic team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our satellite communication systems, which leverage AI and ML for automation and optimization. You will play a key role in maintaining the infrastructure, automating deployment processes, and troubleshooting complex issues in a mission-critical environment.

Key Responsibilities:

  • System Reliability & Monitoring:
    • Design, implement, and maintain monitoring and alerting systems to ensure high availability and performance of satellite communication platforms.
    • Proactively identify and address system bottlenecks, vulnerabilities, and other reliability challenges.
    • Ensure infrastructure is capable of supporting AI and ML workloads at scale, with a focus on automation and efficiency.
  • Infrastructure Management & Automation:
    • Build and maintain CI/CD pipelines for satellite communication AI/ML applications, ensuring smooth deployment and integration processes.
    • Implement and optimize cloud-native architectures, using platforms such as AWS, GCP, or Azure, to support AI/ML models and satellite communication systems.
    • Automate scaling, deployment, and configuration of infrastructure to ensure high availability and fault tolerance.
  • Incident Management & Root Cause Analysis:
    • Lead incident response efforts, including troubleshooting, root cause analysis, and resolution of production issues.
    • Implement post-mortem analysis processes to continuously improve the reliability and performance of systems.
    • Ensure the implementation of best practices for incident documentation, including actionable feedback and lessons learned.
  • Collaboration & Continuous Improvement:
    • Work closely with engineering teams, including AI/ML developers, software engineers, and network engineers, to identify areas for improvement and optimize system performance.
    • Collaborate with satellite engineers to integrate AI/ML solutions into the satellite communication stack, ensuring performance optimization and automation.
    • Contribute to the development of internal tools and dashboards to enhance system reliability and transparency.
  • Security & Compliance:
    • Ensure security best practices are implemented across the satellite communication platform, particularly regarding AI/ML data privacy and satellite systems.
    • Collaborate with security teams to ensure systems are compliant with industry standards and regulations.

Qualifications:

  • Required:
    • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience).
    • 3+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
    • Strong knowledge of cloud platforms (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker).
    • Experience with infrastructure-as-code tools (Terraform, Ansible, etc.).
    • Strong expertise in monitoring, logging, and alerting tools (Prometheus, Grafana, ELK Stack, etc.).
    • Familiarity with AI/ML systems and how they can be scaled and managed in production environments.
    • Experience with scripting languages (Python, Bash, Go, etc.) for automation and tool development.
  • Preferred:
    • Experience with satellite communication systems or space-based infrastructure.
    • Knowledge of networking protocols and technologies related to satellite communication.
    • Experience with machine learning frameworks (TensorFlow, PyTorch, etc.) and deploying AI models in production.
    • Familiarity with disaster recovery, backup strategies, and high-availability configurations for cloud-based systems.
    • Certification in cloud platforms (AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect, etc.).

Skills & Attributes:

  • Problem-Solving & Critical Thinking:
    Ability to think creatively and analytically to solve complex problems in real-time.
  • Collaboration:
    Excellent team player with the ability to work cross-functionally in a collaborative environment.
  • Adaptability:
    Able to thrive in a fast-paced, constantly evolving environment and adapt to new technologies and methodologies.
  • Communication:
    Strong written and verbal communication skills, with the ability to explain technical concepts clearly to non-technical stakeholders.

What We’ll Offer

  • Professional development opportunities.
  • Collaborative and innovative work environment
  • Aviation, Maritime domain exposure, and business knowledge
  • Connectivity and content engineering and business knowledge
  • Opportunity to work in cross-functional teams.
  • Performance-based bonus
  • Opportunity to work across teams and organizations.

Neuron is an Equal Opportunity Employer. Employment opportunities at Neuron are based upon one's qualifications and capabilities to perform the essential functions of a particular job. All employment opportunities are provided without regard to race, religion, sex (including sexual orientation and transgender status), pregnancy, childbirth or related medical conditions, national origin, age, veteran status, disability, genetic information, or any other characteristic protected by law.

Top Skills

Ansible
AWS
Azure
Bash
Docker
Elk Stack
GCP
Go
Grafana
Kubernetes
Prometheus
Python
PyTorch
TensorFlow
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Miramar, Florida
185 Employees
Year Founded: 2019

What We Do

Quvia is the first AI-powered QoE platform for ships, planes and remote sites. The platform seamlessly blends any combination of connectivity, including multiple service providers, satellite orbits and terrestrial networks, into one, vendor-neutral environment; continuously measures and analyzes real-time connectivity performance and its impact on quality of experience (QoE); and intelligently orchestrates the network to deliver the best possible QoE—even in the most remote places.

Today, Quvia works with industry-leading companies in aviation, cruise, energy, shipping and more. To learn more, visit: www.quvia.ai.

Not sure how to pronounce Quvia? Think cue-vee-uh. 👌

Recruiting Disclaimer: Quvia will never ask to interview job applicants via text message or ask for personal banking information as part of the interview process. Quvia will never ask job applicants or new hires to send money or deposit checks for the company. In case of doubt, please contact us directly at [email protected].

Similar Jobs

Workday Logo Workday

Site Reliability Engineer

Cloud • Fintech • HR Tech
In-Office
Chennai, Tamil Nadu, IND

Barclays Logo Barclays

Site Reliability Engineer

Fintech • Financial Services
In-Office
2 Locations
In-Office
2 Locations

Miratech Logo Miratech

Site Reliability Engineer

Information Technology
In-Office
Chennai, Tamil Nadu, IND

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account