Senior CloudOps Engineer

Posted 2 Days Ago
Be an Early Applicant
Hiring Remotely in México
Remote
Senior level
Information Technology
The Role
The Senior CloudOps Engineer will manage AWS and Linux/Windows environments, focusing on automation, production lifecycle support, and continuous service availability in a 24x7 SaaS environment.
Summary Generated by Built In

The Cloud Ops Engineer will support Amazon Web Services (AWS) and Linux/Windows environments. The Cloud Ops Engineer will be responsible for all aspects of the production lifecycle of maintenance, and administration, including but not limited to: infrastructure automation, continuous integration and deployment, product release and support, running a scalable production environment for hosting the ARCOS platform, maintaining application/database availability, and ensuring continuous 24x7 production uptime of our services. 

The Cloud Ops Engineer needs to be familiar with AWS, Apache, Tomcat, PostgreSQL, Oracle, Ansible, Jenkins, Jira, Confluence and SaaS operations.  

  • Design, develop and maintain scalable AWS solutions and infrastructure, including but not limited to: EC2, RDS, S3, DynamoDB, Elasticache, and Route53. 
  • Develop tooling and processes to automate the deployment of SaaS based applications and their underlying operating systems and infrastructure. 
  • Perform PostgreSQL and Oracle database administration, including maintenance, troubleshooting, tuning, optimization, installation, upgrades, backup/recovery, and data migration. 
  • Partner with Engineering, Development, Quality Assurance, Professional Services, and Technical Support to ensure the success of the assigned product offerings and schedules. 
  • Engage in Agile team practices such as daily standups, backlog refinement, release planning and sprint planning. 
  • Coordinate configuration changes, installs, and upgrades with appropriate development teams and product owners while following company change control procedures. 
  • Participate in capacity planning to determine future infrastructure needs. 
  • Participate in 24x7 on-call responsibilities, maintaining the availability and performance of all customer-facing production services. 
  • Triage and participate in the resolution of complex problems, including network connectivity issues, that span multiple tiers of application/infrastructure. 
  • Implement monitoring and reporting capabilities to assist engineering in rapidly identifying issues.  
  • Actively monitor supported systems and respond promptly to security or usability concerns. 
  • Review application logs and analyze events using cloud-native services (e.g. CloudWatch, CloudTrail) or third party SIEM tools (e.g. Splunk). 
  • Upgrade systems and processes as required for enhanced functionality and security compliance. 
  • Maintain product service level agreements. 
  • Accurately document all processes and procedures for routine and non-routine tasks.  
  • All other duties and responsibilities as assigned. 

Requirements
    • Bachelor’s degree in Computer Science or related field, or equivalent work experience. 
    • 4-5 years of system administration experience, ideally in global management and operations of highly trafficked production applications. Experience working in a 24x7 SaaS environment is preferred.  
    • 4-5 years of experience designing solutions for and managing AWS services, including but not limited to: EC2, RDS, S3, DynamoDB, Elasticache, WAF/Shield, Route53, IAM and Directory Service. 
    • 2 years of experience with CI/CD technologies and best practices. 
    • 2 years of experience with PostgreSQL, Oracle, SQL Server, 
    • Experience with Linux and Windows system administration, automation and performance tuning. 
    • Experience with configuration management and infrastructure as code tools such as Ansible and Terraform. 
    • Experience with Apache, Nginx, Tomcat, NodeJS/PM2. 
    • Experience with scripting languages, including Bash, Python and Powershell. 
    • Knowledge of Docker, Jira, Confluence.  
    • Advanced knowledge of system vulnerability management and security best practices. 
    • Solid understanding of networking concepts and troubleshooting.  
    • Proven ability to work effectively with highly reliable and highly available mission critical technologies with detail and results shown while meeting deadlines. 
    • Ability to operate deployment automation, SaaS operations, internal and external SaaS infrastructure, security and cost management. 
    • Solid understanding of technical issues and opportunities related to modern cloud infrastructure and operations. 
    • Action oriented, decisive approach to work required, with the willingness to take a hands-on role when needed to ensure deliverables are met on time. 
    • High energy, motivated self-starter with ability to take direction and manage tasks with minimal supervision within an energized, collaborative, and entrepreneurial environment. 
    • Excellent written and verbal communication skills.  

Benefits
  • Health Care Plan (Medical, Dental & Vision)
  • Retirement Plan (401k, IRA)
  • Life Insurance (Basic, Voluntary & AD&D)
  • Paid Time Off (Vacation, Sick & Public Holidays)
  • Family Leave (Paternity, Maternity)
  • Short Term & Long Term Disability
  • Training & Development
  • Work From Home

Top Skills

Amazon Web Services (Aws)
Ansible
Apache
Bash
Confluence
Docker
Jenkins
JIRA
Linux
Oracle
Postgres
Powershell
Python
Tomcat
Windows
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Columbus, OH
226 Employees
Year Founded: 1993

What We Do

ARCOS provides SaaS solutions to solve resource management challenges that companies face. The ARCOS Resource Management platform helps customers plan, respond, restore, and report actions taken during normal operations or unplanned service interruptions. Using ARCOS, utilities, airlines, manufacturers, and industrial facilities improve response time and improve the efficiency of resources while improving customer satisfaction.

As a SaaS solution provider, the nation's most progressive critical infrastructure companies have chosen ARCOS as a mission-critical component of their operation to ensure compliance and consistency with their callout business rules governed by bargained agreements.

The ARCOS System not only improves operational efficiency, but it also provides a quantitative return-on-investment, saving O&M budget by improving restoration time, eliminating unnecessary overtime, streamlining daily employee scheduling shortages due to after-hours emergency work, and providing comprehensive performance reporting and troubleshooting metrics.

#resourcemanagement #emergencyreponse #mutualassistance #damageassessment #crewmanagement #fieldservicemobility #fieldservicemanagement

Similar Jobs

Mondelēz International Logo Mondelēz International

Data Analyst

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
4 Locations
90000 Employees
1-1 Annually

Commerce Logo Commerce

Senior Workday Payroll & Time Tracking Administrator

Artificial Intelligence • Cloud • Consumer Web • eCommerce • Information Technology • Software
Remote
México
1200 Employees
650K-1M Annually
Remote
México
575 Employees

JumpCloud Logo JumpCloud

Recruiter

Cloud • Information Technology • Security • Software
Easy Apply
In-Office or Remote
3 Locations
800 Employees

Similar Companies Hiring

Axle Health Thumbnail
Logistics • Information Technology • Healthtech • Artificial Intelligence
Santa Monica, CA
17 Employees
Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account