Site Reliability Engineer - Core technology Infrastructure
Job Description:
Job description:
- Strong IT professional providing Development and collaborating with Production Support for multiple infrastructure applications and systems, while driving continued IT Operations Management service improvements.
- Possess strong analytical ability in solving IT problems, working towards automation, and elimination of systems and/or process bottlenecks.
- Ability to lead and coordinate timely issue resolution for critical applications in partnership with other technicians from database, web service, network, storage, OS system admin, application developer, and management teams.
- Experience with Java, UNIX OS, Perl and Python scripting, SQL query, System Analyst, web services, monitoring, documentation, change control, troubleshooting, PowerShell, and process improvement.
Functions:
- Drive standardization for new onboarding processes and controls.
- Research and implement process or technology improvements
- Support customer monitoring requirements and engineer solutions for event and network monitoring
- Develop, test and deploy automated workflow in support of the IT business
- Troubleshoot, resolve system issues.
- Implement changes, upgrades to production systems
- Deliver technical documentation for all projects completed
- On-call coverage requirements and support break-fix needs
Required Skills:
- Strong UNIX, Linux, Wintel, Perl/Shell/Python scripting
- Programming in Java or .Net
- SQL/Database queries for data extraction
- Develop custom automation in order to streamline support processes
- Perform root cause analysis for recurring problems by partnering with other teams to develop long-term resolutions, including implementing preventative measures to minimize problems and production outages
- Strong problem root cause diagnosis skills and desire to learn processes, new products, applications and technology
- Manage production changes, releases, and upgrades in a collaborative environment in accordance with lifecycle methodology and risk guidelines and data management
- Supports the 24x7 day-to-day maintenance of the infrastructure application systems in operation, including tasks related to identifying and troubleshooting application and data issues and issues resolution or escalation
- Resolving and documenting incident and service tickets in a timely manner according to Service Level Agreements (SLAs) or assigned completion dates
- Performing testing and shakeout procedures during/after active incidents or deployments during or after regular business hours
- Manages activities related to maintenance of the application systems that are running the daily operations of the firm
- Ability to work as part of a team - Candidate will work closely with both the Engineering, Development and other Operational teams.
- Monitor Production environments / scheduled jobs and identify improvement to monitoring.
Desired Skills
• Great soft skills - People and communications skills are essential.
• Must possess coding experience in either JAVA, .Net or another object-oriented language
• Good proficiency in system, network, security and database operations, protocols, and industry standard technologies.
• Experience in command line interfaces (CLI), third party APIs and integration.
• Experience in server administration with Red Hat Enterprise Linux and Windows Server
• Good understanding of developing fault tolerant solutions and knowledge in horizontal scaling and resiliency/HA.
• Ability to juggle competing priorities and adapt to changes in project scope.
• College Degree or Higher or equivalent work experience
Shift:
1st shift (United States of America)
Hours Per Week:
40