Site Reliability Engineering Systems Engineer - Event Management Tools - Core Technology Infrastructure

Sorry, this job was removed at 11:20 a.m. (CST) on Thursday, July 14, 2022
Find out who's hiring in Charlotte, NC.
See all Developer + Engineer jobs in Charlotte, NC
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Job Description:
Job description:

  • Strong IT professional providing Production Support for multiple infrastructure applications and systems, while driving continued IT Operations Management service improvements.
  • Possess strong analytical ability in solving IT problems, working towards automation, and elimination of systems and or process bottlenecks.
  • Ability to lead and coordinate timely issue resolution for critical applications in partnership with other technicians from database, web service, network, storage, OS system admin, application developer, and management teams.
  • Experience with Java, UNIX OS, Perl and Python scripting, SQL query, System Analyst, web services, monitoring, documentation, change control, troubleshooting, PowerShell, and process improvement.
  • Work with application owners, both Business owners and Engineering teams, along with operation services, to establish Business and Technical monitoring strategies, including instrumentation of the systems, collection of metrics, development of KPIs, and configuration of alerting by static and dynamic thresholds through use of statistical analysis and machine learning.
  • The idea is to drive Standardizing and centralization. Build tools to achieve operational efficiencies and product insight. •
  • Design and build an inventory system with comprehensive list of KPIs and metrics built in and preserved.
  • Develop performance test plan and Test harness that can satisfy 100+ varied products and platforms.
  • Devise programmatic capacity planning routines.
  • Utilize technical area expertise to assess, select, manage, and implement enterprise application components, and to ensure that the technical solution solves the business problem as an organic part of the organization's operational and functional baseline.
  • Participate in the support of Major Incidents with Major Incident Management (MIM), Operations Triage Group (OTG), ECC, and Problem Management (PM) throughout the major incident life cycle by providing monitoring data on the system(s) in question and by addressing deficiencies in technical and business monitoring KPIs.
  • Support Triage efforts during Major Incidents by deconstructing application performance, interoperability, instrumentation, and human factors to facilitate resolution and development of resilient solutions.
  • Support PM's enterprise root cause analysis (RCA) processes in collaboration with appropriate OI&T organizations.
  • Capture technical information from the relevant stakeholders and synthesize it into useful information in various formats for OIT senior management and other VA components.
  • Demonstrate proficiency with DevOps tools, JIRA, ServiceNow, MS Project and perform tasks using the tools.


Required Skills
• Strong UNIX, Linux, Wintel, Perl/Shell/Python scripting
• SQL/Database queries for data extraction
• Develop custom automation in order to streamline support processes
• Perform root cause analysis for recurring problems by partnering with other teams to develop long-term resolutions, including implementing preventative measures to minimize problems and production outages
• Strong problem root cause diagnosis skills and desire to learn processes, new products, applications and technology
• Manage production changes, releases, and upgrades in a collaborative environment in accordance with lifecycle methodology and risk guidelines and data management
• Supports the 24x7 day-to-day maintenance of the infrastructure application systems in operation, including tasks related to identifying and troubleshooting application and data issues and issues resolution or escalation
• Resolving and documenting incident and service tickets in a timely manner according to Service Level Agreements (SLAs) or assigned completion dates
• Performing testing and shakeout procedures during/after active incidents or deployments during or after regular business hours
• Manages activities related to maintenance of the application systems that are running the daily operations of the firm
• Ability to work as part of a team - Candidate will work closely with both the Engineering, Development and other Operational teams.
• Monitor Production environments / scheduled jobs and identify improvement to monitoring.
Desired Skills
• Great soft skills - People and communications skills are essential.
• Other coding skills such as JAVA
• Good proficiency in system, network, security and database operations, protocols, and industry standard technologies.
• Experience with supported tools such as: Netcool Omnibus, WebGUI, ITM a Plus
• Experience in command line interfaces (CLI), third party APIs and integration.
• Experience in server administration with Red Hat Enterprise Linux and Windows Server
• Good understanding of developing fault tolerant solutions and knowledge in horizontal scaling and resiliency/HA.
• Ability to juggle competing priorities and adapt to changes in project scope.
• College Degree or Higher or equivalent work experience
Shift:
1st shift (United States of America)
Hours Per Week:
40
Learn more about this role

Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about Bank of AmericaFind similar jobs