Lead - Site Reliability Engineering

Sorry, this job was removed at 10:44 p.m. (CST) on Monday, Feb 17, 2025
Be an Early Applicant
Hyderabad, Telangana
Cloud • Software
If you’re ready to build your future — and the future of technology — then you’re in the right place.
The Role

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.

Job Category

Software Engineering

Job Details

About Salesforce

We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing well and doing good – you’ve come to the right place.

At Salesforce, we work on solving the hard, long term engineering problems that underlies all of our products.  Focus is on building powerful and simple to use frameworks, services and software components that will be used by our products to support the exponential growth of our business while we deliver value to our end customers.  

Our team owns mission critical services like the Core application server and async processing that supports the Salesforce platform across multiple substrates including AWS among others. These tier0 distributed services support billions of transactions at B2C scale and are resilient, highly available and scalable. Architected on CNA principles, our services are built to leverage OSS platforms like K8s, Service mesh and Spinnaker for the continuous deployment. 

Our goal is to innovate at scale and leverage AI for self configuration, self detection and self healing. As service owners, we track availability with observability, detect and forecast anomalies, use alert correlation for incident causation and prioritize automation with AIOps to reduce operational pain and toil along with innovation. 

We are looking for passionate engineers to join our engineering team, who love to own modern large scale services in production and drive our charter forward. If you’re fired up about software performance, solving complex problems, automating everything, and working with great engineers, this is the job for you!

Your responsibilities include:

Responsibilities

* Complete service ownership, right from influencing product architecture to operating service seamlessly in production
* Analyze and remediate production incidents for the Core Application Server and asynchronous processing platform
* Develop deeper insights into platform incidents and influence with engineering backlog to address repeat incidents and prevent incidents proactively
* Leverage AIOps platform to continuously improve anomaly detection, automate runbooks and drive our MTTD & MTTR goals
* Understand customer use cases leveraging our platform and services and collaborate with the rest of the engineering organization to identify opportunities to achieve our availability goals
* Engage with engineers developing features on our platform and provide consultative support and onboarding guidance
* Collaborate with Systems engineering team for activities such as providing inputs for OS patching, JDK upgrade and software configuration
* Collaborate with technical writers to create, update and review documentation for users and operators
* Participate in the team’s 24x7 on-call rotation to address complex problems in real-time and keep services operational and highly available
* Continuously raise standards of engineering excellence by implementing best DevOps practices
* Champion a culture and work environment that promotes diversity and inclusion
* Lead, collaborate, communicate, and mentor

Required Skills

* Bachelors Degree in Computer Science or equivalent experience
* 5 years of work experience
* Knowledge of OO programming and concepts and experience coding in Java, C++ or Python
* Ability to debug complex distributed systems to understand system design with an eye for performance and scalability bottlenecks and provide recommendations to optimize code
* In-depth, hands-on experience with Linux, networking, server, and cloud architectures
* Exposure to container related technologies such as Kubernetes, Docker, etc.
* Proficiency with source control, continuous integration, and testing pipelines

Preferred Skills

* Overall 10+ years experience and 5+ years in a production engineering/DevOps/SRE or similar role working on high scale distributed systems
* Strong background in open source software is preferred
* Experience analyzing heap dumps
* Experience instrumenting code and profiling applications
* Experience evaluating and interpreting large volumes of production data to know efficiency, latency, memory and CPU utilization
* Experience with messaging platforms 
* Experience with AWS or another cloud PaaS provider
* Experience in configuration management technologies such as Chef, Puppet or Ansible
* Strong problem-solving, troubleshooting and analytical skills clearly demonstrated in past projects
* Solid understanding of configuration, deployment, management and maintenance of large cloud-hosted systems; including auto-scaling, monitoring, performance tuning, troubleshooting and disaster recovery
* Understanding of Java Virtual Machine technology and ability to tune and debug issues related to compilers, Garbage collectors

Accommodations

If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement

At Salesforce we believe that the business of business is to improve the state of our world. Each of us has a responsibility to drive Equality in our communities and workplaces. We are committed to creating a workforce that reflects society through inclusive programs and initiatives such as equal pay, employee resource groups, inclusive benefits, and more. Learn more about Equality at www.equality.com and explore our company benefits at www.salesforcebenefits.com.

Salesforce is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Salesforce does not accept unsolicited headhunter and agency resumes. Salesforce will not pay any third-party agency or company that does not have a signed agreement with Salesforce.

Salesforce welcomes all.

The Company
HQ: San Francisco, CA
72,000 Employees
Hybrid Workplace

What We Do

Salesforce is the #1 AI CRM, where Humans with agents drive customer success together. Through Agentforce, our groundbreaking suite of customizable agents and tools, Salesforce brings autonomous AI agents, unified data from any source, and best-in-class Customer 360 apps together on one integrated platform to help companies connect with customers in a whole new way.

Salesforce is democratizing AI agents for businesses of every size and industry so every company can embrace a workforce without limits. Our low code, open, and secure platform helps companies build and customize Salesforce fast so they can safely scale AI-powered work to every customer and employee experience and transform their business.

Salesforce is proud to be the market leader, but we’re even more proud to lead in philanthropy, innovation and culture. Guided by core values of trust, customer success, innovation, equality, and sustainability, Salesforce is more than a business — we’re a platform for change.

Why Work With Us

There’s no typical day in the life of a Salesforce employee. You could be transforming our next AI innovation — or transforming your community. Closing deals — or closing your laptop for a day of Volunteer Time Off. Driving change for our customers — or driving change within one of our high-performing teams.

Gallery

Gallery

Similar Jobs

NVIDIA Logo NVIDIA

Senior Site Reliability Engineer, Data Science and ML Platforms

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Remote
5 Locations
21960 Employees

NVIDIA Logo NVIDIA

Senior Site Reliability Engineer - AI Research Clusters

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
4 Locations
21960 Employees
Hyderabad, Telangana, IND
26747 Employees

F5 Logo F5

Site Reliability Engineer I

Cloud • Information Technology • Security • Software
Hyderabad, Telangana, IND
5847 Employees

Similar Companies Hiring

Stepful Thumbnail
Software • Healthtech • Edtech • Artificial Intelligence
New York, New York
60 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account