Job Title: Cloud Engineer III
Job Description: The Cloud Engineer (also known as Site reliability engineer - SRE) plays a pivotal role in guaranteeing the dependability, accessibility, and performance of Pearson's systems. They actively take charge of projects and initiatives, working closely with diverse teams to deliver tangible results, and persistently drive enhancements and efficiency through the execution of automation and industry best practices.
This role aligns to industry titles such as Site reliability engineer or Devops Engineer
Key Responsibilities:
- Team Goals: Implement consistent actions to drive the achievement of team objectives and meet service level objectives, underlining a dedicated commitment to ensuring the team's success.[LC3]
- Cloud Engineering: Actively participate in projects aligned with Pearson’s objectives, such as improving system observability, implementing SRE best practices, and enhancing security measures. Implement CI/CD pipelines utilizing Pearson standards for cloud architecture and security. Develop and implement automation scripts and tools that reduce manual effort and enhance operational efficiency.
- Incident Management: Execute on-call duties with excellence, showcasing proficiency in swiftly responding to incidents, adeptly troubleshooting complex issues, and actively contributing to comprehensive post-incident reviews.
- Peer Review: Regularly contribute to and review code and documentation, providing and receiving clear, constructive feedback.
- Collaboration & Communication: Collaborate with developers, operations, and product teams, and articulate technical concepts clearly to technical and non-technical [LC4] stakeholders.
- Continuous Learning: Keep abreast of industry trends, emerging technologies, and best practices in Site Reliability Engineering.[LC5]
- Skill Development: Proactively seek opportunities for professional and technical development through training, certifications, and personal projects.
- Scalability & Performance: Gain expertise in capacity planning and performance optimization, ensuring that systems are scalable and can handle increased loads while balancing budget constraints and business requirements.[LC6]
- Documentation: Create and maintain detailed documentation for processes, procedures, and best practices.
- SRE/Cloud Engineering Culture: Uphold and exemplify the core values of SRE culture, including automation, reliability, and data-driven decision-making.
- Cost Optimization: Collect, analyze, and interpret cloud cost data to identify trends, anomalies, and cost-saving opportunities. Translate these findings into actionable execution.
- Technical Mastery: Develop an in-depth understanding of your team’s [LC7] systems, architectures, and technologies.
- Feedback & Self-Reflection: Actively seek and constructively receive feedback for continuous improvement.
Qualities:
- High level of technical proficiency in your team’s[LC8] technology stack
- Excellent communication and collaboration skills
- Strong aptitude for problem-solving and analytical thinking
- Commitment to continuous learning and skill development
Skills:
- Cloud platform - AWS
- IaC - Terraform
- Containers - Docker, ECS/Fargate, Lambda, Linux fundamentals
- CI/CD - GitHub Actions, Jenkins, Ansible
- Observability - CloudWatch, New Relic, Grafana
- Scripting - Python, Bash, Groovy
- SRE Practices - RCAs, incident review, runbooks
- AWS Services to Know - VPC networking, Transit Gateway, VPN, security groups, CloudTrail, IAM, route 53, CloudFront
- Agile - Jira, JSM
- AI tooling - AI tooling experience preferred, we use Github copilot, Cursor, AWS bedrock
Skills Required
- Experience with AWS cloud platform
- Infrastructure as Code with Terraform
- Container technologies: Docker, ECS/Fargate
- Serverless: AWS Lambda
- Linux fundamentals
- CI/CD tooling: GitHub Actions and Jenkins
- Configuration/automation: Ansible
- Observability tooling: CloudWatch, New Relic, Grafana
- Scripting languages: Python, Bash, Groovy
- Networking and AWS services: VPC, Transit Gateway, VPN, security groups, Route 53, CloudFront
- Security and audit services: CloudTrail, IAM
- SRE practices: incident response, RCAs, runbooks
- Agile tools: Jira, JSM
- AI tooling experience (GitHub Copilot, Cursor, AWS Bedrock)
What We Do
We are the world’s learning company with more than 22,500 employees operating in 70 countries. We provide content, assessment and digital services to learners, educational institutions, employers, governments and other partners globally. We are committed to helping equip learners with the skills they need to enhance their employability prospects and to succeed in the changing world of work. We believe that wherever learning flourishes so do people.







