Responsibilities:
- Ensure high availability, scalability, and performance of production systems.
- Implement and maintain SLIs, SLOs, and SLAs for critical services.
- Conduct capacity planning and performance tuning.
- Automate infrastructure provisioning using IaC tools such as Terraform and Terragrunt , ansible
- Develop automation to minimize manual operations and improve deployment workflows.
- Build CI/CD pipelines to support rapid and reliable deployments.
- Design and maintain monitoring, logging, and alerting systems (Datadog).
- Participate in on-call rotations and lead incident response efforts.
- Perform root-cause analysis and develop postmortems to prevent recurring issues.
- Manage cloud infrastructure (AWS, Azure) and container orchestration platforms (Kubernetes, ECS).
- Optimize system architecture for reliability and fault tolerance.
- Implement best practices for security, networking, and service resilience.
- Work closely with development teams to design reliable microservices and distributed systems.
- Advocate for SRE principles and drive operational excellence across engineering teams.
- Mentor engineers on reliability practices, tooling, and automation strategies.
Qualifications:
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
- 3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
- Strong proficiency with Linux systems and shell scripting.
- Experience with cloud platforms (AWS, Azure).
- Hands-on experience with Kubernetes/ECS and container technologies (Docker).
- Proficiency in at least one programming language: Python or Java
- Experience with CI/CD pipelines and DevOps tooling.
- Strong understanding of distributed systems, networking, and security fundamentals.
- Strong analytical and problem-solving skills.
- Excellent communication and cross-team collaboration.
- Ability to thrive in fast-paced, high-stakes environments.
- A mindset focused on continuous improvement and operational excellence.
Prefered Qualifications:
- Experience with observability stacks (OpenTelemetry).
- Knowledge of database management (PostgreSQL).
- Experience with configuration management tools (Ansible, Chef, Puppet).
- Familiarity with zero-downtime deployments and chaos engineering practices.
Top Skills
What We Do
Get behind the scenes insights from startup tech teams: https://www.myhatchpad.com/newsletter/
hatch I.T. is a specialized technology consulting firm connecting software, product, and data engineers with tech startups in emerging tech markets. We offer customized models that transform the way early-stage and high-growth startups scale. Our flagship programs include:
- Scale – technical consulting and recruiting services for high-growth startups
- Stride – technical strategy and consulting for early-stage startups
- hatchpad – an online community platform connecting startup technologists to network, learn, and advance in their careers
In true startup fashion, our roots can be traced to a garage in Leesburg, VA in 2013. While working with local startups, our Founder & CEO, Tim Winkler, realized that traditional staffing models didn’t align with the growth needs of startups. Working with those firms felt transactional and the costs were way outside a startup's budget. There was a need for a solution that was relational, community driven, and flexibly priced. With this in mind, hatch I.T. was formed, along with customized models that transform the way early-stage and high-growth startups scale.
Fast forward 8 years and 15 employees later, hatch has developed a platform that provides a roadmap to guide startups from MVP through all stages of growth. After proving this model with dozens of startups across DC, Maryland, & Virginia, we realized it was needed in all emerging startup markets.
If you’re a startup looking to grow your startup team, or an engineer looking for a career at an innovative tech company, connect with hatch I.T. today.









