Role Overview
As an Infrastructure Architect, you will be responsible for designing, implementing, and operating a resilient, secure, and cost-efficient multi-cloud infrastructure across Google Cloud Platform (GCP) and Tencent Cloud.
You will work closely with engineering, DevOps, and SRE teams to define infrastructure standards, ensure high availability through Disaster Recovery, enable Unified Monitoring, and support production stability. You will also provide Level 3 support for complex infrastructure issues and continuously improve platform reliability and cost efficiency.
Key Responsibilities
1. Infrastructure Architecture & Standards
- Design and maintain standardized infrastructure blueprints across GCP (global) and Tencent Cloud (APAC/China).
- Implement secure cross-cloud connectivity, including Tencent CCN and GCP Interconnect, ensuring compliant data flow across regions.
- Define and enforce IAM/CAM standards, network security, and data residency controls in line with GDPR and China MLPS 2.0 requirements.
- Implement and maintain Infrastructure as Code (IaC) using Terraform to ensure consistency, repeatability, and auditability.
- Review infrastructure designs and guide development teams to follow approved patterns.
2. Production Support & Troubleshooting
- Provide Level 3 escalation support for critical infrastructure and cloud-related production issues.
- Participate in major incident response, supporting recovery efforts across multiple regions and cloud providers.
- Perform Root Cause Analysis (RCA) for infrastructure incidents and implement corrective architectural improvements.
- Collaborate with SRE and DevOps teams to improve system stability and operational maturity.
3. Monitoring & Observability
- Design and support a centralized monitoring and observability setup across GCP and Tencent Cloud.
- Implement consistent metrics, logs, and traces using tools such as Prometheus, Grafana, Datadog, or ELK.
- Enable OpenTelemetry for unified tracing and logging across regions.
- Configure alerts and health checks to proactively detect infrastructure degradation.
4. Disaster Recovery & Business Continuity
- Design and maintain DR architectures (Active-Active or Active-Passive) across regions and cloud providers.
- Implement backup, replication, and data recovery mechanisms, including cross-cloud storage strategies.
- Define and track RTO and RPO targets for critical systems.
- Participate in and support DR drills and failover testing to ensure readiness.
5. Cost Optimization & Cloud Efficiency
- Support cost optimization initiatives, including usage analysis and rightsizing of cloud resources.
- Assist in implementing Committed Use Discounts (GCP) and prepaid or bidding models (Tencent Cloud).
- Identify opportunities to reduce data egress and inter-cloud transfer costs.
- Build visibility into infrastructure costs and work with teams to optimize spend without impacting reliability.
Requirements
Required Skills & Experience
- 4-5 years of experience in cloud infrastructure, DevOps, or platform engineering roles.
- Strong hands-on experience with GCP and working knowledge of Tencent Cloud.
- Experience designing multi-region, highly available cloud architectures.
- Solid expertise in Terraform and Infrastructure as Code.
- Practical experience in incident management, troubleshooting, and RCA.
- Understanding of cloud security, networking, and compliance requirements.
- Experience with monitoring and observability tools.
What Success Looks Like
- Stable, repeatable infrastructure deployments across GCP and Tencent Cloud.
- Faster resolution of production incidents with reduced recurrence.
- Tested and reliable disaster recovery mechanisms.
- Improved cost visibility and optimized cloud usage.
- Engineering teams enabled by clear infrastructure standards and patterns.
Top Skills
What We Do
Teleport allows engineers and security professionals to unify access for SSH servers, Kubernetes clusters, web applications, and databases across all environments.








