Platform Architect

Reposted 19 Days Ago
Be an Early Applicant
San Jose, CA
In-Office
150K-275K Annually
Senior level
Artificial Intelligence • Hardware • Software
The Role
Lead the definition and realization of an AI server platform architecture, collaborating across teams, and designing advanced hardware systems for high-performance workloads.
Summary Generated by Built In

About Etched

Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.

Job Summary

As a Platform Architect, you will lead the definition and realization of our AI server platform architecture, from server board design to rack-level integration and multi-rack POD-scale system orchestration. This is a hands-on technical leadership role that requires deep expertise in PCIe and fabric topologies, power and thermal constraints, system controls, and high-speed networking. Key responsibilities will include creating advanced new platform architecture for next generation Sohu AI servers as part of a future new product development roadmap.

You will collaborate cross-functionally with electrical, mechanical, thermal, firmware, and operations teams to architect systems that scale from a single server to full-rack and multi-rack POD deployments.

Key responsibilities

  • Architect the end-to-end hardware system stack, including server-level components, rack-scale systems, and multi-rack POD designs optimized for AI and high-performance workloads

  • Design and implement advanced PCIe Gen5/Gen6 topologies: root complex architecture, retimer placement, switch hierarchy, and accelerator fan-out strategies

  • Define scalable BMC architecture and platform management features across fleet deployments, including telemetry pipelines, orchestration hooks, and API integrations (e.g., Redfish, IPMI)

  • Specify and lead the implementation of chip-to-chip interconnects such as NVLink, UCIe, and other emerging high-bandwidth, low-latency fabrics

  • Develop integration strategies for power distribution, control planes, cooling systems (air and liquid), and shared interconnect fabrics at the rack level

  • Own the networking architecture across servers and racks, including 400G/800G Ethernet, leaf-spine switching, NIC-to-ToR planning, and cross-rack topology

  • Specify power delivery systems for high-density, multi-kilowatt platforms: VRM selection, power trees, sequencing, and protection logic

  • Guide system design decisions with awareness of mechanical and thermal constraints to ensure performance, manufacturability, and serviceability

  • Contribute to rack-level management infrastructure: CDU planning, telemetry aggregation, rack controller architecture, and out-of-band control

  • Support bring-up and validation teams in debugging complex issues at the system, rack, and POD levels

You may be a good fit if you have

  • 8+ years of experience in system or server hardware architecture, ideally in HPC, AI infrastructure, or hyperscale data centers

  • Deep understanding of PCIe protocols and topologies, including bifurcation, retimer tuning, switch fabrics, and accelerator communication

  • Experience with rack-level and multi-rack system design, including shared power and networking infrastructure

  • Strong expertise in BMC systems, control buses, telemetry integration, and orchestration tooling

  • Familiarity with modern high-speed networking technologies: 400G Ethernet, InfiniBand, CXL fabrics, and NIC-switch integration

  • Proven background in power architecture for dense compute systems, including power budgeting, sequencing logic, and VRM optimization

  • Rack-level management infrastructure design experience, including CDU layout, telemetry aggregation, and rack controller implementation

  • Proven track record of building infrastructure for at-scale deployment, such as automated diagnostics, health monitoring, and fleet orchestration frameworks

  • Understanding of thermal design principles such as airflow, heatsink selection, and liquid cooling systems

  • A systems-level perspective with the ability to design scalable, maintainable, and high-performance platforms

  • Excellent communication skills and experience collaborating with hardware, firmware, validation, and mechanical engineering teams

Benefits

  • Medical, dental, and vision packages with generous premium coverage

    • $500 per month credit for waiving medical benefits

  • Housing subsidy of $2k per month for those living within walking distance of the office

  • Relocation support for those moving to San Jose (Santana Row)

  • Various wellness benefits covering fitness, mental health, and more

  • Daily lunch + dinner in our office

How we’re different

Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.

We are a fully in-person team in San Jose (Santana Row), and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.

Top Skills

Cxl
Ethernet
Infiniband
Ipmi
Nvlink
Pcie
Pcie Gen5
Pcie Gen6
Redfish
Ucie
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Cupertino, CA
53 Employees
Year Founded: 2022

What We Do

By burning the transformer architecture into our chips, we’re creating the world’s most powerful servers for transformer inference.

Similar Jobs

ServiceNow Logo ServiceNow

Architect

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara, CA, USA
28000 Employees
163K-285K Annually

ServiceNow Logo ServiceNow

Architect

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Santa Clara, CA, USA
28000 Employees
162K-250K Annually

Cloudflare Logo Cloudflare

Solutions Architect

Cloud • Information Technology • Security • Software • Cybersecurity
Hybrid
2 Locations
4400 Employees
221K-300K Annually

Apex Fintech Solutions Logo Apex Fintech Solutions

Architect

Fintech • Software • Financial Services
Remote or Hybrid
USA
1000 Employees
166K-208K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account