Scale-out Engineer

Posted 8 Hours Ago
Be an Early Applicant
Santa Clara, CA
3-5 Years Experience
Hardware • Manufacturing
The Role
The Scale-out Engineer will design and maintain the TT-fabric networking library for AI processors, develop distributed training systems for deep learning, optimize communication for AI clusters, and collaborate with AI model builders and researchers to enhance scale-out infrastructure.
Summary Generated by Built In

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.

We're seeking a skilled AI Scale-Out Software Engineer to build and optimize our Tenstorrent scale-out fabric (TT-fabric) for distributed inference and training infrastructure. The ideal candidate will have expertise in deep learning, distributed systems, and low-level networking.

This role is hybrid, based out of Santa Clara, CA; Austin, TX; or Toronto, ON.


Responsibilities:

  • Design, develop, and maintain TT-fabric, a low-level networking library for Tenstorrent AI processors built on top of Ethernet protocol
  • Design and implement efficient distributed training systems for large-scale deep learning models
  • Optimize network communication for multi-node AI processor clusters
  • Tune system performance for inference and training of key AI models
  • Work in the TT-Metalium team and integrate scale-out APIs into the Programming Model
  • Work with AI model builder and researchers to improve both the scale out infrastructure and as well as model design


Experience & Qualifications:

    • Bachelor's or Master’s degree in Computer Science, Electrical Engineering, or a related field.
    • Proven experience in low-level software development.
    • Strong proficiency in programming languages such as C / C++.
    • Experience with MPI or similar distributed computing frameworks
    • Experience with low-level networking libraries (e.g., libfabric, libibverbs)
    • Knowledge of networking protocols, especially Ethernet and InfiniBand
    • Knowledge of high-performance interconnects
    • Familiarity with RDMA programming
    • Familiarity with large-scale deep learning frameworks (e.g., PyTorch, TensorFlow)
    • Familiarity with network offload engines and SmartNICs
    • Strong communication skills and the ability to work effectively with cross-functional teams.
    • Passion for technology and a commitment to pushing the boundaries of what is possible in AI.


Compensation for all engineers at Tenstorrent ranges from $100k - $500k including base and variable compensation targets. Experience, skills, education, background and location all impact the actual offer made.

Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.

Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been licensing conditions set  by the U.S. government.

Our engineering positions and certain engineering support positions require access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency, asylee and refugee information and/or documentation will be required and considered as Tenstorrent moves through the employment process.

If a U.S. export license is required, employment will not begin until a license with acceptable conditions is granted by the U.S. government.  If a U.S. export license with acceptable conditions is not granted by the U.S. government, then the offer of employment will be rescinded.

Top Skills

C
C++
The Company
HQ: Toronto, ON
389 Employees
On-site Workplace
Year Founded: 2016

What We Do

Tenstorrent is a next-generation computing company that builds computers for AI.

Headquartered in Toronto, Canada, with U.S. offices in Austin, Texas, and Silicon Valley, and global offices in Belgrade and Bangalore, Tenstorrent brings together experts in the field of computer architecture, ASIC design, advanced systems, and neural network compilers.

Join us: www.tenstorrent.com/careers

Jobs at Similar Companies

Accuris Logo Accuris

Senior Marketing Operations Manager (Remote)

Information Technology • Machine Learning • Software • Conversational AI • Generative AI • Manufacturing
Remote
Colorado, USA
1200 Employees
110K-125K Annually

Voltage Park Logo Voltage Park

Technical Program Manager

Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
Remote
USA
45 Employees
150K-195K Annually

Halter Logo Halter

Senior Frontend Engineer (Pasture Team)

Greentech • Hardware • Internet of Things • Machine Learning • Software • Business Intelligence • Agriculture
Easy Apply
Hybrid
Auckland, NZL
150 Employees

Similar Companies Hiring

Voltage Park Thumbnail
Software • Other • Machine Learning • Infrastructure as a Service (IaaS) • Hardware • Cloud • Artificial Intelligence
Berkeley, CA
45 Employees
Accuris Thumbnail
Software • Manufacturing • Machine Learning • Information Technology • Generative AI • Conversational AI
Denver, CO
1200 Employees
Halter Thumbnail
Software • Machine Learning • Internet of Things • Hardware • Greentech • Business Intelligence • Agriculture
Auckland City, NZ
150 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account