Site Reliability Engineer

Posted 5 Days Ago
Easy Apply
Be an Early Applicant
San Francisco, CA
In-Office
Mid level
Artificial Intelligence • Cloud • Information Technology • Software • Automation
The Role
Own and operate end-to-end infrastructure for decentralized confidential ML inference: design GPU cluster management, optimize performance, build CI/CD, maintain observability, create runbooks, participate in on-call, and implement cost and security strategies.
Summary Generated by Built In

About The Role

The NEAR AI engineering team is developing decentralized and confidential machine learning infrastructure to power user owned AI. We currently focus on building infrastructure to enable private and confidential inference that works across different compute providers, as well as a blockchain-based coordination layer that incentivizes computer providers to join the decentralized inference network.

You will own various components and drive critical decisions throughout their life cycles, including architecture, implementation, and maintenance. You will collaborate with highly knowledgeable and skilled colleagues who are passionate about solving hard problems that can disrupt the industry.


What You'll Be Doing:

  • End-to-end infrastructure ownership (for handling telemetry data, for performing testing, etc)
  • Design and implementation of infrastructure components that manage clusters of GPU with special configurations
  • Performance tuning and optimizations
  • Create and maintain runbooks that support the on-call rotation
  • Participate in the on-call rotation.
  • Support code releases and delivery 
  • Plan and implement infrastructure cost and security strategies 
  • Plan and implement effective CI/CD Pipelines to facilitate development processes

What We're Looking For:

  • Agility to quickly learn new programming languages and technologies
  • Ability to write clean and efficient code
  • Ability to transform ambiguous problems into tangible solutions or prototypes
  • Linux systems proficiency
  • Experience with software concurrency or parallelism
  • Experience in building, operating, and scaling Cloud infrastructure (GCP, AWS, etc)
  • Experience with data visualization and observability tooling (Grafana, Graphite, Zabbix, etc)
  • Detail-oriented mindset with a focus on setting priorities and progressing towards objectives
  • Excellent communication and teamwork skills
  • Bachelor's Degree in Computer Science or a related field

We'd Love If You Have:

  • Experience with NEAR or other blockchain internals
  • Experience with GPUs
  • Experience with Trusted Execution Environments
  • Experience debugging and troubleshooting complex concurrent systems
  • Professional experience with Rust

Locations: onsite, San Francisco office

Top Skills

AWS
Blockchain
Ci/Cd
GCP
Gpus
Grafana
Graphite
Linux
Near
Rust
Trusted Execution Environments
Zabbix
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, California
9 Employees
Year Founded: 2017

What We Do

NEAR AI is an artificial intelligence research, engineering, and product development company committed to building an AI future owned by everyone. Founded by AI pioneer and former Google Deepmind researcher Illia Polosukhin, NEAR AI’s verifiable private inference infrastructure empowers developers and enterprises to deploy AI models with full control over their data. With hardware-backed private inference via a simple API, NEAR AI Cloud runs sensitive AI workloads securely and at scale, from privacy-critical consumer interactions to autonomous systems and critical infrastructure. NEAR AI Private Chat brings the same guarantees to users’ everyday questions and research. Serving over 100 million users across platforms such as Brave Nightly and OpenMind, NEAR AI is proven infrastructure for transforming sensitive data into safe intelligence and advancing a user-owned AI future. Learn more at https://near.ai/.

Similar Jobs

Zscaler Logo Zscaler

Site Reliability Engineer

Cloud • Information Technology • Security • Software • Cybersecurity
Easy Apply
Hybrid
San Jose, CA, USA
8697 Employees
119K-170K Annually

Sprout Social Logo Sprout Social

Site Reliability Engineer

Marketing Tech • Social Media • Software • Analytics • Business Intelligence
Easy Apply
Remote or Hybrid
US
1400 Employees
114K-173K Annually

MongoDB Logo MongoDB

Site Reliability Engineer

Big Data • Cloud • Software • Database
Easy Apply
Hybrid
8 Locations
5550 Employees
151K-297K Annually

BuildOps Logo BuildOps

Site Reliability Engineer

Cloud • Mobile • Software
Easy Apply
Hybrid
San Francisco, CA, USA
500 Employees
174K-226K Annually

Similar Companies Hiring

Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY
Bellagent Thumbnail
Artificial Intelligence • Machine Learning • Business Intelligence • Generative AI
Chicago, IL
20 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account