Impact You'll Make:
- Design and implement the core architecture of our distributed training and inference systems that can handle enterprise-scale data
- Craft elegant integration points between data warehouses, AI processing engines, and our proprietary Graph transformer technology.
- Build sophisticated orchestration systems that optimize computational resources while ensuring reliability and restartability.
- Develop clean APIs and abstractions that decouple system components for rapid parallel development
- Create scalable, cloud-native infrastructure that grows with our customers' needs while maintaining performance
- Collaborate directly with customers to refine and iterate on real-world deployments
What You Bring:
- Strong foundation in computer science (BS required, MS/PhD preferred) with 3-5+ years of software development experience
- Deep understanding of distributed systems design principles
- Proficiency in languages like Python, Java or C++
- Problem-solving mindset with ability to tackle novel challenges in uncharted territory
What Sets You Apart:
- Experience with cloud distributed storage, databases, and file systems (AWS, Azure)
- Track record building and scaling microservices architectures
- Knowledge of AI frameworks like PyTorch or TensorFlow, especially inference serving at scale
- Contributions to open-source projects in distributed systems or data processing
- Understanding of ML fundamentals, especially in enterprise applications.
- Experience designing systems that elegantly handle failure modes and restarts
- Hands-on experience with at least one major cloud (AWS / Azure / GCP) and Kubernetes at scale; multi-cloud exposure is a plus.
Benefits
- Stock
- Competitive Salaries
- Medical Insurance
- Dental Insurance
Top Skills
What We Do
Democratizing AI on the Modern Data Stack!
The team behind PyG (PyG.org) is working on a turn-key solution for AI over large scale data warehouses. We believe the future of ML is a seamless integration between modern cloud data warehouses and AI algorithms. Our ML infrastructure massively simplifies the training and deployment of ML models on complex data.
With over 40,000 monthly downloads and nearly 13,000 Github stars, PyG is the ultimate platform for training and development of Graph Neural Network (GNN) architectures. GNNs -- one of the hottest areas of machine learning now -- are a class of deep learning models that generalize Transformer and CNN architectures and enable us to apply the power of deep learning to complex data. GNNs are unique in a sense that they can be applied to data of different shapes and modalities.









