Responsibilities:
- Design, implement, and maintain CloudWalk’s distributed LLM training pipeline.
- Orchestrate multi-node, multi-GPU runs across Kubernetes and internal clusters.
- Optimize performance, memory, and cost across large training workloads.
- Integrate cutting-edge frameworks (Unsloth, TorchTitan, Axolotl) into production workflows.
- Build internal tools and templates that accelerate research-to-production transitions.
- Collaborate with infra, research, and MLOps teams to ensure reliability and reproducibility.
Requirements:
- Strong background in PyTorch and distributed training (DeepSpeed, FSDP, Accelerate).
- Hands-on experience with large-scale multi-GPU or multi-node training.
- Familiarity with Transformers, Datasets, and mixed-precision techniques.
- Understanding of GPUs, containers, and schedulers (Kubernetes, Slurm).
- Mindset for reliability, performance, and clean engineering.
Bonus:
- Experience with Ray, MLflow, or W&B.
- Knowledge of ZeRO, model parallelism, or pipeline parallelism.
- Curiosity for emerging open-source stacks like Unsloth, TorchTitan, and Axolotl.
Top Skills
What We Do
We are democratizing the payments industry in Brazil, by empowering entrepreneurs through technological, inclusive, and life-changing solutions. Based in Brazil, CloudWalk is a high-end global payment network built on modern technology and proprietary blockchain, focused in bringing a revolution to the payment ecosystem for small and medium-sized businesses. As a unicorn, the company has provided its customers with more than R$ 1 billion in savings by charging fair fees on its transactions and is now present in more than 300.000 businesses across 5.000 brazilian cities. With investors such as the Valor Capital Group, HIVE Ventures and Coatue, the company has already raised US$ 365.5 million in investments and R$3.4 billion in FDICs for anticipation of receivables in its network of financial solutions. In 2022, it was the only brazilian fintech to be featured in the "The Retail Tech 100" ranking by CB Insights, on the "Protection Solutions for Payments and Frauds".









