NextSilicon is reimagining high-performance computing (HPC & AI). Our accelerated compute solutions leverage intelligent adaptive algorithms to vastly accelerate supercomputers, driving them forward into a new generation. We have developed a novel software-defined hardware architecture that is achieving significant advancements in both the HPC and AI domains.
At NextSilicon, everything we do is guided by three core values:
- Professionalism: We strive for exceptional results through professionalism and unwavering dedication to quality and performance.
- Unity: Collaboration is key to success. That's why we foster a work environment where every employee can feel valued and heard.
- Impact: We're passionate about developing technologies that make a meaningful impact on industries, communities, and individuals worldwide.
The AI Workloads team is responsible for modeling and enabling end-to-end AI workflows on NextSilicon’s next-generation hardware platforms. As an AI Workloads Engineer in Belgrade, you’ll build workflow modeling infrastructure, run and adapt open-source AI systems, and use real workloads to drive performance improvements from chip design through production.
- 4+ years of experience in software engineering.
- Strong Python and PyTorch development experience.
- Solid understanding of LLMs and modern inference workflows (e.g., KV cache, paged attention, speculative/assisted decoding, batching/scheduling)
- Experience running, profiling, and instrumenting open-source AI inference systems (e.g., vLLM or similar)
- Proficiency in C++ for developing software that models or interacts with hardware execution behavior (latency, dataflow, memory access patterns).
- Experience with distributed inference and collectives (e.g., NCCL) and parallelism strategies (TP/PP/EP) is an advantage
- Experience with dynamic batching systems (e.g., vLLM, TensorRT-LLM) is an advantage
- Familiarity with MLPerf Inference benchmarks and methodology (Server/Offline, latency constraints, request arrival patterns) is an advantage
- Experience programming custom kernels (e.g., CUDA, Triton, or similar) is an advantage
- Background in performance analysis, simulation, compiler/runtime profiling, or workload modeling is an advantage
Top Skills
What We Do
We believe in a smarter future and want to create new opportunities for innovation. In order to achieve this, we’re rethinking compute architectures for the future of computer processing.







