Maximum of 25 job preferences reached.
Top Engineering Jobs
Cloud • Information Technology • Machine Learning
Lead end-to-end technical delivery of large-scale bare-metal GPU clusters for strategic customers: facility/rack design, GPU cluster bring-up, InfiniBand/RoCE fabric validation, HPC benchmarking and remediation, operational models for BMaaS, and cross-team product feedback. Act as primary technical customer contact, run proofs-of-concept, collaborate with engineering teams, and support security-sensitive, production-ready supercomputers.
Top Skills:
AnsibleBare Metal As A Service (Bmaas)BashBiosBmcFirmwareGb200Gpu ClustersHigh-Speed FabricHpcIb_Write_BwInfinibandKubernetesLinuxNcclNvidia HgxNvlinkPxe BootPythonRoceSlurmTcp/Ip
Cloud • Information Technology • Machine Learning
Lead customer-facing technical engagements focused on networking for HPC/cloud environments. Design, prototype, and deploy Kubernetes-based solutions, optimize customer workloads, contribute product feedback, run proofs-of-concept, and represent CoreWeave at events. Collaborate with engineering teams on product improvements and R&D for emerging solutions.
Top Skills:
Cloud ComputingHpcInfinibandKubernetesKubernetes CsiNcclNvidia Gpus
Cloud • Information Technology • Machine Learning
Design, build, and operate a scalable multi-tenant control plane for high-performance AI storage. Optimize exabyte-scale S3-compatible object storage and distributed filesystems using RDMA, SPDK, GPU Direct Storage, and cloud-native tooling. Improve reliability, observability, and performance across storage stacks, collaborate cross-functionally, and mentor engineers.
Top Skills:
CCephClickhouseDaosDashboardsDistributed FilesystemGoGpu Direct StorageGrafanaInfinibandKubernetesMetricsNfsObject StoragePrometheusRdmaRoceRustS3SpdkTelemetry
Cloud • Information Technology • Machine Learning
Design and implement distributed storage solutions for AI workloads, optimize performance, and collaborate with teams to enhance storage capabilities.
Top Skills:
CClickhouseFuseGoGpu Direct StorageGrafanaKubernetesNfsPrometheusRdmaRust
Cloud • Information Technology • Machine Learning
As a Principal Engineer at CoreWeave, you will design cluster orchestration systems, lead technical direction, ensure system reliability, and mentor engineers while focusing on large-scale GPU cluster performance for AI workloads.
Top Skills:
Cloud-Native SystemsGoKubernetesSlurmSunk
New
Track Smarter, Apply Better.
Ditch the spreadsheets. Organize your job search with our freeApplication Tracker.
Use For Free
Cloud • Information Technology • Machine Learning
As a Senior Software Engineer in Cluster Orchestration, you'll lead multiple services, drive improvements in performance and reliability, and mentor junior engineers.
Top Skills:
C++GoGrafanaKubernetesOpentelemetryPrometheusPython
Cloud • Information Technology • Machine Learning
As a Staff Engineer, you'll lead the orchestration platform strategy, mentor engineers, and ensure reliable workloads across GPU clusters, directly impacting AI innovation.
Top Skills:
Argo WorkflowsCloud-Native TechnologiesGoIstioKnativeKubeflowKubernetesKueueRaySlurm
Cloud • Information Technology • Machine Learning
Lead operational readiness reviews for large-scale data center projects, identify design/construction/commissioning gaps, drive corrective actions, validate turnover readiness, coordinate across design, construction, commissioning, and operations to ensure mission-critical infrastructure meets standards and is handed off reliably.
Top Skills:
Microsoft ProjectPrimavera P6Smartsheet
Cloud • Information Technology • Machine Learning
Build, maintain, and optimize scalable front-end systems and web experiences. Translate Figma designs into reusable components, support large-scale site migrations, improve performance, accessibility, and SEO, and work with cross-functional teams using modern workflows and AI-assisted tools.
Top Skills:
Ai-Powered ToolsCi/CdContentfulCSSFigmaGitHeadless CmsHTMLHubspotJavaScriptNext.JsReactScssVueWebflowWixWordpress
Cloud • Information Technology • Machine Learning
Lead the reliability engineering and production operations for CoreWeave's Common Services, enhancing reliability and operational excellence across teams.
Top Skills:
AnsibleCi/CdHelmKubernetesLinuxTerraform
Let Your Resume Do The Work
Upload your resume to be matched with jobs you're a great fit for.
Success! We'll use this to further personalize your experience.
All Filters
Total selected ()
No Results
No Results


