Infrastructure Engineer - Software Engineer – Infrastructure & Hardware Optimization - Remote
Hello,
Infrastructure Engineer - Software Engineer – Infrastructure & Hardware Optimization - Remote
We have below job opening.
If you are interested and your experience match with job description.
Please send your updated resume....Asap
Software Engineer – Infrastructure & Hardware Optimization
Location: SF, CA, Portland, OR, Dallas, TX - Remote but need to be local of respective location
Duration: 6 Months+ Contract
Job Description: We are seeking a skilled low-level systems engineer to join the team. This individual will focus on infrastructure software that detects, configures, and optimizes AI inference pipelines across heterogeneous hardware accelerators (e.g., NVIDIA / AMD GPUs, TPUs, AWS Inferentia, FPGAs). You will work on hardware abstraction layers, containerized runtime environments, benchmarking, telemetry, and driver orchestration logic for multi-cloud agentic inference deployments.
Ideal Experience:
· 4–7 years experience in systems software or infrastructure engineering, preferably with exposure to AI/ML workloads.
· Deep expertise in CUDA, NCCL, ROCm, or other accelerator programming frameworks.
· Familiarity with LLM inference runtimes (TensorRT-LLM, vLLM, ONNXRuntime).
· Experience with Kubernetes scheduling, device plugin development, and runtime patching for heterogeneous compute.
· Strong Python/C++ and Linux systems programming skills.
· Passion for building scalable, portable, and secure AI infrastructure.
Responsibilities:
· Design and implement cross-platform hardware detection systems for GPUs/TPUs/NPUs using CUDA, ROCm, and low-level runtime interfaces.
· Build and maintain plugin-based infrastructure for capability scoring, power efficiency tuning, and memory optimization.
· Develop hardware abstraction layers (HAL) and performance benchmarking tools to optimize AI agents for cloud-native inference.
· Extend container-based MLOps systems (Docker/Kubernetes) with support for hardware-specific runtime containers (e.g., TensorRT, vLLM, ROCm).
· Automate driver validation, container security hardening, and runtime health monitoring across deployments.
· Integrate telemetry systems (Prometheus, Grafana) to surface per-device inference performance metrics and health status.
· Collaborate with solutions and DevOps teams to ensure hardware-aware agent deployment across cloud providers.
Additional InformationAll your information will be kept confidential according to EEO guidelines.
Similar Jobs
What We Do
Cystems Logic is gaining a competitive edge in today's economy and its all about speed and flexibility. Companies of small to large size and complexity need to adapt swiftly to evolving global business trends and opportunities. Cystems Logic supports rapid innovation to drive your business process excellence for increased productivity and profitability. We are here to lower to cost of using costly technology at increase Return on Investment (ROI). Our Strength is our talented team with solid foundation in the ERP arena. We are emerging providers of SAP new implementation, maintenance & support for small to large size companies in specific industry domain.









