AI/ML Systems Engineer

Posted One Month Ago
New York, NY, USA
Hybrid
Mid level
Greentech • Professional Services • Consulting • Energy • Generative AI • Renewable Energy
The Role
The job involves designing a Python-based testing harness for data collection in an emissions measurement study, capturing runtime and energy metrics across AI workloads.
Summary Generated by Built In

AI/ML Systems Engineer — Hardware Power Instrumentation (Contract, 3–4 weeks)

OPF is hiring a contract engineer to run the data collection phase of an emissions measurement study comparing edge versus cloud LLM inference across multi-modal AI workloads. The methodology is being designed by external advisors with prior published work on AI carbon measurement; the engineer's role is to implement that methodology rigorously, capture clean instrumented data, and hand it off cleanly to OPF's internal team for analysis and modeling.

Findings will appear in a co-branded industry publication backed by a defensible methodology — meaning the public artifact is industry-format, but the underlying data has to withstand potential scrutiny.

Engagement scope

You will design, build, and run a Python-based testing harness that captures inference runtime, energy consumption, and component-level power across three edge devices and one cloud GPU instance, then deliver clean, documented outputs to the OPF analytical team for downstream modeling.

The work spans Windows and Linux, three distinct hardware platforms (Intel Core Ultra with NPU, AMD Ryzen AI with XDNA NPU, NVIDIA L4 cloud GPU), multiple deployments (OpenVINO and Lemonade Server for local; vLLM, Triton, and/or FastAPI for cloud as appropriate per task); and two layers of measurement (software-based estimation and on-die telemetry).

The engagement begins with a proof-of-concept phase on the AMD Ryzen 7 Pro laptop, focused on a single task (Basic Queries), to build the harness, telemetry, and methodology end-to-end before scaling to the rest of the platforms and tasks. Following this, a cloud GPU instance must be provisioned for cloud data collection and harness validation. Once the harness has been validated on one laptop and in the cloud environment, the approach will be scaled to the remaining laptops and tasks. Proof-of-concept outcome serves as an interim milestone before proceeding with the remainder of the engagement.

Inputs you will receive on Day 1

  • Three pre-imaged production laptops with two chip platforms, with the AMD Ryzen 7 Pro laptop prioritized for proof-of-concept
  • High-level methodology from OPF & external advisors
  • Defined task list (six multi-modal AI tasks) with pre-selected open-source models and Hugging Face datasets
  • Access to advisors at scheduled checkpoints

Outputs you will be responsible for

  • Reproducible Python test harness with pinned dependencies, configurations, and seeds
  • Cloud GPU instance (NVIDIA L4) provisioned with appropriate deployment per task (vLLM for LLM workloads; Triton or FastAPI for vision, audio, and diffusion workloads)
  • Edge measurement dataset: six tasks × three laptop platforms × ~100-1000 prompts per task (single measurement per prompt), with batch size locked at 1. This requires synchronized telemetry from both measurement layers (software estimate via CodeCarbon/equivalent, and on-die telemetry via RAPL for Intel and HWiNFO64 or µProf for AMD, and vendor NPU/iGPU counters). Results should be delivered in an analysis-ready file (CSV preferred) with units, timestamps, and run metadata
  • Cloud measurement dataset: six tasks × one cloud GPU instance (NVIDIA L4) × task-specific batch size conditions × ~100-1000 prompts per task (single measurement per prompt), with synchronized GPU telemetry (NVML, DCGM, nvidia-smi), CodeCarbon estimates, and per-run latency and throughput metrics, delivered in the same analysis-ready file as edge measurements
  • If methodology requires pinning to GPU/CPU/NPU, then outputs also include capturing offload fractions per run directly from inference runtime logs, with technical context on offload patterns and unsupported-op fallbacks documented in the companion writeup; methodology decisions (inclusion/exclusion/caveats) remain with OPF
  • Data quality assurance: validation checks for missing trials, telemetry dropouts, thermal artifacts, and outlier flagging, with anomalies surfaced and interpreted (OPF to address)
  • Reproducibility package: harness code, environment specification, run logs, configuration files, and a README enabling reviewers of the industry publication to re-run, slice, or extend the dataset
  • Handoff session (60–90 minute walkthrough) and short technical companion document covering dataset structure, operator offload patterns and fallback notes, known caveats, and operational notes

Out of scope

Statistical modeling, headline framings, chart production for the publication, claims substantiation memos, and the public-facing writeup all sit with the OPF team and external advisors. The engineer produces clean, parsed, and organized instrumented data and the documentation needed to use it; OPF will use this as an input into the publication.

Required experience

  • Strong Python proficiency, experimental discipline, and reproducibility practices
  • Hands-on hardware telemetry — Intel RAPL, NVIDIA NVML, plus at least one of Intel SoC Watch / VTune, AMD μProf, or HWiNFO64
  • Working knowledge of at least one in-scope deployment (OpenVINO, Lemonade Server, vLLM, or Triton)
  • Linux and Windows systems experience, including hardware performance counter telemetry
  • Demonstrated reproducibility discipline: pinned environments, version-controlled configs, documented assumptions

Strongly preferred

  • Prior MLPerf Power or MLCommons Power working group submissions
  • Published or contributed to AI energy / carbon measurement research or related sustainability topics
  • Experience with NPU-class accelerators (Intel AI Boost, AMD XDNA, or comparable)
  • Experience with GPU benchmarking methodology — DCGM telemetry, clock-pinning for reproducibility, or comparable practices
  • Comfort working alongside external advisors and delivering well-documented handoffs to a downstream analytical team

Engagement structure

  • 3–4 week intensive engagement. If dependencies get delayed, please budget 5–6 weeks.
  • Compensation is based on the following milestones: working harness and proof-of-concept on one platform, full pilot run on one task across all platforms, complete data collection, and final handoff. Cloud compute costs are reimbursed.
  • Screening is a paid <1 day technical trial task before any commitment (e.g., set up RAPL + CodeCarbon + nvidia-smi on your own machine, run a small benchmark, deliver a short writeup with results).

Logistics

  • Strong preference for NYC metro area; remote candidates considered with a clear plan for hardware access (e.g., one laptop ships, two stay with the OPF team). Remote candidates must have at least 4 hours of working-day overlap with US Eastern time.
  • Contributor may be acknowledged in the resulting publication. Code may be open-sourced, subject to client review.

To apply

Submit application via this link (https://c4tnu.share.hsforms.com/2srxbvmjzTh6TdgSOJxs7KA). Be ready to provide a portfolio or GitHub link, one example of prior hardware instrumentation work, an estimate of cost for this project, and a short note (≤300 words) on how you'd approach the paid technical task. If selected for the paid screening trial, OPF will request 2 professional references.

Skills Required

  • Strong Python proficiency, experimental discipline, and reproducibility practices
  • Hands-on hardware telemetry experience
  • Working knowledge of at least one in-scope deployment (OpenVINO, Lemonade Server, vLLM, or Triton)
  • Linux and Windows systems experience, including hardware performance counter telemetry
  • Demonstrated reproducibility discipline: pinned environments, version-controlled configs, documented assumptions
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
15 Employees

Similar Jobs

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Sr. Analyst, Global Brand Analytics

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
New York, NY, USA
16000 Employees
90K-100K Annually

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Stockroom Associate II

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
New York, NY, USA
16000 Employees
15-22 Hourly

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

Sales Associate III

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
New York, NY, USA
16000 Employees
15-24 Hourly

New York Life Insurance Company Logo New York Life Insurance Company

Senior Associate, Business Platform Enablement & Delivery (CMS)

Artificial Intelligence • Cloud • Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
Hybrid
New York, NY, USA
12000 Employees
100K-143K Annually

Similar Companies Hiring

ClickMint Thumbnail
AdTech • eCommerce • Marketing Tech • Generative AI
Malibu, CA
9 Employees
Amalgamated Sugar Thumbnail
Food • Greentech • Agriculture • Industrial • Manufacturing
Boise, Idaho
768 Employees
LTX Thumbnail
Conversational AI • Generative AI
Jerusalem, Israel
360 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account