Ironsite

Senior Applied ML Researcher

Reposted 23 Days Ago

Be an Early Applicant

San Francisco, CA, USA

In-Office

180K-350K Annually

Senior level

Artificial Intelligence • Computer Vision • Hardware • Software

The Role

Lead design, training, and deployment of large-scale language and multimodal models for construction; build training infrastructure, data pipelines, evaluation and alignment (including RLHF), and production ML systems integrating human labeling and vision-language models.

Summary Generated by Built In

About Ironsite

Construction is one of the most complex and labor-intensive industries, spending $7 trillion annually on labor, but productivity losses cost $1.6 trillion per year due to outdated management tools.

Ironsite leverages wearable cameras combined with human labeling and AI vision language models to drive on-site productivity, safety & training for crafts workers. We put cameras on construction workers' hard hats and vests, then use advanced computer vision to analyze what's actually happening on job sites.

We help teams reduce labor costs, improve safety, and deliver projects faster. To date, we’ve captured 50,000+ hours of construction footage across 7 states and have recently partnered with the nation’s #2 hard-hat manufacturer, Studson, to develop custom hardware purpose-built for the field.

We’ve raised over $13M from 8VC, South Park Commons, and 30+ leading operators and technologists, including Eric Glyman (Ramp), Jeff Dean (Google), and Scott Wu (Cognition). Now, we’re building the team to scale nationwide.

The Role

We're seeking an exceptional Staff / Principal Research Scientist to build our foundation model capabilities from the ground up. You'll work directly with our team to design, train, and deploy large-scale language and multimodal models tailored to construction domain expertise. This role offers the rare opportunity to train production LLMs from scratch using our unique, continuously-growing dataset of construction footage and operational data.

What You'll Build

Foundation Models from Scratch: Design model architectures, training objectives, and optimization strategies to train large language models and multimodal models (combining vision + language) specifically for construction use cases
Domain-Specific Pre-training: Develop pre-training strategies using construction documentation, safety protocols, building codes, and industry knowledge to create models that understand construction contexts deeply
Multimodal Model Training: Build and train vision-language models that can reason about construction footage, technical drawings, and textual data simultaneously
Training Infrastructure: Design and implement distributed training systems capable of handling billion+ parameter models across GPU clusters
Model Evaluation & Alignment: Create comprehensive evaluation frameworks and fine-tuning pipelines (supervised fine-tuning, RLHF) to align models with construction domain requirements
Data Pipeline Architecture: Build scalable data ingestion, cleaning, and preprocessing systems for training on diverse construction data sources
Production ML Systems: Deploy and monitor computer vision models in challenging real-world environments with strict latency and reliability requirements
Human-AI Collaboration Platform: Develop systems that seamlessly integrate human labeling with AI predictions for continuous model improvement
Cross-modal Understanding: Create models that combine visual data with contextual information for deeper construction insights

Technical Challenges You'll Solve

Training large-scale models efficiently with limited compute budgets while maximizing performance
Developing novel pre-training objectives that capture construction-specific knowledge and temporal reasoning
Implementing efficient attention mechanisms and architectural innovations for long-context understanding of construction projects
Designing evaluation metrics that measure real-world construction task performance beyond standard benchmarks
Balancing model capability with deployment constraints for edge and mobile applications

What We're Looking For

Technical Excellence

[REQUIRED] 5+ years of experience in deep learning research with focus on large-scale model training
[REQUIRED] Demonstrated experience training models at scale (100M+ parameters) from scratch
[REQUIRED] Deep understanding of transformer architectures, attention mechanisms, and modern training techniques (mixed precision, distributed training, gradient accumulation)
Production experience with ML systems at scale (PyTorch, distributed training, model serving)
Experience with language model pre-training, including tokenization strategies, training objectives (CLM, MLM), and scaling laws
Understanding of fine-tuning techniques including instruction tuning, RLHF, and preference optimization (DPO, PPO)
Knowledge of efficient training techniques: LoRA, QLoRA, flash attention, gradient checkpointing
Experience with model evaluation, benchmarking, and safety considerations

Systems Experience

Background in vision-language models or multimodal architectures strongly preferred
Experience building and optimizing data pipelines for large-scale training
Systems-level thinking about training efficiency, hardware utilization, and cost optimization
Familiarity with MLOps for model versioning, experiment tracking, and reproducibility

Preferred Qualifications

Advanced degree (M.S./Ph.D.) in Computer Science, Electrical Engineering, or related field with focus on deep learning
First-author publications at top ML venues (NeurIPS, ICML, ICLR, ACL, CVPR) on language models, multimodal learning, or efficient training
Experience training or contributing to open-source foundation models
Background in domain-specific model development (code, science, medical, etc.)
Experience with video understanding models or temporal reasoning in transformers
Contributions to major ML frameworks or training libraries
Track record of transitioning research prototypes to production systems

Location & Compensation

San Francisco Bay Area (on-site)
Competitive salary and significant equity package
Full benefits including health, dental, vision, and 401k +6% match
Access to dedicated GPU compute resources for research and experimentation

Compensation

The base pay range for this role is $180,000 – $350,000 per year.

Skills Required

5+ years of experience in deep learning research with focus on large-scale model training
Demonstrated experience training models at scale (100M+ parameters) from scratch
Deep understanding of transformer architectures, attention mechanisms, mixed precision, distributed training, and gradient accumulation
Production experience with ML systems at scale (PyTorch, distributed training, model serving)
Experience with language model pre-training, tokenization strategies, CLM/MLM, and scaling laws
Knowledge of fine-tuning techniques including instruction tuning, RLHF, and preference optimization (DPO, PPO)
Knowledge of efficient training techniques: LoRA, QLoRA, flash attention, gradient checkpointing
Experience with model evaluation, benchmarking, and safety considerations
Experience building and optimizing data pipelines for large-scale training
Systems-level thinking about training efficiency, hardware utilization, and cost optimization
Familiarity with MLOps for model versioning, experiment tracking, and reproducibility
Background in vision-language models or multimodal architectures
Advanced degree (M.S./Ph.D.) in Computer Science, Electrical Engineering, or related field
First-author publications at top ML venues (NeurIPS, ICML, ICLR, ACL, CVPR)
Experience training or contributing to open-source foundation models
Experience with video understanding models or temporal reasoning in transformers
Contributions to major ML frameworks or training libraries
Track record of transitioning research prototypes to production systems

View all jobs at Ironsite

View Ironsite Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

Year Founded: 2024

What We Do

Ironsite AI is a construction technology company that leverages wearable cameras and AI vision models to drive on-site productivity, safety, and training. By equipping workers with smart hard hats, the platform captures real-time data to analyze field activities, optimize labor allocation, and identify safety risks. Their mission is to modernize construction management by providing data-driven insights that help contractors reduce labor costs and deliver projects more efficiently.