Join us as we build VLM Run – the enterprise infrastructure layer for visual intelligence. Our mission is to give developers a unified interface to fine-tune, specialize, and operationalize Vision-Language Models (VLMs) that turn images, PDFs, screenshots, and video into reliable, schema-true structured data for production insights and automation – built for scale, security, and SLAs.
We’re looking for exceptional ML interns (Master’s & PhD students) to help us scale the future of Visual AI. You’ll thrive here if you bring strong research and engineering skills, care about good abstractions, and are excited to ship real product.
ML/CV Development: Improve our core VLM capabilities (see Orion), including vision-language understanding, OCR + function-calling workflows, fine-tuning recipes, and robustness.
ML Infrastructure: Help optimize the VLM stack, focusing on training efficiency, evaluation pipelines, quantization/distillation, and cost-efficient serving and scaling.
High Agency: Own a scoped project end-to-end. Turn ambiguity into experiments, results, and shipped code.
Requirements
Currently pursuing a MS/PhD in CS/EE/Math or equivalent.
Strong Python skills and comfort with PyTorch.
Familiarity with Transformers/ViTs and the Hugging Face ecosystem (transformers, datasets).
Ability to read papers, reproduce results, and communicate findings clearly.
Nice to have
Experience with fine-tuning tooling (peft, trl), evaluation frameworks, or dataset curation.
Familiarity with model serving or perf work (vLLM, TensorRT/Triton, Ray, FlashAttention).
GCP/AWS, Docker, and basic MLOps experience.
Bonus: GitHub repo with 100+ stars, recent peer-reviewed paper, OSS contributions.
🗒️ Other Details
Internship Terms: Winter and Summer positions available (exact dates flexible per academic schedule).
Compensation: Paid internship, competitive with seed-stage ML startups.
Location: Santa Clara, CA – at least some in-person collaboration preferred (we’re right off 101, next to AMD’s HQ offices).
Skills Required
- Currently pursuing a MS/PhD in CS/EE/Math or equivalent.
- Strong Python skills.
- Comfort with PyTorch.
- Familiarity with Transformers/ViTs and the Hugging Face ecosystem (transformers, datasets).
- Ability to read papers, reproduce results, and communicate findings clearly.
- Experience with fine-tuning tooling (peft, trl), evaluation frameworks, or dataset curation.
- Familiarity with model serving or performance work (vLLM, TensorRT/Triton, Ray, FlashAttention).
- GCP/AWS, Docker, and basic MLOps experience.
- Public OSS contributions, a high-star GitHub repo, or recent peer-reviewed paper.
What We Do
VLM Run is an enterprise infrastructure platform for visual intelligence, providing a unified API to fine-tune, specialize, and operationalize Vision Language Models (VLMs). The company enables enterprises to seamlessly process and extract structured, schema-true JSON data from unstructured visual sources, including images, PDFs, and videos, designed for production-grade accuracy, security, and scalability.
.png)







