As a Senior Software Engineer - Multimodal AI Systems, you will lead the integration, evaluation, and testing of advanced Vision Foundation Models (VFMs) and Vision-Language Models (VLMs). You will play a key role in building scalable systems and evaluation frameworks for multimodal AI applications involving image, video, and semantic understanding.
- Integrate Vision Foundation Models (VFMs) and Vision-Language Models (VLMs) into scalable production systems, developing APIs, inference pipelines, and backend services for multimodal applications
- Collaborate with data scientists and ML engineers to deploy, optimize, and continuously improve AI workflows
- Design and implement automated evaluation frameworks, benchmarking pipelines, and testing strategies to assess model accuracy, robustness, latency, and overall performance across image, video, and multimodal tasks
- Build scalable, reliable, and maintainable infrastructure, including data pipelines for large-scale image, video, and multimodal datasets, while optimizing system performance and throughput
- Analyze model performance, identify failure cases, and contribute to continuous improvement initiatives for AI systems
- Support integration of retrieval-augmented (RAG) systems, working with embeddings, vector databases, and multimodal retrieval to enable semantic search and contextual AI workflows
Who are you?
You are an experienced software engineer with a strong foundation in AI/ML systems and a passion for building scalable, real-world applications. You bring a balance of system design expertise, hands-on coding, and a collaborative mindset.
- Bachelor's or Master's degree in Computer Science, Software Engineering, AI, or a related field
- Proven experience in software engineering, preferably in AI/ML systems
- 4+ year of strong proficiency in Java or Scala , Python
- Experience with backend architecture, REST APIs, and microservices
- Hands-on experience with PyTorch and GPU-based inference systems
- Familiarity with distributed systems and scalable data pipelines
- Understanding of computer vision, Vision Foundation Models (VFMs), and Vision-Language Models (VLMs)
- Experience working with multimodal AI systems and models such as CLIP, BLIP, or similar
- Experience in building evaluation frameworks, benchmarking pipelines, and performance testing
- Familiarity with large-scale image/video datasets and data processing techniques
- Exposure to embeddings, vector databases, or retrieval-based systems
- Collaborative, solution-oriented mindset with strong problem-solving skills
What we offer
HERE offers an opportunity to work in a cutting-edge technology environment with challenging problems to solve! You can make a direct impact on delivery of company's strategic goals and the freedom to decide how to perform your work. We will support you in delivering your day-to-day tasks and achieving your personal goals and developing your skills. Personal development is highly encouraged at HERE. You can take different courses and training at our online Learning Campus and join cross-functional team projects within our Talent Platform.
HERE is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, age, gender identity, sexual orientation, marital status, parental status, religion, sex, national origin, disability, veteran status, and other legally protected characteristics.
Who are we?
HERE Technologies is a location data and technology platform company. We empower our customers to achieve better outcomes - from helping a city manage its infrastructure or a business optimize its assets to guiding drivers to their destination safely.
At HERE we take it upon ourselves to be the change we wish to see. We create solutions that fuel innovation, provide opportunity and foster inclusion to improve people's lives. If you are inspired by an open world and driven to create positive change, join us. Learn more about us on our YouTube Channel.
You will join a team focused on advancing multimodal AI capabilities, working at the intersection of computer vision, large-scale AI systems, and software engineering. The team collaborates closely with data scientists and ML engineers to build scalable, production-grade vision and vision-language solutions that power next-generation products.
Skills Required
- Bachelor's or Master's degree in Computer Science, Software Engineering, AI, or related field
- Proven experience in software engineering
- Experience in AI/ML systems (preferred)
- 4+ years of strong proficiency in Java or Scala and Python
- Experience with backend architecture, REST APIs, and microservices
- Hands-on experience with PyTorch and GPU-based inference systems
- Familiarity with distributed systems and scalable data pipelines
- Understanding of computer vision, Vision Foundation Models (VFMs), and Vision-Language Models (VLMs)
- Experience with multimodal models such as CLIP, BLIP, or similar
- Experience building evaluation frameworks, benchmarking pipelines, and performance testing
- Familiarity with large-scale image/video datasets and data processing techniques
- Exposure to embeddings, vector databases, or retrieval-based systems
- Collaborative, solution-oriented mindset with strong problem-solving skills
What We Do
HERE Technologies is a location data and technology company that created the first digital map over 35 years ago. Today we are the world's leading location platform company with a global footprint across 52 countries. Although our strongest presence is in the automotive industry, we also work with leading companies across a wide range of industries, including transport and logistics, mobility, manufacturing and retail and the public sector.
Why Work With Us
At HERE, we're always excited about discovering people who share our passion for building innovative solutions that make the world easier to navigate. We believe our success is powered by our team's diversity, creativity and collaboration and we're always looking for opportunities to grow it further.
Gallery
HERE Technologies Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.

