Speech Software Engineer

Reposted 4 Days Ago
2 Locations
Hybrid
215K-235K Annually
Senior level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software
The Role
Lead the architectural evolution of ASAPP's voice infrastructure, optimizing performance for real-time audio processing and collaborating with Speech Scientists and Research Engineers.
Summary Generated by Built In
At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. We are guided by principles that shape how we think, build, and execute, including deep customer obsession, purposeful speed, ownership, and a relentless focus on outcomes. We work in small, highly skilled teams, prioritize clarity over complexity, and continuously evolve through curiosity, data, and craftsmanship.

We’re building a globally diverse team of technologists and problem solvers who thrive in fast-paced environments, value collaboration, and approach every challenge with a Day 1 mindset. With hubs in New York City, Mountain View, Latin America, and India. If you’re driven by continuous learning, rapid iteration, and the challenge of building in a high-growth startup, this is more than a role—it’s a journey.

We are seeking a Senior Speech Software Engineer to drive both the infrastructure and applied speech intelligence behind our real-time voice AI platform. This is not just a systems role — you will operate at the intersection of speech research, model optimization, and production engineering, ensuring our ASR and TTS systems meet the demanding quality, latency, and reliability requirements of enterprise call centers.

You will help evolve our speech stack to deliver human-like, low-latency voice interactions at massive scale, tuning and adapting modern speech models to perform in noisy, real-world customer environments. You will work closely with Speech Scientists, ML Researchers, and Infrastructure Engineers to bridge cutting-edge speech technology with hardened production systems.

What you'll do

  • Speech Model Optimization & Applied Research:
  • Tune and optimize ASR and TTS models for real-world call center environments, improving transcription accuracy, noise robustness, and speaker variability
    Improve spoken output naturalness by refining prosody, pacing, number and spelling pronunciation, and conversational flow
    Balance latency vs. quality tradeoffs in streaming speech pipelines to maintain real-time responsiveness
    Evaluate and integrate emerging speech technologies (e.g., noise suppression, voice activity detection, diarization) to measurably improve performance
  • Voice Infrastructure & Systems Engineering
  • Architect and modernize a scalable, high-availability voice infrastructure that replaces legacy systems
    Build multi-threaded, low-latency server frameworks capable of handling thousands of concurrent real-time audio streams
    Design and operate streaming ASR → LLM → TTS pipelines that power live AI-driven customer conversations
    Develop robust media stream handling to ensure reliable audio flow between telephony providers, clients, and ML services
  • Evaluation, Observability & Quality
  • Define and implement speech quality evaluation frameworks, including WER/CER analysis, latency tracking, and perceptual TTS metrics
    Build tooling and dashboards to monitor production performance and detect regressions in accuracy, latency, or naturalness
    Create load-testing and simulation tools to model high-concurrency, real-world voice traffic
  • Cross-Functional Collaboration
  • Partner with Speech Scientists and ML Researchers to productionize new ASR and TTS models
    Work with Security and Compliance teams to ensure voice data handling meets enterprise and regulatory standards
    Collaborate with Product teams to translate conversational quality requirements into measurable system improvements

What you'll need

  • Core Engineering Background
  • 5+ years of software engineering experience building and operating production-grade distributed systems
    Strong proficiency in Golang or Python (or willingness to become an expert quickly)
    Experience designing low-latency, high-concurrency systems, ideally involving real-time media or streaming data
  • Speech & Audio Expertise
  • Practical experience working with ASR and/or TTS systems in applied or production environments
    Understanding of how to adapt and tune speech models for domain-specific use cases
    Familiarity with speech quality metrics such as WER, CER, MOS, latency, and streaming stability
    Strong grasp of audio fundamentals, including sample rates, codecs (Opus, G.711), buffering, packet loss, and jitter
  • Applied ML for Speech
  • Experience evaluating model performance and running structured experiments to improve transcription accuracy and speech naturalness
    Comfort working with modern ML tooling and model APIs to fine-tune, adapt, or post-process speech model outputs
    Ability to make pragmatic tradeoffs between model quality, compute cost, and real-time constraints

What we'd like to see

  • Experience with noise reduction, echo cancellation, VAD, diarization, or other speech enhancement technologies
  • Familiarity with forced alignment techniques or phoneme/word-level timing models
  • Hands-on experience deploying ML services with Kubernetes, Docker, and cloud platforms (AWS/GCP/Azure)
  • Knowledge of event-driven and asynchronous systems (e.g., async I/O, event loops, streaming frameworks)
  • Experience analyzing large-scale speech or conversation datasets to drive model or system improvements

ASAPP is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, disability, age, or veteran status. If you have a disability and need assistance with our employment application process, please email us at [email protected] to obtain assistance. #LI-AG1 #LI-Hybrid

Top Skills

AWS
Azure
Docker
GCP
Go
Hadoop
Hive
Kubernetes
Python
Spark
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Bangalore, Karnataka
389 Employees
Year Founded: 2014

What We Do

Our artificial intelligence and machine learning products deliver automation and human augmentation, allowing individuals and organizations to realize their full potential. Today, the world's largest organizations rely on ASAPP to provide amazingly efficient and effective customer experiences. Our Research & Development team is unparalleled, driving the advancement of AI, machine learning, speech recognition, robotic process automation, natural language processing and more.

Similar Jobs

Pluralsight Logo Pluralsight

Senior Strategy Analyst

Edtech • Information Technology • Software
Remote or Hybrid
USA
1300 Employees
86K-108K Annually

Optimum Logo Optimum

Lead Event Representative

AdTech • Digital Media • Internet of Things • Marketing Tech • Mobile • Retail • Software
Hybrid
Bethpage, NY, USA
9000 Employees
59K-96K Annually

Snap Inc. Logo Snap Inc.

Lead, Proactive Trust & Safety Operations

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
3 Locations
5000 Employees
111K-196K Annually

Snap Inc. Logo Snap Inc.

Client Partner, SMC

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
4 Locations
5000 Employees
91K-161K Annually

Similar Companies Hiring

Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Idler Thumbnail
Artificial Intelligence
San Francisco, California
6 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account