Sr. Director, Cloud Engineering

Posted Yesterday
Be an Early Applicant
Wilmington, MA
In-Office
Senior level
Software
The Role
Lead Cloud Engineering at TraceLink, overseeing SRE, Performance & Tools, and Release Engineering. Drive AI initiatives and operational excellence.
Summary Generated by Built In

Company overview:

TraceLink’s software solutions and Opus Platform help the pharmaceutical industry digitize their supply chain and enable greater compliance, visibility, and decision making. It reduces disruption to the supply of medicines to patients who need them, anywhere in the world.

 

Founded in 2009 with the simple mission of protecting patients, today Tracelink has 8 offices, over 800 employees and more than 1300 customers in over 60 countries around the world. Our expanding product suite continues to protect patients and now also enhances multi-enterprise collaboration through innovative new applications such as MINT.

 

Tracelink is recognized as an industry leader by Gartner and IDC, and for having a great company culture by Comparably.

TraceLink is seeking a strategic and hands-on Senior Director of Cloud Engineering to lead a multi-disciplinary organization spanning Site Reliability Engineering (SRE), Performance & Tools Engineering, and Release Engineering. This role is critical to ensuring the scalability, reliability, and operational excellence of TraceLink’s cloud-native SaaS platform, while also owning the infrastructure behind both internal and customer-facing AI capabilities.

The Director will be the single-threaded owner of our internal suite of AI-enabled tools for engineering productivity, as well as responsible for the DevOps and infrastructure support for external AI features integrated into the Opus platform, such as LLM-powered agentic functionality.

They will drive initiatives that enable AI-powered operational intelligence, cost-optimized infrastructure, and high-velocity product delivery across a globally distributed engineering team.

 

Responsibilities:

  • Act as a Single Threaded Owner (STO) for infrastructure & operational excellence  and lead a global organization across three primary areas:

    • SRE, with an SRE Manager and team focused on reliability, observability, incident response, and cloud operations

    • Performance & Tools, building tooling for automated testing, test orchestration, system health monitoring, and integration testing

    • Release Engineering, responsible for CI/CD tooling, release orchestration, and deployment automation

  • Own and evolve TraceLink’s internal suite of AI-enabled tools designed to enhance developer productivity and platform insight

  • Play a leadership role in DevOps and infrastructure operations for AI capabilities integrated into TraceLink’s Opus platform, including support for LLM-based workflows, inference pipelines, and secure model interactions

  • Evaluate and adopt emerging technologies aligned with the company’s product vision and technical architecture

  • Partner with the CISO, architecture, and product teams to align cloud practices with security, compliance, and business goals

  • Drive maturity in infrastructure as code, observability (OpenTelemetry, Prometheus, Grafana, Jaeger), and release automation (Jenkins, Flux-CD, Env0, CodeBuild)

  • Lead the design and rollout of AI-driven anomaly detection, telemetry pipelines, and proactive system health monitoring

  • Extend CI/CD and integration testing systems to support performance testing, distributed tracing, and alerting workflows

  • Be a major contributor to efforts to improve product quality through improved automated testing

  • Champion cost optimization initiatives, including efficient AWS resource usage (Karpenter, Spot Instances, serverless), and align to target COGS metrics

  • Set high standards for reliability, latency, availability, and scalability of core systems

  • Oversee deployment health, platform smoke tests, and post-deployment validation strategies

  • Monitor and report on platform KPIs, system uptime, alerting noise ratios, and MTTR

  • Lead incident response strategies and reduction of manual toil through automation and self-service tools

  • Hire, mentor, and grow high-performing engineering managers and technical leaders

  • Align team OKRs with broader engineering and company goals

  • Foster a culture of engineering rigor, continuous improvement, and cross-functional collaboration

 

Qualifications:

Required:

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience

  • 5+ years in engineering leadership roles managing multiple cross-functional DevOps/SRE/tooling teams

  • Deep experience with cloud-native architecture, especially AWS services, infrastructure-as-code, CI/CD systems, and observability platforms

  • Proven success running SaaS at scale, including performance, reliability, and cost optimization

  • Hands-on experience with tools such as Terraform, Helm, Docker, Kubernetes, Prometheus, ELK, Redis, Kafka, Karpenter, Jenkins, OpenTelemetry, Grafana, Env0, CodeBuild

  • AWS Bedrock or equivalent managed foundation model platforms

  • Experience supporting AI/ML-enabled applications, including inference pipelines and secure LLM integration

  • Experience with high-performance inference runtimes such as KServe, vLLM, TensorRT-LLM, TGI, or Envoy AI Gateway

  • Techniques for optimizing inference performance and cost, including KV Cache management, prompt caching, model quantization, and batching strategies

  • Clear understanding of security practices, DevSecOps, and compliance (e.g., SOC-2, ISO27001)

  • Excellent communication and stakeholder management skills

Preferred:

  • Advanced degree in Engineering or related field

  • Experience with regulated industries (e.g., healthcare, pharma, or life sciences)

  • Familiarity with reactive frameworks and modern Java/JavaScript application stacks

Please see the Tracelink Privacy Policy for more information on how Tracelink processes your personal information during the recruitment process and, if applicable based on your location, how you can exercise your privacy rights. If you have questions about this privacy notice or need to contact us in connection with your personal data, including any requests to exercise your legal rights referred to at the end of this notice, please contact [email protected].  


Top Skills

AWS
Aws Bedrock
Codebuild
Docker
Elk
Env0
Envoy Ai Gateway
Grafana
Helm
Jenkins
Kafka
Karpenter
Kserve
Kubernetes
Opentelemetry
Prometheus
Redis
Tensorrt-Llm
Terraform
Tgi
Vllm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Wilmington, Massachusetts
942 Employees
Year Founded: 2009

What We Do

TraceLink is the only network creation platform company that builds integrated business ecosystems with multienterprise applications - the true foundation for digitalization - delivering customer-centric agility and resiliency for end-to-end supply networks and leveraging the collective intelligence of entire industries.

Delivering end-to-end supply chain solutions, TraceLink's Opus Platform enables speed of innovation and implementation with an open partner model for no-code and low-code development of solutions and applications.

At TraceLink, we blend decades of knowledge in SaaS technology and supply chain business processes with a clear vision for advancing manufacturing industries through disruptive, unconventional software solutions.

With headquarters in Massachusetts, TraceLink has six global offices through North America, South America, Europe, and Asia.

Similar Jobs

Comcast Logo Comcast

Account Executive

Digital Media • News + Entertainment
Hybrid
Plymouth, MA, USA
5000 Employees
58K-108K Annually

Comcast Logo Comcast

Account Executive

Digital Media • News + Entertainment
Hybrid
Springfield, MA, USA
5000 Employees
34K-71K Annually

Comcast Logo Comcast

Account Executive

Digital Media • News + Entertainment
Hybrid
South Bolton, MA, USA
5000 Employees
65K-131K Annually

Comcast Logo Comcast

Account Executive

Digital Media • News + Entertainment
Hybrid
Lawrence, MA, USA
5000 Employees
58K-108K Annually

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account