What Is Scalability and How Do You Build for It? 25 Engineers Weigh In.

Written by Adam Calica
Published on Sep. 30, 2020
Brand Studio Logo

When a CEO learns his company will appear on the television show “Shark Tank,” the natural reaction is excitement for potential hockey stick growth. A CTO’s reaction? “Oh no.”

Adam Berlinsky-Schnine, CTO of Apairi, wrote about that exact experience in a blog post for Hacker Noon. In it, he sums up a common scalability fear.

“The main reason scaling is so stressful is that it is not linear: your system could be functioning perfectly one minute, then a small increase in usage tips it over the edge and you have an outage,” he said.

Engineers across the country are looking for how to best scale their own tech stacks without disruption to customers or drainage of funds. When companies are ready to scale, part of that process means outgrowing old systems. Director of Engineering Noah Appel at INTURN, a company that turns excess inventory into capital, said they did a whole overhaul to their core SaaS product in order to implement scalability. Notably, they transitioned their front end from clunky proprietary systems to community-supported, open-source libraries and frameworks. And CTO Bernard Kowalski said home insurance marketplace Young Alfred transitioned to the cloud for infinite scalability.

After deciding on tech tools, team processes need to be implemented. How are engineers prepared for larger data responsibilities, and what are the playbooks if a mistake or outage happens? Plan out time for tech training and adjustments. When it’s time to scale, the team will be ready. 

What is scalability?

Scalability is the capacity of a product, company, system, team, etc. to provide services that match growing demand.

 

Young Alfred

Bernard Kowalski

CHIEF TECHNOLOGY OFFICER

Theo Chu

CTO Bernard Kowalski said scalability is a marriage of tech tools and team processes.  It’s difficult to add more servers, CPUs and memory without implementing best practices to handle the workload. At home insurance marketplace Young Alfred, scaling in the cloud means constant monitoring, logging and alerting.

 

Describe what scalability means to you. Why is scalability important for the technology you're building?

There are two primary factors that drive scalability. First is software, design decisions and IT infrastructure. The second is the scalability of teams and processes. It is difficult to build scalable systems without experienced engineers tuning both parts of the engine.

Scaling can also be vertical or horizontal. We usually think about horizontal scaling, where we add more nodes to the system to handle additional work, instead of vertical scaling, where we just throw more memory, more or faster CPUs and more storage onto individual machines.

Vertical scaling is easy if you have more dollars initially, but it has hard limits. To design systems that are expected to handle orders of magnitude more work, planning for horizontal scalability is crucial. The ability to quickly increase processing capacity and manage costs is especially important for startups on an exponential growth trajectory. 

 

How do you build this tech with scalability in mind?

Cloud is the way to go when it comes to scalability. For an average startup, it provides almost infinite scalability for properly designed systems.

 

Cloud is the way to go when it comes to scalability. 

 

What tools or technologies does your team use to support scalability, and why? 

Microservices, together with architectural patterns like CQRS or event sourcing, built on the modern cloud infrastructure help with scale because you break large complex problems into more manageable pieces. SaaS services like ECS, Kubernetes, elastic storage, CDNs, load balancers, data lakes and more enable software engineers and architects to build systems that can naturally scale. Docker is also a game-changer for many software engineers.

Having all these technologies at hand is great, but combining too many can increase complexity. We find ourselves constantly balancing the complexity trade-off.

The creation process of a system that is scalable is only half of the success; operating at scale is the other half. We must be able to diagnose and fix application issues that arise from the underlying infrastructure while meeting SLA requirements. 

This is where we need to follow best practices of building for the cloud with proper logging, monitoring and alerting. On the testing side, lower environments that are as close as possible to production are very important. We use canary or blue/green release strategies, where you can safely run new versions of your applications while minimizing the risk of a negative customer experience. 

Finally, approaches like domain-driven design help reduce architectural complexity. Having a proper CI/CD pipeline significantly reduces pain points with the development and deployment of multiple interconnected services. It is a best practice to automate as much as you can in all aspects of the development process. 

 

Common

Eric Rodriguez

VICE PRESIDENT OF OPERATIONS

Eric Rodriguez

At real estate tech company Common, VP of Operations Eric Rodriguez said automation helps them scale while also reducing costs. From there, the engineering team can focus on critical issues.

 

Describe what scalability means to you. Why is scalability important for the technology you're building?

Scalability is about being able to do a lot more with a lot less. Scaling is a critical piece of the operations of any company. You need to think constantly about actions that can be automated to allow you to reduce costs but also to make sure that you have your employees focused on the most critical issues for your business. 

As Common looks to build a resident brand and the next generation of property management, we need to make sure that we’re scaling effectively to reduce costs for the owners and developers of the properties that we manage. This allows us to provide a delightful and mostly self-serve experience to our members at any time.

 

Scalability is about being able to do a lot more with a lot less.

 

How do you build this tech with scalability in mind? 

As an operator, any time I work with a development team to think through building tech, I like to first outline each step of the process in a flow. Once the process is diagrammed, I push myself and others to identify what parts of the process we think can be automated. A frequent question I ask is “Why can it not be automated?” 

That question is the foundation for scalability and allows us to identify places where automation might be detrimental to the experience of our customers as well as drive alignment on the path forward. Another thing I like to think about is how to make a tool be as self-serving as possible so we don’t need to keep bothering the engineering team to make tweaks as we learn more about user behavior or as the scope expands. This isn’t always possible, but it’s definitely something important to keep in mind.

 

What tools or technologies does your team use to support scalability, and why? 

Our operations team relies on Salesforce pretty heavily, given that it’s a platform that can be built to meet your team’s specific needs and integrates with a lot of different third-party tools. Through Salesforce, we’ve been able to automate different points of communication in our customer journey and have automated the scheduling and carrying out of tasks across our funnel. We’re now even starting to explore robotic process automation to help us automate any lengthy manual task the team has to carry out. 

 

INTURN

Noah Appel

DIRECTOR OF ENGINEERING

Noah Appel

Director of Engineering Noah Appel said scalability at INTURN, a company that turns excess inventory into capital, requires onboarding training for their clients’ engineers as they scale. To enhance the customer experience, Appel’s team built new languages, frameworks and philosophies into their tech stack.

 

Describe what scalability means to you. Why is scalability important for the technology you’re building?

Scalability in the B2B SaaS world requires the ability to onboard, support and satisfy larger companies that need to transact on larger and larger datasets. It also means the ability to onboard new engineers as our own engineering team grows to meet the needs of larger enterprise customers. As we begin to onboard enterprise fashion and CPG brands, this ability to ingest, manipulate and read massive amounts of data while providing an exceptional user experience places the challenge of scalability front and center.

 

Scalability in the B2B SaaS world requires the ability to onboard, support and satisfy larger companies.  

 

How do you build this tech with scalability in mind? 

In our overhaul of our core SaaS product, starting with scalability in mind, we transitioned our back end from a monolithic to a microservices application and transitioned our front end from clunky proprietary systems to community-supported, open-source libraries and frameworks. We are also rebuilding our DevOps systems to increase self-awareness of how our distributed services interact.

 

What tools or technologies does your team use to support scalability, and why? 

As part of this recent overhaul, we’ve introduced new languages, frameworks and philosophies into our tech stack, all with the goal of building software that can handle larger datasets while improving our end user’s experience. On the service layer, we’ve introduced Go in order to enable our move to a microservices application. 

As a middleware, we’ve introduced GraphQL into the stack to handle resolving requests made from our web app to multiple services at once. And on the front end, we’ve revamped the UI and UX to gracefully handle the demand for accessing and manipulating large amounts of data. The amount of data flowing from server to client increases as we onboard larger enterprise customers, and it’s these technologies that allow us to scale to meet their needs. 

 

Crunchyroll

Jerry Wong

INFRASTRUCTURE ARCHITECT

Jerry Wong

Infrastructure Architect Jerry Wong said scalability is implemented in all engineering processes at anime and manga digital media company Crunchyroll. When analyzing architectural design, the company asks if the tech can support the current scale — and also if it can support 10 times the users on the same scale.

 

In your own words, describe what scalability means to you. Why is scalability important for the technology you’re building?

Scalability means that all components of a product — including architecture, infrastructure, as well as the underlying services — are able to handle the current requests of customers and gracefully meet future demands. Scalability is a very important cornerstone to our technology and platform because it serves our fans at a global scale. With over 2 million paying subscribers, our fans expect the best quality, stability and functionalities of our video on demand platform.

 

How do you build this tech with scalability in mind? What are some specific strategies or engineering practices you follow?

The concept of scalability that is built into our engineering processes encourages all engineers to design and think with scalability in mind for the microservices that they are instantiating. For every product concept, there is always an architectural review of the product concept that is immediately followed by technical discovery. We always ask two questions when analyzing an architectural design: “Can this design support the current Crunchyroll scale?” and “Can this design support 10 times the users of the current Crunchyroll scale?” This is a simple yet effective question that opens up the scalability problem and encourages engineers to think about their design solutions in a very broad way. 

These questions are asked whether we’re building a new continuous delivery pipeline, a new machine learning service that will be able to serve better recommendations to users or even prospecting a build versus buy solution. 

Moving past architectural design, we always iterate to perfection. Nobody expects that the initial design of anything will also be perfect from day one. There are supporting functions in our company that encourage effective iterations such as the multitude of QA/QE load tests, performance tests, etc. In essence, after a technology has gone out to production, the iteration of that technology is our key to success in terms of scalability.

 

Our engineering processes encourages all engineers to design and think with scalability in mind for the microservices that they are instantiating.”

 

What tools or technologies does your team use to support scalability, and why? 

AWS is our cloud provider to build scalable infrastructure. The large abundance of services that AWS provides enables us to easily create, build and deploy services that contain core functionalities of our VOD platform on a global scale. 

Observability is the next key concept that we embrace to effectively measure how well the service is performing and identify key offending components in which we can then iterate and fix. New Relic is one of the major observability tools that we use to instrument and support scalability through observability. Content delivery networks (CDNs) are another major technology that we use to better serve our fans. CDNs allow us the ability to cache video assets closer to the edge locations of our customers to give them a better and more performant experience by caching popular assets. 

 

Instacart

Dustin Pearce

VP OF INFRASTRUCTURE

Dustin Pearce

Vice President of Infrastructure Dustin Pearce said designing a system with limits helped control scalability at online grocery delivery service Instacart. Circuit breakers and controls that limit data access help small tweaks from customer behavior or code changes turn into tidal wave-sized problems. 

 

In your own words, describe what scalability means to you. Why is scalability important for the technology you’re building?

Scale exaggerates imperfections —this is why simplicity scales. Each part of your system needs to be very well defined and understood in order to scale. As you scale a system, you introduce new complexity in the form of new failure modes. More computers and more connections equal more opportunities for something to go wrong. Reasoning those failure modes and building resilience into your system is important. It is equally important to make sure you are learning from system outages and pouring that learning back into the system in the form of additional resilience.

 

Scale exaggerates imperfections —this is why simplicity scales.”

 

How do you build this tech with scalability in mind? What are some specific strategies or engineering practices you follow?

When approaching scale, one of the most important aspects is designing systems with limits. Writing and reading data from a database is often where scale issues are most acute. Since small changes in the behavior of your users or your code can explode into a tidal wave at scale, you need to design circuit breakers and controls that limit data access. In the earliest stages of scaling, this means that developers have to get used to doing work in batches. The most common mechanism used is query interfaces since they require paging and limit how much data can be returned from any given query.

 

What tools or technologies does your team use to support scalability, and why? 

Web servers and other stateless services use immutable infrastructure and auto-scaling. This allows us to rapidly expand capacity as needed. When working with cloud infrastructure, we don’t troubleshoot or debug a single node. If there is a single node misbehaving, it just gets replaced. This is the “cattle, not pets” mentality made popular by Netflix. On the database front, we keep things simple with typical RDBMS systems managed by our cloud provider. We use our application to limit access to these databases and keep their size and workload manageable by spreading the load across several databases that hold parts of the data. This is a process called horizontal sharding.

 

TrueAccord

Jeffrey Ling

DIRECTOR OF ENGINEERING

Jeffrey Ling

At fintech company TrueAccord, engineers are given autonomy on how they want to design their services. Director of Engineering Jeffrey Ling said they keep teams cohesive by having experienced engineers act as architects and give advice. 

 

In your own words, describe what scalability means to you. Why is scalability important for the technology you’re building?

Scaling software is more than just having servers work through high load. It’s also about being able to enable our team to build features quickly and safely. Enabling that takes thought on the organization as well as the tech.

Scaling is important to us because we work with millions of active accounts under strict federal and state compliance rules. If our system lags the wrong way, we are liable for any mistakes the system makes. Worse, our consumers would lose trust in us and our ability to help them out of their debts. We’re constantly working on ways to allow us to continue to build new features and meet consumer needs while dealing with these constraints. 

 

How do you build this tech with scalability in mind? 

Each engineer has full autonomy to how they want to design their services, though we do provide a recommended toolset with best practices as guidance. We keep things cohesive by having experienced engineers acting as architects to find points of reusability and give advice on potential adverse effects. 

As we build our tech, we try to keep a very strong separation of concerns throughout the system. We align the service boundaries with our teams in mind, working through scenarios with the purpose of reducing the amount of cross-team work needed for each scenario. 

 

Scaling software is more than just having servers work through high load.”


What tools or technologies does your team use to support scalability, and why? 

For our back end, we use Go, Scala and Java, as well as node where appropriate. Go is great for light microservices like lambda functions, while Scala and Java are great for complex business logic that need a more expressive language. 

We use a whole smorgasbord of AWS products to host our services and train our machine learning models. We scale servers with Kubernetes and scale our warehouses with Snowflake. 

For our team, we align on architecture using Miro, which deserves a special shoutout for being a great collaborative whiteboard and diagramming tool. We have a live version of our architecture that anyone can comment and ask questions on. This has allowed teams to work on architecture asynchronously as we embrace remote work. 

 

Postmates

Sanket Agarwal

MANAGER, ENGINEERING

Sanket Agarwal

During the onset of stay-at-home orders, delivery service Postmates hit unexpected growth. Manager of Engineering Sanket Agarwal said implemented code review and design processes combined with hiring the right talent helped meet customer demand while keeping essential employees safe.   

 

In your own words, describe what scalability means to you. Why is scalability important for the technology you’re building? 

Scalability is providing a high-quality service to our ever-growing base of customers. We need to scale our people organization, our culture and make our systems robust. We also take a larger social responsibility as we scale especially when we are providing essential services and employment during a pandemic.

Postmates is at a point where scale is a way of life. Any line of code that we add affects millions of users. This brings challenges and opportunities. On one hand, we need to have process and automation to avoid bugs, but on the other we can harness the power of our user base to build an experience that is magical. For instance, Postmates can recommend food that you may like. You can only have such experience with the power of data and scale.

 

Scalability is providing a high-quality service to our ever-growing base of customers.”

 

How do you build this tech with scalability in mind? 

Robust engineering practices and a culture of customer obsession are key to building products with love, care and scalability in mind. We often care about shaving off a few milliseconds to make the user experience better.

We care about our code review process, design process and train people to be self-sufficient. We also invest in a postmortem process where we identify and address key issues with our systems.

Also hiring the right set of individuals is key. With our brand, we’ve been able to attract some of the best talent the industry has to offer. They bring the experience of having built these systems and give us foresight into best practices.

During the pandemic we had a rush of people trying to stay indoors and order food, and you have to react to a surge in demand. We couldn’t have prepared for that. But our team huddled during weekends to keep the lights on. The ability to quickly react and adapt is also key to scalability especially in a hyper-competitive environment.

 

What tools or technologies does your team use to support scalability, and why? 

At Postmates, we rely on battle-hardened technologies. We’ve built a host of custom technology, which doesn’t exist for our use case or scale. But we also heavily rely on open source to avoid reinventing the wheel.

We have built an in-house mapping and planning system that can match couriers to deliveries at scale. This helps optimize delivery times while balancing batching. We also have a highly scalable data infrastructure on top of BigQuery, that, along with machine learning, enables recommendation engines like our feed or search.

Most of our applications are built on a service-oriented architecture. We use open source technologies like Kubernetes and Docker to host our services.

 

Grammarly

Kirill Golodnov

SENIOR SOFTWARE ENGINEER

Kirill Golodnov

Senior Software Engineer Kirill Golodnov knew to expect a growth surge when editing software Grammarly became available on Google Docs. Replacing programming languages, retiring outdated AWS instances and adding a caching layer prepared them for traffic spikes.

 

In your own words, describe what scalability means to you. Why is scalability important for the technology you’re building?

Grammarly’s digital writing assistant is used by millions of people every day across web browsers and devices and we’re always expanding availability. To provide this support, we deliver sophisticated writing suggestions to millions of devices in real-time around the world, which means we process millions of simultaneous server connections, requiring hundreds of servers. As Grammarly continues to accelerate growth, finding solutions to these challenges is key to maintaining a seamless experience for our users.

 

We believe it’s important to take a nuanced approach to scalability and not rely on adding more servers without thinking more critically about our pipelines.”

 

How do you build this tech with scalability in mind? 

We believe it’s important to take a nuanced approach to scalability and not rely on adding more servers without thinking more critically about our pipelines. Horizontal scalability cannot solve all problems because it creates new ones; specifically, the need to run and manage thousands of servers. We use a number of tactics to reduce this need by an order of magnitude. We optimize our algorithms and neural networks. We also strive to identify programming languages and runtimes that will lessen our server load. 

We historically used Common Lisp in our suggestion engine but it was not optimized well for the high-load processes required by our product. So we recently replaced it with Clojure, which is a dialect of Lisp that runs on JVM. As a result, we were able to retire many of our AWS instances. We’ve also found that sometimes adding a caching layer helps. When we were preparing to support Grammarly in Google Docs, we were anticipating big spikes in traffic. Caching worked great for us in that scenario.

 

What tools or technologies does your team use to support scalability, and why?

Our team runs back ends on AWS. For computations, we run Docker containers on AWS elastic container service with auto-scaling, which is pretty typical these days. For us, the more complex challenge is scaling storage and data processing. Grammarly users who write documents in the Grammarly editor need to be able to trust that they can safely store them there, so we’ve needed to find creative solutions for making sure this storage is distributed and resistant to failure. Ultimately, we came up with a custom architecture based on AWS DynamoDB and S3, coordinated by Apache ZooKeeper, to ensure consistency of stored documents. For big data workflows, our team uses AWS-managed Kafka as well as Apache Spark on AWS EMR, and also AWS Athena and Redshift.

Grammarly is Hiring | View 27 Jobs

 

Discord

Daisy Zhou

SENIOR SOFTWARE ENGINEER

Daisy Zhou

Senior Software Engineer Daisy Zhou saw an uptick in users at social platform Discord due to COVID-19. Outside of the pandemic, Discord engineers try to autoscale in order to act proactively to user growth, rather than react to customer traffic.

 

In your own words, describe what scalability means to you. Why is scalability important for the technology you're building?

At Discord, we are building a welcoming platform where everyone can talk, hang out with their friends and build their communities. With so many people moving big parts of their life online recently, we have seen a huge, unexpected increase in active users of more than 50 percent since the previous year. Discord has always been known as a fast, reliable product. Scalability is the engineering work necessary to maintain the quality that users expect even in the face of such unexpected growth.

 

How do you build this tech with scalability in mind? 

We run stateless services that can be easily scaled up or down whenever possible. Configuring how and when to autoscale ahead of time is much easier than having to manually rescale when traffic changes drastically. It also simplifies many other aspects of maintaining a service, so it is almost always worth setting services up this way from the beginning when possible. 

For most other scaling optimizations, finding the balance between not over-engineering early and being ready in time to support more load is the trickiest part. It’s hard to pinpoint, but the ideal time to start working on scalability is after the problems we will encounter are clear but before they overwhelm us. We use a combination of resource utilization metrics and performance metrics to measure our system performance as well as the user’s experience of our system. Some clear indicators that a service needs some love are multiple shards running hotter, slightly degraded performance at peak traffic times or degraded performance for individual power users and groups.

Our specific strategies depend on the specific problem. In the last few months we’ve worked on two significant scaling projects in the chat infrastructure of Discord that use pretty common strategies: request coalescing and horizontal scalability. We recently built a service that stands in front of our messages database that now coalesces requests and will allow us to easily add other optimizations in the future. Another is a re-architecture of our guilds service, which as our biggest communities have grown, started struggling to handle 100,000 connected sessions per Elixir process. We horizontally scaled the Elixir processes so that we can scale as the number of connected sessions increases.

 

We run stateless services that can be easily scaled up or down whenever possible.”

 

What tools or technologies does your team use to support scalability, and why? 

A lot of Discord runs in Google Cloud Platform (GCP), and whenever possible we run stateless services as GCP autoscaling instance groups.

Many of our stateful services that maintain connections to clients, and process and distribute messages in real-time, are written in Elixir, which is well suited to the real-time message passing workload. But in recent years we have run up against some scaling limitations. Discord has written and open-sourced several libraries that ease some pains we’ve encountered with managing a large, distributed Elixir system with hundreds of servers. These include ZenMonitor for coalescing down monitoring, Manifold to batch message passing across nodes, and Semaphore, which is helpful for throttling as services get close to their limits.

Culturally, we don’t shy away from trying out new technologies when our current ones are not cutting it. ScyllaDB and Rust are two examples where our explorations have paid off and they have been good solutions to some of our problems.

 

Udemy

At edtech company Udemy, Vice President of Engineering Vlad Berkovsky and Senior Director of Engineering Cathleen Wang said that scalability is about building solutions that focus on customers’ problems without adding complexities. To do that, they adopted a hybrid cloud infrastructure and run the website from a private and public cloud.

 

Cathleen Wang

SR. DIRECTOR OF ENGINEERING

Cathleen Wang

In your own words, describe what scalability means to you. Why is scalability important for the technology you're building?

“Scalability means building high-quality products and services that solve business problems,” Wang said. “This is important because as our business grows, the needs of features and capabilities naturally become more complex. Designing and building solutions that focus on key customer problems without introducing unnecessary complexity or less desired user experience is critical to delivering high-quality products and services that can enable and support business growth. Solutions we build should be both performant and flexible to address business needs.”

 

We designed our infrastructure with scalability in mind.”

 

Vlad Berkovsky

VICE PRESIDENT OF ENGINEERING

Vlad Berkovsky

How do you build this tech with scalability in mind?  

“Decision on the architecture of systems is typically specific to the application,” Berkovsky said. “Specific issues can arise around the site response times, the volume of data reads and writes, etc. Generally speaking, there are three options for technology scaling: horizontal duplication, splitting the system by function, and splitting the system into individual chunks.

“At Udemy, we designed our infrastructure with scalability in mind. If the infrastructure doesn’t scale, the application scalability won’t save the day. This is why we adopted a hybrid cloud architecture: running the site from both the private and public clouds. Private cloud provides us with predictable performance and cost, and public cloud ensures a virtually unlimited capacity for scaling out as needed. 

“On the people side, we are a DevOps company. Our engineers write and then own their code in production. This continued ownership increases quality and accountability. The goal of the site operations teams is to enable developers to maintain this ownership.”

 

What tools or technologies does your team use to support scalability, and why? 

“Once the capacity question is solved and the capacity is secured, the next big questions are around the application scalability,” Berkovsky said. “It is very difficult to predict which parts of the system will become bottlenecks as the site load grows. This is why we are making use of ‘game days’ when we stress test our site or site components to identify the bottlenecks under specific load profiles.

“Automation is essential to help scale the site operations teams. Automation helps to move the focus from performing repeatable tasks manually to automating these and focusing on more important strategic work. At Udemy, we adopted the infrastructure-as-code approach to managing infrastructure. We don’t allow direct manual changes to the infrastructure; any changes need to be implemented as code to ensure predictable and repeatable results.

“We make use of such popular configuration and orchestration tools like Ansible and Terraform. Because these tools are vendor agnostic, they work equally well in public and private cloud implementations. This allows us to use the same toolkit across multiple cloud platforms reducing our efforts to manage these platforms.”

 

BigCommerce

Sandeep Ganapatiraju

LEAD SOFTWARE DEVELOPMENT ENGINEER

Sandeep Ganapatiraju

Sandeep Ganapatiraju, a lead software development engineer, said the goal of scalability at the e-commerce platform BigCommerce is to increase customer usage with the least amount of technical changes. Multiple simulations of customer types and usage help them plan out when to release new business features. 

 

In your own words, describe what scalability means to you. Why is scalability important for the technology you’re building?

To me, it’s more about the ability of the platform to adapt to increasing customer usage with the least amount of changes needed.

This means having a clear strategy by simulating various customer usage scenarios upfront based on how the business sees usage likely to grow over the next few months, including product releases from alpha, beta, and becoming generally available. 

We can have a clear plan of scalability when we calculate over the next “x” months where we expect customer usage to grow by “y” while the system is able to tolerate “y + z” load for a short period of time.

 

Scalability is more about the ability of the platform to adapt to increasing customer usage with the least amount of changes needed.”

 

How do you build this technology with scalability in mind? Share some specific strategies or engineering practices you follow.

We do multiple simulations. We create examples of the largest user creating even larger usage, an average user operating at average usage and also, a sudden short burst of usage spike such as a sale.

Each of these scenarios is specifically thought about while keeping business in mind. Then, we keep the business informed to help a planned rollout of new features.

 

What tools or technologies does your team use to support scalability, and why? 

BigCommerce uses JMeter and queries to explain plans for quick checks. Then we use BlazeMeter once we have a solid test laid out to test overall system performance.

We do a lot of monitoring of production using Grafana dashboards to see how various services are performing in production. We use also do canary deployments to release a new version to a specific software subset of users to see any unprecedented bugs or scales. New Relic alerts if requests get backed up and time out by more than “x” percent in a given time. We also monitor requests that take a long time with distributed tracing tools to see which specific microservice is slowing the overall request.

 

Sovrn

Theo Chu

ENGINEERING MANAGER, COMMERCE

Theo Chu

At Sovrn, scalability is crucial since their network handles tens of thousands of API requests per second. Engineering Manager Theo Chu said they must anticipate future demand when building in order to prevent latency and keep customers happy. 

 

In your own words, describe what scalability means to you. Why is scalability important for the technology you’re building?

Scalability means building our technology and products from the ground up with future scale in mind. It means anticipating future demand and being able to meet that demand without having to re-engineer or overhaul our system. Scalability is especially crucial for us since we handle tens of thousands of API requests per second. Our data pipelines process and analyze billions of events per day. We support traffic from major publishers on the web and our partners expect us to handle their scale on a daily basis without sacrificing latency or reliability.

 

Scalability means building our technology and products from the ground up with future scale in mind.”

 

How do you build this tech with scalability in mind? 

Building for scalability means designing, building and maintaining engineering systems with a deep technical understanding of the technologies that we use and the performance constraints of our systems. Our approach has been to build for scalability from the bottom up through both technical and product perspectives. On the technical front, we rely on underlying technologies and frameworks that enable scale. 

On the product front, we find that building foundational components early on for anticipated scale trades off much better than having to re-architect the system later on. This was exemplified with our Optimize product, where we have been able to scale effortlessly after designing our database to handle hundreds of millions of mappings in earlier iterations.

 

What tools or technologies does your team use to support scalability, and why? 

We rely on a range of technologies that support and process data at scale, including Cassandra, Kafka and Spark. Since these are the foundational blocks of our system, we optimize them heavily and load test each component up to multiples of our current scale to enable scalability and address any bottlenecks. Since our infrastructure is fully on AWS, we also utilize AWS tools that support scalability such as autoscaling groups.

 

Optiver

Will Wood

TEAM LEAD INFRASTRUCTURE & CONTROL

Will Wood

Optiver, a global electronic market maker, uses their own capital at their own risk to provide liquidity to financial markets. Optiver's engineers and traders come together to craft simple solutions to complex problems. Will Wood, based in Chicago, said that to keep tech scalable, his team uses load testing systems they build themselves to see how their tech reacts in specific situations.

 

Describe what scalability means to you. Why is scalability important for the technology you're building?

Scalability means being able to easily handle the next busy market day. The technology I'm building is at the center of the environment, so if it has performance problems, the impact will be large. If it scales well, the firm will be able to remain fully active through extreme market conditions. My goal is that this technology has the same performance characteristics on an extreme day as it has on an average one.

 

My goal is that this technology has the same performance characteristics on an extreme day as it has on an average one.”


How do you build this tech with scalability in mind? 

I try to reduce the variance in performance as much as possible. In order to accomplish this, I choose algorithms with consistent performance, use simple programming language features and keep the resources my systems use isolated from interference. 

I also design my systems to handle a specific load, which is usually some estimate of an extreme day, and behave in a deterministic way if that load is exceeded. Then, I monitor the actual performance and load in the production environment, signaling me when the latter is approaching the designed threshold.


What tools or technologies does your team use to support scalability?

On my team, we build our own systems for load testing. This allows us to test very specific scenarios and adapt to changing market conditions and business requirements. We have also developed a system for monitoring the performance of our applications in the production environment. This gives us regular feedback into how our tech is behaving and allows us to quickly notice any degraded performance.

 

Jellyvision

Alex Bugosh

PRINCIPAL SOFTWARE ENGINEER

Alex Bugosh

When open enrollment for healthcare hits, Jellyvision’s benefits tools have to be able to handle an influx in traffic. Bugosh says his team relies on data from previous years, frequent load testing and tools like AWS to help ensure their systems are ready for the busy season.

 

Why is scalability important for the technology you’re building?

At Jellyvision, we face a fairly interesting set of scalability problems. Our ALEX Benefits Counselor product is used by employees to help them choose their healthcare each year during open enrollment. Since most of our customers have open enrollment during the same few months in the fall, we experience a spiked but predictable load pattern. The ability to have those systems scale up and down and deliver results quickly is key for our ability to help our users.

 

The back-end services utilize autoscaling, in case we see any large unexpected spikes in traffic, but are efficient during our off-peak times.” 


How do you build this tech with scalability in mind?

The first and most important strategy we have is identifying the expected usage of our systems when we are initially developing the requirements. Those requirements guide our expectations during code and design reviews and help us to choose the right core technologies and architecture patterns for our systems.

This process leads us to make deliberate choices about what logic should live in the front-end JavaScript applications and what logic should live in our back-end services. Our front end is served entirely by a CDN, which has moved much of the scalability concerns over to AWS. Since our back-end services operate mostly independently, we can add instances at will. The back-end services utilize autoscaling, in case we see any large unexpected spikes in traffic, but are efficient during our off-peak times.

The other part of our strategy depends on having years of data around our expected load levels, based on the number of customers and historic load levels. We perform extensive load testing leading up to our peak times and will adjust our baselines so that autoscaling has less work to do. 


What tools or technologies does your team use to support scalability?

Jellyvision’s scalability story is AWS-centric. We use Cloudfront and S3 together to serve our front-end JavaScript code and media assets. We are in the process of moving our core services over from Elastic Beanstalk to ECS and Fargate.

For logging and metrics, Sumo Logic is our vendor. We rely on Sumo Logic’s dashboards, alerts and access to our historical data to prepare for our busy season.

 

Hudson River Trading

Jason Mast

LEAD CORE DEVELOPER

Jason Mast

Fintech company HRT has built its own custom tools and proprietary reliability features to help with scalability issues. More than tools, however, Mast credits his team’s culture for having the biggest impact on scalability. He says that, by adopting a “long-term mindset,” engineers are inspired to design systems able to handle new features and stressors. 

 

Describe what scalability means to you. Why is scalability important for the technology you’re building?

Scalability means understanding how current capacity is being utilized and how technology will behave as it approaches or exceeds saturation. This is particularly important when operating in financial markets, where volume varies greatly and bursts of activity can unexpectedly overwhelm a system that appeared to have substantial headroom.

In finance, opportunities can be fleeting. Failure to scale to meet growth demands can quickly reflect on the bottom line. In proprietary trading, there are several dimensions of growth to consider: market volume, geographies and asset classes, our catalog of trading models and sophistication of machine learning. Our success depends on being able to respond to expansion in all of these dimensions.  

 

In finance, opportunities can be fleeting. Failure to scale to meet growth demands can quickly reflect on the bottom line.”

 

How do you build this tech with scalability in mind?

Building scalable technology starts with company culture. We strive to hire developers that can reason about complex systems and foster an engineering environment that values a long-term mindset. We cultivate that mindset through openly collaborative development and an iterative code review process where we encourage scalability and maintainability. 

Furthermore, we leverage an expansive compute cluster for research and development. This organically encourages modular design by improving parallelism in the cluster and increasing productivity. Since we run the same software in production, those aspects of scalability carry over nicely to the live environment. Meanwhile, a collection of automated stress tests provides continual insight into performance and scaling considerations.
 

What tools or technologies does your team use to support scalability? 

A foundational element of our technology that facilitates scale is a set of refined libraries that provide efficient communication among components. The straightforward access to concise data structures over shared memory or zero-copy network transports eases the burden of building distributed solutions. 

We’ve built proprietary reliability features into our network stack, enabling multicast communication using modern network hardware that provides high-speed fan-out of data to a potentially vast array of processing nodes. We’ve also built custom tools that assess production utilization daily and redistribute workload uniformly. 

Meanwhile, we’ve created a notification platform atop Redis and visualization tools using Grafana that continually communicate utilization of the system at the hardware, operating system and software levels. 

 

20spokes

Ryan Fischer

FOUNDER

Ryan Fischer

To Fischer, founder of development agency 20spokes, building scalable tech should be simple. That is, he said, engineers should keep their architecture lightweight and “easy to update and change.” This way, products are able to accommodate the user’s needs today and in the future. 

 

Describe what scalability means to you. 

Scalability is the ability to adapt to change and new needs. In the tech world, it is mostly attributed to how many users a particular app or site can manage without having performance issues. It’s important to remember the original definition, as you want your product not only to be able to handle increased activity but also be able to adapt in its feature set to meet the needs of the customer. This is crucial in the technology you choose, as it needs to be flexible and remove roadblocks preventing change.
 

During development, one of the best ways to improve scaling is not with tools but peer reviews.”


How do you build your tech with scalability in mind? 

Building to scale is keeping everything simple. All major frameworks used today scale just fine — when there are scaling issues, it tends to stem from how the product was architected. Many times, a product can be overengineered, leading to scaling problems not only with the demand on a server but also in creating new features. 

Keeping it simple means building components to be independent and small, which makes them easy to update and change. The same can be done with the server architecture. 

 

What tools or technologies does your team use to support scalability?

We use services such as Datadog to monitor performance. Google’s Firebase toolset has been improving and is very helpful to monitor the performance of mobile applications. 

During development, one of the best ways to improve scaling is not with tools but peer reviews. All of our code is peer-reviewed to ensure it is meeting our standards before being merged and deployed.

 

Sphera

Albion H.

SOFTWARE ARCHITECT

Albion H.

Sphera, a risk management software company, can’t afford to have its platforms experience even a moment of downtime. That’s why the engineering team turned to Microsoft Azure, a robust technology Albion says is equipped to handle the scaling and support of each one of its products. “The use of managed and proven technologies allows us to keep our teams focused on more creative endeavors and building business value,” Albion said. 

 

Why is scalability important for the technology you're building?

Our goal is to make SpheraCloud the best SaaS operational excellence platform. Our customers are running global operations with tens of thousands of users — they rely on our software to manage operations in their plants and visualize risk in real time. In this environment, even very small amounts of downtimes can result in a major disruption to operations and an increase in risk.

 

Even very small amounts of downtimes can result in a major disruption to operations and an increase in risk.”


How do you build this tech with scalability in mind? 

People are the most important piece of the scale puzzle. We’ve seen that clarity into roles and responsibilities is critical for scale initiatives to be successful. Absence of clarity usually results in things not getting done because nobody felt they had ownership, or in multiple teams developing the same feature without any coordination. The latter scenario not only wastes money but can also create long-term resentment between teams and destroy employee morale.

Following research in the area of organizational behavior, our engineering department is organized into “squads” of no more than five people to maximize engagement and collaboration. Scaling cross-team collaboration and communication happens via chapters and guilds, as popularized by Spotify. 


What tools or technologies does your team use to support scalability?

Our software runs on Microsoft Azure, and we strive to make use of its core capabilities to enable scalability and robustness. SpheraCloud is composed of multiple modules that seamlessly integrate with each other to feel like a single, cohesive platform. They’re deployed in Azure App Service, which enables us to automatically scale each module horizontally based on load and performance metrics. 

We store our data in Azure SQL Database, which can scale and support even the most demanding web apps. We take advantage of this architecture to direct all our most demanding “read” operations, such as reporting or searching, to "read-only" replicas.

We use Azure Cache for Redis, a fully managed, in-memory cache store for session data such as user cookies, roles and permissions, and application resources. Redis gives us sub-millisecond response time and diminishes the load on the database, enabling even greater scalability.

 

Carminati Consulting

Brittany Carminati

PRESIDENT

Brittany Carminati

When the pandemic began, Carminati Consulting had to scale its ImmuwareTM product to meet the needs of healthcare clients affected by COVID-19 — and do it fast. Carminati said that because the software was designed around custom compliance requirements for each customer, they were able to quickly build out new functions to support facility administrators monitoring the disease. 

 

Why is scalability important for the technology you're building?

Our SaaS product, ImmuwareTM, is a comprehensive employee and occupational health solution. Since the COVID-19 pandemic, we rapidly configured the product to handle the demands of an ever-changing landscape for healthcare organizations battling to keep healthcare professionals safe during the pandemic. We would not have been able to serve our new or existing customers if it weren’t for our flexible and scalable platform.   

 

We would not have been able to serve our new or existing customers if it weren’t for our flexible and scalable platform.”


How do you build this tech with scalability in mind? 

From its inception, ImmuwareTM was designed to be an online community for healthcare customers — meaning it is the organization’s job to ensure their workforce is compliant for specific healthcare-related activities. In doing so, we have ensured employees, supervisors and occupational health administrators have access to the necessary data, reports and dashboards to drive compliance. 

Since this initial adoption, we’ve scaled ImmuwareTM to specific niche roles, such as COVID-19 administrators designated to oversee symptom monitoring. Because ImmuwareTM was designed with the notion of “record types” — meaning each customer has different compliance requirements and, therefore, different record types which need to be tracked — we are able to easily offer tailored and rapid deployment for customers just seeking a specific record type such as “daily COVID-19 wellness checks.”

 

What tools or technologies does your team use to support scalability?

Azure Cloud hosting has been huge for us. Without Azure Web Services we could not as easily scale during peak usages and will not be able to integrate with external systems.

 

Automox

Brad Smith

DIRECTOR OF SOFTWARE ENGINEERING

Brad Smith

When defining system scalability requirements, Brad Smith’s team at Automox always begins with a design session. The director of software engineering said that he thinks about scalability not only as an application or system but as it relates to an entire organization. Smith keeps his organization ahead of the curve by asking engineers to use tests and metrics to guide their decisions and never trying to overcompensate for performance and scale.

 

In your own words, describe what scalability means to you. 

Scalability can be a loaded word. I think most people refer to it as an application or system that increases performance proportional to the services shouldering that load. But you should think beyond scaling web services; an organization needs to be able to scale as well. If a service is deemed scalable, we need the people and processes to keep that service ahead of the curve. 

Here at Automox, scalability is an important part of how we build our infrastructure and organizations and deploy our applications. 

Customers depend on us to make sure their endpoints are patched with the latest fixes. We have to be able to meet their demand. Every time Microsoft has a big Patch Tuesday, our system must be able to respond to the added load without letting our customers down. Most of the time we meet the challenge. But there are times where we fall short. Failure is OK as long as you learn from it and make the system scale further next time.

 

How do you build this tech with scalability in mind? 

When defining system scalability requirements, we always start with a design session. We get together as a group and brainstorm. The team has an opportunity to discuss potential issues or pitfalls and define what success looks like. Each stakeholder is represented at the table. 

There is a natural tendency to make applications and systems bullet-proof, but it’s better to make incremental changes, release them, measure them and make data-driven decisions going forward. We use tests and metrics to guide our decisions and try to stay away from fingers in the wind. 

In a startup, the trade-off between performance and scalability is paramount. Never try to overcompensate for performance and scale. Ninety percent of the time, you’re not going to need it. Paying for resources that you may never use is not being scale-ready. It will make you slow to respond when the next need to scale does occur.

 

What tools or technologies does your team use to support scalability, and why?

Some of the tools we leverage to make scaling easier for us are Kubernetes, actionable metrics with Prometheus and AWS-hosted services like RDS. Kubernetes provides us a way to scale our services (monolith and microservices) with automation and little effort. 

We rely heavily on metrics to gauge the health of our applications and services. Prometheus, along with Thanos, offers a scalable metric back end that will continue to grow with us. 

When it comes to our datastores, we typically use hosted offerings like Amazon’s RDS. Anytime you can entrust one aspect of your stack to a proven partner, that is one less thing you have to spend time and money on. If the team does not have to worry about scaling or backups for PostgreSQL, that is a win.

Lastly, we use Jira for project management and work tracking. If you do not understand how much work your team can do, then you will never know how or when to scale your team. Precision, as it relates to planned and unplanned work, is key to predicting when to scale up or down.

 

Maxwell

Ben Wright

SENIOR SOFTWARE ENGINEER

Ben Wright

Without a transparent and maintainable code base, Senior Software Engineer Ben Wright said that Maxwell employees wouldn’t be able to achieve their ultimate goal: opening up the homeownership process. To do so, Wright’s team has focused on building the company’s front-end infrastructure with a reusable, component-based architecture. They are also building an external component library every team member can access and rely on. 

 

In your own words, describe what scalability means to you.

As a front-end engineer, scalability means building a maintainable code base that can grow with additional users and developers. This is important for Maxwell as a company and for our technology specifically because, ultimately, our mission is to empower people to make mortgage lending simpler and more accessible. As you can imagine, increasing transparency is only achievable and impactful with more users and more customers. We need the data and scale to have a big impact in this massive industry, so scalability is critical. If we don’t think about scalability, we won’t succeed as a company.

 

How do you build this tech with scalability in mind? 

We have focused on building the front end with a reusable component-based architecture. We have built out an external component library to be the source of truth for engineering, design and product decisions. The library allows us to maintain consistent design in our UI/UX, establish patterns within our component code and provide guidelines for new developers to build new pages or components that match.

 

What tools or technologies does your team use to support scalability, and why? 

We use ESLint and RuboCop to enforce a common code style, vigorous code reviews (both asynchronous and in-person) to ensure code quality, thorough automated testing to prevent bugs and unintended consequences, and more.

 

Artifact Uprising

Austin Mueller

SENIOR SOFTWARE ENGINEER

Austin Mueller

When it comes to site scalability, Artifact Uprising Senior Software Engineer Austin Mueller sees his job as ensuring performance isn’t impacted by the number of site visitors on any given day. To make that vision a reality, his team uses AWS S3 and Lambda in addition to autoscaling Kubernetes clusters to process customizable, digital photo orders as they come through.

 

In your own words, describe what scalability means to you. 

Because we build and sell products, our site activity varies greatly depending on the time of day and season. For this reason, scalability must be top of mind.

Scalability means having a reliable service that can handle two or two million customers without downtime, interruptions or delays in service. Second, we want to make sure that our infrastructure can scale up or scale down without manual oversight. Our goal is not to waste resources if only two people are using our site and not to let our customer experience suffer if two million people are using our site.  

 

How do you build this tech with scalability in mind?

Part of our day-to-day mindset is thinking ahead about what we are building and how it will behave under load. This means designing our infrastructure and applications in such a way that they can automatically scale up and down to handle traffic spikes. 

Our team puts a lot of effort into learning and implementing scalability best practices. We use Amazon Web Services (AWS), which has wonderful scalability features baked in, for many parts of our infrastructure. We want to make sure we are well equipped to use these services to their fullest potential.

 

What tools or technologies does your team use to support scalability, and why?

One critical feature of our application is the ability to upload and store photos. This service must be able to scale, especially as it requires a lot of network traffic. We work to keep upload wait times to a minimum, even if thousands of people are uploading photos at once. 

We are able to leverage the capabilities of AWS S3 and Lambda to provide an infinitely scalable service that does not degrade, no matter how many photos are being uploaded. We also use autoscaling Kubernetes clusters to process orders as they come through. As more orders are queued up, we can automatically spin up new servers to ensure that orders are processed quickly and efficiently.

 

StackHawk

Topher Lamey

SENIOR SOFTWARE ENGINEER

Topher Lamey

Senior Software Engineer Topher Lamey emphasizes scalability in his work at StackHawk because of its role in building a high-quality software product for customers. In order for dev professionals to be most productive in the codebase, Lamey said there must be a certain number of changesets flowing through the CI/CD pipeline. Not only that, but the test/deploy process and architecture needs to function seamlessly. 

 

In your own words, describe what scalability means to you. 

Scalability, to me, means that the delivered software scales appropriately across multiple levels. The ability to easily triage and fix environment issues allows each team member to be highly productive in the codebase. 

When it comes to a software product that people will pay money for, scalability refers to how that product will handle the workload of its users (human or not). The product should predictably scale in terms of performance and resource usage to deliver functionality. Factors like monitoring resource usage and key system metrics, architecting services to distribute resource workloads, using proven technology, and writing and profiling performant code all contribute to scalability.

In the early stages of a company, it’s more important to be flexible and figure out what the product is. There’s no need to build for Facebook-scale at that point. However, as a company grows, scalability needs and expectations need to be identified and budgeted for.

 

How do you build this tech with scalability in mind?

We have around a half-dozen engineers, so we need some scalability around our dev process. We have to account for multiple engineers working simultaneously in the same codebase.

In terms of the software, we think about scalability as a requirement as we plan new functionality. As the functionality’s requirements are discussed, we talk about scalability needs. Some general questions as part of the discussion include: What needs to scale in this scenario? What would break first as the usage of this functionality increases? What resources are impacted? 

Then, as we implement functionality, we are collaborative about design options and choices. Our dev process has gates around automated testing and manual code reviews to help spot issues. We then deploy changes to environments that attempt to mirror production, including monitoring and alerting. This way we can be sure new changes scale appropriately.

 

What tools or technologies does your team use to support scalability, and why?

To help scale dev processes, we use GitFlow to simplify changeset management across our projects. Our entire build/deploy process is automated using a mix of Docker Compose, Kubernetes and AWS CodeBuild/ECS. As part of GitFlow, we gate merges to branches based on automated tests and peer code reviews. We deploy changes to test environments that closely mirror production so we have a high degree of confidence that scalability will not be impacted.

Some technologies we use because we know they scale are Spring Boot, Python, Kotlin, AWS RDS/PostgresSQL and Redis. Additionally, we use Logz.io and Grafana to help monitor and handle alerting for our systems. Our internal services communicate with each other using gRPC rather than JSON/REST. gRPC is a highly scalable Google technology that implements stateless RPC using language-agnostic definitions called protobufs. GRPC provides a way to define and share common RPC message and method definitions across the board. We’ve also gone with Kubernetes because we can easily set up service scaling rules around resource usage. Because our services are stateless, it’s relatively easy to spin up new instances to help process the workload.

 

Fluid Truck Share

Leonardo Amigoni

CTO

Leonardo Amigoni

Fluid Truck Share’s CTO Leonardo Amigoni appreciates Google’s compiled programming language Golang because of its lightweight nature. He said it has allowed his team to focus on the community truck sharing platform’s business needs rather than being burdened by their own technology. Fluid Truck Share is built on microservices architecture so that the engineering team can scale parts of the system individually. 

 

In your own words, describe what scalability means to you. 

Scalability is a characteristic of a software system or organization that describes its capability to perform well under an increased workload. A system that scales well can maintain its level of efficiency during increased operational demand.

Scalability has become increasingly relevant at Fluid Truck as we acquire more customers and expand into new markets. For this reason, we have migrated away from the traditional monolith application paradigm in favor of a microservice architecture. In a traditional monolith application, all system features are written into a single application. Often they are grouped by feature type, controllers or services. For example, a system may group all user registration and management under an authorization module. This module may contain its own set of services, repositories or models. But ultimately, they are still contained with a single application codebase. 

When certain areas of the codebase need to scale with an increase in user demand, monolith applications often require scaling the entire application. With microservices, the separation of system components allows parts of a system to scale individually.  

 

How do you build this tech with scalability in mind?

Fluid Truck has adopted Golang and Kubernetes for our microservicing needs. Golang’s lightweight nature has allowed us to focus on our business needs rather than being burdened by our technology. Its simplicity has allowed us to expand our infrastructure and maintain our platform by accommodating developers from all backgrounds. 

Moreover, we chose Go because of its simple concurrency model. In an environment where concurrency and parallelism is a must, Go routines have allowed us to scale processes across multiple processor cores using a simpler multi-threaded model for execution than what we previously had.

 

What tools or technologies does your team use to support scalability, and why? 

Kubernetes allows us to manage our infrastructure by deploying machine-agnostic microservices that can be replicated just about anywhere. It is an orchestration tool for containers that ensures our platform scales based upon demand. Microservices are easily scaled using a combination of load balancers and replication sets. We sought to automate our platform’s scalability by containerizing our microservices. Kubernetes was the right tool to help us manage this task.

 

The Trade Desk

Chris Jansen

SENIOR SOFTWARE ENGINEER

Chris Jansen

When Senior Software Engineer Chris Jansen first joined The Trade Desk six years ago, the adtech company was handling two million queries per second globally. Now they’re handling up to 11 million. Jansen said that, among other strategies, his team has had a lot of success refactoring their own code to reduce complexity and optimize memory usage.

 

In your own words, describe what scalability means to you. 

When I think of scalability, I think about our platform’s literal ability to scale. Scalability is important at The Trade Desk because, as we’ve often said internally, we’re only 2 percent done. Advertising is a $600 billion industry, and we’re always expanding our piece of the pie. To successfully grow as much and as quickly as we have, we have to consider scalability early. It’s built in to how we think about every feature we design and build here.

 

How do you build this tech with scalability in mind? 

A distributed architecture is a fundamental building block to scalability. It allows each major component to scale independently as we grow. For example, at The Trade Desk, the components that handle incoming advertising opportunities or bid requests have had to scale more quickly to account for new inventory sources such as connected television. Compare this with our UI, where user growth has been more linear.

Another core strategy for us is frequently analyzing central processing unit and memory performance to see where we can improve. We’ve had a lot of success refactoring our own code to reduce complexity and optimize memory usage.

 

What tools or technologies does your team use to support scalability, and why? 

We are increasingly turning to containerized components and tools like Kubernetes and Airflow for management and scaling. Containers are easier to manage and more flexible than dedicated servers. We’re also using Spark for our more data-dense analytics and machine learning.

 

Responses have been edited for length and clarity. Images via listed companies.