Pre-Emptive, Client-Side Rate Limiting
Use any API with enough volume and you’ll run into one of the thornier sides of APIs: rate limits. They can be a pain to work around, even if they are there for excellent security and scalability reasons. Rate limits are nothing to mess with; APIs will throttle your traffic for a short time and will even ban you permanently if you repeatedly fail to respect the limits. Getting banned means that you can’t use an API at all, and could be fatal for your business. As such, to ensure you avoid such punishments, you need a form of client-side throttling. How can we effectively use an API while ensuring we don’t get throttled or banned?
Almost all API rate limits can be broken down into three components: API clients are given a budget for the duration of a time window that they can spend on API calls, each of which have a cost associated with them. If you exceed the budget in the time window, you’ll get throttled and your requests will be dropped. Any decent API will have all this information displayed prominently in their documentation. As an example, GitHub’s rate limiting for authorized clients has a budget of 5,000 for a time window of one hour, where each call has a cost of one. This means we could make, at most, 5,000 API calls in one hour.
Our goal is to allow maximum throughput to the API without exceeding the budget. Ideally, we want to preemptively avoid making API calls if we’re confident they’ll get throttled. Unfortunately, we don’t have a way to perfectly predict if a given call will get throttled. As a result, we’ll have to make some (hopefully) good guesses. If we’re too conservative, we won’t be using the API to its full potential. If we’re too aggressive, we risk getting throttled or banned. The name of the game here is to get an estimate of our remaining budget that’s as close as possible to the actual remaining budget as determined by the API.
In order to do this, we’re going to create a proxy server to the API. The proxy will either make the API call and return its result, or decide preemptively to block the request to avoid getting throttled by the API itself and possibly getting banned. This has the benefit of being transparent to the caller, encapsulates all client-side throttling logic to the proxy server, and gives us a clean, global view of all API calls we’re making. Because we are confident that the proxy takes care of ensuring we don’t get throttled or banned, the rest of our code can be ignorant about it, and doesn’t need any major changes. The only minor change that would be needed to integrate our new proxy would be to simply change the endpoint it connects to.
Most APIs will, in their response, tell you what your current remaining budget for the time window is. For a first naive attempt, we will simply use that value as the singular source of truth. On every response we’ll save the returned remaining budget, and before each request we’ll check to make sure we can actually afford it. This solution, although very simple, breaks completely if we ever use the entire budget. Since we only update the remaining budget when we receive responses from the API, once we think we’ve used the entire budget, we’ll never make any more API requests, and the budget will never change. In a terrible catch-22, the only way to update our belief about being throttled would be to receive an API response without getting throttled!
A better idea is to keep a log of all requests and their associated costs that we’ve made in the past. To find out what we think our remaining budget is, we take every request that we’ve made in the current time window, sum up all their costs, and subtract that from the total budget. This works great because our estimated remaining budget will increase as previous calls we’ve made fade into the past, outside the current time window. The downside is that we’re completely ignoring the API telling us what our actual remaining budget is as the source of truth; depending on how the API implements its rate limiting, the two values could be significantly different. Maybe your code assumes the time windows end on the hour, but the exchange uses a sliding window. We have no way of knowing for sure. Such desyncs could lead to erroneously making way more requests than allowed and result in getting throttled or banned despite our efforts to avoid this fate.
To truly solve this problem, we’ll create a best of both worlds approach. Before each request, we’ll add a log saying that we’ve spent some of the budget. When the request returns, we’ll add a new log specifying the updated finalized remaining budget. To calculate our estimated remaining budget, we’ll go backwards through the log to find the most recent finalized entry, and then go forward to the end, subtracting the cost of all calls we’ve made since then. This approach is the strongest solution, as we use a combination of the API as the source of truth with optimistic estimation for everything that’s happened since. Any massive desyncs will be quickly corrected on every response we receive from the API, and we can accurately estimate our remaining budget as previous calls fade to the past outside the current time window.
We can make a variety of further improvements to this system. To increase burst throughput, we can allow support for multiple concurrent requests. If we estimate that a call will be throttled, we could return the estimated time at which the client should retry, so that the client can intelligently handle throttled calls. For shorter time windows, we could even buffer client connections, so that the client just waits a little longer instead of getting a dropped response. I’d use this approach with care, as it’s almost always better to fail fast than hang indefinitely. Failing fast puts the client in control of what to do, whereas just waiting around puts the client in a position of uncertainty and helplessness. Ideally, the caller should specify whether it wants to fail fast or is willing to wait indefinitely for a response.
Avoiding API throttling can be tricky, but it’s a procedure worth taking the time to implement properly considering how disastrous the consequences can be. This solution ensures that you likely never go over your API limits while still using the API maximally. A proxy server provides a clean and transparent way to get peace of mind that you won’t be banned from the API without complicating the rest of your codebase. In this world it’s throttle or be throttled, and preemptive client-side rate limiting might just save your business.