Rate limit

Introduction

It is a technique used to control the rate or frequency of incoming requests or API calls to a system or service. It is implemented to prevent abuse, protect system resources, ensure fair usage, and maintain the overall stability and performance of the system.

It helps you limit the sudden increase in the number of requests at any point in time. For instance, setting spike arrest policy rate to 10 per minute, it does the following calculations to limit the sudden spike/increase in the number of requests
```
10 per minute = 10 per 60 seconds = 1 per 6 seconds 
```
It will not allow more than 1 request every 6 seconds. In this way, we can ensure that all 10 requests are not made within the initial seconds of a minute.

It helps you limit the number of requests per time interval. For instance, setting quota policy rate to 10 per minute, it is possible to hit all 10 requests in the first few seconds of a minute.

In a static time window rate limit, a fixed time interval is defined, and the rate limit is applied within that interval. For example, let's consider a rate limit of 100 requests per minute. In a static time window approach, you would allow up to 100 requests to be made within every 1-minute interval. If a client exceeds this limit within that minute, they would be subject to rate limiting until the next minute starts.
For a static time window approach, It is only needed to keep track of the number of requests made within each fixed time window.

In a sliding time window rate limit, the rate limit is applied over a rolling or sliding time interval. Instead of fixed intervals, the rate limit is enforced over a continuous time window that moves with each request. For instance, let's assume a sliding time window rate limit of 100 requests per minute. In this approach, the system keeps track of the requests made within the last minute
It can help to prevent from the burst during a period of time
Each individual requests are tracked and stored in a list / queue

In the token bucket algorithm, a bucket is conceptualized as a container that holds a certain number of tokens. Tokens represent the units of capacity or permission to perform an action or make a request. The bucket is initially filled with a maximum number of tokens.
Tokens are added to the bucket at a constant rate, known as the refill rate
When a request or action is made, a certain number of tokens are required to perform that action. If there are enough tokens available in the bucket, the action is allowed, and the required number of tokens are consumed from the bucket. If there are not enough tokens available in the bucket, the action is rate-limited or delayed until enough tokens become available. The rate at which tokens are consumed from the bucket determines the rate at which actions can be performed or requests can be made.
Different api can have different token consumption, so as to make it deliver the resources more efficiently

Last updated 1 year ago

Was this helpful?