Rate Limiting Algorithms Explained

Hey there! So today I want to talk about something that's super important in system design but often gets overlooked until you're dealing with a massive traffic spike - rate limiting algorithms. Trust me, understanding these can save your backend from turning into a pile of burning servers!

When I first started learning about system design, rate limiting seemed like this mysterious black box. But once I dug deeper, I realized it's actually quite elegant. We'll explore five different algorithms, each with its own personality and use cases.

What Are We Covering?

We'll dive into these five rate limiting algorithms from a high level:

  • Token bucket
  • Leaking bucket
  • Fixed window counter
  • Sliding window log
  • Sliding window counter

Don't worry if these sound intimidating - by the end of this post, you'll understand when and why to use each one!

1. Token Bucket Algorithm

This is probably the most popular rate limiting algorithm out there. Companies like Amazon and Stripe use this approach, and for good reason - it's simple, well-understood, and handles traffic bursts gracefully.

How Does It Work?

Imagine you have a bucket that can hold a certain number of tokens. Here's the process:

  • Tokens are added to the bucket at a steady rate (like bucket is refilled after every minute)
  • When the bucket is full, extra tokens just overflow and are lost
  • Each request needs one token to proceed
  • If there's a token available, the request goes through and consumes the token
  • If no tokens are available, the request gets rejected

Here is how we'll implement this in code:

  • Check if window time has been elapsed since the last time counter was reset. For rate 100 req/min, current time 10:00:27, last reset time 9:59:00 is elapsed and 9:59:25 (10:00:27-9:59:25 > 60 sec) is not elapsed.
  • If window time is not elapsed, check if the user has sufficient requests left to handle the incoming request.
  • If window is not elapsed then check if there are sufficient number of tokens(requests) left.
  • If there are sufficient tokens, decrement the token count and allow the request.
  • If there are no tokens left, reject the request.
// Simple token bucket implementation concept
async tokenBucket(ip: string): Promise {
    const currentTime = Date.now();
    const key = `token:${ip}`;

    try {
      // Get the current token count and last refill time
      const [tokenCount, lastRefill] = await this.redisClient.hmGet(key, [
        "tokenCount",
        "lastRefill",
      ]);
      const currentTokenCount = tokenCount
        ? parseInt(tokenCount)
        : this.maxRequests;
      const lastRefillTime = lastRefill ? parseInt(lastRefill) : 0;
      const elapsed = currentTime - lastRefillTime;

      // Calculate if we need to refill tokens based on elapsed time
      if (elapsed >= this.windowMs) {
        const newTokenCount = this.maxRequests - 1; // Refill tokens using one for the current request
        await this.redisClient.hSet(key, {
          tokenCount: newTokenCount.toString(),
          lastRefill: currentTime.toString(),
        });
        return true; // Request allowed after refill
      }

      if (currentTokenCount <= 0) {
        console.log(`Rate limit exceeded for IP: ${ip}`);
        return false; // Rate limit exceeded
      }

      // Decrement token count for the current request
      await this.redisClient.hSet(key, {
        tokenCount: (currentTokenCount - 1).toString(),
        lastRefill: lastRefillTime.toString(),
      });
      
      // Set expiration for cleanup
      await this.redisClient.expire(key, Math.ceil(this.windowMs / 1000) + 60);
      return true; // Request allowed
    } catch (err) {
      console.error("Error accessing Redis:", err);
      return false; // Fail closed - deny request on Redis error
    }
  }

Parameters You Need

  • bucket size (this.maxRequests): maximum number of tokens the bucket can hold
  • refill rate (this.windowMs): after how many milliseconds the bucket refills

How Many Buckets Do You Need?

This depends on your requirements:

  • Different API endpoints might need separate buckets (1 post/sec, 150 friends/day, 5 likes/sec)
  • If you're limiting by IP address, each IP gets its own bucket
  • For global limits (like 10k requests/sec total), use one shared bucket

Pros and Cons

Pros:

  • Super easy to implement
  • Memory efficient

Cons:

  • Tuning the bucket size and refill rate can be tricky
  • Allows traffic bursts (not good for handling sudden spikes)

2. Leaking Bucket Algorithm

This one's similar to token bucket but with a key difference - requests are processed at a fixed, steady rate. Think of it like a FIFO queue with a controlled output rate.

How It Works

  • Incoming requests go into a queue (the "bucket")
  • If the queue is full, new requests get dropped
  • Requests are processed from the queue at regular intervals
// Leaking bucket concept
async leakyBucket(ip: string): Promise {
    const currentTime = Date.now();
    const key = `leaky:${ip}`;

    try {
      // Get the last request time and bucket level
      const [lastRequest, bucketLevel] = await this.redisClient.hmGet(key, [
        "lastRequest",
        "bucketLevel",
      ]);

      const lastRequestTime = lastRequest ? parseInt(lastRequest) : 0;
      const currentBucketLevel = bucketLevel ? parseInt(bucketLevel) : 0;

      /**
       * Leaky bucket calculations you should consider:
       *
       * Leak Rate: How much time for each request to leak out of the bucket.
       * Leak Rate = Time of each window / maximum allowed requests
       * Example: If window is 10 seconds and max requests is 5, then leak rate is 2 seconds per request.
       * This means every 2 seconds, one request can leak out of the bucket.
       *
       * Leak amount: How many requests can leak out based on the elapsed time since the last request.
       * leakedAmount = elapsed / leakRate;
       */
      const elapsed = currentTime - lastRequestTime;
      const leakedAmount = Math.floor(
        elapsed / (this.windowMs / this.maxRequests)
      );
      const newBucketLevel = Math.max(0, currentBucketLevel - leakedAmount);

      // Check if we can allow the request
      if (newBucketLevel >= this.maxRequests) {
        console.log(`Rate limit exceeded for IP: ${ip}`);
        return false; // Rate limit exceeded
      }

      // Update the bucket level and last request time
      await this.redisClient.hSet(key, {
        lastRequest: currentTime.toString(),
        bucketLevel: (newBucketLevel + 1).toString(),
      });

      // Set expiration for cleanup
      await this.redisClient.expire(key, Math.ceil(this.windowMs / 1000) + 60);
      return true; // Request allowed
    } catch (err) {
      console.error("Error accessing Redis:", err);
      return false; // Fail closed - deny request on Redis error
    }
  }

Parameters

  • Bucket size (this.maxRequests): the queue size (how many requests can wait)
  • Leak rate: how many requests are processed per second

Pros and Cons

Pros:

  • Memory efficient with fixed queue size
  • Provides stable output rate (great for downstream systems)

Cons:

  • Can't handle bursts well - everything gets smoothed out
  • Newer requests might wait behind older ones

3. Fixed Window Counter Algorithm

This is probably the simplest algorithm to understand and implement. Time is divided into fixed windows, and each window has a request counter.

How It Works

  • Divide time into fixed intervals (like 1-minute windows)
  • Each window allows a fixed number of requests
  • Keep a counter for each window
  • When counter reaches the limit, reject new requests for that window
fixed window illustration
// Fixed window counter
async fixedWindow(ip: string): Promise {
    const currentTime = Date.now();
    // Calculate window start time for fixed window algorithm
    const windowStart = Math.floor(currentTime / this.windowMs) * this.windowMs;
    const key = `${ip}:${windowStart}`;

    try {
      const reqCount = await this.redisClient.GET(key);

      // Initialize new window with first request
      if (reqCount === null) {
        await this.redisClient.SET(key, 1, {
          expiration: { type: "EX", value: this.windowMs },
        });
      }

      // Check if rate limit exceeded
      if (reqCount && parseInt(reqCount) >= this.maxRequests) {
        console.log(`Rate limit exceeded for IP: ${ip}`);
        return false; // Rate limit exceeded
      }

      // Increment request count for this window
      await this.redisClient.INCR(key);
      return true; // Request allowed
    } catch (err) {
      console.error("Error accessing Redis:", err);
      return false; // Fail closed - deny request on Redis error
    }
  }

The Edge Problem

Here's where things get tricky. Let's say you allow 5 requests per minute. What happens if someone sends 5 requests at 2:00:30 and another 5 requests at 2:01:30? That's 10 requests in just 1 minute, even though each individual window stayed within limits!

This "edge effect" is the main weakness of fixed windows - traffic bursts at window boundaries can bypass your limits.


4. Sliding Window Log Algorithm

This algorithm fixes the edge problem from fixed windows by keeping track of individual request timestamps.

How It Works

  • Keep a log of all request timestamps
  • When a new request comes in, remove all timestamps older than the window
  • Add the new request's timestamp to the log
  • If log size is within limits, allow the request; otherwise, reject it
Sliding logs illustration
// Sliding window log
async slidingLogs(ip: string): Promise {
    const currentTime = Date.now();
    const key = `sliding:${ip}`;

    try {
      // Remove old timestamps outside the sliding window
      await this.redisClient.zRemRangeByScore(
        key,
        0,
        currentTime - this.windowMs
      );

      // Get current count of requests in the sliding window
      const requestCount = await this.redisClient.zCard(key);
      
      // Add current request timestamp to sorted set
      await this.redisClient.zAdd(key, {
        score: currentTime,
        value: currentTime.toString(),
      });

      // Check if rate limit would be exceeded
      if (requestCount >= this.maxRequests) {
        console.log(`Rate limit exceeded for IP: ${ip}`);
        return false; // Rate limit exceeded
      }


      // Set expiration for cleanup - key expires after window + buffer time
      await this.redisClient.expire(key, Math.ceil(this.windowMs / 1000) + 60);
      return true; // Request allowed
    } catch (err) {
      console.error("Error accessing Redis:", err);
      return false; // Fail closed - deny request on Redis error
    }
  }

Notice something important: even if a request gets rejected, its timestamp might still be logged for future calculations.

Pros and Cons

Pros:

  • Very accurate - no edge effects
  • Works perfectly for any rolling time window

Cons:

  • Memory hungry - stores every request timestamp
  • Can be expensive to maintain for high-traffic systems

5. Sliding Window Counter Algorithm

This is the hybrid approach - combining the memory efficiency of fixed windows with the accuracy of sliding windows. It's a clever compromise!

How It Works

We use this formula:

Requests in current window + (requests in previous window × overlap percentage)

For example, if we're 70% through the current window:
3 (current) + 5 (previous) × 0.7 = 6.5 requests

If the maximum allowed requests is 10, we can allow a new coming request (because only 6.5 requests are passed till now as per our calculations). If maximum allowed requests was 6, we would reject any new request.

Sliding logs illustration
// Sliding window counter
class SlidingWindowCounter {
    constructor(windowSize, maxRequests) {
        this.windowSize = windowSize * 1000; // convert to ms
        this.maxRequests = maxRequests;
        this.windows = new Map();
    }
    
    allowRequest() {
        const now = Date.now();
        const currentWindow = Math.floor(now / this.windowSize);
        const previousWindow = currentWindow - 1;
        
        // calculate overlap percentage
        const windowProgress = (now % this.windowSize) / this.windowSize;
        
        const currentCount = this.windows.get(currentWindow) || 0;
        const previousCount = this.windows.get(previousWindow) || 0;
        
        // estimated requests in sliding window
        const estimatedCount = currentCount + (previousCount * (1 - windowProgress));
        
        if (estimatedCount < this.maxRequests) {
            this.windows.set(currentWindow, currentCount + 1);
            return true;  // request allowed
        }
        
        return false;     // request rejected
    }
}

Pros and Cons

Pros:

  • Smooths out traffic spikes
  • Memory efficient
  • Good approximation of true sliding window

Cons:

  • Not 100% accurate (assumes even distribution)
  • Might not work for super strict requirements

But here's the thing - according to Cloudflare's experiments with 400 million requests, only 0.003% of requests were incorrectly handled. That's pretty darn good!

Which Algorithm Should You Choose?

Here's my personal take on when to use each:

  • Token bucket: when you need to allow bursts but want overall rate control (most API rate limiting)
  • Leaking bucket: when you need steady, predictable output (message queues, batch processing)
  • Fixed window: when simplicity matters more than perfect accuracy (internal services)
  • Sliding window log: when you need perfect accuracy and can handle the memory cost (financial systems)
  • Sliding window counter: when you want good accuracy with reasonable memory usage (most production systems)

Wrapping Up

Rate limiting isn't just about protecting your servers - it's about creating a fair and predictable experience for all your users. Each algorithm has its sweet spot, and understanding these trade-offs will help you make better architectural decisions.

I hope this breakdown was helpful! Next time you're designing a system and someone asks "how do we handle rate limiting?", you'll have a whole toolkit of solutions to choose from.

Github Link: https://github.com/Harsh2509/rate-limiter

Happy coding! 🚀


- Harsh Chauhan