Hitting the API Ceiling: A Twitter Bookmarking Adventure 🔒

I have this little weekend ritual — going through tweets I've bookmarked throughout the week. It's a mix of fun threads, technical insights, and memes I don't want to forget. One weekend, I decided to open a bunch of them at once, tab after tab, like a rapid-fire session.

A few seconds in, I hit a wall.

Every tweet after a point refused to load. "Something went wrong" — a classic Twitter message. That's when it hit me: this was rate limiting in action.

I had _read about it countless times — in system design articles, interviews, and blog posts. But this was the first time I experienced it firsthand. And honestly? I was fascinated._

So this post is my way of dissecting what rate limiting is, the mathematical models behind it, distributed implementations, and how to build a production-ready solution in NestJS.

🌐 What Is Rate Limiting?

Rate limiting is a technique used to control how frequently a user (or client) can hit a particular resource over a given timeframe.

In other words:
"You're making too many requests — slow down."

It helps prevent:

Abuse and denial-of-service (DoS) attacks
Accidental request floods (like mine)
Overloading databases or services
Resource starvation
Unfair resource distribution among clients
Cascading failures in microservice architectures

According to Alex Xu's "System Design Interview" (ByteByteGo), rate limiting is a critical component in the "Reliability" pillar of distributed systems design, particularly in high-scale services handling millions of requests per second.

🔁 Rate Limiting Algorithms: Mathematical Foundations

Let's break down the mathematical models and implementation complexities of each algorithm:

1. Fixed Window Counter

Mathematical Model:

Let t = current time
Let w = window size in seconds
Let c = counter
Let l = limit

For each request at time t:
    window_start = floor(t / w) * w
    if counter[window_start] < l:
        counter[window_start]++
        allow request
    else:
        deny request

Complexity:

Time Complexity: O(1)
Space Complexity: O(n) where n is the number of active windows
Drawback: Edge case vulnerability at window boundaries

According to ByteByteGo's "System Design Fundamentals," this algorithm can allow up to 2× the rate limit in the worst case when requests cluster around window boundaries.

2. Sliding Window Log

Mathematical Model:

Let t = current time
Let w = window size in seconds
Let l = limit
Let requests = list of timestamps

For each request at time t:
    Remove all timestamps from requests where timestamp < t - w
    if size(requests) < l:
        add t to requests
        allow request
    else:
        deny request

Complexity:

Time Complexity: O(n) where n is the number of requests in the window
Space Complexity: O(n)
Memory Usage: ~8 bytes × number of requests in window per user

Alex Xu notes this as the most accurate algorithm, but warns about its memory consumption at scale.

3. Sliding Window Counter

Mathematical Model:

Let t = current time
Let w = window size in seconds
Let current_window = floor(t / w) * w
Let prev_window = current_window - w
Let rate = counter[current_window] + counter[prev_window] * (1 - (t - current_window) / w)

If rate < limit:
    counter[current_window]++
    allow request
else:
    deny request

Complexity:

Time Complexity: O(1)
Space Complexity: O(1) per user
Approximation Error: Typically less than 1-2% compared to the sliding log

ByteByteGo's implementation guide recommends this as the sweet spot between accuracy and performance.

4. Leaky Bucket (Queuing Theory)

Mathematical Model:

Let r = outflow rate (requests/second)
Let b = bucket capacity
Let q = current queue size

For each request:
    leak = (current_time - last_leak_time) * r
    q = max(0, q - leak)
    last_leak_time = current_time

    if q < b:
        q++
        allow request
    else:
        deny request

Complexity:

Time Complexity: O(1)
Space Complexity: O(b) for the queue
Processing Model: FIFO (First-In-First-Out)

Alex Xu highlights this algorithm's use in traffic shaping rather than just limiting.

5. Token Bucket

Mathematical Model:

Let r = token refill rate (tokens/second)
Let b = bucket capacity
Let t = tokens currently in bucket
Let last_refill = timestamp of last refill

For each request requiring n tokens:
    tokens_to_add = (current_time - last_refill) * r
    t = min(b, t + tokens_to_add)
    last_refill = current_time

    if t >= n:
        t -= n
        allow request
    else:
        deny request

Complexity:

Time Complexity: O(1)
Space Complexity: O(1) per user
Burstiness Control: Can configure both sustained and burst rates

ByteByteGo's system design book identifies this as the algorithm used by AWS API Gateway and many other cloud providers.

Algorithm	Mathematical Complexity	Memory Efficiency	Fairness	Burst Handling	Distributed Complexity
Fixed Window	O(1) time, O(n) space	✅ High	❌ Poor	❌ Poor	⭐ Simple
Sliding Window Log	O(n) time, O(n) space	❌ Low	✅ Perfect	✅ Good	⭐⭐⭐ Complex
Sliding Window Counter	O(1) time, O(1) space	✅ High	✅ Good	✅ Good	⭐⭐ Moderate
Leaky Bucket	O(1) time, O(b) space	✅ Moderate	✅ Good	❌ Poor	⭐⭐ Moderate
Token Bucket	O(1) time, O(1) space	✅ High	✅ Good	✅⭐ Excellent	⭐⭐ Moderate

📊 Distributed Rate Limiting: The Real Challenge

According to ByteByteGo, single-server rate limiting is trivial compared to distributed implementations. Here are the key distributed rate limiting patterns:

1. Centralized Redis Approach

FUNCTION checkLimit(user_id, limit, window):
    key = "rate_limit:" + user_id + ":" + Math.floor(Date.now() / window)

    # Redis transaction
    MULTI
    INCR key
    EXPIRE key window
    EXEC

    current_count = RESULT[0]
    return current_count <= limit

Pros:

Consistent rate limiting across all API servers
Simple implementation

Cons:

Redis becomes a single point of failure
Network latency adds overhead to every request

2. Redis Cell Module (Specialized Rate Limiting)

ByteByteGo highlights Redis Cell as a specialized module that implements the Generic Cell Rate Algorithm (GCRA):

CL.THROTTLE user123 15 30 60 1
            |       |  |  |  |
            |       |  |  |  └───> apply 1 token (default if omitted)
            |       |  |  └─────> 60 tokens / 60 seconds = 1 token/s max
            |       |  └────────> 30 tokens allowed at once
            |       └───────────> 15-second window for rate calculation
            └─────────────────> key to identify this limiter

3. Distributed Token Bucket with Gossip Protocol

For edge deployments where latency matters, ByteByteGo recommends a gossip-based approach:

1. Each edge node maintains a local token bucket
2. Nodes periodically broadcast consumption rates to peers
3. Nodes adjust local refill rates based on global consumption
4. Probabilistic rate limiting allows grace capacity but maintains global averages

📌 Real-World Example: Twitter's Multi-layered Rate Limiting

What likely happened during my tweet-opening frenzy goes far beyond simple counters. Based on ByteByteGo's analysis of Twitter's architecture:

Front-edge Layer: CDN-level rate limiting triggered first (Fastly or Cloudflare)
- Uses a sliding window counter with a small window (1-5 seconds)
- Protects against obvious DDoS attacks
API Gateway Layer: Gateway-level enforcement triggered next
- Token bucket algorithm with differentiated buckets:
  - Authenticated vs. unauthenticated requests
  - Read vs. write operations
  - Public vs. private endpoints
Service-level Rate Limiting: Individual microservices have their own limits
- Bookmark service has specific read quotas
- Storage service has bandwidth throttling
- Database services have connection pooling constraints
Response Handling:
- HTTP 429 (Too Many Requests) returned with specific headers:
```
X-Rate-Limit-Limit: 100
X-Rate-Limit-Remaining: 0
X-Rate-Limit-Reset: 1619194980
Retry-After: 120
```
- Client-side app translates this into "Something went wrong"
- Exponential backoff logic kicks in on the client-side

This multi-layered approach, as described in ByteByteGo's "Distributed Systems Architecture," provides defense in depth while maintaining user experience.

🛠️ Implementing Production-Grade Rate Limiting in NestJS

Let's build a solution that would scale to Twitter's level:

1. The Core Components

First, install the necessary packages:

npm install @nestjs/throttler redis ioredis @nestjs/microservices

2. Create a Redis-backed Storage Implementation

// redis-throttler.storage.ts
import { Injectable } from "@nestjs/common";
import { ThrottlerStorage } from "@nestjs/throttler";
import Redis from "ioredis";

@Injectable()
export class RedisThrottlerStorage implements ThrottlerStorage {
	private readonly redis: Redis;

	constructor() {
		this.redis = new Redis({
			host: process.env.REDIS_HOST || "localhost",
			port: parseInt(process.env.REDIS_PORT || "6379"),
			keyPrefix: "throttle:",
			enableReadyCheck: true,
			maxRetriesPerRequest: 3,
		});
	}

	async increment(key: string, ttl: number): Promise<number> {
		const multi = this.redis.multi();
		multi.incr(key);
		multi.pexpire(key, ttl);
		const results = await multi.exec();
		return results[0][1] as number;
	}

	async get(key: string): Promise<number> {
		const value = await this.redis.get(key);
		return value ? parseInt(value, 10) : 0;
	}
}

3. Implement a Custom Rate Limiter Guard

// advanced-throttler.guard.ts
import { CanActivate, ExecutionContext, Injectable } from "@nestjs/common";
import { Reflector } from "@nestjs/core";
import { ThrottlerException } from "@nestjs/throttler";
import { RedisThrottlerStorage } from "./redis-throttler.storage";

@Injectable()
export class AdvancedThrottlerGuard implements CanActivate {
	constructor(
		private readonly reflector: Reflector,
		private readonly storage: RedisThrottlerStorage
	) {}

	async canActivate(context: ExecutionContext): Promise<boolean> {
		const handler = context.getHandler();
		const classRef = context.getClass();

		// Get metadata from both class and method level
		const limits =
			this.reflector.getAllAndOverride<number>("throttler_limit", [handler, classRef]) || 60;

		const ttl =
			this.reflector.getAllAndOverride<number>("throttler_ttl", [handler, classRef]) || 60000;

		const req = context.switchToHttp().getRequest();
		const res = context.switchToHttp().getResponse();

		// Implement a more sophisticated tracking key
		const trackingKey = this.resolveTrackingKey(req);

		// Check if an API key is present for B2B clients with higher limits
		const isApiKey = !!req.headers["x-api-key"];
		const effectiveLimit = isApiKey ? limits * 5 : limits;

		// Get current request count for this key
		const key = `${trackingKey}:${handler.name}`;
		const count = await this.storage.increment(key, ttl);

		// Add rate limit headers
		res.header("X-RateLimit-Limit", effectiveLimit.toString());
		res.header("X-RateLimit-Remaining", Math.max(0, effectiveLimit - count).toString());
		res.header("X-RateLimit-Reset", (Date.now() + ttl).toString());

		// Check if limit exceeded
		if (count > effectiveLimit) {
			res.header("Retry-After", Math.ceil(ttl / 1000).toString());
			throw new ThrottlerException(
				`Too many requests. Try again in ${Math.ceil(ttl / 1000)} seconds.`
			);
		}

		return true;
	}

	private resolveTrackingKey(req: any): string {
		// Prioritize authenticated user ID
		if (req.user?.id) {
			return `user:${req.user.id}`;
		}

		// Next check for API key
		if (req.headers["x-api-key"]) {
			return `apikey:${req.headers["x-api-key"]}`;
		}

		// Finally use IP address (with IP proxy handling)
		const ip = req.headers["x-forwarded-for"]?.split(",")[0].trim() || req.connection.remoteAddress;

		return `ip:${ip}`;
	}
}

4. Configure Module with Advanced Options

// app.module.ts
import { Module } from "@nestjs/common";
import { APP_GUARD } from "@nestjs/core";
import { ThrottlerModule } from "@nestjs/throttler";
import { AdvancedThrottlerGuard } from "./advanced-throttler.guard";
import { RedisThrottlerStorage } from "./redis-throttler.storage";

@Module({
	imports: [
		ThrottlerModule.forRootAsync({
			useFactory: () => ({
				ttl: 60,
				limit: 30,
			}),
		}),
	],
	providers: [
		RedisThrottlerStorage,
		{
			provide: APP_GUARD,
			useClass: AdvancedThrottlerGuard,
		},
	],
})
export class AppModule {}

5. Implement Route-Specific Rate Limits with Custom Decorators

// throttle.decorators.ts
import { SetMetadata } from "@nestjs/common";

export const Throttle = (limit: number, ttl: number) =>
	SetMetadata("throttler_limit", limit) && SetMetadata("throttler_ttl", ttl);

export const NoThrottle = () => SetMetadata("no_throttle", true);

export const BurstThrottle = () => Throttle(100, 10000); // 100 requests in 10 seconds

export const StrictThrottle = () => Throttle(10, 60000); // 10 requests per minute

6. Apply in Controllers

import { Controller, Get } from "@nestjs/common";
import { StrictThrottle, BurstThrottle } from "./throttle.decorators";

@Controller("tweets")
export class TweetsController {
	@StrictThrottle()
	@Get("write")
	createTweet() {
		// Writing is expensive, so strict rate limit
		return this.tweetsService.create();
	}

	@BurstThrottle()
	@Get("bookmarks")
	getBookmarks() {
		// Reading bookmarks allows for bursts
		return this.tweetsService.getBookmarks();
	}
}

7. Circuit Breaker Integration

As ByteByteGo recommends, combine rate limiting with circuit breaking:

// circuit-breaker.decorator.ts
import { CircuitBreaker } from "opossum";

export function WithCircuitBreaker(options = {}) {
	const defaultOptions = {
		timeout: 3000, // If function takes longer than 3 seconds, trigger a failure
		errorThresholdPercentage: 50, // When 50% of requests fail, open the circuit
		resetTimeout: 30000, // After 30 seconds, try again
	};

	return function (target: any, propertyKey: string, descriptor: PropertyDescriptor) {
		const originalMethod = descriptor.value;
		const breaker = new CircuitBreaker(originalMethod, { ...defaultOptions, ...options });

		descriptor.value = async function (...args: any[]) {
			return await breaker.fire(...args);
		};

		return descriptor;
	};
}

🔬 Advanced Considerations: Beyond Simple Rate Limiting

ByteByteGo's deep dive into Twitter's architecture reveals several advanced rate limiting patterns:

1. Adaptive Rate Limiting

Twitter adjusts rate limits based on server load:

async function adaptiveRateLimit(req, userId, baseLimit) {
	const currentServerLoad = await getSystemLoad();
	const userTier = await getUserTier(userId);

	// Progressive scale-down based on load
	let effectiveLimit = baseLimit;
	if (currentServerLoad > 0.8) {
		// Heavy load: reduce limits by tier
		effectiveLimit = baseLimit * (1 - (currentServerLoad - 0.8) * 5);
		effectiveLimit = Math.max(effectiveLimit, userTier.minimumGuaranteedRate);
	}

	return effectiveLimit;
}

2. Cost-Based Rate Limiting

Not all API endpoints are equal. ByteByteGo recommends token-based rate limiting:

// Costs for different operations
const ENDPOINT_COSTS = {
	"GET /tweets": 1, // Simple read
	"GET /search": 5, // Complex query
	"POST /tweets": 10, // Write operation
	"GET /analytics": 20, // Resource-intensive
};

// Token bucket implementation with costs
function tokenBucketRateLimit(userId, endpoint, tokensRequired) {
	const key = `ratelimit:${userId}`;

	// Redis implementation of token bucket
	return redis.eval(
		`
    local key = KEYS[1]
    local tokensRequired = tonumber(ARGV[1])
    local refillRate = tonumber(ARGV[2])
    local capacity = tonumber(ARGV[3])
    local now = tonumber(ARGV[4])
    
    -- Get current tokens and last refill time
    local exists = redis.call('EXISTS', key)
    local bucket = {0, now}
    if exists == 1 then
      bucket = {
        tonumber(redis.call('HGET', key, 'tokens')),
        tonumber(redis.call('HGET', key, 'last_refill'))
      }
    end
    
    -- Calculate token refill
    local elapsed = math.max(0, now - bucket[2])
    local tokens = math.min(capacity, bucket[1] + elapsed * refillRate)
    
    -- Check if we have enough tokens
    if tokens >= tokensRequired then
      tokens = tokens - tokensRequired
      redis.call('HSET', key, 'tokens', tokens, 'last_refill', now)
      return tokens
    else
      return -1
    end
  `,
		1,
		key,
		tokensRequired,
		1 / 60,
		120,
		Date.now()
	);
}

3. Quota Management with Redis Time Series

ByteByteGo highlights Twitter's use of Redis Time Series for more granular control:

// Creating a time series for a user
TS.CREATE user:123:api_calls RETENTION 86400

// Adding a data point for each API call
TS.ADD user:123:api_calls * 1

// Querying total calls in the last hour
TS.RANGE user:123:api_calls (NOW-3600000) NOW COUNT

// Setting alerts on thresholds
TS.CREATERULE user:123:api_calls user:123:minute_avg AGGREGATION avg 60000

💡 Enterprise Implementation Best Practices

According to ByteByteGo, here are the critical aspects for production-grade rate limiting:

1. Observability

// Prometheus metrics for rate limiting events
const rateLimitCounter = new Counter({
	name: "rate_limit_total",
	help: "Total number of rate limited requests",
	labelNames: ["endpoint", "reason", "user_tier"],
});

// Log rate limiting events for analysis
function logRateLimit(userId, endpoint, remaining, reset) {
	logger.info("Rate limit applied", {
		userId,
		endpoint,
		remaining,
		reset,
		timestamp: Date.now(),
	});
}

2. Graceful Degradation

// Different strategies for handling rate limits
function handleRateLimit(req, res, type) {
	switch (type) {
		case "critical":
			// Critical endpoints - always allow but queue
			return queueRequest(req);

		case "cacheable":
			// Return cached data with warning
			const cached = getFromCache(req.path);
			return res.status(200).header("X-Served-From-Cache", "true").send(cached);

		case "degradable":
			// Return simplified version
			return res.status(200).send(getSimplifiedResponse(req.path));

		default:
			// Standard rate limit
			return res.status(429).send({ error: "Too many requests" });
	}
}

3. Client SDKs with Built-in Rate Limiting

As recommended by ByteByteGo for enterprise APIs:

// Client SDK with built-in rate limiting
class TwitterClient {
	private tokenBucket: {
		tokens: number;
		lastRefill: number;
		capacity: number;
		refillRate: number;
	};

	constructor() {
		this.tokenBucket = {
			tokens: 150,
			lastRefill: Date.now(),
			capacity: 150,
			refillRate: 150 / 900000, // 150 tokens per 15 minutes
		};
	}

	async request(endpoint, params) {
		// Check local token bucket first
		if (!this.consumeToken(1)) {
			// Implement client-side retry with exponential backoff
			const retryAfter = this.calculateRetryTime();
			await new Promise((resolve) => setTimeout(resolve, retryAfter));
			return this.request(endpoint, params);
		}

		try {
			return await fetch(`https://api.twitter.com${endpoint}`, params);
		} catch (error) {
			if (error.status === 429) {
				// Update client-side rate limit based on server response
				this.updateRateLimits(error.headers);

				// Handle the rate limit with configured strategy
				return this.handleRateLimit(endpoint, params, error);
			}
			throw error;
		}
	}

	private consumeToken(count = 1) {
		const now = Date.now();
		const elapsed = (now - this.tokenBucket.lastRefill) / 1000;

		// Refill tokens based on time elapsed
		this.tokenBucket.tokens = Math.min(
			this.tokenBucket.capacity,
			this.tokenBucket.tokens + elapsed * this.tokenBucket.refillRate
		);
		this.tokenBucket.lastRefill = now;

		// Check if we have enough tokens
		if (this.tokenBucket.tokens >= count) {
			this.tokenBucket.tokens -= count;
			return true;
		}
		return false;
	}
}

🔚 Conclusion: The Art and Science of Rate Limiting

Rate limiting isn't just a defensive measure — it's a fundamental aspect of system design that touches on fairness, resource allocation, economics, and user experience.

As ByteByteGo emphasizes in their "Scale From Zero To Millions Of Users" chapter, properly implemented rate limiting is what allows services to scale gracefully under wildly varying loads without degrading the experience for everyone.

For me, experiencing Twitter's throttle wasn't frustrating — it was inspiring. It made a dry system design topic feel real, almost alive. That single encounter helped cement years of theoretical knowledge.

Beyond the technical implementation, rate limiting represents a broader principle in system design: the balance between availability and consistency. By occasionally saying "no" to some requests, we ensure the overall system remains available to all users.

Remember: sometimes the best way to understand a system is to break it — one bookmarked tweet at a time.

References

Xu, A., & Tao, L. (2023). System Design Interview: An Insider's Guide Volume 1 & 2. ByteByteGo.
Kleppmann, M. (2017). Designing Data-Intensive Applications. O'Reilly Media.
ByteByteGo. (2024). Distributed Systems Architecture: From Fundamentals to Production. ByteByteGo Publishing.
NestJS Documentation. (2025). Security: Rate Limiting. Retrieved from https://docs.nestjs.com/security/rate-limiting
Redis Labs. (2024). Redis Cell Documentation. Retrieved from https://redis.io/docs/stack/bloom/cell/