System Design Theory
1. Load Balancing
Theory
Load balancing refers to the process of distributing network traffic across multiple servers to ensure high availability and reliability by preventing any single server from becoming a bottleneck.
Types
-
Layer 4 (Transport Layer)
- Works with network/transport layer protocols (TCP/UDP)
- Faster but less flexible
- Based on IP address and port numbers
-
Layer 7 (Application Layer)
- Content-aware routing
- More flexible but higher latency
- Can route based on URL, headers, cookies
Algorithms
interface LoadBalancer {
// Round Robin
roundRobin(): Server {
currentIndex = (currentIndex + 1) % servers.length;
return servers[currentIndex];
}
// Least Connections
leastConnections(): Server {
return servers.reduce((min, server) =>
server.connections < min.connections ? server : min
);
}
// IP Hash
ipHash(clientIP: string): Server {
const hash = createHash(clientIP);
return servers[hash % servers.length];
}
}
2. Caching
Theory
Caching is a technique that stores copies of frequently accessed data in a high-speed data storage layer to improve retrieval times and reduce database load.
Caching Strategies
- Cache-Aside (Lazy Loading)
class CacheAside {
async get(key: string): Promise<Data> {
// Check cache first
let data = await cache.get(key);
if (data === null) {
// Cache miss: get from DB and update cache
data = await db.get(key);
await cache.set(key, data);
}
return data;
}
}
- Write-Through
class WriteThrough {
async set(key: string, value: Data): Promise<void> {
// Write to DB first
await db.set(key, value);
// Then update cache
await cache.set(key, value);
}
}
- Write-Behind (Write-Back)
class WriteBehind {
async set(key: string, value: Data): Promise<void> {
// Write to cache immediately
await cache.set(key, value);
// Queue DB update
await writeQueue.push({ key, value });
}
private async processWriteQueue() {
while (true) {
const { key, value } = await writeQueue.pop();
await db.set(key, value);
}
}
}
3. Database Sharding
Theory
Sharding is a database architecture pattern that splits large databases into smaller, faster, more manageable pieces called shards based on some partition key.
Sharding Strategies
- Hash-Based Sharding
class HashSharding {
private shardCount: number;
constructor(shardCount: number) {
this.shardCount = shardCount;
}
getShardId(key: string): number {
const hash = createHash(key);
return hash % this.shardCount;
}
}
- Range-Based Sharding
class RangeSharding {
private ranges: Array<{ min: number; max: number; shardId: number }>;
getShardId(value: number): number {
const shard = this.ranges.find(range =>
value >= range.min && value <= range.max
);
return shard.shardId;
}
}
4. Message Queues
Theory
Message queues provide asynchronous communication and decoupling between components in a distributed system.
Patterns
- Publisher-Subscriber
class PubSub {
private subscribers: Map<string, Function[]> = new Map();
publish(topic: string, message: any) {
const topicSubscribers = this.subscribers.get(topic) || [];
topicSubscribers.forEach(subscriber => subscriber(message));
}
subscribe(topic: string, callback: Function) {
if (!this.subscribers.has(topic)) {
this.subscribers.set(topic, []);
}
this.subscribers.get(topic).push(callback);
}
}
- Point-to-Point
class Queue {
private messages: any[] = [];
async send(message: any) {
this.messages.push(message);
}
async receive(): Promise<any> {
return this.messages.shift();
}
}
5. Consistent Hashing
Theory
Consistent hashing is a technique used to distribute data across nodes in a way that minimizes reorganization when nodes are added or removed.
Implementation
class ConsistentHashing {
private nodes: Map<number, string> = new Map();
private replicas: number;
constructor(replicas: number) {
this.replicas = replicas;
}
addNode(node: string) {
for (let i = 0; i < this.replicas; i++) {
const hash = this.getHash(`${node}-${i}`);
this.nodes.set(hash, node);
}
}
removeNode(node: string) {
for (let i = 0; i < this.replicas; i++) {
const hash = this.getHash(`${node}-${i}`);
this.nodes.delete(hash);
}
}
getNode(key: string): string {
const hash = this.getHash(key);
const nodeHashes = Array.from(this.nodes.keys()).sort((a, b) => a - b);
for (const nodeHash of nodeHashes) {
if (nodeHash >= hash) {
return this.nodes.get(nodeHash);
}
}
// Wrap around to first node if key hash is larger than all node hashes
return this.nodes.get(nodeHashes[0]);
}
}
6. CAP Theorem
Theory
The CAP theorem states that a distributed system can only provide two of the following three guarantees:
- Consistency: All nodes see the same data at the same time
- Availability: Every request receives a response
- Partition tolerance: System continues to operate despite network failures
Application
interface SystemRequirements {
consistency: 'strong' | 'eventual';
availability: 'high' | 'medium' | 'low';
partitionTolerance: boolean;
}
class SystemDesign {
determineSystemType(requirements: SystemRequirements) {
if (requirements.partitionTolerance) {
// In practice, partition tolerance is required
if (requirements.consistency === 'strong') {
return 'CP System (e.g., MongoDB, HBase)';
} else {
return 'AP System (e.g., Cassandra, DynamoDB)';
}
}
return 'CA System (Traditional RDBMS, single node)';
}
}
7. Rate Limiting
Theory
Rate limiting is a strategy to control the rate of requests a client can make to a service to prevent abuse and ensure fair usage.
Algorithms
- Token Bucket
class TokenBucket {
private tokens: number;
private lastRefill: number;
private capacity: number;
private refillRate: number;
constructor(capacity: number, refillRate: number) {
this.capacity = capacity;
this.refillRate = refillRate;
this.tokens = capacity;
this.lastRefill = Date.now();
}
tryConsume(): boolean {
this.refill();
if (this.tokens >= 1) {
this.tokens--;
return true;
}
return false;
}
private refill() {
const now = Date.now();
const timePassed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.capacity,
this.tokens + timePassed * this.refillRate
);
this.lastRefill = now;
}
}
- Leaky Bucket
class LeakyBucket {
private queue: any[];
private capacity: number;
private processRate: number;
constructor(capacity: number, processRate: number) {
this.queue = [];
this.capacity = capacity;
this.processRate = processRate;
this.startProcessing();
}
tryAdd(request: any): boolean {
if (this.queue.length < this.capacity) {
this.queue.push(request);
return true;
}
return false;
}
private startProcessing() {
setInterval(() => {
if (this.queue.length > 0) {
const request = this.queue.shift();
this.processRequest(request);
}
}, 1000 / this.processRate);
}
}
8. Back-of-the-envelope Calculations
Theory
Quick, approximate calculations used to estimate system requirements and validate design decisions.
Common Calculations
- Storage Requirements
function calculateStorageNeeds(params: {
dailyActiveUsers: number;
averageDataPerUser: number; // in bytes
retentionDays: number;
replicationFactor: number;
}): number {
const dailyStorage = params.dailyActiveUsers * params.averageDataPerUser;
const totalStorage = dailyStorage * params.retentionDays * params.replicationFactor;
return totalStorage;
}
- Bandwidth Requirements
function calculateBandwidth(params: {
requestsPerSecond: number;
averageRequestSize: number; // in bytes
averageResponseSize: number; // in bytes
}): number {
const totalBytesPerSecond = params.requestsPerSecond *
(params.averageRequestSize + params.averageResponseSize);
const bandwidthMbps = (totalBytesPerSecond * 8) / 1000000;
return bandwidthMbps;
}
Best Practices for System Design
1. Start Simple
- Begin with a minimal viable solution
- Add complexity only when needed
- Document assumptions and constraints
2. Consider Scalability
- Horizontal vs Vertical scaling
- Stateless services when possible
- Caching strategies
3. Plan for Failure
- Design for failures at every layer
- Implement proper monitoring
- Have fallback mechanisms
4. Make Data Decisions Early
- Choose appropriate storage solutions
- Plan data partitioning strategy
- Consider data access patterns
Interview Tips
- Clarify Requirements
class RequirementGathering {
gatherRequirements() {
return {
functional: this.getFunctionalRequirements(),
nonFunctional: this.getNonFunctionalRequirements(),
constraints: this.getConstraints()
};
}
private getFunctionalRequirements() {
// Core features
// User interactions
// System behaviors
}
private getNonFunctionalRequirements() {
// Performance
// Scalability
// Reliability
// Security
}
private getConstraints() {
// Time
// Budget
// Technology
// Team
}
}
- Systematic Approach
class SystemDesignApproach {
designSystem() {
// 1. Requirements gathering
const requirements = this.gatherRequirements();
// 2. High-level design
const architecture = this.createHighLevelDesign(requirements);
// 3. Detailed design
const detailedDesign = this.createDetailedDesign(architecture);
// 4. Identify bottlenecks
const bottlenecks = this.identifyBottlenecks(detailedDesign);
// 5. Scaling solutions
const scalingSolutions = this.proposeSolutions(bottlenecks);
return {
requirements,
architecture,
detailedDesign,
bottlenecks,
scalingSolutions
};
}
}
System Design Interview Guide
Introduction
This guide provides a structured approach to tackling system design interviews. Each section includes key points to cover, examples, and calculation methods.
1. Feature Expectations (5 minutes)
1.1 Use Cases
- List primary user stories
- Define core functionality
- Identify key features
- Determine MVP requirements
Example for a URL Shortener:
Primary Use Cases:
1. Generate short URL from long URL
2. Redirect short URL to original URL
3. Custom short URLs (optional)
4. Analytics (optional)
1.2 Out of Scope
- List features explicitly not covered
- Define system boundaries
- Identify future considerations
Example:
Out of Scope:
1. User authentication
2. URL preview
3. API rate limiting (initially)
1.3 Users
- Define user types
- User roles and permissions
- Geographic distribution
- User behavior patterns
1.4 Scale
- Current users
- Growth projections
- Peak vs. average load
- Geographic distribution
1.5 Usage Patterns
- Daily active users
- Peak hours
- Seasonal variations
- User interaction frequency
2. Estimations (5 minutes)
2.1 Traffic Estimates
Read QPS:
const calculateReadQPS = () => {
const dailyActiveUsers = 1000000;
const avgReadsPerUser = 10;
const secondsInDay = 86400;
const averageQPS = (dailyActiveUsers * avgReadsPerUser) / secondsInDay;
const peakQPS = averageQPS * 2; // 2x for peak
return { averageQPS, peakQPS };
};
Write QPS:
const calculateWriteQPS = () => {
const dailyActiveUsers = 1000000;
const avgWritesPerUser = 2;
const secondsInDay = 86400;
const averageQPS = (dailyActiveUsers * avgWritesPerUser) / secondsInDay;
const peakQPS = averageQPS * 2;
return { averageQPS, peakQPS };
};
2.2 Storage Estimates
const calculateStorageNeeds = () => {
const dailyNewRecords = 1000000;
const recordSize = 1000; // bytes
const daysToKeep = 365;
const totalStorage = dailyNewRecords * recordSize * daysToKeep;
return totalStorage;
};
2.3 Memory Estimates
const calculateCacheNeeds = () => {
const qps = 1000;
const cacheHitRatio = 0.8;
const avgRecordSize = 1000; // bytes
const cacheDuration = 3600; // 1 hour
const cacheSize = qps * (1 - cacheHitRatio) * avgRecordSize * cacheDuration;
return cacheSize;
};
3. Design Goals (5 minutes)
3.1 Performance Requirements
- Read latency: < 100ms
- Write latency: < 200ms
- Availability: 99.99%
- Consistency requirements
- Durability guarantees
3.2 CAP Theorem Considerations
type SystemType = 'CP' | 'AP';
interface SystemRequirements {
consistency: 'strong' | 'eventual';
availability: number; // percentage
partitionTolerance: boolean;
}
const determineSystemType = (requirements: SystemRequirements): SystemType => {
return requirements.consistency === 'strong' ? 'CP' : 'AP';
};
4. High-Level Design (5-10 minutes)
4.1 API Design
// RESTful API Example
interface APIEndpoints {
// Write APIs
POST: {
'/api/v1/resource': {
request: ResourceCreationRequest;
response: ResourceCreationResponse;
};
};
// Read APIs
GET: {
'/api/v1/resource/:id': {
params: { id: string };
response: ResourceResponse;
};
};
}
4.2 Database Schema
-- Example Schema
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE resources (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT REFERENCES users(id),
data JSONB NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
4.3 Basic Algorithm
class DataProcessor {
async process(data: InputData): Promise<ProcessedData> {
// 1. Validate input
this.validate(data);
// 2. Transform data
const transformed = await this.transform(data);
// 3. Store data
const stored = await this.store(transformed);
// 4. Cache results
await this.cache(stored);
return stored;
}
}
5. Deep Dive (15-20 minutes)
5.1 Component Scaling
Load Balancer Configuration
interface LoadBalancer {
algorithm: 'round-robin' | 'least-connections' | 'ip-hash';
healthCheck: {
path: string;
interval: number;
timeout: number;
unhealthyThreshold: number;
};
ssl: {
enabled: boolean;
cert: string;
key: string;
};
}
Database Sharding
class ShardManager {
private shardCount: number;
constructor(shardCount: number) {
this.shardCount = shardCount;
}
getShardId(key: string): number {
const hash = this.hashFunction(key);
return hash % this.shardCount;
}
}
5.2 Caching Strategy
interface CacheConfig {
strategy: 'cache-aside' | 'write-through' | 'write-behind';
ttl: number;
maxSize: number;
}
class CacheManager {
async get(key: string): Promise<Data> {
const cached = await this.cache.get(key);
if (cached) return cached;
const data = await this.db.get(key);
await this.cache.set(key, data);
return data;
}
}
5.3 Message Queue Architecture
interface QueueConfig {
type: 'kafka' | 'rabbitmq' | 'sqs';
partitions: number;
replicationFactor: number;
retentionPeriod: number;
}
6. Justification (5 minutes)
6.1 Performance Analysis
interface LayerMetrics {
avgLatency: number;
p95Latency: number;
p99Latency: number;
throughput: number;
}
const calculateSystemMetrics = (layers: LayerMetrics[]): SystemMetrics => {
return {
totalLatency: layers.reduce((sum, layer) => sum + layer.avgLatency, 0),
bottlenecks: layers.filter(layer => layer.throughput < targetThroughput)
};
};
6.2 Back-of-the-envelope Calculations
Network Bandwidth
const calculateBandwidth = () => {
const requestSize = 1000; // bytes
const qps = 1000;
const bytesPerSecond = requestSize * qps;
const megabitsPerSecond = (bytesPerSecond * 8) / 1000000;
return megabitsPerSecond;
};
Database IOPS
const calculateIOPS = () => {
const writeQPS = 100;
const readQPS = 1000;
const replicationFactor = 3;
const totalIOPS = (writeQPS * replicationFactor) + readQPS;
return totalIOPS;
};
Best Practices
-
Start with Scale
- Begin with rough numbers
- Use powers of 10
- Consider growth
-
Clarify Requirements
- Ask questions
- State assumptions
- Define scope
-
Draw Diagrams
- Start simple
- Add complexity gradually
- Show data flow
-
Consider Trade-offs
- Cost vs Performance
- Consistency vs Availability
- Complexity vs Maintainability
Common Pitfalls to Avoid
-
Over-engineering
- Don't add unnecessary complexity
- Start simple, then scale
-
Ignoring Requirements
- Listen carefully to requirements
- Validate assumptions
-
Skipping Calculations
- Always do the math
- Verify numbers make sense
-
Not Considering Edge Cases
- Think about failure scenarios
- Plan for system degradation