System Design Notes
Table of Contentsβ
- Introduction to System Design
- High-Level Design (HLD)
- Low-Level Design (LLD)
- System Design Fundamentals
- Interview Templates
- Common Design Patterns
- Case Studies
- Checklists and Best Practices
Introduction to System Designβ
System design is the process of defining the architecture, modules, interfaces, and data for a system to satisfy specified requirements. It involves two main levels:
- High-Level Design (HLD): System architecture, major components, and their interactions
- Low-Level Design (LLD): Detailed design of individual components, classes, and algorithms
Why System Design Mattersβ
- Scalability: Handle growing user base and data
- Reliability: Ensure system uptime and fault tolerance
- Performance: Optimize for speed and efficiency
- Maintainability: Easy to modify and extend
- Cost-effectiveness: Optimal resource utilization
High-Level Design (HLD)β
Definitionβ
HLD provides a bird's-eye view of the entire system, focusing on:
- System architecture and major components
- Data flow between components
- Technology stack decisions
- Infrastructure requirements
- Scalability and reliability strategies
Key Components of HLDβ
1. System Architectureβ
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Client βββββΆβ Load BalancerβββββΆβ Web Servers β
β (Web/Mobile)β β β β β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Application Servers β
βββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββ βββββββββββββββββββββ
β Database Layer β
β βββββββββββ βββββββββββββββ β
β β Primary β β Cache β β
β β DB β β (Redis) β β
β βββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββ
2. Core Componentsβ
Load Balancer
- Distributes incoming requests
- Types: Layer 4 (TCP) vs Layer 7 (HTTP)
- Algorithms: Round Robin, Weighted, Least Connections
Web Servers
- Handle HTTP requests
- Serve static content
- Examples: Nginx, Apache
Application Servers
- Business logic execution
- API endpoints
- Examples: Node.js, Spring Boot, Django
Database Layer
- Primary database (RDBMS/NoSQL)
- Read replicas
- Caching layer
Message Queues
- Asynchronous processing
- Decoupling services
- Examples: RabbitMQ, Apache Kafka
3. HLD Design Processβ
-
Requirement Analysis
- Functional requirements
- Non-functional requirements (NFRs)
- Scale estimation
-
Capacity Estimation
- Traffic patterns
- Storage requirements
- Bandwidth calculations
-
Architecture Design
- Choose architectural pattern
- Define major components
- Plan data flow
-
Technology Selection
- Database choice
- Programming languages
- Infrastructure decisions
HLD Example: URL Shortener (like bit.ly)β
Requirements:
- Shorten long URLs
- Redirect short URLs to original
- 100M URLs/day, 100:1 read/write ratio
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Client βββββΆβLoad Balancer βββββΆβ Web Servers β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββ
β βΌ β
β βββββββββββββββββββ β
β β App Servers β β
β β - URL encoding β β
β β - URL decoding β β
β β - Analytics β β
β βββββββββββββββββββ β
β β β
β βΌ β
βββββββββββββββββββ βββββββββββββββββββ β
β Cache βββββββββββββββββββββββββββ€ Database β β
β (Redis) β β - URL mappingsβ β
β - Hot URLs β β - Analytics β β
β - TTL based β β - User data β β
βββββββββββββββββββ βββββββββββββββββββ β
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Low-Level Design (LLD)β
Definitionβ
LLD provides detailed design of individual components, focusing on:
- Class diagrams and relationships
- API designs
- Database schemas
- Algorithms and data structures
- Interface definitions
Key Components of LLDβ
1. Class Designβ
// URL Shortener LLD Example
public class URLShortenerService {
private URLRepository urlRepository;
private CacheService cacheService;
private Base62Encoder encoder;
public ShortenURLResponse shortenURL(ShortenURLRequest request) {
// Validate URL
if (!isValidURL(request.getOriginalUrl())) {
throw new InvalidURLException("Invalid URL provided");
}
// Check if URL already exists
String existingShortCode = urlRepository.findShortCodeByOriginalUrl(
request.getOriginalUrl()
);
if (existingShortCode != null) {
return new ShortenURLResponse(existingShortCode);
}
// Generate unique short code
String shortCode = generateUniqueShortCode();
// Save mapping
URLMapping mapping = new URLMapping(
shortCode,
request.getOriginalUrl(),
request.getUserId(),
System.currentTimeMillis()
);
urlRepository.save(mapping);
return new ShortenURLResponse(shortCode);
}
public String expandURL(String shortCode) {
// Check cache first
String cachedUrl = cacheService.get(shortCode);
if (cachedUrl != null) {
return cachedUrl;
}
// Query database
URLMapping mapping = urlRepository.findByShortCode(shortCode);
if (mapping == null) {
throw new URLNotFoundException("Short URL not found");
}
// Cache the result
cacheService.put(shortCode, mapping.getOriginalUrl(), TTL_SECONDS);
return mapping.getOriginalUrl();
}
private String generateUniqueShortCode() {
// Implementation using counter or random generation
long id = counterService.getNextId();
return encoder.encode(id);
}
}
// Data Models
public class URLMapping {
private String shortCode;
private String originalUrl;
private String userId;
private long createdAt;
private long expiresAt;
// constructors, getters, setters
}
public class ShortenURLRequest {
private String originalUrl;
private String userId;
private long ttl; // Time to live
// constructors, getters, setters
}
2. Database Schema Designβ
-- URL Mappings Table
CREATE TABLE url_mappings (
short_code VARCHAR(7) PRIMARY KEY,
original_url TEXT NOT NULL,
user_id VARCHAR(36),
created_at BIGINT NOT NULL,
expires_at BIGINT,
click_count BIGINT DEFAULT 0,
INDEX idx_user_id (user_id),
INDEX idx_created_at (created_at)
);
-- Analytics Table
CREATE TABLE url_analytics (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
short_code VARCHAR(7) NOT NULL,
ip_address VARCHAR(45),
user_agent TEXT,
referer TEXT,
country VARCHAR(2),
clicked_at BIGINT NOT NULL,
FOREIGN KEY (short_code) REFERENCES url_mappings(short_code),
INDEX idx_short_code_time (short_code, clicked_at)
);
-- Users Table
CREATE TABLE users (
user_id VARCHAR(36) PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at BIGINT NOT NULL,
subscription_type ENUM('FREE', 'PREMIUM') DEFAULT 'FREE'
);
3. API Designβ
# OpenAPI Specification
openapi: 3.0.0
info:
title: URL Shortener API
version: 1.0.0
paths:
/api/v1/shorten:
post:
summary: Shorten a URL
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
url:
type: string
format: uri
customCode:
type: string
minLength: 4
maxLength: 7
ttl:
type: integer
description: Time to live in seconds
required:
- url
responses:
'200':
description: URL shortened successfully
content:
application/json:
schema:
type: object
properties:
shortCode:
type: string
shortUrl:
type: string
originalUrl:
type: string
'400':
description: Invalid request
'409':
description: Custom code already exists
/api/v1/expand/{shortCode}:
get:
summary: Expand a short URL
parameters:
- name: shortCode
in: path
required: true
schema:
type: string
responses:
'302':
description: Redirect to original URL
headers:
Location:
schema:
type: string
'404':
description: Short URL not found
/api/v1/analytics/{shortCode}:
get:
summary: Get URL analytics
parameters:
- name: shortCode
in: path
required: true
schema:
type: string
responses:
'200':
description: Analytics data
content:
application/json:
schema:
type: object
properties:
totalClicks:
type: integer
clicksToday:
type: integer
topCountries:
type: array
items:
type: object
System Design Fundamentalsβ
1. Scalability Patternsβ
Horizontal vs Vertical Scalingβ
Vertical Scaling (Scale Up) Horizontal Scaling (Scale Out)
βββββββββββββββββββ βββββββ βββββββ βββββββ
β β β β β β β β
β More Power β vs β App β β App β β App β
β Same Machine β β β β β β β
β β βββββββ βββββββ βββββββ
βββββββββββββββββββ
Load Balancing Strategiesβ
- Round Robin: Equal distribution
- Weighted Round Robin: Based on server capacity
- Least Connections: Route to server with fewest active connections
- IP Hash: Route based on client IP hash
- Health Check: Remove unhealthy servers
Database Scalingβ
Read Replicas Pattern:
ββββββββββββββ Write βββββββββββββββ
βApplication ββββββββββββββΆβ Primary DB β
β Server β β β
ββββββββββββββ βββββββββββββββ
β β
β Replication
β βΌ
β Read βββββββββββββββββββββββ
ββββββββββββββββββΆβ Read Replicas β
β βββββββ βββββββ β
β β DB1 β β DB2 β β
β βββββββ βββββββ β
βββββββββββββββββββββββ
2. Consistency Patternsβ
CAP Theoremβ
- Consistency: All nodes see the same data simultaneously
- Availability: System remains operational
- Partition Tolerance: System continues despite network failures
You can only guarantee 2 out of 3
Consistency Modelsβ
- Strong Consistency: Immediate consistency across all nodes
- Eventual Consistency: System will become consistent over time
- Weak Consistency: No guarantees when all nodes will be consistent
3. Caching Strategiesβ
Cache Patterns:
1. Cache-Aside (Lazy Loading)
βββββββββββββββ Cache Miss βββββββββββ Query ββββββββββββ
βApplication ββββββββββββββββββΆβ Cache β β Database β
βββββββββββββββ βββββββββββ ββββββββββββ
β β² β²
ββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββ
Update Cache β Return Data
2. Write-Through
βββββββββββββββ Write βββββββββββ Write ββββββββββββ
βApplication βββββββββββββββββΆβ Cache βββββββββββββββΆβ Database β
βββββββββββββββ βββββββββββ ββββββββββββ
3. Write-Behind (Write-Back)
βββββββββββββββ Write βββββββββββ Async Write ββββββββββββ
βApplication β ββββββββββββββββΆβ Cache ββββββββββββββββΆβ Database β
βββββββββββββββ βββββββββββ ββββββββββββ
4. Database Design Patternsβ
SQL vs NoSQL Decision Matrixβ
Factor | SQL | NoSQL |
---|---|---|
Schema | Fixed schema | Flexible schema |
ACID | Full ACID support | Eventual consistency |
Scaling | Vertical (primarily) | Horizontal |
Queries | Complex queries (JOIN) | Simple queries |
Use Cases | Financial, Traditional apps | Real-time, Big data |
Database Shardingβ
Horizontal Partitioning (Sharding):
User Data Distribution by User ID:
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Shard 1 β β Shard 2 β β Shard 3 β
β Users 0-33% β βUsers 34-66% β βUsers 67-100%β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
Sharding Key Selection:
- Range-based: Partition by value ranges
- Hash-based: Partition by hash function
- Directory-based: Lookup service for shard location
Interview Templatesβ
Template 1: System Design Interview Structure (45-60 minutes)β
Phase 1: Requirements Gathering (10 minutes)β
Questions to Ask:
β‘ What are the core features needed?
β‘ How many users are expected?
β‘ What's the scale (reads vs writes)?
β‘ What's the latency requirement?
β‘ Do we need to handle failures?
β‘ Any specific technology constraints?
Example Clarification:
"For a URL shortener:
- Do we need custom URLs?
- Should URLs expire?
- Do we need analytics?
- What's the expected QPS?"
Phase 2: Capacity Estimation (10 minutes)β
Estimation Template:
β‘ Daily Active Users (DAU)
β‘ Queries Per Second (QPS)
- Write QPS = DAU * writes_per_user / seconds_per_day
- Read QPS = Write QPS * read_to_write_ratio
β‘ Storage Requirements
- Data per record * records_per_day * retention_days
β‘ Bandwidth Requirements
- QPS * average_request_size
Example Calculation:
"URL Shortener with 100M URLs/day:
- Write QPS: 100M / 86400 = ~1200 QPS
- Read QPS: 1200 * 100 = 120K QPS
- Storage: 500 bytes * 100M * 365 = ~18TB/year"
Phase 3: High-Level Design (15 minutes)β
Design Steps:
β‘ Draw basic architecture
β‘ Identify major components
β‘ Show data flow
β‘ Discuss technology choices
Components Checklist:
β‘ Load Balancer
β‘ Web Servers
β‘ Application Servers
β‘ Database (Primary/Replica)
β‘ Cache Layer
β‘ Message Queues (if needed)
β‘ CDN (if needed)
Phase 4: Deep Dive - Database Design (10 minutes)β
Database Design Template:
β‘ Define main entities
β‘ Create table schemas
β‘ Define relationships
β‘ Consider indexing strategy
β‘ Discuss partitioning/sharding
Schema Template:
table_name (
primary_key TYPE PRIMARY KEY,
column1 TYPE constraints,
column2 TYPE constraints,
created_at TIMESTAMP,
updated_at TIMESTAMP,
INDEX idx_name (columns),
FOREIGN KEY constraints
)
Phase 5: Scaling and Reliability (10 minutes)β
Scaling Checklist:
β‘ How to handle increased load?
β‘ Database scaling strategy
β‘ Caching strategy
β‘ CDN usage
β‘ Load balancing
Reliability Checklist:
β‘ Single points of failure
β‘ Data backup strategy
β‘ Disaster recovery
β‘ Monitoring and alerting
β‘ Circuit breakers
Template 2: API Design Templateβ
# Standard API Design Template
paths:
/api/v1/resource:
get:
summary: Get resources
parameters:
- name: limit
in: query
schema:
type: integer
default: 20
maximum: 100
- name: offset
in: query
schema:
type: integer
default: 0
responses:
'200':
description: Success
content:
application/json:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/Resource'
pagination:
$ref: '#/components/schemas/Pagination'
'400':
$ref: '#/components/responses/BadRequest'
'500':
$ref: '#/components/responses/InternalError'
post:
summary: Create resource
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/CreateResourceRequest'
responses:
'201':
description: Created
content:
application/json:
schema:
$ref: '#/components/schemas/Resource'
'400':
$ref: '#/components/responses/BadRequest'
Template 3: Class Design Templateβ
// Service Layer Template
@Service
public class ResourceService {
private final ResourceRepository repository;
private final CacheService cacheService;
private final ValidationService validationService;
public ResourceService(
ResourceRepository repository,
CacheService cacheService,
ValidationService validationService
) {
this.repository = repository;
this.cacheService = cacheService;
this.validationService = validationService;
}
public CreateResourceResponse createResource(CreateResourceRequest request) {
// 1. Validate input
validationService.validate(request);
// 2. Business logic
Resource resource = new Resource(
generateId(),
request.getName(),
request.getDescription(),
System.currentTimeMillis()
);
// 3. Persist
Resource savedResource = repository.save(resource);
// 4. Cache
cacheService.put(getCacheKey(savedResource.getId()), savedResource);
// 5. Return response
return new CreateResourceResponse(savedResource);
}
public GetResourceResponse getResource(String resourceId) {
// 1. Check cache
Resource cachedResource = cacheService.get(getCacheKey(resourceId));
if (cachedResource != null) {
return new GetResourceResponse(cachedResource);
}
// 2. Query database
Resource resource = repository.findById(resourceId)
.orElseThrow(() -> new ResourceNotFoundException(resourceId));
// 3. Cache result
cacheService.put(getCacheKey(resourceId), resource, TTL_SECONDS);
return new GetResourceResponse(resource);
}
private String getCacheKey(String resourceId) {
return "resource:" + resourceId;
}
}
// Repository Interface Template
public interface ResourceRepository {
Resource save(Resource resource);
Optional<Resource> findById(String id);
List<Resource> findByUserId(String userId, int limit, int offset);
void deleteById(String id);
boolean existsById(String id);
}
// Model Template
public class Resource {
private final String id;
private String name;
private String description;
private final String userId;
private final long createdAt;
private long updatedAt;
public Resource(String id, String name, String description, String userId, long createdAt) {
this.id = id;
this.name = name;
this.description = description;
this.userId = userId;
this.createdAt = createdAt;
this.updatedAt = createdAt;
}
// Getters and business methods
public void updateDetails(String newName, String newDescription) {
this.name = newName;
this.description = newDescription;
this.updatedAt = System.currentTimeMillis();
}
}
Common Design Patternsβ
1. Microservices Patternsβ
Service Decompositionβ
Decomposition Strategies:
β‘ By Business Capability
β‘ By Domain (DDD)
β‘ By Transaction
β‘ By Team Structure (Conway's Law)
Example: E-commerce Decomposition
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β API Gateway β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ
β β β β β
βΌ βΌ βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β User β βProduct β βInventoryβ β Order β βPayment β
βService β βService β βService β βService β βService β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β β β β β
βΌ βΌ βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ
βUser DB β βProduct β βInventoryβ βOrder DB β βPayment β
β β β DB β β DB β β β β DB β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ
Communication Patternsβ
1. Synchronous Communication
Client ββHTTPβββΆ Service A ββHTTPβββΆ Service B
2. Asynchronous Communication
Service A ββMessageβββΆ Queue ββMessageβββΆ Service B
3. Event-Driven Architecture
Service A ββEventβββΆ Event Bus ββEventβββΆ Multiple Services
2. Data Management Patternsβ
Database per Serviceβ
{
"pattern": "Database per Service",
"benefits": [
"Service independence",
"Technology diversity",
"Fault isolation"
],
"challenges": [
"Data consistency",
"Complex queries across services",
"Data duplication"
],
"solutions": {
"consistency": "Saga Pattern",
"queries": "CQRS + Event Sourcing",
"duplication": "Eventual consistency"
}
}
CQRS (Command Query Responsibility Segregation)β
Write Side (Commands): Read Side (Queries):
βββββββββββββββ βββββββββββββββ
β Command β β Query β
β Handler β β Handler β
βββββββββββββββ βββββββββββββββ
β β
βΌ βΌ
βββββββββββββββ Events βββββββββββββββ
β Write DB βββββββββββββββΆ β Read DB β
β(Normalized) β β(Denormalized)β
βββββββββββββββ βββββββββββββββ
3. Resilience Patternsβ
Circuit Breaker Patternβ
public class CircuitBreaker {
private State state = State.CLOSED;
private int failureCount = 0;
private long lastFailureTime = 0;
public <T> T execute(Supplier<T> operation) throws Exception {
if (state == State.OPEN) {
if (System.currentTimeMillis() - lastFailureTime > timeout) {
state = State.HALF_OPEN;
} else {
throw new CircuitBreakerOpenException();
}
}
try {
T result = operation.get();
onSuccess();
return result;
} catch (Exception e) {
onFailure();
throw e;
}
}
private void onSuccess() {
failureCount = 0;
state = State.CLOSED;
}
private void onFailure() {
failureCount++;
lastFailureTime = System.currentTimeMillis();
if (failureCount >= failureThreshold) {
state = State.OPEN;
}
}
enum State { CLOSED, OPEN, HALF_OPEN }
}
Bulkhead Patternβ
Resource Isolation:
βββββββββββββββββββββββββββββββββββββββββββ
β Application β
βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββ€
βThread Pool 1βThread Pool 2βThread Pool 3β
β Critical β Normal β Batch β
β Operations β Operations β Operations β
β 10 β 20 β 5 β
β threads β threads β threads β
βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββ
Case Studiesβ
Case Study 1: Design a Chat Application (like WhatsApp)β
Requirements Analysisβ
Functional Requirements:
β‘ Send/receive messages
β‘ Group chats
β‘ Online status
β‘ Message history
β‘ Push notifications
Non-Functional Requirements:
β‘ 1B users, 50B messages/day
β‘ Real-time messaging
β‘ 99.9% availability
β‘ Support multimedia messages
High-Level Architectureβ
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
βMobile Apps βββββΆβ Load BalancerβββββΆβ Gateway β
βββββββββββββββ β (Layer 7) β β Service β
ββββββββββββββββ βββββββββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Chat β β User β βNotification β
β Service β β Service β β Service β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Message β β User β β Device β
β Database β β Database β β Database β
β(Cassandra) β β (MongoDB) β β (Redis) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
Additional Components:
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Message β β Media β β Push β
β Queue β β Storage β β Notificationβ
β (Kafka) β β (S3) β β (FCM) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
Database Designβ
-- Messages Table (Cassandra-style)
CREATE TABLE messages (
chat_id TEXT,
message_id TIMEUUID,
sender_id TEXT,
content TEXT,
message_type TEXT, -- text, image, video
created_at TIMESTAMP,
PRIMARY KEY (chat_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
-- User Chats Table
CREATE TABLE user_chats (
user_id TEXT,
chat_id TEXT,
chat_type TEXT, -- direct, group
last_read_message_id TIMEUUID,
created_at TIMESTAMP,
PRIMARY KEY (user_id, chat