Hash Tables
A Hash Table (or Hash Map) is a data structure that stores key-value pairs and provides fast lookup, insertion, and deletion operations using a hash function.
What is a Hash Table?
Hash tables use a hash function to compute an index into an array of buckets, from which the desired value can be found.
Key Concepts:
- Hash Function: Converts key to array index
- Bucket: Storage location for key-value pair
- Collision: When two keys hash to same index
- Load Factor: Ratio of stored elements to table size
Key -> Hash Function -> Index -> Value
"apple" -> hash("apple") -> 3 -> {"color": "red", "taste": "sweet"}
How Hash Tables Work
# Conceptual example
hash_table = [None] * 10 # Array of 10 buckets
def hash_function(key):
"""Simple hash function"""
return sum(ord(c) for c in key) % 10
# Insert
key = "apple"
index = hash_function(key) # e.g., 3
hash_table[index] = ("apple", "red")
# Retrieve
index = hash_function("apple")
value = hash_table[index] # ("apple", "red")
Python Dictionary (Built-in Hash Table)
Python's dict is an optimized hash table implementation.
# Create dictionary
student = {
"name": "Alice",
"age": 20,
"grade": "A"
}
# Access - O(1)
print(student["name"]) # Alice
# Insert/Update - O(1)
student["major"] = "Computer Science"
student["age"] = 21
# Delete - O(1)
del student["grade"]
# Check key existence - O(1)
if "name" in student:
print("Name exists")
# Get with default - O(1)
major = student.get("major", "Undecided")
# Iterate
for key, value in student.items():
print(f"{key}: {value}")
Hash Table Operations
Common Operations
# Create
hash_map = {}
# Insert/Update
hash_map["key1"] = "value1"
hash_map["key2"] = "value2"
# Get
value = hash_map.get("key1") # "value1"
value = hash_map.get("key3", "default") # "default"
# Delete
hash_map.pop("key1") # Returns "value1"
# or
del hash_map["key2"]
# Check existence
exists = "key1" in hash_map # False
# Get all keys
keys = hash_map.keys()
# Get all values
values = hash_map.values()
# Get all items
items = hash_map.items()
# Clear
hash_map.clear()
Time Complexity
| Operation | Average Case | Worst Case |
|---|---|---|
| Search | O(1) | O(n) |
| Insert | O(1) | O(n) |
| Delete | O(1) | O(n) |
| Space | O(n) | O(n) |
Worst case occurs when all keys hash to same index (many collisions)
Implementing a Hash Table
class HashTable:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)]
self.count = 0
def _hash(self, key):
"""Hash function using built-in hash"""
return hash(key) % self.size
def insert(self, key, value):
"""
Insert key-value pair
Time Complexity: O(1) average
"""
index = self._hash(key)
bucket = self.table[index]
# Update if key exists
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
# Insert new key-value pair
bucket.append((key, value))
self.count += 1
# Resize if load factor > 0.7
if self.count / self.size > 0.7:
self._resize()
def get(self, key):
"""
Get value by key
Time Complexity: O(1) average
"""
index = self._hash(key)
bucket = self.table[index]
for k, v in bucket:
if k == key:
return v
raise KeyError(key)
def delete(self, key):
"""
Delete key-value pair
Time Complexity: O(1) average
"""
index = self._hash(key)
bucket = self.table[index]
for i, (k, v) in enumerate(bucket):
if k == key:
del bucket[i]
self.count -= 1
return v
raise KeyError(key)
def _resize(self):
"""Resize table when load factor is high"""
old_table = self.table
self.size *= 2
self.table = [[] for _ in range(self.size)]
self.count = 0
for bucket in old_table:
for key, value in bucket:
self.insert(key, value)
def __contains__(self, key):
"""Check if key exists"""
try:
self.get(key)
return True
except KeyError:
return False
def __len__(self):
"""Return number of items"""
return self.count
def __str__(self):
"""String representation"""
items = []
for bucket in self.table:
items.extend(bucket)
return str(dict(items))
# Usage
ht = HashTable()
ht.insert("apple", 5)
ht.insert("banana", 3)
ht.insert("orange", 7)
print(ht.get("apple")) # 5
print("banana" in ht) # True
ht.delete("orange")
print(len(ht)) # 2
Collision Resolution
1. Chaining (Separate Chaining)
Each bucket stores a list of key-value pairs.
# Example: Multiple keys hash to same index
table = [
[],
[("key1", "val1"), ("key5", "val5")], # Collision
[("key2", "val2")],
[]
]
2. Open Addressing
Find next available slot when collision occurs.
def linear_probing(self, key):
"""
Linear probing: try index, index+1, index+2, ...
"""
index = self._hash(key)
original_index = index
while self.table[index] is not None:
if self.table[index][0] == key:
return index
index = (index + 1) % self.size
# Checked all slots
if index == original_index:
return None
return index
Common Hash Table Problems
1. Two Sum
def two_sum(nums, target):
"""
Find two numbers that sum to target
Time Complexity: O(n)
Space Complexity: O(n)
"""
seen = {}
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return None
# Test
nums = [2, 7, 11, 15]
print(two_sum(nums, 9)) # [0, 1]
2. First Non-Repeating Character
def first_non_repeating(s):
"""
Find first non-repeating character
Time Complexity: O(n)
Space Complexity: O(1) - at most 26 letters
"""
char_count = {}
# Count occurrences
for char in s:
char_count[char] = char_count.get(char, 0) + 1
# Find first with count 1
for char in s:
if char_count[char] == 1:
return char
return None
# Test
print(first_non_repeating("leetcode")) # 'l'
print(first_non_repeating("aabbcc")) # None
3. Group Anagrams
def group_anagrams(words):
"""
Group anagrams together
Time Complexity: O(n * k log k) where k is max word length
Space Complexity: O(n * k)
"""
from collections import defaultdict
anagram_groups = defaultdict(list)
for word in words:
# Sort characters as key
key = ''.join(sorted(word))
anagram_groups[key].append(word)
return list(anagram_groups.values())
# Test
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(words))
# [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
4. Longest Consecutive Sequence
def longest_consecutive(nums):
"""
Find longest consecutive sequence
Time Complexity: O(n)
Space Complexity: O(n)
"""
if not nums:
return 0
num_set = set(nums)
longest = 0
for num in num_set:
# Only start sequence if num-1 not in set
if num - 1 not in num_set:
current = num
length = 1
while current + 1 in num_set:
current += 1
length += 1
longest = max(longest, length)
return longest
# Test
nums = [100, 4, 200, 1, 3, 2]
print(longest_consecutive(nums)) # 4 (sequence: 1, 2, 3, 4)
5. Subarray Sum Equals K
def subarray_sum(nums, k):
"""
Count subarrays with sum equal to k
Time Complexity: O(n)
Space Complexity: O(n)
"""
count = 0
prefix_sum = 0
sum_count = {0: 1} # Initialize with 0 sum
for num in nums:
prefix_sum += num
# Check if (prefix_sum - k) exists
if prefix_sum - k in sum_count:
count += sum_count[prefix_sum - k]
# Update sum count
sum_count[prefix_sum] = sum_count.get(prefix_sum, 0) + 1
return count
# Test
nums = [1, 1, 1]
k = 2
print(subarray_sum(nums, k)) # 2
Python Collections for Hash Tables
1. defaultdict
from collections import defaultdict
# Auto-initialize missing keys
word_count = defaultdict(int)
for word in ["apple", "banana", "apple"]:
word_count[word] += 1
print(word_count) # {'apple': 2, 'banana': 1}
# With lists
groups = defaultdict(list)
groups["fruits"].append("apple")
groups["fruits"].append("banana")
2. Counter
from collections import Counter
# Count occurrences
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
counter = Counter(words)
print(counter) # Counter({'apple': 3, 'banana': 2, 'cherry': 1})
# Most common
print(counter.most_common(2)) # [('apple', 3), ('banana', 2)]
# Operations
c1 = Counter(['a', 'b', 'c'])
c2 = Counter(['b', 'c', 'd'])
print(c1 + c2) # Counter({'b': 2, 'c': 2, 'a': 1, 'd': 1})
3. OrderedDict
from collections import OrderedDict
# Maintains insertion order (note: regular dict also maintains order in Python 3.7+)
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3
# Move to end
od.move_to_end('a')
print(od) # OrderedDict([('b', 2), ('c', 3), ('a', 1)])
Hash Function Design
Good Hash Function Properties:
- Deterministic - Same input gives same output
- Uniform Distribution - Minimizes collisions
- Fast Computation - O(1) time
- Uses all input data - Avoids clustering
# Simple hash functions
def hash_string(s, table_size):
"""Polynomial rolling hash"""
hash_value = 0
for char in s:
hash_value = (hash_value * 31 + ord(char)) % table_size
return hash_value
def hash_number(num, table_size):
"""Modulo hash for integers"""
return num % table_size
Applications of Hash Tables
- Caching - Store computed results (LRU cache)
- Databases - Fast indexing and lookup
- Symbol Tables - Compilers and interpreters
- Removing Duplicates - Using set
- Frequency Counting - Character/word counts
- Routing - IP address lookup
- Password Verification - Storing hashed passwords
Advantages
- Fast Operations - O(1) average for insert, delete, search
- Flexible Keys - Any hashable type
- Dynamic - Can grow as needed
- Easy to Implement - Simple interface
Disadvantages
- Space Overhead - Extra memory for hash table
- No Order - Keys are not sorted (use OrderedDict if needed)
- Hash Function - Quality affects performance
- Worst Case - O(n) with many collisions
Practice Problems
Try solving these hash table problems: - Valid Sudoku - Longest substring without repeating characters - Top K frequent elements - Design LRU Cache - Isomorphic strings
When to Use Hash Tables
Use hash tables when you need: - Fast lookup by key (O(1)) - Counting occurrences - Removing duplicates - Caching results - Mapping relationships
Next Tutorial: Sorting Algorithms