What is defaultdict
A defaultdict is a specialized dictionary available in the Python collections module. It is an extension of the built-in dict
class that simplifies handling of missing keys by providing a default value for them. Instead of raising a KeyError when a key is not found, defaultdict automatically creates the key with a default value specified by a function called the default factory.
Understanding Default Factories
Using Built-in Functions as Default Factories
The default factory can be any callable that takes no arguments and returns a value. Python provides several built-in functions that can be used as default factories, such as list
, int
, and set
. These functions create empty lists, integers initialized to zero, and empty sets, respectively.
Creating Custom Default Factories
You can also create custom default factories by defining your own functions or using lambda expressions. This allows you to tailor the default values to your specific use case.
Creating a defaultdict
Importing the defaultdict Class
To use defaultdict, you need to import it from the collections
module:
from collections import defaultdict
Initializing a defaultdict
Once imported, you can create a defaultdict by passing the default factory as an argument:
dd = defaultdict(list)
Common defaultdict Use Cases
Counting Elements
A defaultdict with an int
default factory is ideal for counting occurrences of elements in a sequence:
words = ["apple", "banana", "apple", "orange", "banana", "apple"]
word_counts = defaultdict(int)
for word in words:
word_counts[word] += 1
Grouping Elements
A defaultdict with a list
default factory can be used to group elements based on a certain attribute:
students = [
{"name": "Alice", "age": 24},
{"name": "Bob", "age": 22},
{"name": "Charlie", "age": 24},
{"name": "David", "age": 22},
]
students_by_age = defaultdict(list)
for student in students:
students_by_age[student["age"]].append(student)
Nested defaultdicts
You can use nested defaultdicts to create multi-level dictionaries with default values at each level:
nested_dd = defaultdict(lambda: defaultdict(int))
Combining defaultdicts
To merge two defaultdicts with the same default factory, you can use a loop to update the values:
dd1 = defaultdict(int, {"a": 1, "b": 2})
dd2 = defaultdict(int, {"b": 3, "c": 4})
for key, value in dd2.items():
dd1[key] += value
defaultdict with lambda Functions
Using a lambda function as the default factory allows you to create more flexible defaultdicts:
# Create a defaultdict with a default value of 1
dd = defaultdict(lambda: 1)
Comparing defaultdict to dict
Key Differences
The main difference between defaultdict and dict
is the behavior when a key is not found. defaultdict automatically creates the key with a default value, while dict
raises a KeyError.
Performance Implications
Using defaultdict can improve the performance of your code by eliminating the need for explicit key existence checks and exception handling when working with dictionaries.
References