2023-03-31

defaultdict in Python

What is defaultdict

A defaultdict is a specialized dictionary available in the Python collections module. It is an extension of the built-in dict class that simplifies handling of missing keys by providing a default value for them. Instead of raising a KeyError when a key is not found, defaultdict automatically creates the key with a default value specified by a function called the default factory.

Understanding Default Factories

Using Built-in Functions as Default Factories

The default factory can be any callable that takes no arguments and returns a value. Python provides several built-in functions that can be used as default factories, such as list, int, and set. These functions create empty lists, integers initialized to zero, and empty sets, respectively.

Creating Custom Default Factories

You can also create custom default factories by defining your own functions or using lambda expressions. This allows you to tailor the default values to your specific use case.

Creating a defaultdict

Importing the defaultdict Class

To use defaultdict, you need to import it from the collections module:

python
from collections import defaultdict

Initializing a defaultdict

Once imported, you can create a defaultdict by passing the default factory as an argument:

python
dd = defaultdict(list)

Common defaultdict Use Cases

Counting Elements

A defaultdict with an int default factory is ideal for counting occurrences of elements in a sequence:

python
words = ["apple", "banana", "apple", "orange", "banana", "apple"]
word_counts = defaultdict(int)

for word in words:
    word_counts[word] += 1

Grouping Elements

A defaultdict with a list default factory can be used to group elements based on a certain attribute:

python
students = [
    {"name": "Alice", "age": 24},
    {"name": "Bob", "age": 22},
    {"name": "Charlie", "age": 24},
    {"name": "David", "age": 22},
]

students_by_age = defaultdict(list)
for student in students:
    students_by_age[student["age"]].append(student)

Nested defaultdicts

You can use nested defaultdicts to create multi-level dictionaries with default values at each level:

python
nested_dd = defaultdict(lambda: defaultdict(int))

Combining defaultdicts

To merge two defaultdicts with the same default factory, you can use a loop to update the values:

python
dd1 = defaultdict(int, {"a": 1, "b": 2})
dd2 = defaultdict(int, {"b": 3, "c": 4})

for key, value in dd2.items():
    dd1[key] += value

defaultdict with lambda Functions

Using a lambda function as the default factory allows you to create more flexible defaultdicts:

python
# Create a defaultdict with a default value of 1
dd = defaultdict(lambda: 1)

Comparing defaultdict to dict

Key Differences

The main difference between defaultdict and dict is the behavior when a key is not found. defaultdict automatically creates the key with a default value, while dict raises a KeyError.

Performance Implications

Using defaultdict can improve the performance of your code by eliminating the need for explicit key existence checks and exception handling when working with dictionaries.

References

https://www.geeksforgeeks.org/defaultdict-in-python/
https://docs.python.org/3/library/collections.html

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!