Introduction
In this article, I will discuss the process of converting a Pandas DataFrame to a dictionary.
to_dict Method
The main method to convert a Pandas DataFrame into a dictionary is by using the to_dict()
function. The syntax of this function is as follows:
dataframe.to_dict(orient='dict', into=dict)
This method provides flexibility in the output format through the use of the orient and into parameters.
Orient
We will explore the orient
parameter and the possible values.
dict
Setting the orient
parameter to dict
will create a dictionary of dictionaries, with the keys of the outer dictionary representing the column names, and the inner dictionaries containing the corresponding data.
import pandas as pd
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
result = df.to_dict(orient='dict')
print(result)
{'A': {0: 1, 1: 2, 2: 3}, 'B': {0: 4, 1: 5, 2: 6}}
list
When orient
is set to list
, the resulting dictionary will have the column names as keys and the column data as lists of values.
result = df.to_dict(orient='list')
print(result)
{'A': [1, 2, 3], 'B': [4, 5, 6]}
series
With orient
set to series
, the output will be a dictionary of Pandas Series objects, with the column names as keys.
result = df.to_dict(orient='series')
print(result)
{'A': 0 1
1 2
2 3
Name: A, dtype: int64, 'B': 0 4
1 5
2 6
Name: B, dtype: int64}
split
The split
orientation generates a dictionary with three keys: index
, columns
, and data
. The values for these keys are the index labels, column names, and data values, respectively.
result = df.to_dict(orient='split')
print(result)
{'index': [0, 1, 2], 'columns': ['A', 'B'], 'data': [[1, 4], [2, 5], [3, 6]]}
records
When orient
is set to records
, the output is a list of dictionaries, with each dictionary representing a row in the DataFrame. The keys in each dictionary correspond to the column names.
result = df.to_dict(orient='records')
print(result)
[{'A': 1, 'B': 4}, {'A': 2, 'B': 5}, {'A': 3, 'B': 6}]
index
Setting the orient
parameter to index
creates a dictionary of dictionaries, with the outer dictionary's keys representing the index labels and the inner dictionaries containing the corresponding data.
result = df.to_dict(orient='index')
print(result)
{0: {'A': 1, 'B': 4}, 1: {'A': 2, 'B': 5}, 2: {'A': 3, 'B': 6}}
Converting DataFrames into OrderedDict
By default, the to_dict()
function returns a standard Python dictionary. However, you can also convert the DataFrame into an OrderedDict by setting the into
parameter to collections.OrderedDict
. OrderedDict maintains the order of keys in the dictionary, which may be useful in certain scenarios.
Let's take a look at an example of converting a DataFrame to an OrderedDict with the orient
parameter set to dict
.
import pandas as pd
from collections import OrderedDict
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
result = df.to_dict(orient='dict', into=OrderedDict)
print(result)
OrderedDict([('A', {0: 1, 1: 2, 2: 3}), ('B', {0: 4, 1: 5, 2: 6})])
As you can see, the output is an OrderedDict with the column names ('A' and 'B') as keys and the corresponding data as inner dictionaries. The key order is maintained in the OrderedDict.
You can also convert the DataFrame to an OrderedDict using other orient
values. For example, let's convert the DataFrame with orient
set to records
:
result = df.to_dict(orient='records', into=OrderedDict)
print(result)
[OrderedDict([('A', 1), ('B', 4)]), OrderedDict([('A', 2), ('B', 5)]), OrderedDict([('A', 3), ('B', 6)])]
In this case, the output is a list of ordered dictionaries, with each dictionary representing a row in the DataFrame. The key order is preserved within each dictionary.