2022-11-18

Converting Pandas DataFrame to Dict

Introduction

In this article, I will discuss the process of converting a Pandas DataFrame to a dictionary.

to_dict Method

The main method to convert a Pandas DataFrame into a dictionary is by using the to_dict() function. The syntax of this function is as follows:

python
dataframe.to_dict(orient='dict', into=dict)

This method provides flexibility in the output format through the use of the orient and into parameters.

Orient

We will explore the orient parameter and the possible values.

dict

Setting the orient parameter to dict will create a dictionary of dictionaries, with the keys of the outer dictionary representing the column names, and the inner dictionaries containing the corresponding data.

python
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

result = df.to_dict(orient='dict')
print(result)
{'A': {0: 1, 1: 2, 2: 3}, 'B': {0: 4, 1: 5, 2: 6}}

list

When orient is set to list, the resulting dictionary will have the column names as keys and the column data as lists of values.

python
result = df.to_dict(orient='list')
print(result)
{'A': [1, 2, 3], 'B': [4, 5, 6]}

series

With orient set to series, the output will be a dictionary of Pandas Series objects, with the column names as keys.

python
result = df.to_dict(orient='series')
print(result)
{'A': 0    1
1    2
2    3
Name: A, dtype: int64, 'B': 0    4
1    5
2    6
Name: B, dtype: int64}

split

The split orientation generates a dictionary with three keys: index, columns, and data. The values for these keys are the index labels, column names, and data values, respectively.

python
result = df.to_dict(orient='split')
print(result)
{'index': [0, 1, 2], 'columns': ['A', 'B'], 'data': [[1, 4], [2, 5], [3, 6]]}

records

When orient is set to records, the output is a list of dictionaries, with each dictionary representing a row in the DataFrame. The keys in each dictionary correspond to the column names.

python
result = df.to_dict(orient='records')
print(result)
[{'A': 1, 'B': 4}, {'A': 2, 'B': 5}, {'A': 3, 'B': 6}]

index

Setting the orient parameter to index creates a dictionary of dictionaries, with the outer dictionary's keys representing the index labels and the inner dictionaries containing the corresponding data.

python
result = df.to_dict(orient='index')
print(result)
{0: {'A': 1, 'B': 4}, 1: {'A': 2, 'B': 5}, 2: {'A': 3, 'B': 6}}

Converting DataFrames into OrderedDict

By default, the to_dict() function returns a standard Python dictionary. However, you can also convert the DataFrame into an OrderedDict by setting the into parameter to collections.OrderedDict. OrderedDict maintains the order of keys in the dictionary, which may be useful in certain scenarios.

Let's take a look at an example of converting a DataFrame to an OrderedDict with the orient parameter set to dict.

python
import pandas as pd
from collections import OrderedDict

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

result = df.to_dict(orient='dict', into=OrderedDict)
print(result)
OrderedDict([('A', {0: 1, 1: 2, 2: 3}), ('B', {0: 4, 1: 5, 2: 6})])

As you can see, the output is an OrderedDict with the column names ('A' and 'B') as keys and the corresponding data as inner dictionaries. The key order is maintained in the OrderedDict.

You can also convert the DataFrame to an OrderedDict using other orient values. For example, let's convert the DataFrame with orient set to records:

python
result = df.to_dict(orient='records', into=OrderedDict)
print(result)
[OrderedDict([('A', 1), ('B', 4)]), OrderedDict([('A', 2), ('B', 5)]), OrderedDict([('A', 3), ('B', 6)])]

In this case, the output is a list of ordered dictionaries, with each dictionary representing a row in the DataFrame. The key order is preserved within each dictionary.

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!