2022-11-14

Multi-Level Indexing in Pandas

What is Multi-Level Indexing in Pandas

Multi-level indexing, also known as hierarchical indexing, is a way of organizing data in Pandas with multiple levels of indexes. It is useful for representing complex data sets with multiple dimensions, such as financial data or scientific data. In Pandas, a multi-level index consists of multiple levels of indexes, with each level representing a different aspect of the data.

Creating a Multi-Level Index in Pandas

To create a multi-level index in Pandas, we can use the set_index() function. The set_index() function allows us to specify one or more columns to be used as the index, and we can pass multiple columns to create a multi-level index. For example:

python
import pandas as pd

data = {'year': [2019, 2019, 2020, 2020],
        'quarter': [1, 2, 1, 2],
        'revenue': [100, 200, 150, 250]}
df = pd.DataFrame(data)
df = df.set_index(['year', 'quarter'])
print(df)
              revenue
year quarter
2019 1            100
     2            200
2020 1            150
     2            250

In this example, we create a DataFrame with a multi-level index consisting of the year and quarter columns.

Indexing with Multi-Level Indexing in Pandas

Once we have created a multi-level index in Pandas, we can use it to select and manipulate data. We can use the loc[] and iloc[] functions to select data based on the index levels. For example:

python
# Select data for 2019, quarter 1
print(df.loc[(2019, 1)])

# Select data for all quarters in 2020
print(df.loc[2020])

# Select data for all years and quarters
print(df.loc[:])
revenue    100
Name: (2019, 1), dtype: int64
           revenue
quarter
1               150
2               250
              revenue
year quarter
2019 1            100
     2            200
2020 1            150
     2            250

We can also use the xs() function to select data at a particular level of the index. For example:

python
# Select data for quarter 1 for all years
print(df.xs(1, level='quarter'))
      revenue
year
2019      100
2020      150

Aggregating Data with Multi-Level Indexing in Pandas

Another useful feature of multi-level indexing in Pandas is the ability to aggregate data at different levels of the index. We can use the groupby() function to group the data by one or more levels of the index and then apply an aggregation function. For example:

python
# Calculate the total revenue for each year
print(df.groupby(level='year')['revenue'].sum())

# Calculate the average revenue for each quarter
print(df.groupby(level='quarter')['revenue'].mean())
year
2019    300
2020    400
Name: revenue, dtype: int64
quarter
1    125
2    225
Name: revenue, dtype: int64

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!