What is Multi-Level Indexing in Pandas
Multi-level indexing, also known as hierarchical indexing, is a way of organizing data in Pandas with multiple levels of indexes. It is useful for representing complex data sets with multiple dimensions, such as financial data or scientific data. In Pandas, a multi-level index consists of multiple levels of indexes, with each level representing a different aspect of the data.
Creating a Multi-Level Index in Pandas
To create a multi-level index in Pandas, we can use the set_index() function. The set_index() function allows us to specify one or more columns to be used as the index, and we can pass multiple columns to create a multi-level index. For example:
import pandas as pd
data = {'year': [2019, 2019, 2020, 2020],
'quarter': [1, 2, 1, 2],
'revenue': [100, 200, 150, 250]}
df = pd.DataFrame(data)
df = df.set_index(['year', 'quarter'])
print(df)
revenue
year quarter
2019 1 100
2 200
2020 1 150
2 250
In this example, we create a DataFrame with a multi-level index consisting of the year
and quarter
columns.
Indexing with Multi-Level Indexing in Pandas
Once we have created a multi-level index in Pandas, we can use it to select and manipulate data. We can use the loc[]
and iloc[]
functions to select data based on the index levels. For example:
# Select data for 2019, quarter 1
print(df.loc[(2019, 1)])
# Select data for all quarters in 2020
print(df.loc[2020])
# Select data for all years and quarters
print(df.loc[:])
revenue 100
Name: (2019, 1), dtype: int64
revenue
quarter
1 150
2 250
revenue
year quarter
2019 1 100
2 200
2020 1 150
2 250
We can also use the xs()
function to select data at a particular level of the index. For example:
# Select data for quarter 1 for all years
print(df.xs(1, level='quarter'))
revenue
year
2019 100
2020 150
Aggregating Data with Multi-Level Indexing in Pandas
Another useful feature of multi-level indexing in Pandas is the ability to aggregate data at different levels of the index. We can use the groupby()
function to group the data by one or more levels of the index and then apply an aggregation function. For example:
# Calculate the total revenue for each year
print(df.groupby(level='year')['revenue'].sum())
# Calculate the average revenue for each quarter
print(df.groupby(level='quarter')['revenue'].mean())
year
2019 300
2020 400
Name: revenue, dtype: int64
quarter
1 125
2 225
Name: revenue, dtype: int64