Introduction
Pandas enables users to easily import and export data between various formats, perform preprocessing and transformation. This article shows how to import and export various data with Pandas.
Reading Data Files
In this chapter, I will explore how to import data from various file formats using pandas.
Importing CSV Files
CSV (Comma Separated Values) files are widely used for storing and sharing tabular data. To import a CSV file, you can use the read_csv()
function in pandas:
import pandas as pd
data = pd.read_csv('file.csv')
You can also specify additional arguments, such as the delimiter, encoding, and header information:
data = pd.read_csv('file.tsv', delimiter='\t', encoding='utf-8', header=0)
Importing Excel Files
To import data from an Excel file, you can use the read_excel()
function. First, you may need to install the openpyxl
package if you haven't already:
$ pip install openpyxl
Then, you can import an Excel file like this:
data = pd.read_excel('file.xlsx', sheet_name='Sheet1')
Importing JSON Files
JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. To import JSON data, use the read_json()
function:
data = pd.read_json('file.json')
You can also import JSON data from a URL or a string.
Importing SQL Data
pandas can interact with various SQL databases, such as SQLite, MySQL, and PostgreSQL. To import data from a SQL database, you'll first need to establish a connection to the database using an appropriate Python library. Then, use the read_sql()
function to import data from a specific table or query:
import sqlite3
import pandas as pd
connection = sqlite3.connect('database.db')
data = pd.read_sql('SELECT * FROM table_name', connection)
Don't forget to close the connection after you're done:
connection.close()
Importing Data from Web APIs
Web APIs allow you to access data from various online sources. To import data from a web API, you can use the requests
library to fetch the data and then convert it to a pandas DataFrame:
import requests
import pandas as pd
response = requests.get('https://api.example.com/data')
data = response.json()
df = pd.DataFrame(data)
Importing Data from Other Formats
pandas also provides functions for importing data from other formats, such as HTML tables, Parquet files, and HDF5 files. For instance, to import data from an HTML table, use the read_html()
function:
data = pd.read_html('https://example.com/table.html')
Writing Data Files
In this chapter, I will explore how to export your pandas DataFrame to various file formats, such as CSV, Excel, JSON, and SQL databases.
Exporting to CSV Files
To export your DataFrame to a CSV file, use the to_csv()
method:
import pandas as pd
# Create a sample DataFrame
data = {'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)
# Export DataFrame to a CSV file
df.to_csv('output.csv', index=False)
You can also specify additional arguments, such as the delimiter, encoding, and whether to include the index:
df.to_csv('output.tsv', sep='\t', encoding='utf-8', index=False)
Exporting to Excel Files
To export your DataFrame to an Excel file, use the to_excel()
method. First, you may need to install the openpyxl
package if you haven't already:
$ pip install openpyxl
Then, you can export your DataFrame to an Excel file like this:
df.to_excel('output.xlsx', sheet_name='Sheet1', index=False)
Exporting to JSON Files
To export your DataFrame to a JSON file, use the to_json()
method:
df.to_json('output.json', orient='records')
You can specify the JSON orientation using the orient
parameter. The available options are:
records
: Exports the DataFrame as an array of JSON objects, where each object represents a row in the DataFrame.index
: Exports the DataFrame as a JSON object, where keys are index labels and values are the corresponding row data represented as an array.columns
: Exports the DataFrame as a JSON object, where keys are column names and values are the corresponding column data represented as an array.values
: Exports the DataFrame as a nested array, where each inner array represents a row in the DataFrame.
orient='records'
# Create a sample DataFrame
data = {'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)
# Export DataFrame as an array of JSON objects (records)
df.to_json('output_records.json', orient='records')
[
{ "Column1": 1, "Column2": "A" },
{ "Column1": 2, "Column2": "B" },
{ "Column1": 3, "Column2": "C" }
]
orient='index'
# Export DataFrame as a JSON object with index labels as keys (index)
df.to_json('output_index.json', orient='index')
{
"0": { "Column1": 1, "Column2": "A" },
"1": { "Column1": 2, "Column2": "B" },
"2": { "Column1": 3, "Column2": "C" }
}
orient='columns'
# Export DataFrame as a JSON object with column names as keys (columns)
df.to_json('output_columns.json', orient='columns')
{
"Column1": { "0": 1, "1": 2, "2": 3 },
"Column2": { "0": "A", "1": "B", "2": "C" }
}
orient='values'
# Export DataFrame as a nested array (values)
df.to_json('output_values.json', orient='values')
[
[1, "A"],
[2, "B"],
[3, "C"]
]
Exporting to SQL Databases
To export your DataFrame to a SQL database, you'll first need to establish a connection to the database using an appropriate Python library. Then, use the to_sql()
method to create a new table or append data to an existing table:
import sqlite3
import pandas as pd
# Create a sample DataFrame
data = {'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)
# Connect to the database
connection = sqlite3.connect('database.db')
# Export DataFrame to a new SQL table
df.to_sql('new_table', connection, if_exists='replace', index=False)
# Close the connection
connection.close()
Exporting to Other Formats
pandas also provides methods for exporting data to other formats, such as HTML tables, Parquet files, and HDF5 files. For instance, to export data to an HTML table, use the to_html()
method:
df.to_html('output.html', index=False)