2022-11-13

Importing and Exporting Data with Pandas

Introduction

Pandas enables users to easily import and export data between various formats, perform preprocessing and transformation. This article shows how to import and export various data with Pandas.

Reading Data Files

In this chapter, I will explore how to import data from various file formats using pandas.

Importing CSV Files

CSV (Comma Separated Values) files are widely used for storing and sharing tabular data. To import a CSV file, you can use the read_csv() function in pandas:

python
import pandas as pd

data = pd.read_csv('file.csv')

You can also specify additional arguments, such as the delimiter, encoding, and header information:

python
data = pd.read_csv('file.tsv', delimiter='\t', encoding='utf-8', header=0)

Importing Excel Files

To import data from an Excel file, you can use the read_excel() function. First, you may need to install the openpyxl package if you haven't already:

bash
$ pip install openpyxl

Then, you can import an Excel file like this:

python
data = pd.read_excel('file.xlsx', sheet_name='Sheet1')

Importing JSON Files

JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. To import JSON data, use the read_json() function:

python
data = pd.read_json('file.json')

You can also import JSON data from a URL or a string.

Importing SQL Data

pandas can interact with various SQL databases, such as SQLite, MySQL, and PostgreSQL. To import data from a SQL database, you'll first need to establish a connection to the database using an appropriate Python library. Then, use the read_sql() function to import data from a specific table or query:

python
import sqlite3
import pandas as pd

connection = sqlite3.connect('database.db')

data = pd.read_sql('SELECT * FROM table_name', connection)

Don't forget to close the connection after you're done:

python
connection.close()

Importing Data from Web APIs

Web APIs allow you to access data from various online sources. To import data from a web API, you can use the requests library to fetch the data and then convert it to a pandas DataFrame:

python
import requests
import pandas as pd

response = requests.get('https://api.example.com/data')
data = response.json()

df = pd.DataFrame(data)

Importing Data from Other Formats

pandas also provides functions for importing data from other formats, such as HTML tables, Parquet files, and HDF5 files. For instance, to import data from an HTML table, use the read_html() function:

python
data = pd.read_html('https://example.com/table.html')

Writing Data Files

In this chapter, I will explore how to export your pandas DataFrame to various file formats, such as CSV, Excel, JSON, and SQL databases.

Exporting to CSV Files

To export your DataFrame to a CSV file, use the to_csv() method:

python
import pandas as pd

# Create a sample DataFrame
data = {'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)

# Export DataFrame to a CSV file
df.to_csv('output.csv', index=False)

You can also specify additional arguments, such as the delimiter, encoding, and whether to include the index:

python
df.to_csv('output.tsv', sep='\t', encoding='utf-8', index=False)

Exporting to Excel Files

To export your DataFrame to an Excel file, use the to_excel() method. First, you may need to install the openpyxl package if you haven't already:

bash
$ pip install openpyxl

Then, you can export your DataFrame to an Excel file like this:

python
df.to_excel('output.xlsx', sheet_name='Sheet1', index=False)

Exporting to JSON Files

To export your DataFrame to a JSON file, use the to_json() method:

python
df.to_json('output.json', orient='records')

You can specify the JSON orientation using the orient parameter. The available options are:

  • records: Exports the DataFrame as an array of JSON objects, where each object represents a row in the DataFrame.
  • index: Exports the DataFrame as a JSON object, where keys are index labels and values are the corresponding row data represented as an array.
  • columns: Exports the DataFrame as a JSON object, where keys are column names and values are the corresponding column data represented as an array.
  • values: Exports the DataFrame as a nested array, where each inner array represents a row in the DataFrame.

orient='records'

python
# Create a sample DataFrame
data = {'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)

# Export DataFrame as an array of JSON objects (records)
df.to_json('output_records.json', orient='records')
[
  { "Column1": 1, "Column2": "A" },
  { "Column1": 2, "Column2": "B" },
  { "Column1": 3, "Column2": "C" }
]

orient='index'

python
# Export DataFrame as a JSON object with index labels as keys (index)
df.to_json('output_index.json', orient='index')
{
  "0": { "Column1": 1, "Column2": "A" },
  "1": { "Column1": 2, "Column2": "B" },
  "2": { "Column1": 3, "Column2": "C" }
}

orient='columns'

python
# Export DataFrame as a JSON object with column names as keys (columns)
df.to_json('output_columns.json', orient='columns')
{
  "Column1": { "0": 1, "1": 2, "2": 3 },
  "Column2": { "0": "A", "1": "B", "2": "C" }
}

orient='values'

python
# Export DataFrame as a nested array (values)
df.to_json('output_values.json', orient='values')
[
  [1, "A"],
  [2, "B"],
  [3, "C"]
]

Exporting to SQL Databases

To export your DataFrame to a SQL database, you'll first need to establish a connection to the database using an appropriate Python library. Then, use the to_sql() method to create a new table or append data to an existing table:

python
import sqlite3
import pandas as pd

# Create a sample DataFrame

data = {'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)

# Connect to the database

connection = sqlite3.connect('database.db')

# Export DataFrame to a new SQL table

df.to_sql('new_table', connection, if_exists='replace', index=False)

# Close the connection

connection.close()

Exporting to Other Formats

pandas also provides methods for exporting data to other formats, such as HTML tables, Parquet files, and HDF5 files. For instance, to export data to an HTML table, use the to_html() method:

python
df.to_html('output.html', index=False)

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!