2023-06-30

Data Types in Snowflake

Data Types in Snowflake

Snowflake is a widely adopted data warehouse platform that supports most SQL data types to meet a diverse set of needs. This article aims to provide an overview of each of these data types.

Numeric Data Types

We will take a look at the variety of numeric data types supported by Snowflake.

NUMBER

The NUMBER data type is the most general form of numeric data in Snowflake. It can represent both integer and decimal values with a maximum precision of 38 digits. The default precision and scale are (38,0), which means it can hold 38 digits to the left of the decimal point and no digits to the right by default.

DECIMAL, NUMERIC

The DECIMAL and NUMERIC data types are synonymous with the NUMBER data type. They can be used interchangeably and have the same default precision and scale of (38,0).

INT, INTEGER, BIGINT, SMALLINT, TINYINT, BYTEINT

These data types are also synonymous with the NUMBER data type, but they differ in that the precision and scale cannot be specified.

FLOAT, FLOAT4, FLOAT8

The FLOAT data type in Snowflake is used to represent approximate numerical data. FLOAT4 and FLOAT8 are specific versions of the FLOAT type that differ in their precision and range.

DOUBLE, DOUBLE PRECISION, REAL

DOUBLE, DOUBLE PRECISION, and REAL are all used to represent floating-point numbers with greater precision than FLOAT. In Snowflake, they are synonymous with FLOAT.

String & Binary Data Types

String and Binary data types deal with alphanumeric and binary data.

VARCHAR

The VARCHAR data type in Snowflake is used to store variable-length alphanumeric characters. By default (and at maximum), it can store up to 16,777,216 bytes of information.

CHAR, CHARACTER

CHAR and CHARACTER are synonymous with VARCHAR. However, their default length is VARCHAR(1), meaning they hold a single character if no length is specified.

STRING

The STRING data type is synonymous with VARCHAR, providing another option for naming this type of data.

TEXT

The TEXT data type is also synonymous with VARCHAR, and is another choice for representing variable-length alphanumeric characters.

BINARY

The BINARY data type in Snowflake is used to store binary data.

VARBINARY

VARBINARY is synonymous with BINARY. Both data types are used to represent binary data.

Logical Data Types

Logical data types provide a way to represent boolean values in Snowflake.

BOOLEAN

The BOOLEAN data type is used to store true or false values. As of the current date (June 30, 2023), it is only supported for accounts provisioned after January 25, 2016.

Date & Time Data Types

Date & Time data types in Snowflake are crucial for handling data that involves timekeeping elements. They provide flexibility and accuracy when working with time-series data, scheduling, and other date-time related information.

DATE

The DATE data type in Snowflake is used to represent a specific day, stored in the format YYYY-MM-DD.

DATETIME

The DATETIME data type is an alias for TIMESTAMP_NTZ. It's used to represent a specific date and time without storing time zone information.

TIME

The TIME data type is used to store a specific time of the day. The format is HH:MI:SS.

TIMESTAMP

The TIMESTAMP data type in Snowflake is a more general type for storing date and time information. By default, it's an alias for one of the TIMESTAMP variations, TIMESTAMP_NTZ.

TIMESTAMP_LTZ

The TIMESTAMP_LTZ is a TIMESTAMP variant that stores the date and time with local time zone information. However, the time zone information is not stored.

TIMESTAMP_NTZ

The TIMESTAMP_NTZ data type represents TIMESTAMP with no time zone. This means that even if a time zone is provided, it is not stored.

TIMESTAMP_TZ

The TIMESTAMP_TZ data type is used to store TIMESTAMP with time zone information. It enables users to store date and time data along with relevant time zone details.

Semi-structured Data Types

As the volume and variety of data grows, the need for semi-structured data types has become increasingly important. Snowflake supports several semi-structured data types, providing greater flexibility in managing and processing less traditional forms of data.

VARIANT

The VARIANT data type in Snowflake is used to store values of any other type, including OBJECT and ARRAY. This makes it a highly flexible data type for handling diverse kinds of data within a single column. The maximum length of a VARIANT is 16 MB.

OBJECT

An OBJECT in Snowflake is analogous to a JSON object, enabling the storage of key-value pairs within a single data type. Each key is a VARCHAR, and each value is a VARIANT. This format allows for storing and querying complex, nested data structures.

ARRAY

The ARRAY data type is similar to arrays in many other programming languages. It can contain 0 or more pieces of data, and each element is accessed by specifying its position in the array. This enables the storage of ordered collections of data within a single column.

Geospatial Data Types

Geospatial data is critical for numerous applications, ranging from GIS systems, logistics and transportation, to environmental science and more. Snowflake provides support for geospatial data types, enhancing its capabilities to store and analyze this kind of data.

GEOGRAPHY

The GEOGRAPHY data type is used to represent geospatial data in Snowflake. It can store a wide range of geographical data and shapes including points, lines, polygons, and more.

GEOMETRY

The GEOMETRY data type is a more general form of the GEOGRAPHY data type. It allows storing any type of spatial data structure, including those not necessarily associated with geographic locations, like abstract shapes or patterns.

Data Type Mappings Between SQL and Handler Languages

When dealing with stored procedures or User-Defined Functions (UDFs) in Snowflake, it's important to understand how data types are mapped between SQL and the underlying handler language such as Java, Python, or Scala. Snowflake's official documentation provides detailed information about valid mappings between SQL data types and supported handler languages:

https://docs.snowflake.com/developer-guide/udf-stored-procedure-data-type-mapping

References

https://docs.snowflake.com/ja/sql-reference/intro-summary-data-types
https://docs.snowflake.com/developer-guide/udf-stored-procedure-data-type-mapping
https://docs.snowflake.com/sql-reference/data-types-semistructured

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!