Data Types in Snowflake
Snowflake is a widely adopted data warehouse platform that supports most SQL data types to meet a diverse set of needs. This article aims to provide an overview of each of these data types.
Numeric Data Types
We will take a look at the variety of numeric data types supported by Snowflake.
NUMBER
The NUMBER
data type is the most general form of numeric data in Snowflake. It can represent both integer and decimal values with a maximum precision of 38 digits. The default precision and scale are (38,0), which means it can hold 38 digits to the left of the decimal point and no digits to the right by default.
DECIMAL, NUMERIC
The DECIMAL
and NUMERIC
data types are synonymous with the NUMBER
data type. They can be used interchangeably and have the same default precision and scale of (38,0).
INT, INTEGER, BIGINT, SMALLINT, TINYINT, BYTEINT
These data types are also synonymous with the NUMBER
data type, but they differ in that the precision and scale cannot be specified.
FLOAT, FLOAT4, FLOAT8
The FLOAT
data type in Snowflake is used to represent approximate numerical data. FLOAT4
and FLOAT8
are specific versions of the FLOAT
type that differ in their precision and range.
DOUBLE, DOUBLE PRECISION, REAL
DOUBLE
, DOUBLE PRECISION
, and REAL
are all used to represent floating-point numbers with greater precision than FLOAT
. In Snowflake, they are synonymous with FLOAT
.
String & Binary Data Types
String and Binary data types deal with alphanumeric and binary data.
VARCHAR
The VARCHAR
data type in Snowflake is used to store variable-length alphanumeric characters. By default (and at maximum), it can store up to 16,777,216 bytes of information.
CHAR, CHARACTER
CHAR
and CHARACTER
are synonymous with VARCHAR
. However, their default length is VARCHAR(1)
, meaning they hold a single character if no length is specified.
STRING
The STRING
data type is synonymous with VARCHAR
, providing another option for naming this type of data.
TEXT
The TEXT
data type is also synonymous with VARCHAR
, and is another choice for representing variable-length alphanumeric characters.
BINARY
The BINARY
data type in Snowflake is used to store binary data.
VARBINARY
VARBINARY
is synonymous with BINARY
. Both data types are used to represent binary data.
Logical Data Types
Logical data types provide a way to represent boolean values in Snowflake.
BOOLEAN
The BOOLEAN
data type is used to store true or false values. As of the current date (June 30, 2023), it is only supported for accounts provisioned after January 25, 2016.
Date & Time Data Types
Date & Time data types in Snowflake are crucial for handling data that involves timekeeping elements. They provide flexibility and accuracy when working with time-series data, scheduling, and other date-time related information.
DATE
The DATE
data type in Snowflake is used to represent a specific day, stored in the format YYYY-MM-DD
.
DATETIME
The DATETIME
data type is an alias for TIMESTAMP_NTZ
. It's used to represent a specific date and time without storing time zone information.
TIME
The TIME
data type is used to store a specific time of the day. The format is HH:MI:SS
.
TIMESTAMP
The TIMESTAMP
data type in Snowflake is a more general type for storing date and time information. By default, it's an alias for one of the TIMESTAMP
variations, TIMESTAMP_NTZ
.
TIMESTAMP_LTZ
The TIMESTAMP_LTZ
is a TIMESTAMP
variant that stores the date and time with local time zone information. However, the time zone information is not stored.
TIMESTAMP_NTZ
The TIMESTAMP_NTZ
data type represents TIMESTAMP
with no time zone. This means that even if a time zone is provided, it is not stored.
TIMESTAMP_TZ
The TIMESTAMP_TZ
data type is used to store TIMESTAMP
with time zone information. It enables users to store date and time data along with relevant time zone details.
Semi-structured Data Types
As the volume and variety of data grows, the need for semi-structured data types has become increasingly important. Snowflake supports several semi-structured data types, providing greater flexibility in managing and processing less traditional forms of data.
VARIANT
The VARIANT
data type in Snowflake is used to store values of any other type, including OBJECT
and ARRAY
. This makes it a highly flexible data type for handling diverse kinds of data within a single column. The maximum length of a VARIANT is 16 MB.
OBJECT
An OBJECT
in Snowflake is analogous to a JSON object, enabling the storage of key-value pairs within a single data type. Each key is a VARCHAR
, and each value is a VARIANT
. This format allows for storing and querying complex, nested data structures.
ARRAY
The ARRAY
data type is similar to arrays in many other programming languages. It can contain 0 or more pieces of data, and each element is accessed by specifying its position in the array. This enables the storage of ordered collections of data within a single column.
Geospatial Data Types
Geospatial data is critical for numerous applications, ranging from GIS systems, logistics and transportation, to environmental science and more. Snowflake provides support for geospatial data types, enhancing its capabilities to store and analyze this kind of data.
GEOGRAPHY
The GEOGRAPHY
data type is used to represent geospatial data in Snowflake. It can store a wide range of geographical data and shapes including points, lines, polygons, and more.
GEOMETRY
The GEOMETRY
data type is a more general form of the GEOGRAPHY
data type. It allows storing any type of spatial data structure, including those not necessarily associated with geographic locations, like abstract shapes or patterns.
Data Type Mappings Between SQL and Handler Languages
When dealing with stored procedures or User-Defined Functions (UDFs) in Snowflake, it's important to understand how data types are mapped between SQL and the underlying handler language such as Java, Python, or Scala. Snowflake's official documentation provides detailed information about valid mappings between SQL data types and supported handler languages:
References