2022-12-11

UDF in Snowflake

What is User-Defined Function (UDF)

User-Defined Functions (UDFs) are an integral part of Snowflake. In essence, a UDF is a function you define for use in SQL statements. Like built-in SQL functions, UDFs primarily enhance the functionality of SQL with additional capabilities or tasks that are not natively included in the language.

Moreover, UDFs encapsulate certain functionalities that you can repeatedly call from various locations within your code. This encapsulation promotes code reusability, making it a valuable tool for database developers and administrators working with Snowflake.

UDF Types

Snowflake allows you to create two types of UDFs: Scalar and Tabular Functions.

Scalar Functions

A scalar function is a type of UDF that returns a single value. Each input row to a scalar function yields one output row, which consists of just a single column or value. This characteristic makes scalar functions ideal for tasks that need to manipulate data on a row-by-row basis.

Tabular Functions

On the other hand, a tabular function, also known as a user-defined table function (UDTF), returns a tabular value for each input row. These functions provide a more complex functionality compared to scalar functions.

When creating a UDTF, you need to write methods that conform to an interface required by Snowflake. Specifically, these methods need to:

  • Process each row in a partition. This operation is a necessary part of UDTFs.
  • Initialize the handler once for each partition. This is an optional method but can be beneficial for certain use-cases.
  • Finalize the processing for each partition. Like the initialization, this operation is optional but can be used to clean up resources or finalize computation at the end of processing.

UDF Example

The following example creates a UDF named addone, which includes a handler written in Python. The handler function is addone_py, and this UDF returns an integer.

sql
CREATE OR REPLACE FUNCTION addone(i int)
RETURNS INT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
HANDLER = 'addone_py'
as
$$
def addone_py(i):
  return i+1
$$;

After creating the UDF, you can execute it just like any built-in SQL function. The following example shows how to execute the addone UDF.

sql
SELECT addone(3);

This SQL statement will output 4, as the addone function adds 1 to the input integer.

Supported Languages for UDFs

UDFs in Snowflake support various programming languages for writing a function's logic or handler. This allows developers to use their preferred languages within the constraints of the language and its runtime environment. Regardless of the handler language, you create the function using SQL, specifying your handler and handler language.

The following languages are supported for writing UDF handlers:

  • Java
  • JavaScript
  • Python
  • Scala
  • SQL

Creating a UDF in Snowflake

Creating a UDF or UDTF involves writing a handler for the function and then creating the function itself with a CREATE FUNCTION command.

To create a UDF in Snowflake, follow the steps outlined below:

  1. Write the handler code that executes when the UDF is called. This handler contains the logic of the UDF and can be written in one of the supported languages such as Java, JavaScript, Python, Scala, or SQL.

  2. Choose whether to keep the handler code in-line with the CREATE FUNCTION SQL statement or refer to it on a stage. This decision will depend on the complexity and length of your code, and your organization's code management practices.

  3. Execute a CREATE FUNCTION statement in SQL, specifying properties of the function. These properties include the function’s name, input arguments, return type, and handler function, among others.

Syntax for Creating UDFs

The syntax for creating UDFs varies depending on the language used for writing the handler.

Java and Scala Handler

Use the syntax below if the source code is in-line:

CREATE [ OR REPLACE ] [ { TEMP | TEMPORARY } ] [ SECURE ] FUNCTION [ IF NOT EXISTS ] <name> ( [ <arg_name> <arg_data_type> ] [ , ... ] )
  [ COPY GRANTS ]
  RETURNS { <result_data_type> | TABLE ( <col_name> <col_data_type> [ , ... ] ) }
  [ [ NOT ] NULL ]
  LANGUAGE { JAVA | SCALA }
  [ { CALLED ON NULL INPUT | { RETURNS NULL ON NULL INPUT | STRICT } } ]
  [ VOLATILE | IMMUTABLE ]
  [ RUNTIME_VERSION = <java_jdk_version> ]
  [ COMMENT = '<string_literal>' ]
  [ IMPORTS = ( '<stage_path_and_file_name_to_read>' [ , ... ] ) ]
  [ PACKAGES = ( '<package_name_and_version>' [ , ... ] ) ]
  HANDLER = '<path_to_method>'
  [ TARGET_PATH = '<stage_path_and_file_name_to_write>' ]
  AS '<function_definition>'

Use the following syntax if the handler code will be referenced on a stage (such as in a JAR):

CREATE [ OR REPLACE ] [ { TEMP | TEMPORARY } ] [ SECURE ] FUNCTION [ IF NOT EXISTS ] <name> ( [ <arg_name> <arg_data_type> ] [ , ... ] )
  [ COPY GRANTS ]
  RETURNS { <result_data_type> | TABLE ( <col_name> <col_data_type> [ , ... ] ) }
  [ [ NOT ] NULL ]
  LANGUAGE { JAVA | SCALA }
  [ { CALLED ON NULL INPUT | { RETURNS NULL ON NULL INPUT | STRICT } } ]
  [ VOLATILE | IMMUTABLE ]
  [ RUNTIME_VERSION = <java_jdk_version> ]
  [ COMMENT = '<string_literal>' ]
  IMPORTS = ( '<stage_path_and_file_name_to_read>' [ , ... ] )
  HANDLER = '<path_to_method>'

JavaScript Handler

CREATE [ OR REPLACE ] [ { TEMP | TEMPORARY } ] [ SECURE ] FUNCTION <name> ( [ <arg_name> <arg_data_type> ] [ , ... ] )
  [ COPY GRANTS ]
  RETURNS { <result_data_type> | TABLE ( <col_name> <col_data_type> [ , ... ] ) }
  [ [ NOT ] NULL ]
  LANGUAGE JAVASCRIPT
  [ { CALLED ON NULL INPUT | { RETURNS NULL ON NULL INPUT | STRICT } } ]
  [ VOLATILE | IMMUTABLE ]
  [ COMMENT = '<string_literal>' ]
  AS '<function_definition>'

Python Handler

Use the syntax below if the source code is in-line:

CREATE [ OR REPLACE ] [ { TEMP | TEMPORARY } ] [ SECURE ] FUNCTION [ IF NOT EXISTS ] <name> ( [ <arg_name> <arg_data_type> ] [ , ... ] )
  [ COPY GRANTS ]
  RETURNS { <result_data_type> | TABLE ( <col_name> <col_data_type> [ , ... ] ) }
  [ [ NOT ] NULL ]
  LANGUAGE PYTHON
  [ { CALLED ON NULL INPUT | { RETURNS NULL ON NULL INPUT | STRICT } } ]
  [ VOLATILE | IMMUTABLE ]
  RUNTIME_VERSION = <python_version>
  [ COMMENT = '<string_literal>' ]
  [ IMPORTS = ( '<stage_path_and_file_name_to_read>' [ , ... ] ) ]
  [ PACKAGES = ( '<package_name>[==<version>]' [ , ... ] ) ]
  HANDLER = '<function_name>'
  AS '<function_definition>'

Use the following syntax if the handler code will be referenced on a stage (such as in a module):

CREATE [ OR REPLACE ] [ { TEMP | TEMPORARY } ] [ SECURE ] FUNCTION [ IF NOT EXISTS ] <name> ( [ <arg_name> <arg_data_type> ] [ , ... ] )
  [ COPY GRANTS ]
  RETURNS { <result_data_type> | TABLE ( <col_name> <col_data_type> [ , ... ] ) }
  [ [ NOT ] NULL ]
  LANGUAGE PYTHON
  [ { CALLED ON NULL INPUT | { RETURNS NULL ON NULL INPUT | STRICT } } ]
  [ VOLATILE | IMMUTABLE ]
  RUNTIME_VERSION = <python_version>
  [ COMMENT = '<string_literal>' ]
  IMPORTS = ( '<stage_path_and_file_name_to_read>' [ , ... ] )
  [ PACKAGES = ( '<package_name>[==<version>]' [ , ... ] ) ]
  HANDLER = '<module_file_name>.<function_name>'

SQL Handler

CREATE [ OR REPLACE ] [ { TEMP | TEMPORARY } ] [ SECURE ] FUNCTION <name> ( [ <arg_name> <arg_data_type> ] [ , ... ] )
  [ COPY GRANTS ]
  RETURNS { <result_data_type> | TABLE ( <col_name> <col_data_type> [ , ... ] ) }
  [ [ NOT ] NULL ]
  [ VOLATILE | IMMUTABLE ]
  [ MEMOIZABLE ]
  [ COMMENT = '<string_literal>' ]
  AS '<function_definition>'

Parameters for UDF Creation

Snowflake provides a range of required and optional parameters for each language while creating a UDF. These parameters allow you to customize the behavior and control the properties of the UDF. For detailed information on these parameters, please refer to the official Snowflake documentation linked in the original content.

https://docs.snowflake.com/en/sql-reference/sql/create-function#required-parameters
https://docs.snowflake.com/en/sql-reference/sql/create-function#optional-parameters

Calling a UDF in Snowflake

Once a UDF is created in Snowflake, it can be called just like any other function.

Calling a UDF

The general way to call a UDF is the same as calling any other function. The UDF name is used followed by parentheses, and if the UDF has arguments, they are specified inside the parentheses. The arguments can be specified either by name or by position.

Let's consider the following UDF which concatenates three strings:

sql
CREATE OR REPLACE FUNCTION udf_concatenate_strings(
    first VARCHAR,
    second VARCHAR,
    third VARCHAR)
  RETURNS VARCHAR
  LANGUAGE SQL
  AS
  $$
    RETURN first || second || third;
  $$;

When calling the UDF, the arguments can be specified by name or by position:

sql
SELECT udf_concatenate_strings(
  first => 'one',
  second => 'two',
  third => 'three');
sql
SELECT udf_concatenate_strings(
  'one',
  'two',
  'three');

Note: When specifying arguments, all of them must be specified by name or by position. A mix of both cannot be used.

Calling a UDTF

UDTFs, similar to UDFs, are also called using their names. However, because they return a table, the TABLE keyword is used when calling them, followed by the UDTF name and arguments enclosed in parentheses.

Consider the following example where a UDTF named my_java_udtf is called with a DATE literal '2021-01-16' as the argument:

sql
SELECT ...
  FROM TABLE(my_java_udtf('2021-01-16'::DATE));

Just like with UDFs, the arguments to a UDTF can be specified by name or by position.

References

https://docs.snowflake.com/en/sql-reference/udf-overview
https://docs.snowflake.com/en/developer-guide/udf/udf-creating-sql
https://docs.snowflake.com/en/sql-reference/sql/create-function
https://docs.snowflake.com/en/developer-guide/udf/udf-calling-sql

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!