2023-01-23

Stored Procedures vs. UDFs in Snowflake

Introduction

Stored procedures and UDFs are integral to database programming, serving to encapsulate reusable database logic, thereby promoting efficient coding practices. However, they have distinct purposes and usage characteristics, and understanding the differences between them is key to leveraging their capabilities effectively.

This article aims to shed light on the key differences between stored procedures and UDFs in Snowflake, their individual purposes, and their appropriate use cases. It also outlines their distinct behaviors and explains how to choose the right tool for your specific needs.

Stored Procedure Purpose

In the world of databases, stored procedures have a specific role to play. They are generally used to perform administrative operations by executing SQL statements. Stored procedures provide the means to encapsulate and package a series of SQL statements into a single routine that can be invoked whenever these operations need to be performed.

The body of a stored procedure is allowed, but not required, to explicitly return a value. This value could be an error indicator, a status code, or any other data point that the procedure is designed to output. If a return value is not explicitly specified, a NULL value is implicitly returned. The flexibility in return values allows stored procedures to be more than mere functions; they become an integral part of a robust database management system, serving a variety of administrative functions and controls.

User-Defined Function Purpose

UDFs in Snowflake play a distinct role in contrast to stored procedures. The primary purpose of a UDF is to calculate and return a value. This return value is not optional; it's a mandatory requirement of a UDF. Every UDF must have a return statement that explicitly provides a value, thereby making UDFs valuable for computations, data transformations, and similar tasks.

For example, if you have a JavaScript UDF, the body of this UDF must have a return statement that specifies a value. It could be a calculation result, a transformed data point, or any other value derived from the function's internal logic. As such, UDFs typically serve as a modular and reusable piece of logic within SQL statements, thereby improving the efficiency and maintainability of your database code.

When to Create a Stored Procedure or a UDF

Understanding the purpose of stored procedures and UDFs sets a foundation for deciding when to create and use each. Here are some general recommendations:

Creating a Stored Procedure

Consider creating a stored procedure when:

You're migrating an existing stored procedure from another application or system. Maintaining the logic within a procedure often simplifies the migration process and ensures functional consistency.
You need to perform administrative database operations. These operations include typical queries and data manipulation language (DML) tasks such as SELECT and UPDATE, as well as administrative tasks like data definition language (DDL) operations that involve deleting temporary tables, removing data older than a certain period, or adding users.

Creating a UDF

Consider creating a UDF when:

You're migrating an existing UDF from another application or system. This approach ensures that the logic remains encapsulated in a function, preserving the structural integrity of the original application's design.
You need a function that can be called as part of a SQL statement and must return a value for use in that statement.
Your output needs to include a value for every input row or every group. For example, you might use a UDF in a SQL statement like select MyFunction(col1) from table1; or select MyAggregateFunction(col1) from table1 group by col2;.

Usage and Behavior Differences

Stored Procedures and UDFs have several behavioral differences that impact their usage and suitability for different tasks in Snowflake. These differences include return values, how they are used in SQL, the context they can be called in, the number that can be called in one statement, and their ability to access the database.

UDFs Return a Value; Stored Procedures Need Not

Both stored procedures and functions can return a value. However, the body of a stored procedure is allowed, but not required, to explicitly return a value, such as an error indicator.

UDF Return Values Are Directly Usable in SQL; Stored Procedure Return Values May Not Be

The value returned by a stored procedure cannot be used directly in SQL, unlike the value returned by a function. Even though a stored procedure can return a value, the syntax of the CALL command does not provide a place to store the returned value or a way to operate on it or pass the value to another operation. However, there are indirect ways to use the return value of a stored procedure.

UDFs Can Be Called In the Context of Another Statement; Stored Procedures Are Called Independently

A stored procedure does not evaluate to a value and cannot be used in all contexts in which a general expression can be used. For example, you cannot execute SELECT my_stored_procedure().... On the other hand, a UDF evaluates to a value and can be used in contexts in which a general expression can be used, such as SELECT my_function() ....

Multiple UDFs May Be Called With One Statement; a Single Stored Procedure Is Called With One Statement

A single executable statement can call only one stored procedure. In contrast, a single SQL statement can call multiple functions. Similarly, a stored procedure, unlike a function, cannot be called as part of an expression. However, a stored procedure can call another stored procedure, or call itself recursively.

UDFs May Not Access the Database; Stored Procedures Can

Within a stored procedure, you can execute database operations, such as SELECT, UPDATE, and CREATE. In contrast, UDFs do not have access to an API that can perform database operations. This major difference makes stored procedures a powerful tool for database management and manipulation, while UDFs serve a more specific purpose of calculating and returning values.