2023-03-12

SQLFluff

What is SQLFluff

SQLFluff is an Open Source Software (OSS) that automatically identifies and rectifies issues in your SQL format. It supports various SQL dialects (grammar), making it an effective and flexible tool for diverse SQL environments.

You can access SQLFluff and its latest updates from the official Github repository:

https://github.com/sqlfluff/sqlfluff

Unlike other languages, SQL does not have many specific writing guidelines. However, the trend of non-IT professionals, such as salespeople and marketers, writing SQL to analyze data is rapidly increasing. As more and more people use SQL to promote data analysis, the SQL queries generated will vary widely in writing style. Using SQLFluff to semi-forcedly unify the way queries are written can be seen as a good attempt to maintain a standard and enhance the efficiency of data analysis.

Installation

To install SQLFluff, you need to have Python3 installed on your system.

You can install SQLFluff using pip, which is a package manager for Python. Simply enter the following command:

bash
$ pip3 install sqlfluff

Using SQLFluff

SQLFluff provides two main functionalities: lint and fix.

Lint

The lint is designed to read SQL queries and highlight any issues found.

Consider the following SQL query:

example.sql
       SELECT
  ABC, XYZ
  froM
TMP

To use the lint, input the SQL query as follows:

bash
$ sqlfluff lint example.sql --dialect=bigquery

== [example.sql] FAIL
L:   1 | P:   1 | LT01 | Expected only single space before 'SELECT' keyword.
                       | Found '       '. [layout.spacing]
L:   1 | P:   1 | LT02 | First line should not be indented.
                       | [layout.indent]
L:   1 | P:   1 | LT13 | Files must not begin with newlines or whitespace.
                       | [layout.start_of_file]
L:   1 | P:   8 | LT09 | Select targets should be on a new line unless there is
                       | only one select target.
                       | [layout.select_targets]
L:   2 | P:   1 | LT02 | Expected indent of 4 spaces.
                       | [layout.indent]
L:   3 | P:   3 | CP01 | Keywords must be consistently upper case.
                       | [capitalisation.keywords]
L:   4 | P:   1 | LT02 | Expected indent of 4 spaces.
                       | [layout.indent]
All Finished πŸ“œ πŸŽ‰!

The lint will output several lines of text, each line identifying a specific issue in the SQL query.

In this output, L indicates the line number, and P indicates the position of the issue in that line. Next to these, you will see an error code like LT01, LT02, etc., followed by a message describing the identified issue.

For a more detailed understanding of the error codes returned by SQLFluff, refer to the official documentation:

https://docs.sqlfluff.com/en/stable/rules.html

dialect

You can choose from various dialects when using SQLFluff. Use the following command to list available dialects:

bash
$ sqlfluff dialects

==== sqlfluff - dialects ====
ansi:                 ansi dialect [inherits from 'nothing']
athena:                athena dialect [inherits from 'ansi']
bigquery:            bigquery dialect [inherits from 'ansi']
clickhouse:        clickhouse dialect [inherits from 'ansi']
databricks:    databricks dialect [inherits from 'sparksql']
db2:                      db2 dialect [inherits from 'ansi']
duckdb:            duckdb dialect [inherits from 'postgres']
exasol:                exasol dialect [inherits from 'ansi']
greenplum:      greenplum dialect [inherits from 'postgres']
hive:                    hive dialect [inherits from 'ansi']
materialize:   materialize dialect [inherits from 'postgres']
mysql:                  mysql dialect [inherits from 'ansi']
oracle:                oracle dialect [inherits from 'ansi']
postgres:            postgres dialect [inherits from 'ansi']
redshift:        redshift dialect [inherits from 'postgres']
snowflake:          snowflake dialect [inherits from 'ansi']
soql:                    soql dialect [inherits from 'ansi']
sparksql:            sparksql dialect [inherits from 'ansi']
sqlite:                sqlite dialect [inherits from 'ansi']
teradata:            teradata dialect [inherits from 'ansi']
tsql:                    tsql dialect [inherits from 'ansi']

Customizing SQLFluff Rules

SQLFluff allows detailed customization of its rules. You can configure all rules, individual rules, and even the scope of rule application.

The rule settings can be changed by writing your custom configuration in one of the following files:

  • setup.cfg
  • tox.ini
  • pep8.ini
  • .sqlfluff
  • pyproject.toml

For instance, you can configure the .sqlfluff file as follows:

.sqlfluff
[sqlfluff]
dialect = bigquery
exclude_rules = L036

In this case, all rules except for rule L036 will be applied.

fix

While the lint simply points out the issues in your SQL queries according to the set rules, the fix goes a step further. Not only does it identify the problems, but it also attempts to correct them automatically.

To use the fix, use the following command:

bash
$ sqlfluff fix example1.sql --dialect=bigquery

==== finding fixable violations ====
== [example.sql] FAIL
L:   1 | P:   1 | LT01 | Expected only single space before 'SELECT' keyword.
                       | Found '       '. [layout.spacing]
L:   1 | P:   1 | LT02 | First line should not be indented.
                       | [layout.indent]
L:   1 | P:   8 | LT09 | Select targets should be on a new line unless there is
                       | only one select target.
                       | [layout.select_targets]
L:   2 | P:   1 | LT02 | Expected indent of 4 spaces.
                       | [layout.indent]
L:   3 | P:   3 | CP01 | Keywords must be consistently upper case.
                       | [capitalisation.keywords]
L:   4 | P:   1 | LT02 | Expected indent of 4 spaces.
                       | [layout.indent]
==== fixing violations ====
6 fixable linting violations found

Are you sure you wish to attempt to fix these? [Y/n]

When you execute the command, SQLFluff will first find any violations of the rules. If it finds any, it will ask for confirmation before attempting to fix them. To proceed with the fixes, respond with Y.

Attempting fixes...
Persisting Changes...
== [example.sql] FIXED
Done. Please check your files to confirm.
All Finished πŸ“œ πŸŽ‰!

Once you confirm the fixes, SQLFluff will automatically make the necessary changes to your SQL query.

example.sql
SELECT
    ABC,
    XYZ
FROM
    TMP

Just like the lint, you can also customize the rules for the fix according to your needs.

References

https://github.com/sqlfluff/sqlfluff
https://docs.sqlfluff.com/en/stable/rules.html

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!