2023-04-07

B-Tree Index

What is B-Tree Index

B-Tree, an abbreviation for Balanced Tree, is a data structure that is commonly employed in the database domain for indexing. A B-Tree index helps in efficiently locating and retrieving data in a database by maintaining a sorted structure of the data in a hierarchical manner.

Imagine a tree where the topmost part is known as the root, the bottom ends are called leaves, and the in-between sections are termed branches or internal nodes. In a B-Tree, each node contains a certain number of keys and child pointers. The keys within a node are sorted, and each pointer points to a child node. The main characteristic of a B-Tree is that it is balanced, meaning that all leaf nodes are at the same depth, ensuring an equal number of traversals to reach any leaf.

Advantages of B-Tree Index

B-Tree Indexes come with various advantages, which make them a popular choice for database indexing:

  • Logarithmic Data Access
    B-Trees are height-balanced, which means that data can be accessed, inserted, and deleted in logarithmic time.

  • Efficient Disk Reads
    Since databases typically store data on disk, the tree structure of a B-Tree is efficient for minimizing disk reads.

  • Range Queries
    B-Trees are especially effective for range queries, as they can quickly locate the beginning of the range and then scan through the data linearly.

Which Column to Create the Index

I will explain the considerations you should make when deciding which column or set of columns to index in your database. Creating indexes without proper consideration can lead to suboptimal performance. There are several factors to consider, such as table size, cardinality, and the use of WHERE clauses in your queries.

Table Size

When it comes to table size, it's crucial to understand that indexing is not always beneficial. For small tables, a full table scan might be faster than using an index because the overhead of indexing may outweigh its benefits. However, as a table grows, indexes become increasingly vital for efficient data retrieval.

As a rule of thumb, indexing is more beneficial when the size of the table is large. It is often recommended to keep monitoring the growth of your table and create indexes when you start observing performance degradation in your queries.

Cardinality

Cardinality refers to the number of distinct values in a column. High cardinality means that the column contains a large number of unique values, while low cardinality means that there are only a few distinct values.

High-cardinality columns are often good candidates for indexing. Since these columns have a wide range of values, using an index can significantly reduce the number of disk reads required for a query.

On the other hand, for low-cardinality columns, an index might not be as effective. Since there are only a few unique values, a query will likely need to read a large portion of the table anyway, making the benefits of an index marginal.

Where Clause

The columns that are frequently used in the WHERE clause of your queries are often good candidates for indexing. This is because the WHERE clause is used to filter rows, and an index can help the database engine locate the relevant rows more quickly.

For example, if you frequently run queries that filter data based on a specific date range, it might be beneficial to index the date column.

Creating B-Tree Index

The general SQL syntax for creating a B-Tree index on a single column is as follows:

sql
CREATE INDEX index_name ON table_name (column_name);

For creating a composite index on multiple columns, the syntax is slightly different:

sql
CREATE INDEX index_name ON table_name (column1_name, column2_name, ...);

Some databases allow you to explicitly specify the type of index to create, including B-Tree. For example, in PostgreSQL:

sql
CREATE INDEX index_name ON table_name USING btree (column_name);

Meaningless Patterns of B-Tree Index

I will introduce certain scenarios or patterns where using a B-Tree index does not offer any performance benefits and may sometimes even lead to inefficiencies.

Operations are Performed on the Index Columns

When you apply a function or any operation to a column before comparing it to a value, the database might not be able to use the B-Tree index effectively. For example:

sql
SELECT * FROM employees WHERE age * 2 > 60;

In this example, the database cannot efficiently use the index on the age column because it applies 2 to each entry before making the comparison.

SQL is Being Applied to the Index Column

Similar to applying operations, applying SQL functions to the index column can also render the index useless. For example:

sql
SELECT * FROM employees WHERE UPPER(last_name) = 'SMITH';

In this query, if there is an index on last_name, it might not be used efficiently since the database has to apply the UPPER() function to each entry in the column.

Using the IS NULL Predicate

B-Tree indexes may not always be efficient for columns with a large number of NULL values, especially when querying for NULL values. For example:

sql
SELECT * FROM employees WHERE middle_name IS NULL;

If middle_name contains a high number of NULL values, using a B-Tree index on this column may not provide any significant performance benefit for this query.

Using OR

When using the OR operator, the database may sometimes perform a full table scan instead of using the index, especially when the columns involved have low cardinality.

sql
SELECT * FROM orders WHERE status = 'SHIPPED' OR status = 'PROCESSING';

In this example, if the status column has very few unique values, the index may not be effective.

Using Backward and Middle Matching LIKE Predicates

When using LIKE with patterns that don’t match the beginning of the string, the B-Tree index won't be used efficiently.

sql
SELECT * FROM products WHERE product_name LIKE '%laptop%';

In this query, the database can't take advantage of an index on product_name due to the use of a wildcard at the start of the pattern.

Implicit Type Conversion is Performed

Implicit type conversions can prevent the efficient use of indexes. For example:

sql
SELECT * FROM employees WHERE department_id = '10';

If department_id is of integer type, the database has to perform a type conversion before it can compare values, which can lead to inefficient index usage.

Primary Key and Unique Constraints with B-Tree Index

Here I will explain the relationship between primary keys, unique constraints, and B-Tree indexes in databases.

Primary Key and B-Tree Index

A primary key is a column or a set of columns in a table that uniquely identifies each row in that table. In relational databases, primary keys ensure data integrity by making sure that there are no duplicate rows and that every row can be uniquely identified.

When you define a primary key constraint on a column or set of columns, the database management system automatically creates a unique B-Tree index on those columns. This is because the primary key constraint requires the database to efficiently enforce uniqueness and quickly lookup rows by the primary key. A B-Tree index is well-suited for this purpose due to its logarithmic search characteristics.

Therefore, there's no need to manually create an index on a column that has already been defined as a primary key because a unique B-Tree index is automatically created and managed by the database.

Unique Constraints and B-Tree Index

While primary keys ensure uniqueness for the identifying columns of a table, there are scenarios where you may want to ensure that columns other than the primary key are also unique. This is where unique constraints come into play. A unique constraint ensures that the data contained in a column, or a combination of columns, is unique across all the rows in the table.

Similar to primary keys, when you define a unique constraint on a column or set of columns, the database management system automatically creates a unique B-Tree index on those columns. This index helps the database efficiently enforce the uniqueness constraint.

As with primary keys, there is no need to create an additional index on columns that have a unique constraint, as a B-Tree index is already automatically created and managed by the database.

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!