2023-05-12

Database

What is a Database

A database is an organized collection of data stored and accessed electronically. It is designed to house large amounts of information in a way that allows for quick and efficient retrieval of specific data points. Databases support storage and manipulation of data and allow users to create, read, update, and delete data in a systematic and organized way.

Types of Databases

There are two types of databases, each designed to support specific needs and use cases:

  • Relational Databases
    These databases organize data into tables and rows, with each table consisting of a set of columns representing different types of information. Relationships can be created between different tables, hence the name "relational". MySQL, PostgreSQL, and SQLite are examples of relational database systems.

  • NoSQL Databases
    These databases are built to service specific needs such as storing key-value pairs, wide-column, graph, or document data. They are designed to excel in areas where traditional relational databases fall short, such as scalability and handling unstructured data. Examples include MongoDB, Cassandra, and Redis.

Database Management Systems

A Database Management System (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze data. A DBMS allows users to interact with the database. The data stored in the database can be manipulated and retrieved by the DBMS. Types of DBMS include Relational DBMS (RDBMS), Hierarchical DBMS, Network DBMS, and Object-Oriented DBMS.

Relational Databases

Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables. In a relational database, each row in the table is a record with a unique ID called the primary key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points.

For example, consider the following table which represents a simplistic view of a student database.

StudentID FirstName LastName Major GPA
1 John Doe Physics 3.6
2 Jane Smith Mathematics 3.8
3 Mike Johnson Chemistry 3.7
4 Alice Davis Biology 3.9

In the table above, each row represents a unique student (record) and each column represents an attribute of the student data.

Tables, Records, and Fields

In relational databases, tables are used to store data. The schema defines what fields belong to the database table. Each row in the table is a record. Each record consists of one or more fields. Fields are the different pieces of data that are stored for every record in the table.

SQL: The Language of Relational Databases

SQL (Structured Query Language) is a standard language for manipulating relational databases. It can be used to create, modify, and delete databases, tables, and records. It can also be used to search for specific records in a table.

For example, to search for students with a GPA higher than 3.7 in the previous table, you could use the following SQL query:

sql
SELECT * FROM Students WHERE GPA > 3.7;

This would return:

StudentID FirstName LastName Major GPA
2 Jane Smith Mathematics 3.8
4 Alice Davis Biology 3.9

NoSQL Databases

NoSQL stands for "Not Only SQL", referring to a type of database management system that differs from traditional relational databases.

Relational databases (SQL databases) save data in highly normalized tables and perform queries using relationships between these tables. This ensures a certain level of performance, but as the amount and complexity of data increase, issues can arise with scalability and performance. On the other hand, NoSQL databases were designed to solve these problems. NoSQL databases are non-relational and can use various data models.

Generally, NoSQL databases can efficiently process large amounts of data and maintain scalability and performance in a distributed environment. As a result, they are often used in both big data and real-time web applications.

However, it is important to understand that NoSQL is not the solution to all problems. There may be cases where the strict consistency and advanced query languages provided by relational databases are necessary. Therefore, which type of database to use depends on the requirements of a specific application or use case.

Types of NoSQL Databases

There are four main types of NoSQL databases:

  • Document Databases
    A subtype of NoSQL databases, they store data in a document-like format, often JSON. This type of database is schema-less, meaning the data can be stored in many different ways. This is advantageous when dealing with data that doesn't fit neatly into a table format and can be particularly useful when working with large amounts of complex, nested data. Examples include MongoDB and Elasticsearch.

  • Key-Value Stores
    Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Redis and Memcached.

  • Wide-Column Stores
    These store data in tables, rows, and dynamic columns. Wide-column stores offer high performance and a highly scalable architecture. Examples include Cassandra and HBase.

  • Graph Databases
    These databases are designed to handle data whose relations are best represented as a graph and have elements that are interconnected with an undetermined number of relationships between them. Examples include Neo4j and OrientDB.

Use Cases of NoSQL

While NoSQL databases are increasingly popular for many applications, they are particularly useful for working with large sets of distributed data. NoSQL databases are often used in big data and real-time web applications. They can store relationship data, semi-structured data, hierarchical data, and have the ability to scale horizontally.

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!