2022-08-11

BigQuery

What is BigQuery

BigQuery is a data warehouse provided by Google Cloud Platform (GCP). It serves as a powerful tool for performing advanced data analysis, allowing users to swiftly extract information from massive volumes of data. Notably, it possesses the remarkable ability to perform aggregation with regular expression matching on datasets exceeding 12 billion rows within a matter of seconds.

BigQuery's high performance and flexibility make it applicable in various business scenarios.

  • Log Analysis
    It efficiently processes large amounts of log data, enabling detailed analysis of user behavior and system operations. This capability facilitates tasks such as website optimization and root cause analysis of issues, enhancing diverse activities effectively.

  • Operational Efficiency
    By aggregating and analyzing data related to business processes, companies can identify bottlenecks and improvement areas, thereby promoting efforts towards operational efficiency.

  • Real-time Analytics
    With support for data streaming insertion, BigQuery facilitates rapid analysis of real-time data flows. This capability empowers real-time decision-making and swift business actions.

BigQuery Architecture

BigQuery architecture
BigQuery explained: An overview of BigQuery's architecture

  • Distributed Storage
    BigQuery's foundational storage system employs a distributed architecture. This distributes data across multiple servers, ensuring high fault tolerance and scalability. Users can enjoy consistent high-speed performance regardless of data volume.

  • Distributed Memory Shuffle
    This technology swiftly redistributes and aggregates data in memory. It efficiently processes complex queries even on massive datasets.

  • High-Available Cluster Compute
    BigQuery's computing clusters are highly available, recovering rapidly from failures and maintaining processing continuity. This guarantees stable execution of critical business analyses without interruptions.

  • Petabit Network
    A high-speed network forms the backbone of BigQuery. Leveraging this petabit-class network enables rapid data movement and processing.

Key Features of BigQuery

  • Scalability of Analytics
    BigQuery supports analysis at various scales, from small datasets to petabyte-scale massive data. Its secret lies in its underlying distributed architecture, providing consistent high-speed performance regardless of data volume fluctuations.

  • High Availability of Storage
    To minimize the risk of data loss or corruption, BigQuery redundantly stores data in multiple locations. This ensures data access continuity even in the event of hardware or server failures, avoiding business disruptions.

  • Serverless
    BigQuery adopts a serverless architecture, relieving users of infrastructure management and scaling concerns. Users can focus on querying and storing data while Google handles backend management.

  • ANSI-Compliant Standard SQL
    Users can query data using standard SQL, making migration from other databases and tools easy. Data analysts and engineers can leverage their existing knowledge effectively.

  • Real-time Data Analysis and Streaming Insertion
    Real-time data flows can be rapidly analyzed. Streaming data can be inserted directly into BigQuery, allowing immediate querying of the data.

  • Integration with Machine Learning
    BigQuery ML enables creating and evaluating machine learning models using SQL queries. Data scientists and engineers can perform data-driven predictions and analyses without complex processes.

  • Integration with BI Tools and Spreadsheets
    BigQuery seamlessly integrates with BI tools like Looker and spreadsheet applications, facilitating efficient data visualization, reporting, and analysis.

BigQuery Pricing

BigQuery operates as a serverless data warehouse, charging based on actual usage, which is a significant feature. This model allows conducting data analysis based on actual consumption without initial investments or fixed costs.

BigQuery charges mainly consist of query fees and storage fees, with additional charges for specific features or services like streaming insertion and BigQuery Storage API usage.

Query Fees

Query fees have two models: On-Demand and Flat Rate.

  • On-Demand
    In this model, charges are based on the amount of data scanned by queries. Usage is priced at $5 per TB, with the first 1 TB per month being free.
  • Flat Rate
    This model involves purchasing virtual CPUs called slots for usage. It provides dedicated processing capacity at a fixed cost.

Storage Fees

There are two types of storage costs associated with data retention.

  • Active Storage
    Charges apply to tables or table partitions that have been modified within the last 90 days. The cost is $0.020 per GB.
  • Long-Term Storage
    This fee applies to tables or partitions that haven't been modified for 90 consecutive days or more. It offers a 50% discount compared to regular storage fees. There is no difference in performance, durability, or availability between active storage and long-term storage.

The first 10 GB of storage usage per month is free.

References

https://cloud.google.com/bigquery/docs/introduction
https://cloud.google.com/blog/products/data-analytics/new-blog-series-bigquery-explained-overview
https://cloud.google.com/bigquery/pricing

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!