An Overview of the Materialized View of Snowflake

Share on facebook
Share on twitter
Share on linkedin
Explore the Materialized View of Snowflake in detail. Read the blog to learn it in indepth.
Data Pipeline: Purpose, Types, Components and More
A data pipeline is a system that handles the processing, storage, and delivery of data.

Facebook disappeared – A bolt from the blue
For a short moment, everyone pondered, “Can Facebook also disappear? See what happens next!

Materialized overview

The primary purpose of Materialized View (MV) is to tweak up the query performance.
Materialized View is a pre-computed data set cumulated of query results, information and stored for later use. Querying a materialized view is quicker than the query against the base table of the view because the data is pre-computed. This exhibition distinction can be huge, when a question is run habitually or is adequately intricate. Consequently materialized views can accelerate expensive aggregation, projection, and selection activities, particularly that run frequently on enormous data sets.
Materialized view would rather avoid the ordinary view in the manner that it has its own physical information. While the standard view alludes to its fundamental information. You can consider a materialized view as a different object similar to a table., However, it consistently syncs with the data of its base table.
Here is the demonstration of the design for MV- The below diagram exhibits how MV is designed to query and link MV with the base table:
(a showcase of the below diagram)
As you probably are aware the physical files of MV are independent of the base table, almost like a new table. The metadata stored in the MV links the files to the source table to represent how far MV is from its base table.
The above diagram shows MV data in the red bar and base table data in the green bar as different versions of its object data. A single number below the green bar represents the table version, and 2 numbers below the red bar represent the MV version- here the first number indicates the MV version and the 2nd number in parentheses is the base table version. For instance, 1 (1) the first red MV version bar signifies that version 1 of MV is based on version 1 of its base table. 2 (4), the second red MV version bar signifies that version 2 of MV is based on version 4 of its base table.
Observantly, querying this MV currently, it cannot be done directly. As seen, there are more updates to be made on the MV version ( including version 5 and version 6), this is not an updated MV. And, if we query on MV, it will bring wrong results as neither the MV is refreshed nor the updated MV data on the base table. Aiming to solve the problem, we inaugurated the files as delta+ (triangle shape with a plus sign as illustrated in the above diagram) and the eliminated files as delta- (triangle shape with a minus sign as illustrated in the above diagram). The delta+ showcases the added files since the last MV refresh whereas the delta- stands for the deleted files since the last MV refresh.
In order to get the right results while querying for MV, we need to focus following basic options at the highest level:
  • The query for the last refreshed MV.
  • Remove the last data from MV since the last refreshed MV to the current base table version (delta+)
  • Add the updated data from MV since the last MV refreshed MV to the current base table version (delta-)

A foundation for better decisions for data engineers

A simplified query profile appears like the following ( inclusive of the 3 components as explained above):

Respectively, there can be two DMLs allowed to be used against an MV alongside the a CREATE statement of DDL.
Refresh: operation as a Snowflake service performed in the background.
Auto clustering: Add a clustering key to an MV and enable auto clustering on the MV ( another Snowflake service performed in the background).
The other two operations supposedly suspend or resume an MV.

Stats File compaction triggers the MV refresh on the base table.

1- No. of files changed on the base table since the last MV refresh.
2- No. of DML operations on the base table since the last refresh.
3- Stats File defragmentation or compaction service in Snowflake background.
The first two include the changes performed by either the end-user or done by the Snowflake services.

Build next-gen applications without any operational burdens

3 Ways to monitor : MV refresh in general

  • “Show materialized views” represents reshreshed_on timestamp and BEHIND_BY columns
  • MATERIALIZED_VIEW_REFRESH_HISTORY table function (in the Information Schema).
  • MATERIALIZED_VIEW_REFRESH_HISTORY View view (in Account Usage).

Commonly misinterpreted elements

  • INFORMATION_SCHEMA.VIEWS does not exhibit materialized views. Materialized views are shown by INFORMATION_SCHEMA.TABLES.
  • Auto clustering on the base table impacts MV maintenance and triggers refresh more often because any data changed on the base table is considered as MV refresh trigger. Also, it does not matter if the data is altered by the end-user of the Snowflake service.
  • MV can be shared with a consumer.
  • The additional column in the base table will not propagate to MV if the MV is defined as SELECT * from its base table. Therefore, the usage of select * is not recommended.
  • The resources monitor does not incorporate MV service as these resources monitors control virtual warehouses defined by the end-user, instead of snowflake provided warehouses.
Connect with our experts
Today to explore how Lyftrondata could help with data stack modernization with an agile, automatic columnar ELT pipeline and give you 95% faster performance.

Recent Posts