Skip to content

Commit

Permalink
feat: add files for dbt materialized views
Browse files Browse the repository at this point in the history
  • Loading branch information
emily-flambe committed Oct 12, 2022
1 parent d9fe3a2 commit 50e809c
Show file tree
Hide file tree
Showing 23 changed files with 741 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
target/
dbt_modules/
logs/
90 changes: 90 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
## dbt_labs_materialized_views

`dbt_labs_materialized_views` is a dbt project containing materializations, helper macros, and some builtin macro overrides that enable use of materialized views in your dbt project. It takes a conceptual approach similar to that of the existing `incremental` materialization:
- In a "full refresh" run, drop and recreate the MV from scratch.
- Otherwise, "refresh" the MV as appropriate. Depending on the database, that could require DML (`refresh`) or no action.

At any point, if the database object corresponding to a MV model exists instead as a table or standard view, dbt will attempt to drop it and recreate the model from scratch as a materialized view.

Materialized views vary significantly across databases, as do their current limitations. Be sure to read the documentation for your adapter.

If you're here, you may also like the [dbt-materialize](https://github.com/MaterializeInc/materialize/tree/main/misc/dbt-materialize) plugin, which enables dbt to materialize models as materialized views in [Materialize](https://materialize.io/).

## Setup

### General installation:

You can install the materialized-view funcionality using one of the following methods.

- Install this project as a package ([package-management docs](https://docs.getdbt.com/docs/building-a-dbt-project/package-management))
- [Local package](https://docs.getdbt.com/docs/building-a-dbt-project/package-management#local-packages): by referencing the [`materialized-views`](https://github.com/dbt-labs/dbt-labs-experimental-features/tree/master/materialized-views) folder.
- [Git package](https://docs.getdbt.com/docs/building-a-dbt-project/package-management#git-packages) using [project subdirectories](https://docs.getdbt.com/docs/building-a-dbt-project/package-management#git-packages): again by referencing the [`materialized-views`](https://github.com/dbt-labs/dbt-labs-experimental-features/tree/master/materialized-views) folder.
- Copy-paste the files from `macros/` (specifically `default` and your adapter) into your own project.

### Extra installation steps for Postgres and Redshift

The Postgres and Redshift implementations both require overriding the builtin versions of some adapter macros. If you've installed `dbt_labs_materialized_views` as a local package, you can achieve this override by creating a file `macros/*.sql` in your project with the following contents:

```sql
{# postgres and redshift #}

{% macro drop_relation(relation) -%}
{{ return(dbt_labs_materialized_views.drop_relation(relation)) }}
{% endmacro %}

{% macro postgres__list_relations_without_caching(schema_relation) %}
{{ return(dbt_labs_materialized_views.postgres__list_relations_without_caching(schema_relation)) }}
{% endmacro %}

{% macro postgres_get_relations() %}
{{ return(dbt_labs_materialized_views.postgres_get_relations()) }}
{% endmacro %}

{# redshift only #}

{% macro redshift__list_relations_without_caching(schema_relation) %}
{{ return(dbt_labs_materialized_views.redshift__list_relations_without_caching(schema_relation)) }}
{% endmacro %}

{% macro load_relation(relation) %}
{{ return(dbt_labs_materialized_views.redshift_load_relation_or_mv(relation)) }}
{% endmacro %}
```

## Postgres

- Supported model configs: none
- [docs](https://www.postgresql.org/docs/9.3/rules-materializedviews.html)

## Redshift

- Supported model configs: `sort`, `dist`, `auto_refresh`
- [docs](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html)
- Anecdotally, `refresh materialized view ...` is very slow to run. By contrast, `auto_refresh` runs in the background, with minimal disruption to other workloads, at the risk of some small potential latency.
- ❗ MVs do not support late binding, so if an underlying table is cascade-dropped, the MV will be dropped as well. This would be fine, except that MVs don't include their "true" dependencies in `pg_depend`. Instead, a materialized view claims to depend on a table relation called `mv_tbl__[MV_name]__0`, in place of the name of the true underlying table (https://github.com/awslabs/amazon-redshift-utils/issues/499). As such, dbt's runtime cache is unable to reliably know if a MV has been dropped when it cascade-drops the underlying table. This package requires an override of `load_relation()` to perform a "hard" check (database query of `stv_mv_info`) every time dbt's cache thinks a `materializedview` relation may already exist.
- ❗ MVs do appear in `pg_views`, but the only way we can know that they're materialized views is that the `create materialized view` DDL appear in their `definition`, instead of just the SQL without DDL (standard views). There's another Redshift system table, `stv_mv_info`, but it can't effectively be joined with `pg_views` because they're different types of system tables.
- ❗ If a column in the underlying table renamed, or removed and readded (e.g. varchar widening), the materialized view cannot be refreshed:
```
Database Error in model test_mv (models/test_mv.sql)
Materialized view test_mv is unrefreshable as a column was renamed for a base table.
compiled SQL at target/run/dbt_labs_experimental_features_integration_tests/test_mv.sql
```

## BigQuery

- Supported model configs: `auto_refresh`, `refresh_interval_minutes`
- [docs](https://cloud.google.com/bigquery/docs/materialized-views-intro)
- ❗ Although BQ does not have `drop ... cascade`, if the base table of a MV is dropped and recreated, the MV also needs to be dropped and recreated:
```
Materialized view dbt-dev-168022:dbt_jcohen.test_mv references table dbt-dev-168022:dbt_jcohen.base_tbl which was deleted and recreated. The view must be deleted and recreated as well.
```

## Snowflake

- Supported model configs: `secure`, `cluster_by`, `automatic_clustering`, `persist_docs` (relation only)
- [docs](https://docs.snowflake.com/en/user-guide/views-materialized.html)
- ❗ Note: Snowflake MVs are only enabled on enterprise accounts
- ❗ Although Snowflake does not have `drop ... cascade`, if the base table table of a MV is dropped and recreated, the MV also needs to be dropped and recreated, otherwise the following error will appear:
```
Failure during expansion of view 'TEST_MV': SQL compilation error: Materialized View TEST_MV is invalid.
```
16 changes: 16 additions & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: 'dbt_labs_materialized_views'
version: '0.2.0'
config-version: 2
require-dbt-version: ">=1.0.0"

model-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
seed-paths: ["seed"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
4 changes: 4 additions & 0 deletions integration_tests/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

target/
dbt_modules/
logs/
31 changes: 31 additions & 0 deletions integration_tests/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
test-postgres:
dbt deps
dbt seed --target postgres --full-refresh
dbt run --target postgres --full-refresh --vars 'update: false'
dbt run --target postgres --vars 'update: true'
dbt test --target postgres

test-redshift:
dbt deps
dbt seed --target redshift --full-refresh
dbt run --target redshift --full-refresh --vars 'update: false'
dbt run --target redshift --vars 'update: true'
sleep 10 # wait for auto refresh
dbt test --target redshift

test-snowflake:
dbt deps
dbt seed --profile garage-snowflake --full-refresh
dbt run --profile garage-snowflake --full-refresh --vars 'update: false'
dbt run --profile garage-snowflake --vars 'update: true'
dbt test --profile garage-snowflake

test-bigquery:
dbt deps
dbt seed --target bigquery --full-refresh
dbt run --target bigquery --full-refresh --vars 'update: false'
dbt run --target bigquery --vars 'update: true'
dbt test --target bigquery

test-all: test-postgres test-redshift test-snowflake test-bigquery
echo "Completed successfully"
24 changes: 24 additions & 0 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@

name: 'dbt_labs_materialized_views_integration_tests'
version: '0.2.0'
config-version: 2

profile: 'integration_tests'

model-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
seed-paths: ["seed"]
macro-paths: ["macros"]

target-path: "target"
clean-targets:
- "target"
- "dbt_modules"

quoting:
identifier: false
schema: false

seeds:
quote_columns: false
27 changes: 27 additions & 0 deletions integration_tests/macros/overrides.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{# postgres + redshift #}

{% macro drop_relation(relation) -%}
{{ return(dbt_labs_materialized_views.drop_relation(relation)) }}
{% endmacro %}

{% macro postgres__list_relations_without_caching(schema_relation) %}
{{ return(dbt_labs_materialized_views.postgres__list_relations_without_caching(schema_relation)) }}
{% endmacro %}

{% macro postgres_get_relations() %}
{{ return(dbt_labs_materialized_views.postgres_get_relations()) }}
{% endmacro %}

{# redshift only #}

{% macro redshift__list_relations_without_caching(schema_relation) %}
{{ return(dbt_labs_materialized_views.redshift__list_relations_without_caching(schema_relation)) }}
{% endmacro %}

{% macro load_relation(relation) %}
{% if adapter.type() == 'redshift' %}
{{ return(dbt_labs_materialized_views.redshift_load_relation_or_mv(relation)) }}
{% else %}
{{ return(dbt.load_relation(relation)) }}
{% endif %}
{% endmacro %}
17 changes: 17 additions & 0 deletions integration_tests/models/base_tbl.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{{config(
materialized = 'incremental',
unique_key = 'id'
)}}

-- depends on: {{ref('seed_update')}}
-- depends on: {{ref('seed')}}

{% if is_incremental() %}

select * from {{ref('seed_update')}}

{% else %}

select * from {{ref('seed')}}

{% endif %}
11 changes: 11 additions & 0 deletions integration_tests/models/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
version: 2

models:
- name: test_mv_manual
tests:
- dbt_utils.equality:
compare_model: ref('expected')
- name: test_mv_auto
tests:
- dbt_utils.equality:
compare_model: ref('expected')
12 changes: 12 additions & 0 deletions integration_tests/models/test_mv_auto.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{{config(
materialized = 'materialized_view',
auto_refresh = true
)}}

select

gender,
count(*) as num

from {{ref('base_tbl')}}
group by 1
12 changes: 12 additions & 0 deletions integration_tests/models/test_mv_manual.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{{config(
materialized = 'materialized_view',
auto_refresh = false
)}}

select

gender,
count(*) as num

from {{ref('base_tbl')}}
group by 1
4 changes: 4 additions & 0 deletions integration_tests/packages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
packages:
- local: ../
- package: fishtown-analytics/dbt_utils
version: 0.6.4
3 changes: 3 additions & 0 deletions integration_tests/seed/expected.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
gender,num
Female,6
Male,4
6 changes: 6 additions & 0 deletions integration_tests/seed/seed.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
id,first_name,last_name,email,gender,ip_address
1,Jacqueline,Hunter,jhunter0@pbs.org,Male,59.80.20.168
2,Kathryn,Walker,kwalker1@ezinearticles.com,Female,194.121.179.35
3,Gerald,Ryan,gryan2@com.com,Male,11.3.212.243
4,Bonnie,Spencer,bspencer3@ameblo.jp,Female,216.32.196.175
5,Harold,Taylor,htaylor4@people.com.cn,Male,253.10.246.136
11 changes: 11 additions & 0 deletions integration_tests/seed/seed_update.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
id,first_name,last_name,email,gender,ip_address
1,Jacqueline,Hunter,jhunter0@pbs.org,Male,59.80.20.168
2,Kathryn,Walker,kwalker1@ezinearticles.com,Female,194.121.179.35
3,Gerald,Ryan,gryan2@com.com,Female,11.3.212.243
4,Bonnie,Spencer,bspencer3@ameblo.jp,Female,216.32.196.175
5,Harold,Taylor,htaylor4@people.com.cn,Male,253.10.246.136
6,Jack,Griffin,jgriffin5@t.co,Female,16.13.192.220
7,Wanda,Arnold,warnold6@google.nl,Female,232.116.150.64
8,Craig,Ortiz,cortiz7@sciencedaily.com,Male,199.126.106.13
9,Gary,Day,gday8@nih.gov,Male,35.81.68.186
10,Rose,Wright,rwright9@yahoo.co.jp,Female,236.82.178.100
52 changes: 52 additions & 0 deletions macros/bigquery/adapters.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{% macro bigquery_options() %}
{%- set opts = kwargs -%}
{%- set options -%}
OPTIONS({% for opt_key, opt_val in kwargs.items() if opt_val is not none %}
{{ opt_key }}={{ opt_val }}{{ "," if not loop.last }}
{%- endfor -%})
{%- endset %}
{%- do return(options) -%}
{%- endmacro -%}

{% macro bigquery__create_materialized_view_as(relation, sql, config) -%}

{%- set enable_refresh = config.get('auto_refresh', none) -%}
{%- set refresh_interval_minutes = config.get('refresh_interval_minutes', none) -%}
{%- set sql_header = config.get('sql_header', none) -%}

{{ sql_header if sql_header is not none }}

create materialized view {{relation}}
{{ dbt_labs_materialized_views.bigquery_options(
enable_refresh=enable_refresh,
refresh_interval_minutes=refresh_interval_minutes
) }}
as (
{{sql}}
)

{% endmacro %}


{% macro bigquery__refresh_materialized_view(relation, config) -%}

{%- set is_auto_refresh = config.get('auto_refresh', true) %}

{%- if is_auto_refresh == false -%} {# manual refresh #}

{% set refresh_command %}
call bq.refresh_materialized_view('{{relation|replace("`","")}}')
{% endset %}

{%- do return(refresh_command) -%}

{%- else -%} {# automatic refresh #}

{%- do log("Skipping materialized view " ~ relation ~ " because it is set
to refresh automatically") -%}

{%- do return(none) -%}

{%- endif -%}

{% endmacro %}
41 changes: 41 additions & 0 deletions macros/bigquery/materialized_view.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{% materialization materialized_view, adapter='bigquery' -%}

{% set full_refresh_mode = (should_full_refresh()) %}

{% set target_relation = this %}
{% set existing_relation = load_relation(this) %}
{% set tmp_relation = make_temp_relation(this) %}

{{ run_hooks(pre_hooks) }}

{% if existing_relation is none %}
{% set build_sql = dbt_labs_materialized_views.create_materialized_view_as(target_relation, sql, config) %}
{% elif existing_relation.is_view or existing_relation.is_table %}
{#-- Can't overwrite a view with a table - we must drop --#}
{{ log("Dropping relation " ~ target_relation ~ " because it is a " ~ existing_relation.type ~ " and this model is a materialized view.") }}
{% do adapter.drop_relation(existing_relation) %}
{% set build_sql = dbt_labs_materialized_views.create_materialized_view_as(target_relation, sql, config) %}
{% elif full_refresh_mode %}
{#-- create or replace not yet supported for materialized views --#}
{{ log("Dropping relation " ~ target_relation ~ " because replacing an existing materialized view is not supported.") }}
{% do adapter.drop_relation(existing_relation) %}
{% set build_sql = dbt_labs_materialized_views.create_materialized_view_as(target_relation, sql, config) %}
{% else %}
{% set build_sql = dbt_labs_materialized_views.refresh_materialized_view(target_relation, config) %}
{% endif %}

{% if build_sql %}
{% call statement("main") %}
{{ build_sql }}
{% endcall %}
{% else %}
{{ store_result('main', 'SKIP') }}
{% endif %}

{{ run_hooks(post_hooks) }}

{% do persist_docs(target_relation, model) %}

{{ return({'relations': [target_relation]}) }}

{%- endmaterialization %}
Loading

0 comments on commit 50e809c

Please sign in to comment.