You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding "replicated_deduplication_window=0" setting creates duplicates in DBT entities with a ReplicatedMergeTree engine. The table is not incremental, and I've added allow_automatic_deduplication: "True" to profiles.yml but it doesn't help.
In case I'm launching the same DBT-generated script for table creation manually with SETTINGS replicated_deduplication_window=0 removed - it works fine returning 2 records from table as a result
Steps to reproduce
Create test model "test_dbt_model"
with source_data as (
select 1 as id
union all
select null as id
)
select *
from source_data
Set it's config:
engine: ReplicatedMergeTree()
materialized: table
Run dbt model dbt run -s test_dbt_model
The final result shows 4 records inserted, 2 pairs of similar records:
default> select *
from dbt.test_dbt_model
limit 1000
[2023-12-04 22:30:59] 4 rows retrieved starting from 1 in 110 ms (execution: 100 ms, fetching: 10 ms)
Expected behaviour
DBT creates a test_dbt_model table with 2 rows: id = 1 and id is null
create table dbt.test_dbt_model__dbt_backup
ON CLUSTER "test_v2"
engine = ReplicatedMergeTree()
order by (tuple())
SETTINGS replicated_deduplication_window=0
as (
with source_data as (
select 1 as id
union all
select null as id
)
select *
from source_data
)
Thanks for including complete debugging information with this report!
Before ClickHouse v 23.7, dbt does a CREATE TABLE AS SELECT ... and inserts data at the same time as table creation. Since this gets executed ON CLUSTER, with ClickHouse deduplication off each CREATE TABLE statement inserts the same data. After ClickHouse v 23.7, dbt does a CREATE TABLE AS SELECT ... EMPTY and then executes the INSERT INTO statement directly, without an ON CLUSTER clause.
So it's pretty clear that allow_automatic_deduplication: True is required for older ClickHouse versions, and in the next release we'll make that the default if an older version is detected.
Unfortunately there was a subtle bug in the profile handling where allow_automatic_deduplication: True was not actually working, and this will hopefully be fixed in 1.6.1 as well.
Describe the bug
Adding "replicated_deduplication_window=0" setting creates duplicates in DBT entities with a ReplicatedMergeTree engine. The table is not incremental, and I've added allow_automatic_deduplication: "True" to profiles.yml but it doesn't help.
In case I'm launching the same DBT-generated script for table creation manually with SETTINGS replicated_deduplication_window=0 removed - it works fine returning 2 records from table as a result
Steps to reproduce
dbt run -s test_dbt_model
Expected behaviour
DBT creates a test_dbt_model table with 2 rows: id = 1 and id is null
Code examples, such as models or profile settings
dbt and/or ClickHouse server logs
Generated SQL on insert:
Configuration
Environment
ClickHouse server
CREATE TABLE
statements for tables involved:The text was updated successfully, but these errors were encountered: