Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks truncates datatypes returned via DESCRIBE EXTENDED which is used by get_columns_in_relation() #779

Open
ShaneMazur opened this issue Aug 27, 2024 · 2 comments · Fixed by #796
Labels
bug Something isn't working

Comments

@ShaneMazur
Copy link

Describe the bug

Couldn't tell you the full impact of this bug but where I encountered it was while using on_schema_change="sync_all_columns".

Basically the bug led to truncated results that feed queries involved in handling alter statements when there are data type changes in a dataset.


Current Behaviour

This is because running the below truncates the data types

DESCRIBE EXTENDED <catalog>.<schema>.<table>

Truncated field example using DESCRIBE EXTENDED

struct<_info:struct<fieldA:string,fieldB:string>,fieldC:bigint,fieldD:string>,... 78 more fields>

Requested Behaviour

Ideally dbt databricks instead uses the below to acquire that information as it does not truncate data types

select
    column_name,
    full_data_type,
    comment
from <catalog>.information_schema.columns
where table_schema = <schema> and table_name = <table>

Steps To Reproduce

  1. Have a very complex (long datatype) struct field in your dataset
  2. Run any operation in dbt-databricks that looks up the datatype of that field via get_columns_in_relation()
  3. You will observe the struct field you created has truncated datatype

Expected Behaviour

  1. Have a very complex (long datatype) struct field in your dataset
  2. Run any operation in dbt-databricks that looks up the datatype of that field via get_columns_in_relation()
  3. You will observe the struct field you created does not have a truncated datatype

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

Core:
  - installed: 1.8.5
  - latest:    1.8.5 - Up to date!

Plugins:
  - databricks: 1.8.5 - Up to date!
  - spark:      1.8.0 - Up to date!

Additional context

  • Also causes problems when using dbt codegen as it utilizes the adapter to lookup the datatypes of columns
@ShaneMazur ShaneMazur added the bug Something isn't working label Aug 27, 2024
@benc-db
Copy link
Collaborator

benc-db commented Sep 13, 2024

Thanks for reporting, will investigate

@benc-db
Copy link
Collaborator

benc-db commented Sep 18, 2024

Need to reopen the issue. I tried to implement the suggested fix and discovered that there is often sync latency between UC and Delta that causes the information_schema to be out of date. I can fix that issue by forcing sync, but only if the table is delta; The fix is more complicated that I originally implemented, so reopening this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants