Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Database tests and Snowflake Dialect - part 3 out of ... #10458

Merged
merged 83 commits into from
Jul 10, 2024

Conversation

radeusgd
Copy link
Member

@radeusgd radeusgd commented Jul 5, 2024

Pull Request Description

  • Related to Snowflake Dialect #9486
  • Fixes types in literal tables that are used throughout the tests
  • Tries to makes testing faster by disabling some edge cases, trying batching some queries, re-using the main connection and trying to re-use tables more
  • Implements date/time type mapping and operations for Snowflake
  • Updates type mapping to correctly reflect what Snowflake does
    • Disables warnings for Integer->Decimal coercion as that's too annoying and implicitly understood in Snowflake
    • Allows to select a Decimal column with ..By_Type ..Integer (only in Snowflake backend) because the Decimal column there is its 'de-facto' Integer column replacement.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.

@radeusgd radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Jul 5, 2024
@radeusgd radeusgd self-assigned this Jul 5, 2024

public class SnowflakeJDBCUtils {
private static final DateTimeFormatter dateTimeWithOffsetFormatter =
DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS XXX");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: this is probably why I have the 'remove T' logic later. I probably should update this to

Suggested change
DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS XXX");
DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSSSSSSSS XX");

or sth

Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check we read in integers as Long columns.

@@ -408,11 +366,11 @@ add_specs suite_builder setup =
t = table_builder [["X", [Nothing, 1, 2, 3000]], ["Y", [Nothing, True, False, True]]]

c1 = t.at "X" . cast Value_Type.Char
c1.value_type.is_text . should_be_true
c1.value_type . should_be_a (Value_Type.Char ...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this better than .is_text?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if the test fails, is_text.should_be_true gives us "expected False to be True" which tells me nothing.

The new approach tells me e.g. "expected ... built with constructor Char but got ... built with constructor Integer" - which is much more informative as to what went wrong.

values_vec.zip dps_vec v-> dp-> f v dp use_bankers
Batch_Runner.Value batch f

run self (use_bankers : Boolean) (action : Batch_Builder -> Nothing) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run self (use_bankers : Boolean) (action : Batch_Builder -> Nothing) =
run self (use_bankers : Boolean) (action : (Number -> Integer -> Check_Instance) -> Nothing) =

What is the advantage of batching the operations like this? How does it make it more efficient?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Snowflake that is running in the cloud, the minimum latency from my computer to the Frankfurt data center is ~15-20ms. Usually the Snowflake query overhead totals about 70ms. Merely executing the computation is much less - that's why e.g. Postgres tests were much faster.

With batching we pay the roundtrip/overall query overhead only once per batch, so the execution time is faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made me check the efficiency and ooops - there's basically no significant difference!

Test Without batching With batching
Can round positive decimals correctly 1409ms 1107ms
Can round negative decimals correctly 1013ms 2261ms
Explicit and implicit 0 decimal places work the same 606ms 395ms
Can round zero and small decimals correctly 509ms 1233ms
Can round positive decimals to a specified number of decimal places 2011ms 3930ms
Can round negative decimals to a specified number of decimal places 1808ms 3504ms
Can round positive decimals to a specified negative number of decimal places 1656ms 1368ms
Can round negative decimals to a specified negative number of decimal places 1743ms 949ms

So the problem is - when I call round on a DB_Column, it checks its value_type, triggering a small no-results query that asks the DB for the type.

Thus regardless if I batch run N rounds on separate queries or batch them in a single query, I still get: N+1 queries with batching (N type checks and 1 more complex execute) vs 2*N queries without batching (type check + execute for each operation). Apparently, timing wise the result is similar.

I want to fix the batching and see if it gets better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in f9bbbe0 I have changed how we work with the SQL_Type_Reference - now if a statement containing a column with that type is read in, its type is cached into the reference - so that a separate query for reading the reference is avoided. We only perform a separate types-only query if the type is checked before querying.

With that fix the results are much better:

Test Without batching (old measurement) With fixed batching
Can round positive decimals correctly 1409ms 673ms
Can round negative decimals correctly 1013ms 253ms
Explicit and implicit 0 decimal places work the same 606ms 207ms
Can round zero and small decimals correctly 509ms 165ms
Can round positive decimals to a specified number of decimal places 2011ms 291ms
Can round negative decimals to a specified number of decimal places 1808ms 371ms
Can round positive decimals to a specified negative number of decimal places 1656ms 419ms
Can round negative decimals to a specified negative number of decimal places 1743ms 423ms

Now the speed-up is 2-6x so the batching is worth it. With also disabling of some edge case tests, the overall Rounding part is down from ~60s to 4s.

# Conflicts:
#	distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Type_Mapping.enso
#	distribution/lib/Standard/Test/0.0.0-dev/src/Suite.enso
#	test/Table_Tests/src/Common_Table_Operations/Column_Operations_Spec.enso
#	test/Table_Tests/src/Common_Table_Operations/Map_To_Table_Spec.enso
#	test/Table_Tests/src/Common_Table_Operations/Util.enso
@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Jul 10, 2024
@mergify mergify bot merged commit 48c1784 into develop Jul 10, 2024
36 checks passed
@mergify mergify bot deleted the wip/radeusgd/snowflake-dialect-3 branch July 10, 2024 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants