Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the dependency on SHOW statements. Use info_schema and pg_catalog more. #59

Merged
merged 1 commit into from
Jul 16, 2018
Merged

Conversation

knz
Copy link
Contributor

@knz knz commented Jul 4, 2018

Needed to fix cockroachdb/cockroach#26983
and to complete cockroachdb/cockroach#27098
Informs cockroachdb/cockroach#26993

Fixes #57.
Fixes #50.

@knz knz requested a review from bdarnell July 4, 2018 15:59
@knz knz changed the title Remove the dependency on SHOW statement. Use pg_catalog more. Remove the dependency on SHOW statements. Use pg_catalog more. Jul 4, 2018
@knz knz changed the title Remove the dependency on SHOW statements. Use pg_catalog more. Remove the dependency on SHOW statements. Use info_schema and pg_catalog more. Jul 4, 2018
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates!


def has_table(self, conn, table, schema=None):
# Upstream implementation needs pg_table_is_visible().
return any(t == table for t in self.get_table_names(conn, schema=schema))

# The upstream implementations of the reflection functions below depend on
# get_table_oid() which needs pg_table_is_visible().

# correlated subqueries which are not yet supported.
def get_columns(self, conn, table_name, schema=None, **kw):
res = []
# TODO(bdarnell): escape table name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO can go away now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for row in conn.execute('''
SELECT column_name, data_type, is_nullable::bool, column_default
FROM information_schema.columns
WHERE table_schema = %s AND table_name = %s''', (schema or self.default_schema_name, table_name)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a backwards-incompatible change in the handling of schemas. Previously we used schema as a catalog, and now it's being used as a schema. Maybe that's necessary for broader compatibility, but it may be disruptive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline. The outcome of that discussion is that we'll be assuming users were not using cross-db SQLAlchemy schemas in prod.

for row in conn.execute('''
SELECT index_name, column_name, (not non_unique::bool) as unique, implicit::bool as implicit
FROM information_schema.statistics
WHERE table_schema = %s AND table_name = %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is information_schema.statistics the right place to get this? Surprising (but I don't doubt it).

The TODO about escaping can be removed, as can the following lines about compatibility with ancient versions that don't have the implicit column. Since CI passes this is all compatible going back to 1.0.6, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, CI's not even returning results on this PR, but we didn't have it set as required so github showed the green button anyway. Looks like a build was run but failed due to an inability to load the right python version :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These queries on information_schema (and the others) will not work in CockroachDB 1.0 anyway.

Simplified the code accordingly.

def get_foreign_keys(self, conn, table_name, schema=None, **kw):
fkeys = []
# This method is the same as the one in SQLAlchemy's pg dialect, with
# a tweak to the FK regular expressions to tolerate whitespace between
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate we have to duplicate so much to change that regex , but I guess there's no way around it now.


class CockroachCompiler(PGCompiler_psycopg2):
# Ideally we'd like to inherit most of the code from
# PGIdentifierPreparer however SQLAlchemy does not export it. So we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's just as exported as PGDIalect is (which we depend on in dialect.py). Can't we import it from sqlalchemy.dialects.postgresql.base.PGIdentifierPreparer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, that works. Done.


# This is extracted from CockroachDB's `sql.y`. Add keywords here if *NEW* reserved keywords
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately we could future-proof this by querying the server's list of reserved words in dialect.initialize (this is where some other version check queries are done).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't do -- we need to keep all the "old" keywords too here. If we're to keep a list of all the keywords that have ever been keywords in the past, we have to start with the entire current list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need all the old keywords? The keyword list controls only controls quoting, and you only need to quote things that are considered keywords by the server you're talking to. This package needs to work with old servers, so we couldn't have a static list of the current version's keywords, but I don't see why loading the list dynamically wouldn't work.

There is a slight problem with doing this in dialect.initialize, in that this wouldn't refresh the keyword list if the server got rolling-upgraded out from under you, but that's an issue with the static list too. Ideally you'd refresh this on every connection, or maybe cache it based on a query of the server's version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok filed as #62.


from sqlalchemy.orm import sessionmaker, load_only
from sqlalchemy import func
from sqlalchemy import distinct

class AcrossSchemaTest(fixtures.TestBase):
TEST_DATABASE = 'test_sqlalchemy_across_schema'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant is no longer used. As a result, this test isn't really testing what it used to. What's important here is references across two user-defined collections of tables (whether you call them catalogs, schemas, or databases), not references between public and information_schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment about cross-db schemas.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments.

As we discussed separately, I will run the tests against a 2.0 server and see what happens (I have not done so yet).

We'll also refrain from announcing backward compat with cross-db queries in 1.0.


def has_table(self, conn, table, schema=None):
# Upstream implementation needs pg_table_is_visible().
return any(t == table for t in self.get_table_names(conn, schema=schema))

# The upstream implementations of the reflection functions below depend on
# get_table_oid() which needs pg_table_is_visible().

# correlated subqueries which are not yet supported.
def get_columns(self, conn, table_name, schema=None, **kw):
res = []
# TODO(bdarnell): escape table name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for row in conn.execute('''
SELECT column_name, data_type, is_nullable::bool, column_default
FROM information_schema.columns
WHERE table_schema = %s AND table_name = %s''', (schema or self.default_schema_name, table_name)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline. The outcome of that discussion is that we'll be assuming users were not using cross-db SQLAlchemy schemas in prod.

for row in conn.execute('''
SELECT index_name, column_name, (not non_unique::bool) as unique, implicit::bool as implicit
FROM information_schema.statistics
WHERE table_schema = %s AND table_name = %s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These queries on information_schema (and the others) will not work in CockroachDB 1.0 anyway.

Simplified the code accordingly.


# This is extracted from CockroachDB's `sql.y`. Add keywords here if *NEW* reserved keywords
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't do -- we need to keep all the "old" keywords too here. If we're to keep a list of all the keywords that have ever been keywords in the past, we have to start with the entire current list.


class CockroachCompiler(PGCompiler_psycopg2):
# Ideally we'd like to inherit most of the code from
# PGIdentifierPreparer however SQLAlchemy does not export it. So we
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, that works. Done.


from sqlalchemy.orm import sessionmaker, load_only
from sqlalchemy import func
from sqlalchemy import distinct

class AcrossSchemaTest(fixtures.TestBase):
TEST_DATABASE = 'test_sqlalchemy_across_schema'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment about cross-db schemas.

@knz
Copy link
Contributor Author

knz commented Jul 12, 2018

It took me a while but I succeeded: the code is now simultaneously compatible with CockroachDB 1.1, 2.0 and 2.1. (I tested.)

PTAL.

I am bumping the version because of the significant change in the versioning approach.

CHANGES.md Outdated
@@ -1,3 +1,10 @@
# Version 0.2.0

Released July 13, 2018
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update this date when this merges, and ping Justin or I to ship a release (Or I can give you permissions if you want to share ownership of this package).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will update and ping.

self._has_native_jsonb = True
sversion = connection.scalar("select version()")
self._is_v2plus = " v2" in sversion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be consistent about the inclusion of the leading space and trailing dot here and on the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sversion = connection.scalar("select version()")
self._is_v2plus = " v2" in sversion
self._is_v21plus = self._is_v2plus and ("v2.0." not in sversion)
self._has_native_json = self._is_v2plus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't _has_native_jsonb be set the same way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so. Done.

self._is_v21plus = self._is_v2plus and ("v2.0." not in sversion)
self._has_native_json = self._is_v2plus

def is_v2plus(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be used (and I don't think it should be; it's fine to just access the attribute directly)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks. Removed.

return fkeys

def get_pk_constraint(self, conn, table_name, schema=None, **kw):
if self._is_v21plus:
return PGDialect.get_pk_constraint(self, conn, table_name, schema, **kw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The preferred way to do this in python is return super(CockroachDBDialect, self).get_pk_constraint(conn, table_name, schema, **kw) (although it doesn't really make a difference unless multiple inheritance is used).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks I somehow knew that and still did it wrong. Thanks for pointing it out.


class AcrossSchemaTest(fixtures.TestBase):
TEST_DATABASE = 'test_sqlalchemy_across_schema'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, I'm fine with some backwards-incompatibility here (since it's already kind of broken). But we should have some way to make cross-database references going forward (it may not be common, but remember that this capability did originate with a user request in #30). I think the way to do this in SQLAlchemy is to use a dotted schema name that includes the database: schema=self.TEST_DATABASE + '.public', which means that we should recognize this and adjust our information_schema queries accordingly. But the base postgres dialect doesn't do this, so I'm not certain about this (maybe cross-database references are rare in postgres because people use schemas for this purposes instead. Our lack of user-created schemas makes cross-database support more important).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as you found out this does not work because the base dialect issues more queries on pg_catalog tables with the schema string as constrain on various pg_catalog columns.

I would like to move this PR forward nevertheless: the feature was broken in 2.0 anyway and this PR here is about ensuring that the code at least runs with 2.0 and 2.1.

I would really prefer to solve this problem by addressing cockroachdb/cockroach#26443

If we don't go in that direction we'll have no choice but to implement the entire sqlalchemy dialect from scratch.

This change also ensures the adapter is compatible across v1.1, v2.0
and v2.1.
@knz
Copy link
Contributor Author

knz commented Jul 16, 2018

I bumped the date and informed justin. Merging. Thanks for all the feedback!

@knz knz merged commit 57dea9a into cockroachdb:master Jul 16, 2018
@knz knz deleted the 20180704-bump branch July 17, 2018 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants