Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support JSON/Protobuf #2969

Closed
petermattis opened this issue Oct 30, 2015 · 28 comments
Closed

sql: support JSON/Protobuf #2969

petermattis opened this issue Oct 30, 2015 · 28 comments
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community
Milestone

Comments

@petermattis
Copy link
Collaborator

Support manipulation of JSON/Protobuf data. This might mean adding JSON/Protobuf column types. Or it might mean adding builtin functions to manipulate TEXT columns that are JSON/Protobuf encoded. Need to do a little exploration and design work first to figure out which is the right path to take.

@petermattis petermattis added this to the 1.0 milestone Oct 30, 2015
@jess-edwards jess-edwards mentioned this issue Oct 30, 2015
78 tasks
@Slach
Copy link

Slach commented Dec 21, 2015

this'is a very usefull feature for semi-structured data

@MehSha
Copy link

MehSha commented Feb 10, 2016

problem is how we are going to select/mutate this columns. if they are treated as text, then what is the value? couldn't clients just unmarshal that text into whatever structure they have?

how we filter based on that JSON document content (select * from foo where JSONCOLUMN.Person.Id = 1 ?) what about updating part of document? how we run aggregate queries on values?
what if that attribute is an array itself? how we append/prepend/delete items? and how we query on that-arrays-items attributes?
these and many more are questions that to my experience, only Mongodb answered properly (but still some area like aggregation is still crappy with alien syntax), others like Couchbase did it, and its simply incomplete.
if you are going to add understanding of JSON/PROTOBUF to parser/ execution-engine, then it vastly increase it's scope (code, tests, maintenance etc)
feels a misdirection...
my 2 cent

@petermattis petermattis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed SQL labels Feb 13, 2016
@tbg
Copy link
Member

tbg commented Jun 3, 2016

See also #3895 (which had a couple of thumbs up, much discussion and in particular linked to https://github.com/mgutz/dat, the Go Postgres Data Access Toolkit).

@Linicks
Copy link
Contributor

Linicks commented Jul 13, 2016

All,
I have a use case where I would like to mix structured and unstructured data. In some instances, I would need one or more columns to store random data from various sensor groups, etc. Once implemented, will there be a limit on the number of these types of columns?

Thanks!
--Nick

@petermattis
Copy link
Collaborator Author

@Linicks We haven't even sketched out how a JSON/Protobuf column type would work, so it is hard to say with any certainty what the limits would be. If I were to guess, though, there wouldn't be any specific limit on the number of JSON/Protobuf columns within a table, but overall row size limits (for which we don't have any hard setting at the moment) would still apply.

@kulshekhar
Copy link

It's great to see that support for JSON datatype is on the 1.0 roadmap. Would we be able to index columns of this datatype in a way that'll allow efficient querying of object paths? In Postgres, for example, columns of type jsonb can be queried on just as efficiently as other traditional types if they are indexed.

@petermattis
Copy link
Collaborator Author

@kulshekhar We haven't sketched out the design yet so it's difficult to answer your question with any certainty. I would guess the answer is yes and we'll definitely look at the Postgres jsonb column type and implementation for inspiration.

@shaxbee
Copy link

shaxbee commented Aug 16, 2016

@petermattis we've got need for jsonb column types and functions in project I'm working on. We would love to contribute code given spec is agreed on.

@lwansbrough
Copy link

lwansbrough commented Mar 30, 2017

+1 for Protocol Buffers support. JSON is of course very valid too, however we do have a specific need for ProtoBuf. The challenge for JSON is choosing the correct data type for a given field without any prior knowledge of the field in order to create an index on that field. (I believe Postgres' jsonb type uses the first record as a schema, IIRC.) ProtoBuf provides this information, and so theoretically it should be easier to allow for the creation of indices on specific fields of the ProtoBuf data.

Of course, in order to create a Protocol Buffer field in Cockroach that is both indexable and queryable, Coachroach needs to be aware of the schema. You'd need to be able to assign a ProtoBuf schema to a field, and also be able to update that schema (ideally at runtime, so that no downtime is required to upgrade a Protocol Buffer field's schema.)

To my knowledge, CockroachDB would be the only database offering such a feature, but it would be hugely desirable to anyone looking to use ProtoBuf in a database.

It might be helpful to take a look at this, which I just found: https://github.com/google/mysql-protobuf/wiki Mind you I think it's rather strange that a demonstration isn't shown where bytes are inserted into the protobuf field.

@sergionalin
Copy link

sergionalin commented Jun 29, 2017

On JSON support: how about borrowing some ideas from MySQL?
I'm talking about saving the document in an indexed format (see links below), which would slow down writes a bit but would allow for faster selects, faster indexing and, what's even better, smart and fast partial updates (a feature that very few databases, even NoSQL ones, offer).

Overview:
http://mysqlserverteam.com/json-labs-release-native-json-data-type-and-binary-format/
Storage format specs:
https://dev.mysql.com/worklog/task/?id=8132

@NuSkooler
Copy link

I would really like to see functionality more akin to Postgres's JSON(B) support. For example:
https://www.postgresql.org/docs/9.6/static/functions-json.html

The ability to treat JSON first class with queries, indexing, etc. is extremely valuable.

@dianasaur323
Copy link
Contributor

Thanks for your feedback. We are actively working on scoping out this feature, so please provide as much detail as you would like in terms of use case / requirements. cc @nstewart

@xarg
Copy link

xarg commented Aug 16, 2017

We're avid users of JSONB on Postgres and using SQLAlchemy as the lib to connect to it. We're exploring CockroachDB. It's great, but JSONB support is the one big missing thing that does not allow us to migrate.

The good things in PG:

  • The ability to perform complex JSONB queries via operators ->>, @> and more.
  • The ability to index (GIN/btree) on an expression (super useful on huge JSONB docs).

The not so good:

  • The language selectors/operators are rather weird and unnatural. Having something closer to the actual JS way of selecting an element in JSON would be so much better without having to think twice about it.

For example if you have an object such as:

{
  "animals": {
    "cats": [
      {
        "id": 1,
        "name": "tom"
      },
      {
        "id": 2,
        "name": "mot"
      }
    ],
    "dogs": [
      {
        "id": 2,
        "name": "dog"
      }
    ]
  }
}

In PG if you want to select the animals key you do something like: json_field->>'animals'. However if you want to select a path then you have a totally different syntax: json_field#>'{animals,cats}'. I'm sure there are important reasons as to why it's like this (I don't know why). But as an end user I would prefer something more natural like:

json_field.animals - the db would figure out that json_field is a JSONB and adapt the selection accordingly - is this even possible?
json_field.animals.cats - same idea.
json_field.animals.cats[0]."id" - again, like JS - super easy to remember.

For operators such as contains, or exists in PG they are @>, ?, etc.. I can never remember them and even if I see them after I've written the query a few days ago I still can't remember what they do so I always have to look them up.
A better approach IMHO is to use functions like: contains(collection, item), exists(path), 'concat(field1, field2)'. It would be immediately clear what they are doing without having to look them up. I don't really mind if they are prefixed with something like 'json_' if it's a concern.

Hope my input helps.

@manigandham
Copy link

JSONPath is becoming the standard for navigating JSON documents (similar to XPath for XML). Postgres has commits toward this which should be released in v11. This is the best option for CockroachDB to implement too so there's a unified access language.

http://gmile.me/jsonpath/

@xarg
Copy link

xarg commented Aug 16, 2017

+1 for JSONPath! For simpler queries it would still be useful to have the usual JSON syntax.

I would love to have something like: json_path(json_field, '$.animals.cats.[*].name') - to select all cats names! It would be a dream come true.

@lwansbrough
Copy link

lwansbrough commented Aug 16, 2017

It may be a good idea to separate these features, as I would hope the implementations for JSON and Protobuf would look (perhaps dramatically) different. From a Protobuf user's perspective, I think we'd like to see a way to essentially create and update tables by passing a Protobuf schema definition to a create/alter table query. Whether or not data is stored on the backend as a protobuf is probably irrelevant, so long as its as compact as possible (we wouldn't be using protobuf if we weren't totally paranoid about bandwidth/storage.) I wouldn't want to see overhead introduced because of a choice to use a format designed to reduce overhead :)

I say "dramatically different", because I would suspect the way CDB would approach creating the table would be. Since JSON is all about being dynamic, you'd want to allow for flexible schemas in a JSON table/field. Whereas with protocol buffers, you may want the entire table to be definable as a protocol buffer, not just one field. Although the difference between a protocol buffer field vs. a protocol buffer table probably is irrelevant, as long as you can provide a schema update.

As for querying, I agree with the comments above that querying should be easy (I see no reason why JSONPath couldn't be used for PB) and fields need to be indexable, especially at arbitrary depths.

@manigandham
Copy link

@lwansbrough

What is the need for Protobuf in a relational database that already has data types and table schemas? This seems more like wanting a translation between Protobuf to SQL which can be done with an ORM, especially if you don't care about the backend storage itself.

@lwansbrough
Copy link

@manigandham I mean you can make the exact same argument about JSON. The idea is that given a schema defined externally to the DB (say, inside an application), it becomes very easy to delegate management of the database without having to provide SQL access or build an interfacing layer.

If the data can be stored as a protocol buffer while we're at it while maintain the same feature set then I say by all means, but I suspect the storage impact of using PB in that case vs. a custom solution is probably very minimal.

@lwansbrough
Copy link

@manigandham I can also tell you first hand that trying to build a conversion layer between a flat schema (SQL) and schemas with depth is tedious to say the least.

@nstewart nstewart modified the milestones: 1.2, Later Sep 7, 2017
@vtumuluri
Copy link

Postgres's JSONB support together with GIN indexing is wonderful for a lot of use cases involving mix of structured and unstructured data. However, like others mentioned, the JSON query semantics are slightly complicated and not trivial to reason about. There is a PG extension called JSQuery that tries to address many of the query limitations. I would love if CockroachDB could support the query/index capabilities of JSQuery, may be using a JSONPath interface.

@awoods187
Copy link
Contributor

Team, I'd like to introduce myself. My name is Andy and I am a PM here at CRDB. I'm happy to share with you that we have heard you loud and clear and are working to support JSONB in 1.2. To that end we would love to solicit your comments on our current RFC scope proposal found here. In particular, we would love to hear more about how you plan to use JSONB and if we are adequately thinking about your use case. How do you want to use JSONB? Please feel free to share in the comments below or directly at the RFC.

@xarg
Copy link

xarg commented Oct 4, 2017

@awoods187 great news! What's the schedule for 1.2?

@awoods187
Copy link
Contributor

@xarg 1.2 is planned for release in April 2018

@igavva
Copy link

igavva commented Dec 4, 2017

I look forward to this feature!

@justinj
Copy link
Contributor

justinj commented Jan 24, 2018

JSONB support will be in the 2.0 release in April, so I'm going to close this issue!

@justinj justinj closed this as completed Jan 24, 2018
@mofirouz
Copy link
Contributor

@justinj Do you have any links to documentation on JSONB support that you can share publicly here?

@justinj
Copy link
Contributor

justinj commented Feb 21, 2018

Hi @mofirouz, we don't have any documentation for JSONB yet, it will be ready by the time 2.0 releases in April. That said, it's very similar to Postgres' JSONB feature, and most of their docs should be applicable to its use in CockroachDB. You'll need to use a recent build of CockroachDB to have access to all the features in JSONB, though.

@igavva
Copy link

igavva commented Feb 22, 2018

Hi. It's work) Cockroach

root@:26257/> SELECT * FROM bank.json
-> ;
+----+-----------------------------------+
| id | data |
+----+-----------------------------------+
| 0 | {"bar":2,"foo":1} |
| 1 | {"bar":"x","baz":"hello","foo":2} |
| 2 | {"foo":-1,"goo":"hi"} |
| 3 | {"bup":[1,2,3],"foo":-2} |
+----+-----------------------------------+

root@:26257/> SELECT * FROM bank.json WHERE data ->> 'foo' > '0' ORDER BY data->>'foo' ASC LIMIT 1;
+----+-------------------+
| id | data |
+----+-------------------+
| 0 | {"bar":2,"foo":1} |
+----+-------------------+

root@:26257/> SELECT id,data->>'foo' AS count FROM bank.json WHERE data ->> 'foo' > '0' ORDER BY data->>'foo' ASC LIMIT 100;
+----+-------+
| id | count |
+----+-------+
| 0 | 1 |
| 1 | 2 |
+----+-------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community
Projects
None yet
Development

No branches or pull requests