sql: support JSON/Protobuf #2969

petermattis · 2015-10-30T16:39:11Z

Support manipulation of JSON/Protobuf data. This might mean adding JSON/Protobuf column types. Or it might mean adding builtin functions to manipulate TEXT columns that are JSON/Protobuf encoded. Need to do a little exploration and design work first to figure out which is the right path to take.

The text was updated successfully, but these errors were encountered:

Slach · 2015-12-21T13:14:16Z

this'is a very usefull feature for semi-structured data

MehSha · 2016-02-10T15:25:06Z

problem is how we are going to select/mutate this columns. if they are treated as text, then what is the value? couldn't clients just unmarshal that text into whatever structure they have?

how we filter based on that JSON document content (select * from foo where JSONCOLUMN.Person.Id = 1 ?) what about updating part of document? how we run aggregate queries on values?
what if that attribute is an array itself? how we append/prepend/delete items? and how we query on that-arrays-items attributes?
these and many more are questions that to my experience, only Mongodb answered properly (but still some area like aggregation is still crappy with alien syntax), others like Couchbase did it, and its simply incomplete.
if you are going to add understanding of JSON/PROTOBUF to parser/ execution-engine, then it vastly increase it's scope (code, tests, maintenance etc)
feels a misdirection...
my 2 cent

tbg · 2016-06-03T15:43:55Z

See also #3895 (which had a couple of thumbs up, much discussion and in particular linked to https://github.com/mgutz/dat, the Go Postgres Data Access Toolkit).

Linicks · 2016-07-13T23:02:31Z

All,
I have a use case where I would like to mix structured and unstructured data. In some instances, I would need one or more columns to store random data from various sensor groups, etc. Once implemented, will there be a limit on the number of these types of columns?

Thanks!
--Nick

petermattis · 2016-07-14T19:28:25Z

@Linicks We haven't even sketched out how a JSON/Protobuf column type would work, so it is hard to say with any certainty what the limits would be. If I were to guess, though, there wouldn't be any specific limit on the number of JSON/Protobuf columns within a table, but overall row size limits (for which we don't have any hard setting at the moment) would still apply.

kulshekhar · 2016-08-03T11:25:09Z

It's great to see that support for JSON datatype is on the 1.0 roadmap. Would we be able to index columns of this datatype in a way that'll allow efficient querying of object paths? In Postgres, for example, columns of type jsonb can be queried on just as efficiently as other traditional types if they are indexed.

petermattis · 2016-08-03T13:56:02Z

@kulshekhar We haven't sketched out the design yet so it's difficult to answer your question with any certainty. I would guess the answer is yes and we'll definitely look at the Postgres jsonb column type and implementation for inspiration.

shaxbee · 2016-08-16T13:49:32Z

@petermattis we've got need for jsonb column types and functions in project I'm working on. We would love to contribute code given spec is agreed on.

lwansbrough · 2017-03-30T18:00:39Z

+1 for Protocol Buffers support. JSON is of course very valid too, however we do have a specific need for ProtoBuf. The challenge for JSON is choosing the correct data type for a given field without any prior knowledge of the field in order to create an index on that field. (I believe Postgres' jsonb type uses the first record as a schema, IIRC.) ProtoBuf provides this information, and so theoretically it should be easier to allow for the creation of indices on specific fields of the ProtoBuf data.

Of course, in order to create a Protocol Buffer field in Cockroach that is both indexable and queryable, Coachroach needs to be aware of the schema. You'd need to be able to assign a ProtoBuf schema to a field, and also be able to update that schema (ideally at runtime, so that no downtime is required to upgrade a Protocol Buffer field's schema.)

To my knowledge, CockroachDB would be the only database offering such a feature, but it would be hugely desirable to anyone looking to use ProtoBuf in a database.

It might be helpful to take a look at this, which I just found: https://github.com/google/mysql-protobuf/wiki Mind you I think it's rather strange that a demonstration isn't shown where bytes are inserted into the protobuf field.

sergionalin · 2017-06-29T12:26:09Z

On JSON support: how about borrowing some ideas from MySQL?
I'm talking about saving the document in an indexed format (see links below), which would slow down writes a bit but would allow for faster selects, faster indexing and, what's even better, smart and fast partial updates (a feature that very few databases, even NoSQL ones, offer).

Overview:
http://mysqlserverteam.com/json-labs-release-native-json-data-type-and-binary-format/
Storage format specs:
https://dev.mysql.com/worklog/task/?id=8132

NuSkooler · 2017-08-15T16:05:21Z

I would really like to see functionality more akin to Postgres's JSON(B) support. For example:
https://www.postgresql.org/docs/9.6/static/functions-json.html

The ability to treat JSON first class with queries, indexing, etc. is extremely valuable.

dianasaur323 · 2017-08-15T16:38:49Z

Thanks for your feedback. We are actively working on scoping out this feature, so please provide as much detail as you would like in terms of use case / requirements. cc @nstewart

xarg · 2017-08-16T17:49:54Z

We're avid users of JSONB on Postgres and using SQLAlchemy as the lib to connect to it. We're exploring CockroachDB. It's great, but JSONB support is the one big missing thing that does not allow us to migrate.

The good things in PG:

The ability to perform complex JSONB queries via operators ->>, @> and more.
The ability to index (GIN/btree) on an expression (super useful on huge JSONB docs).

The not so good:

The language selectors/operators are rather weird and unnatural. Having something closer to the actual JS way of selecting an element in JSON would be so much better without having to think twice about it.

For example if you have an object such as:

{
  "animals": {
    "cats": [
      {
        "id": 1,
        "name": "tom"
      },
      {
        "id": 2,
        "name": "mot"
      }
    ],
    "dogs": [
      {
        "id": 2,
        "name": "dog"
      }
    ]
  }
}

In PG if you want to select the animals key you do something like: json_field->>'animals'. However if you want to select a path then you have a totally different syntax: json_field#>'{animals,cats}'. I'm sure there are important reasons as to why it's like this (I don't know why). But as an end user I would prefer something more natural like:

json_field.animals - the db would figure out that json_field is a JSONB and adapt the selection accordingly - is this even possible?
json_field.animals.cats - same idea.
json_field.animals.cats[0]."id" - again, like JS - super easy to remember.

For operators such as contains, or exists in PG they are @>, ?, etc.. I can never remember them and even if I see them after I've written the query a few days ago I still can't remember what they do so I always have to look them up.
A better approach IMHO is to use functions like: contains(collection, item), exists(path), 'concat(field1, field2)'. It would be immediately clear what they are doing without having to look them up. I don't really mind if they are prefixed with something like 'json_' if it's a concern.

Hope my input helps.

manigandham · 2017-08-16T17:54:30Z

JSONPath is becoming the standard for navigating JSON documents (similar to XPath for XML). Postgres has commits toward this which should be released in v11. This is the best option for CockroachDB to implement too so there's a unified access language.

http://gmile.me/jsonpath/

xarg · 2017-08-16T18:05:12Z

+1 for JSONPath! For simpler queries it would still be useful to have the usual JSON syntax.

I would love to have something like: json_path(json_field, '$.animals.cats.[*].name') - to select all cats names! It would be a dream come true.

lwansbrough · 2017-08-16T21:02:21Z

It may be a good idea to separate these features, as I would hope the implementations for JSON and Protobuf would look (perhaps dramatically) different. From a Protobuf user's perspective, I think we'd like to see a way to essentially create and update tables by passing a Protobuf schema definition to a create/alter table query. Whether or not data is stored on the backend as a protobuf is probably irrelevant, so long as its as compact as possible (we wouldn't be using protobuf if we weren't totally paranoid about bandwidth/storage.) I wouldn't want to see overhead introduced because of a choice to use a format designed to reduce overhead :)

I say "dramatically different", because I would suspect the way CDB would approach creating the table would be. Since JSON is all about being dynamic, you'd want to allow for flexible schemas in a JSON table/field. Whereas with protocol buffers, you may want the entire table to be definable as a protocol buffer, not just one field. Although the difference between a protocol buffer field vs. a protocol buffer table probably is irrelevant, as long as you can provide a schema update.

As for querying, I agree with the comments above that querying should be easy (I see no reason why JSONPath couldn't be used for PB) and fields need to be indexable, especially at arbitrary depths.

manigandham · 2017-08-16T21:08:05Z

@lwansbrough

What is the need for Protobuf in a relational database that already has data types and table schemas? This seems more like wanting a translation between Protobuf to SQL which can be done with an ORM, especially if you don't care about the backend storage itself.

lwansbrough · 2017-08-16T21:21:08Z

@manigandham I mean you can make the exact same argument about JSON. The idea is that given a schema defined externally to the DB (say, inside an application), it becomes very easy to delegate management of the database without having to provide SQL access or build an interfacing layer.

If the data can be stored as a protocol buffer while we're at it while maintain the same feature set then I say by all means, but I suspect the storage impact of using PB in that case vs. a custom solution is probably very minimal.

lwansbrough · 2017-08-16T21:35:03Z

@manigandham I can also tell you first hand that trying to build a conversion layer between a flat schema (SQL) and schemas with depth is tedious to say the least.

vtumuluri · 2017-09-15T16:33:10Z

Postgres's JSONB support together with GIN indexing is wonderful for a lot of use cases involving mix of structured and unstructured data. However, like others mentioned, the JSON query semantics are slightly complicated and not trivial to reason about. There is a PG extension called JSQuery that tries to address many of the query limitations. I would love if CockroachDB could support the query/index capabilities of JSQuery, may be using a JSONPath interface.

awoods187 · 2017-10-03T19:54:51Z

Team, I'd like to introduce myself. My name is Andy and I am a PM here at CRDB. I'm happy to share with you that we have heard you loud and clear and are working to support JSONB in 1.2. To that end we would love to solicit your comments on our current RFC scope proposal found here. In particular, we would love to hear more about how you plan to use JSONB and if we are adequately thinking about your use case. How do you want to use JSONB? Please feel free to share in the comments below or directly at the RFC.

xarg · 2017-10-04T15:57:05Z

@awoods187 great news! What's the schedule for 1.2?

awoods187 · 2017-10-04T16:51:49Z

@xarg 1.2 is planned for release in April 2018

igavva · 2017-12-04T07:18:10Z

I look forward to this feature!

justinj · 2018-01-24T20:20:45Z

JSONB support will be in the 2.0 release in April, so I'm going to close this issue!

mofirouz · 2018-02-21T19:32:43Z

@justinj Do you have any links to documentation on JSONB support that you can share publicly here?

justinj · 2018-02-21T19:37:33Z

Hi @mofirouz, we don't have any documentation for JSONB yet, it will be ready by the time 2.0 releases in April. That said, it's very similar to Postgres' JSONB feature, and most of their docs should be applicable to its use in CockroachDB. You'll need to use a recent build of CockroachDB to have access to all the features in JSONB, though.

igavva · 2018-02-22T07:32:15Z

Hi. It's work) Cockroach

root@:26257/> SELECT * FROM bank.json
-> ;
+----+-----------------------------------+
| id | data |
+----+-----------------------------------+
| 0 | {"bar":2,"foo":1} |
| 1 | {"bar":"x","baz":"hello","foo":2} |
| 2 | {"foo":-1,"goo":"hi"} |
| 3 | {"bup":[1,2,3],"foo":-2} |
+----+-----------------------------------+

root@:26257/> SELECT * FROM bank.json WHERE data ->> 'foo' > '0' ORDER BY data->>'foo' ASC LIMIT 1;
+----+-------------------+
| id | data |
+----+-------------------+
| 0 | {"bar":2,"foo":1} |
+----+-------------------+

root@:26257/> SELECT id,data->>'foo' AS count FROM bank.json WHERE data ->> 'foo' > '0' ORDER BY data->>'foo' ASC LIMIT 100;
+----+-------+
| id | count |
+----+-------+
| 0 | 1 |
| 1 | 2 |
+----+-------+

petermattis added the SQL label Oct 30, 2015

petermattis added this to the 1.0 milestone Oct 30, 2015

jess-edwards mentioned this issue Oct 30, 2015

Product Roadmap #2132

Closed

78 tasks

petermattis added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed SQL labels Feb 13, 2016

tbg mentioned this issue Jun 3, 2016

sql: Question - JSON Support. For example QueryJSON PostreSQL support would be really great. #3895

Closed

bdarnell mentioned this issue Nov 7, 2016

future desirable: automatically add indexes to every column #10486

Closed

jseldess mentioned this issue Nov 21, 2016

Improve when to use CockroachDB copy cockroachdb/docs#824

Closed

dianasaur323 modified the milestones: Later, 1.0 Feb 6, 2017

dianasaur323 added O-community Originated from the community low priority labels Mar 14, 2017

nstewart modified the milestones: 1.2, Later Sep 7, 2017

vivekmenezes assigned m-schneider and justinj Oct 2, 2017

justinj closed this as completed Jan 24, 2018

Fale mentioned this issue Mar 18, 2018

SQL Feature Support in CockroachDB v2.0 Doc Update cockroachdb/docs#2743

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: support JSON/Protobuf #2969

sql: support JSON/Protobuf #2969

petermattis commented Oct 30, 2015

Slach commented Dec 21, 2015

MehSha commented Feb 10, 2016

tbg commented Jun 3, 2016 •

edited

Loading

Linicks commented Jul 13, 2016

petermattis commented Jul 14, 2016

kulshekhar commented Aug 3, 2016

petermattis commented Aug 3, 2016

shaxbee commented Aug 16, 2016

lwansbrough commented Mar 30, 2017 •

edited

Loading

sergionalin commented Jun 29, 2017 •

edited

Loading

NuSkooler commented Aug 15, 2017

dianasaur323 commented Aug 15, 2017

xarg commented Aug 16, 2017

manigandham commented Aug 16, 2017

xarg commented Aug 16, 2017 •

edited

Loading

lwansbrough commented Aug 16, 2017 •

edited

Loading

manigandham commented Aug 16, 2017

lwansbrough commented Aug 16, 2017

lwansbrough commented Aug 16, 2017

vtumuluri commented Sep 15, 2017

awoods187 commented Oct 3, 2017

xarg commented Oct 4, 2017

awoods187 commented Oct 4, 2017

igavva commented Dec 4, 2017

justinj commented Jan 24, 2018

mofirouz commented Feb 21, 2018

justinj commented Feb 21, 2018

igavva commented Feb 22, 2018

sql: support JSON/Protobuf #2969

sql: support JSON/Protobuf #2969

Comments

petermattis commented Oct 30, 2015

Slach commented Dec 21, 2015

MehSha commented Feb 10, 2016

tbg commented Jun 3, 2016 • edited Loading

Linicks commented Jul 13, 2016

petermattis commented Jul 14, 2016

kulshekhar commented Aug 3, 2016

petermattis commented Aug 3, 2016

shaxbee commented Aug 16, 2016

lwansbrough commented Mar 30, 2017 • edited Loading

sergionalin commented Jun 29, 2017 • edited Loading

NuSkooler commented Aug 15, 2017

dianasaur323 commented Aug 15, 2017

xarg commented Aug 16, 2017

manigandham commented Aug 16, 2017

xarg commented Aug 16, 2017 • edited Loading

lwansbrough commented Aug 16, 2017 • edited Loading

manigandham commented Aug 16, 2017

lwansbrough commented Aug 16, 2017

lwansbrough commented Aug 16, 2017

vtumuluri commented Sep 15, 2017

awoods187 commented Oct 3, 2017

xarg commented Oct 4, 2017

awoods187 commented Oct 4, 2017

igavva commented Dec 4, 2017

justinj commented Jan 24, 2018

mofirouz commented Feb 21, 2018

justinj commented Feb 21, 2018

igavva commented Feb 22, 2018

tbg commented Jun 3, 2016 •

edited

Loading

lwansbrough commented Mar 30, 2017 •

edited

Loading

sergionalin commented Jun 29, 2017 •

edited

Loading

xarg commented Aug 16, 2017 •

edited

Loading

lwansbrough commented Aug 16, 2017 •

edited

Loading