-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Key-based item uniqueness #22
Comments
This has some of the same difficulties as #21 I'm hoping as we improve our notion of what a vocabulary is and how to organize them, it will become more clear where this and |
Perhaps a way to do this would be to add an objectArray type? For example converting the following:
To something along the lines of the the following:
A keyword such as uniqueKeys (or uniqueKey with just one value, if that makes more sense) could be used with this type to achieve the desired effect. |
@TakingItCasual I'm not sure how your suggestion solves the problem of key-based uniqueness. You seem to be merely adding a new type keyword as a shorthand for functionality that already exists. |
We are right now needing almost exactly what @gregsdennis proposed except that we must have composed keys so the "uniquenessKey" should be an array. Having said that, I find very odd that the pointer points "#" to an element in the array somwhere in the instance... which would be very confusing to users, having all other pointers refering to the root of the schema as "#", I believe. Maybe there's no other way, though. |
If we do that, that should be done with a Relative JSON Pointer, not changing the meaning of |
We could do something like what JSON path does for array predicates: use |
@gregsdennis Or we could just use Relative JSON Pointer which is designed specifically to solve this problem, is one of the specs we publish, and is used throughout JSON Hyper-Schema already. |
This would be very useful. |
This would be very useful, and |
Totally agree. Would be really useful |
I'm skeptical about the effectiveness of things like this for a few reasons:
|
@awwright if we were adding |
While To your second point, I'm not proposing that this has any relation to data in schemas. I think that item uniqueness holds a valid place within a schema, and identifying that uniqueness based on a property within the item (for example an ID) is a worthy addition. |
I found this issue after asking a question on SO. The view count (currently 21 views after 7 days) may provide insight into prioritizing this issue. |
@handrews @awwright In terms of
This is often needed, and often
(Assuming you do not mean I personally don't feel this is unreasonable, however I don't think the number of people who've requested it is high enough to divert focus on remaining tasks for draft-8. (This issue is currently not in the draft-8 milestone. Let's also discuss if we should move it there or not.) If the value was simply a string, which represented the value in the object to be checked, it sounds fairly simple to me. I don't think we need to allow for pointers. Values which are ids for objects should be at the top most layer, and doing so reduces the complexity of the task here. |
@speedplane Thanks for reaching out. Feel free to join our slack server for further discussion. Priorities are hard, especially when there are existing comittments and limited time. If you want to help us "move faster", keep an eye out for our soon to be announced Open Collective! |
Another SO question that involves this.
|
+1 to the Consider the following example:
The combination of key1 and key2 are what make each instance in the array unique. |
@chapmanjw, I mentioned the use of multiple keys in the original post, but thanks for adding in an example! |
Moving this to the extension vocabularies repo. |
Hi all, I'd just like to mention that I'd be interested in this feature too! Not a difficult problem to solve for me with separate code, but nonetheless, if it was eventually included that would be great. Cheers, |
|
@karenetheridge I prefer having a pointer to a property rather than just a property name. This allows the key to be nested in the object somewhere. |
@gregsdennis Interesting. So in the basic case (an array of objects of strings),
..or is that too complicated? (edit: I'm having difficulty thinking of when a nested array might be useful though. I think you had the better idea.) :) |
That's what I'm thinking. I'm not sure if it will be useful, either, but my experience is that someone will want it. |
If it doesn't exist, no one can have it. Adoption is purely based on existing functionality. If its not available, then devs will implement it at application layer, which... helps no one. You can apply the same thinking recursively all the way up and have JsonSchemas support only string types with zero extra validations... people that need those will implement at application level. This is not some obscure functionality either, uniqueness of things is a core concept of information passing objects. If JsonSchema is expressive enough to provide those and its support is widely adopted, then you don't need extra layers to provide that functionality anymore, which simplifies systems and improves robustness :) Its more of a "want" than an "if": it is needed, just depends if anyone wants to implement it. |
Stumbled here because I want to specify unique keys. Totally concur with @joaomcarlos We've had to implement custom code validators for simple things like comparing two dates (even when they are expressed as an integer). It can't be done "native". Now this is not a complete validation model that I can send to customers, like I could an xsd... Now I'm really stuck between the devil and the deep blue sea. |
I can think of two scenarios when it comes to uniqueness of multiple keys - Scenario 1 - similar to @chapmanjw (Dec 9, 2019), the composite combination of key1 and key2 must be unique. As in, two or more objects in the array can have the same key1 or same key2, but no two objects can have the same key1 + key2 combination.
Example pass case :- Multiple objects have the same key1/key2 but no two objects have the same key1 + key2 combination
Example fail case :- First two objects have same key1 + key2 combination
Scenario 2 - Objects within the array should not have the same value for key 1 or key 2 (can be extended for as many keys as possible)
Example pass case :- Both key1 and key2 are unique across all three objects
Example fail case :- Although key1 is unique across all three objects, key2 is not. We don't care about the key1 + key2 combination here
These examples can be extended to have more than two keys. I would like to have both these scenarios included as features. |
I was about to make a comment similar to @pmsreenivas. Multiple keys would be quite useful - composite or not. My example involves authoring multiple-choice questions. Authoring test questions is pretty common in the education industry. Once such a question is authored, it is posted to the server. The route on the server validates the posted JSON and it would be nice if I can declare and validate the unique keys. Here's a simplified version of my schema and with a suggestion on how to do so.
Rationale
|
I like the array form for multiple keys, but I'm not sure about the objects with a single Pointers don't add to complexity because and validator is already going to know how to resolve them from having to support It's easier to validate at the meta-schema level, too. As you mentioned, performance isn't a concern. Rather it should be improved since the validator is only required to compare a small number of values rather than an entire tree. |
Understanding the composite example is essential to understanding my suggestion. By "composite", I am saying a key composed of multiple parts. Using my example, this simple line of code should explain it. So, if you wanted to avoid saying the word "key", we would have to have nested arrays, which is not as easy-to-read.
An array of objects (or a single object) is really not that bad. Seeing the word "key" helps the reader understand what they are looking at. Also, we may consider adding other parameters to tune object comparisons such as "deep" or "exclude" (these are just examples, not an advocation of adding such parameters).
If there is no roadblock to using pointers, I'm just as happy to use pointers. But I wouldn't want pointers to hold up the feature in general. So I'm pointing out that we could go either way and pivot later.
Exactly right. Specifying unique keys does not reduce complexity, but does improve performance. |
Oh, you mean that the array items are OR'd together. I'm not sure I agree with that. I'm thinking that each item in the array adds to the composite key (they're AND'd). I'm not certain there's much of a use case for complex boolean login on deciding a uniqueness key. But to say that |
@gregsdennis I provided a use case where I want multiple fields to be unique - the id as well as the text of a choice in a multiple choice question. And, when working with databases, I have seen many tables that would have both a unique identifier and some textual representation of the item (e.g. title or name) that should be unique as well - otherwise it would cause a user to be confused seeing an apparent duplicate in a list. There are exceptions of course for more complex entities where the display text does not need to be unique when other metadata can appear along-side a title or name to distinguish it. There are other people in this thread that has asked for the same, so I don't think it is that obscure of a need. But I am glad my code example cleared up my intent. |
Based on my comment I posted on Feb 1, 2021 - The Array items must be OR'd if the keys are not composite and independently unique (scenario 2 in my earlier comment) while they must be AND'd if the keys are compositely unique (scenario 1 in my earlier comment) |
I just discovered that multiple unique keys (OR'd) are supported by ajv. That basically provides exactly what I need. Unfortunately, it doesn't support composite keys (AND'd). But, if your json-schema validator supports custom keywords (like ajv), then here's some code I've been playing with that is a proof-of-concept. Feel free to use / port it to implement your own uniqueItems keyword. |
I've created a draft vocab for this. It can be reviewed here. This vocab adds the The keyword passes if all of the resulting value sets are unique. |
I've released the vocabulary and an implementation for my validator JsonSchema.Net. |
@gregsdennis I can't get I'm not sure if i'm missing something or if the issue is with the validator which i'm using: https://github.com/sirosen/check-jsonschema |
@dudicoco because this is an externally defined vocabulary, the implementation you're using needs to support it. If it supports custom keywords, you may be able to add it yourself. As of right now, my validator is the only one I know of that supports this vocabulary. There's a lot of discussion in various issues around the difference between vocabularies and their meta-schemas (if they have one). The meta-schema can provide syntax checking (is the keyword's usage syntactically correct?), but the meta-schema can't define the logic behind the keyword. I also cover this in a bit more detail in my docs. What you'll need to do is check to see if the validator you're using supports custom keywords and possibly implement it yourself. If you want to play with it a bit to see if your schema's doing what you expect, you can use https://json-everything.net/json-schema, which is powered by my library and supports my custom vocabularies. |
Thanks @gregsdennis! Is it possible to define all keys as unique by default? See the following schema snippet for example: "environment_variables": {
"type": "object",
"required": [],
"patternProperties": {
"(^([a-z]+[.])+[a-z_]+)$|(^([A-Z0-9_])+([A-Z0-9_]?)+([A-Z0-9])+)$": {
"type": [
"string"
]
}
},
"additionalProperties": false
} In this case we can't specify each key in |
Hey @dudicoco. Please raise the issue over in my repo https://github.com/gregsdennis/json-everything. Happy to discuss your needs there. |
Reading all this just reinforces for me that anyone doing document databases should have it impressed upon them that only very simple, fundamental objects should be nested. If you have a document containing an array of unique objects, break out a collection and store that as an array of ids to the new collection objects instead. The tools will allow you to make nested objects that look like they should work, and even do some pretty advanced operations on them, but when you get to adding indexes and constraints or doing reporting, it all falls apart. |
@mprevdelta can you share an example? |
This feature is similar to |
Defining an array of keys to automatically describe a composite key is not useful as it prevents defining multiple unique keys.
Could either be defined as either:
If you define it as option 1 then the vocabulary can be extended such that
Such that There may be a case to allow a simplified version of the syntax such that if a nested array contains only a single item then it could be rewritten without the array wrapper so that the previous example would be functionally identical to:
This would be particularly useful for JSON:API resources where you would quite often want something like, for example, for an employee where they have a unique id, a unique username and a unique combination of several other fields:
|
@MT-0 what you're describing can be done with my proposal (and what's currently described in the vocab I wrote). You just need an {
// ...
"allOf": [
{ "uniqueKeys": [ "/key1" ] },
{ "uniqueKeys": [ "/key2" ] },
{ "uniqueKeys": [ "/key3", "/key4" ] }
]
} It's more verbose, but it's also more explicit and easier to read. |
It may be handy to be able to specify key-based uniqueness within an array of items.
Given the schema
the instance
would pass. However, the user may want this to fail as the value of the
key
property is repeated.I propose a
uniquenessKey
(or similar) keyword that would allow the author to specify a pointer to an object property that should be unique among all items within the array. This would update the above schema to(The pointer would be resolved using the item in the array as the root.)
The text was updated successfully, but these errors were encountered: