Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialization support for queries, highlight configs #2594

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

SergioBenitez
Copy link

@SergioBenitez SergioBenitez commented Aug 31, 2023

This is a continuation of #2374 which fully resolves #1942.

The approach here is to expose serializable versions of queries and highlight configurations instead of attempting to provide serialized versions directly. This enables the consumer to serialize the structures as they wish, and importantly, to do so in a way that respects things like byte order properly.

The above is exposed via new serializable() and deserialize() methods on Query and HighlightConfiguration and Serializable{Query,HighlightConfiguration} structures which implement serde::{Serialize, Deserialize} when the newly supported serde feature is enabled. I have successfully used this API to deserialize over 100 of highlight configurations in ~3 ms. This compares to ~3000ms to generate the configurations directly, or a 1000x improvement. The resulting structures can be used immediately without any additional preprocessing.

I should mention that there are a few concerns with the exact approach from #2374, which is replicated here:

  • There is no checking that the const char *src array's size is sufficient. Previously, a size argument was passed in but was unused. We should use the argument to check against the expected size.
  • Some sort of versioning needs to be added to (de)serialization. Otherwise, serialized bytes from a previous version can be deserialized by a new, incompatible version. While the bytes may physically fit, they may not semantically fit.

@SergioBenitez
Copy link
Author

SergioBenitez commented Sep 12, 2023

Checking in on this. Is there a chance this will be merged soon? I'd like to release a library making use of this feature, so I'll need to publish a fork on crates.io to do so if this won't be merged any time soon, which I'd like to avoid.

@ahlinc @amaanq

@ahlinc
Copy link
Contributor

ahlinc commented Sep 12, 2023

There is a lot of interest in queries (de)serialization and a more robust solution will be implemented that covers more use cases with better deserialization performance.

@SergioBenitez
Copy link
Author

There is a lot of interest in queries (de)serialization and a more robust solution will be implemented that covers more use cases with better deserialization performance.

That sounds great.

I wonder if it would make sense to merge something like this PR in the interim, seeing as how my original issue is nearly a year old now. Even if it isn't particularly robust, doesn't cover every use-case (which are missing?), and could be improved performance-wise, it still provides functionality that dramatically changes the utility of tree-sitter by immediately making it usable in cases where it was previously prohibitive due to performance impacts.

Nevertheless, is there anywhere I can follow those efforts and potentially contribute? I'd love to avoid duplicating effort, and of course, maintaining a fork just to have something on crates.io.

ErinvanderVeen and others added 2 commits September 13, 2023 13:27
This commit adds two new functions to query.c
useful for (de)serialization. In an
effort to address issue tree-sitter#1942
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caching or Serializing a TSQuery
3 participants