Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tree sitter chunking #7417

Merged
merged 3 commits into from
Jun 21, 2023
Merged

Conversation

A-Walrus
Copy link
Contributor

@A-Walrus A-Walrus commented Jun 21, 2023

No description provided.

Call as bytes before slicing, that way you can take bytes that aren't
aligned to chars. Should technically also be slightly faster since you
don't have to check alignment...
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix for TSchunk access is correct and needed 👍 I am not sure sure about the byte alignment in the grapheme functions. Where would that be needed? It's pretty slow and I can't think of a scenario where we would want to align invalid bytes?

@A-Walrus
Copy link
Contributor Author

Where would that be needed? It's pretty slow and I can't think of a scenario where we would want to align invalid bytes?

Currently they're necessary to stop helix from crashing in certain situations, but I think they might just be a bandaid solution and not a fix for the actual bug...
Marking as draft while I try to dig a bit deeper.

@A-Walrus A-Walrus marked this pull request as draft June 21, 2023 13:49
@pascalkuthe
Copy link
Member

pascalkuthe commented Jun 21, 2023

If you have a reproduction case with a tracebacj that would be great. I can tell a lot from that. Maybe a buggy TS grammar could cause this. I actually want to remove the manual grapheme alignment we currently do for TS highlights anyway. The doc formatter can do the byte/char/grapheme alignment on the fly which is a nice perf bump for long lines.

Apart from that I think all other uses with an invalid byte pis are a bug and we would just be fixing the symptom

@A-Walrus
Copy link
Contributor Author

Reproduction case, c/cpp syntax

"😀"

With the emojy copy pasted a bunch

@pascalkuthe
Copy link
Member

A thanks yeah I saw the issue now. Let ne see if I can comeuo with a quick fix.

@pascalkuthe
Copy link
Member

Ah yeah it's related to what I meant. It shouldn't be too hard to track the byte pos while rendering too and simply use byte indecies in the entire syntax highlighting infrastructure. That would fix some inefficencies jn the rendering. I have been plannjng that for a while but didnt get around to it yet. If you are interested you can take a crack at this but it would be a bit more invokved.

IMO it's still a bug in the C TS grammar the the byte index is not byte aligned tough 🤔 Unaligned bytes make sense as an input (since TS doesn't know ahead if time where char boundaries are) but not as an output (a highlighting starting in the middle of a Unicode codepoint makes no sense)

@A-Walrus A-Walrus marked this pull request as ready for review June 21, 2023 14:34
@A-Walrus A-Walrus changed the title Fix grapheme crash Fix tree sitter chunking Jun 21, 2023
@A-Walrus
Copy link
Contributor Author

I got rid of the bandaid and left only the relevant fix. Will file a bug report with the language servers, and see if I can rework syntax highlighting to use bytes everywhere

@the-mikedavis the-mikedavis merged commit eb81cf3 into helix-editor:master Jun 21, 2023
wes-adams pushed a commit to wes-adams/helix that referenced this pull request Jul 4, 2023
Call as bytes before slicing, that way you can take bytes that aren't
aligned to chars. Should technically also be slightly faster since you
don't have to check alignment...
mtoohey31 pushed a commit to mtoohey31/helix that referenced this pull request Jun 2, 2024
Call as bytes before slicing, that way you can take bytes that aren't
aligned to chars. Should technically also be slightly faster since you
don't have to check alignment...
smortime pushed a commit to smortime/helix that referenced this pull request Jul 10, 2024
Call as bytes before slicing, that way you can take bytes that aren't
aligned to chars. Should technically also be slightly faster since you
don't have to check alignment...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants