-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge memory usage when used through python bindings #199
Comments
I attached a debugger to Emacs to see what it was doing, and it looks like we end up stuck in this loop: while (ts_stack_is_active(self->stack, version)) {
LOG(
"process version:%d, version_count:%u, state:%d, row:%u, col:%u",
version,
ts_stack_version_count(self->stack),
ts_stack_state(self->stack, version),
ts_stack_position(self->stack, version).extent.row,
ts_stack_position(self->stack, version).extent.column
);
if (!ts_parser__advance(self, version, allow_node_reuse)) return NULL;
LOG_STACK();
position = ts_stack_position(self->stack, version).bytes;
if (position > last_position || (version > 0 && position == last_position)) {
last_position = position;
break;
}
} It also looks like an empty file can trigger the issue. Unsure what makes the Emacs and Python bindings to the ts-cli test runner, as all of the tests are able to run just fine. |
It's possible it's a bug w/ the python bindings, fuzzing and a corpus input of 6000+ files has the cli puttering along just fine |
If you can (and I'd really appreciate it), try bisecting the repo between 0.19.0 and when you first detect this bug. Note that you might have to regenerate the parser between some commits. |
This may be related tree-sitter/tree-sitter#2517, details are in edit notes there. My respect for Gentoo users to be so detailed in reports! |
Thank you for the prompt reply, I'll try bisecting and regenning. |
OK, so now I'm very confused. I wanted to start the bisecting, but of course before that I needed to prepare the verification steps, which I was planning it to be:
And to my surprise, the current master code works? Going back to the tagged versions of 0.20.* also works? After rechecking, I think I found something weird. The When I'm on commit a7be575, I get those 2 files (the diff between my files and the once in repo is huge, not something cosmetic like timestamps, unless I'm missing something): Clearly something is very different in the generator used? The versions on my system are:
Could we maybe investigate what happens here? While, yes, we could take the easy route in Gentoo: run |
I generate on ts master w/ the Rust CLI crate, and I think ahlinc does too. Can you bisect |
Mind linking me the relevant git repos, ideally on exactly which commit you are on master? Just so I can reproduce as well as possible :) |
https://github.com/tree-sitter/tree-sitter/ I constantly change commits ( generating on master right ;) ), anything within the last day or two should work right now - tree-sitter 0.20.8 (6d41d99990cfc8527f424603cdfce1c9af2d78ef) Build it by running cargo install --path cli (you can add --debug for faster build times, but slower generate times...) |
@amaanq Yes, you are very right, and indeed with master tree-sitter-cli I get nearly same file as yours (in the parser.c it moved one enum declaration to just another place, but this doesn't affect anything at all). I guess that the new master CLI uses a little bit different functions/codebase, which affects how bindings access the library (I mean the the master CLI generates code for the master tree-sitter lib), so it doesn't work well for anyone using not-master tree-sitter-lib. When I installed live libtree-sitter on my system, the new tree-sitter-bash works. Now I want to ask you both what is the best way forward? |
Well that's even more interesting, a mismatch of versions between the installed tree-sitter library and generated parsers caused this? I think it's because of this: tree-sitter/tree-sitter#2316 If you feel like going above and beyond, bisecting an installed version before and after that PR w/ my generated-from-master parser.c file would be awesome. (The results of this might help us make a decision more easily) I think we need to bump a new ver... |
Well, why not? Did a full Meaning starting from tree-sitter/tree-sitter@4a00725 the memory leak stops and the libtree-sitter matches the current generated files. I've also tried the other way - find the first commit generating "leaking" files with released libtree-sitter-0.20.8, and also got exactly the same commit tree-sitter/tree-sitter@4a00725 - meaning this is indeed the changing commit, starting from the generated files are incompatible with release libtree-sitter. Why did this commit break? Cause the generator stops adding I guess this commit can be reverted for now to not break backward compatibility - if you want. |
Nice catch, I think we can revert that commit for now and postpone it to couple with some other more important changes to be worth to bump a language version.
|
@arthurzam could you explain this sentence a bit more? Where is it installed and on what moments is it used? |
libtree-sitter is installed on main system, so the header file is also installed globally. For example on gentoo:
So when the tree-sitter-bash's C files are compiled on the system, it |
Thank you for the explanation, so for now this issue with the bash grammar happened because I habitually used a CLI built from the Tree-sitter master to regenerate parser files in this bash repo. |
I assume that Neovim nightly wasn't affected by this issue because they don't trust to generated files and do parsers regeneration during the build process. Or they just didn't do an update. |
@arthurzam a couple of additional questions:
|
This is installed by the makefile of tree-sitter: link
Since the include directive is Thinking more on it, if indeed it had |
Oookay 😄, that will be fixed for sure.
That's an another issue in Tree-sitter, will be fixed too. |
Unless I'm misunderstanding here, in this instance, we (in Gentoo) weren't regenerating the parser anyway, so the header on our system in particular doesn't negate the need for the PR you filed (even if it's fair to ask why it's there). i.e. the issue is the tree-sitter-bash release had one built in a bad env, not that we just happen to have the header around, even if that's an independent problem. EDIT: sorry, I follow now, we still have to build the library each time, duh |
You shouldn't have a system wide |
@thesamesam Could you please patch Gentoo's emerge file for that until the next Tree-sitter release would happen? |
There shouldn't be a system-wide copy of parser.h and upstream plan on dropping this from dev-libs/tree-sitter as it can cause issues if there's a mismatch between the version used for the bundled generated parser vs the header used to build the library. Force use of the correct one in the dist tarball at src/tree_sitter/parser.h instead. Drop the dependency on dev-libs/tree-sitter given we were only depending on it for this header. Bug: tree-sitter/tree-sitter-bash#199 Bug: https://bugs.gentoo.org/912716 Signed-off-by: Sam James <sam@gentoo.org>
There shouldn't be a system-wide copy of parser.h and upstream plan on dropping this from dev-libs/tree-sitter as it can cause issues if there's a mismatch between the version used for the bundled generated parser vs the header used to build the library. Backport that change to avoid mismatches. Bug: tree-sitter/tree-sitter-bash#199 Bug: https://bugs.gentoo.org/912716 Signed-off-by: Sam James <sam@gentoo.org>
There shouldn't be a system-wide copy of parser.h and upstream plan on dropping this from dev-libs/tree-sitter as it can cause issues if there's a mismatch between the version used for the bundled generated parser vs the header used to build the library. Force use of the correct one in the dist tarball at src/tree_sitter/parser.h instead. Drop the dependency on dev-libs/tree-sitter given we were only depending on it for this header. Bug: tree-sitter/tree-sitter-bash#199 Bug: https://bugs.gentoo.org/912716 Signed-off-by: Sam James <sam@gentoo.org>
There shouldn't be a system-wide copy of parser.h and upstream plan on dropping this from dev-libs/tree-sitter as it can cause issues if there's a mismatch between the version used for the bundled generated parser vs the header used to build the library. Backport that change to avoid mismatches. Bug: tree-sitter/tree-sitter-bash#199 Bug: https://bugs.gentoo.org/912716 Signed-off-by: Sam James <sam@gentoo.org>
There shouldn't be a system-wide copy of parser.h and upstream plan on dropping this from dev-libs/tree-sitter as it can cause issues if there's a mismatch between the version used for the bundled generated parser vs the header used to build the library. Force use of the correct one in the dist tarball at src/tree_sitter/parser.h instead. Drop the dependency on dev-libs/tree-sitter given we were only depending on it for this header. Bug: tree-sitter/tree-sitter-bash#199 Bug: https://bugs.gentoo.org/912716 Signed-off-by: Sam James <sam@gentoo.org>
There shouldn't be a system-wide copy of parser.h and upstream plan on dropping this from dev-libs/tree-sitter as it can cause issues if there's a mismatch between the version used for the bundled generated parser vs the header used to build the library. Backport that change to avoid mismatches. Bug: tree-sitter/tree-sitter-bash#199 Bug: https://bugs.gentoo.org/912716 Signed-off-by: Sam James <sam@gentoo.org>
@ahlinc @amaanq Thank you for helping us debug it, and for fixing all problems we found. I believe we can continue from here on Gentoo side on packaging tree-sitter-bash. Ideally you tag a new version with updated includes (
Work on it has been done, we are now at the formalities part until we merge it to downstream. Now that I can check for regressions in parsing, and we have a huge repository of bash code https://github.com/gentoo/gentoo, I'm starting to find regressions in the parsing, so if you aren't against I'll file bugs for them. I believe we can close this bug. Thank you for working with us on this bug, and for working on tree-sitter-bash. |
There shouldn't be a system-wide copy of parser.h and upstream plan on dropping this from dev-libs/tree-sitter as it can cause issues if there's a mismatch between the version used for the bundled generated parser vs the header used to build the library. Force use of the correct one in the dist tarball at src/tree_sitter/parser.h instead. Drop the dependency on dev-libs/tree-sitter given we were only depending on it for this header. Bug: tree-sitter/tree-sitter-bash#199 Bug: https://bugs.gentoo.org/912716 Signed-off-by: Sam James <sam@gentoo.org>
There shouldn't be a system-wide copy of parser.h and upstream plan on dropping this from dev-libs/tree-sitter as it can cause issues if there's a mismatch between the version used for the bundled generated parser vs the header used to build the library. Backport that change to avoid mismatches. Bug: tree-sitter/tree-sitter-bash#199 Bug: https://bugs.gentoo.org/912716 Signed-off-by: Sam James <sam@gentoo.org>
@thesamesam @arthurzam @MatthewGentoo I also want to say thank you for Gentoo team for passion and for many details in replies that helped to find real issues and fix them for both projects! 🎉 P.S: We in Tree-sitter team would be happy to know any details about use cases of Tree-sitter in Gentoo project. |
In Gentoo, we define the packages building instructions as The tool has 2 types of validation strategy: sourcing the files (many wrappers around just calling it through |
Thank you all for all the fixes and improvements. With the release of 0.20.4, I think this bug is closed. Many thanks for the great maintainers! |
When trying to use new tree-sitter-bash (0.20.0, 0.20.1, 0.20.2) through the Python bindings, a huge memory usage is detected. Please note that by huge memory usage I mean really huge and really fast, in 6-8 seconds gets to 16GB and then I need to kill it before filling my RAM on PC. Tested on extra big system and also got to 490 GB RAM before I killed it. I think this is some memory leak or allocation loop, but can't be sure, so I'm afraid to call it as such.
Also, 0.19.0 was working ok. But I'm really thankful for the work the new maintainer has done since. We at Gentoo really want to upgrade so we can use the new improved grammar 😄
Reproducer
Caution: I recommend checking you have ready infra for killing the python process running for that, or a good OOM killer configuration. This really grows fast and can freeze your system. For me I'm testing inside a chroot container, so I have a terminal ready to kill the whole chroot before my system freezes.
At first we were thinking we were passing a very complicated bash scripts (we do have such), but the above example is literally just a simple echo.
Another maintainer at Gentoo noticed it also happens with emacs bindings, so I don't think this is only python bindings issue, maybe the general bindings interface? This is also reproduced across multiple Gentoo systems, even different arches (amd64, arm64, sparc, ppc64, ppc64le).
System configuration
This is done on a Gentoo system. Here are the commands to compile the resulting so file (maybe we are compiling incorrectly?):
Here is the link tree for the so:
The system has:
Any more info, testing, or anything you want I'll be happy to provide.
The text was updated successfully, but these errors were encountered: