Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support writing the tests in two separate input and output files #1536

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

aminya
Copy link
Contributor

@aminya aminya commented Dec 15, 2021

This adds support for writing the tests in separate neighbour files
The input file name should match *.tsin.*.
The output file name should match *.tsout.scm

For example, test1.tsin.js and test1.tsout.scm

The output file is automatically updated if the --update flag is passed.

@maxbrunsfeld
Copy link
Contributor

I don't think I want to support a separate alternative way of writing tests; I'd rather just have one way of doing things. Is there a problem with the current test file format that you're hitting?

This reminds me that we never documented this feature, which allows you to write tests for source code containing --- or ===.

@aminya
Copy link
Contributor Author

aminya commented Dec 23, 2021

I don't think I want to support a separate alternative way of writing tests; I'd rather just have one way of doing things. Is there a problem with the current test file format that you're hitting?

Yes. The main reason is that testing existing code is hard with the current format. Tree sitter invents its own format, which is different than the actual usage of the language. This means that manual work is involved in writing such tests.
The format I am proposing also allows things like snapshot tests. Tree sitter can automatically generate the output if requested, and then use that as the expected output.

@maxbrunsfeld
Copy link
Contributor

Tree sitter can automatically generate the output if requested, and then use that as the expected output.

That's already supported in the current system using the tree-sitter test --update command.

@aminya
Copy link
Contributor Author

aminya commented Dec 26, 2021

Tree sitter can automatically generate the output if requested, and then use that as the expected output.

That's already supported in the current system using the tree-sitter test --update command.

Yeah, but it is hard for the files of a language in some external project. People should do this by hand.

This adds support for writing the tests in separate neighbor files
The input file name should match `*.tstest.*`
The output file name should match `*.tstest.scm`

For example, `test1.tstest.js` and `test1.tstest.scm`
@sogaiu
Copy link

sogaiu commented Feb 23, 2023

I think one of the original points of the PR was that other projects already might have tests in some existing format and that it would be nice to be able to work with an existing setup without having to change too much.

Apart from that though a few things I've noticed while using the built-in corpus tests include:

  • Upon test failure, I don't see information about start and end positions of nodes (or field names) displayed (the green and red text is pretty though)
  • To get that kind of information I typically go searching for the file the test that failed was in, look through the file to find the relevant input, copy that input to another file, and then run the parse subcommand (may be I'm unaware of a better method)

I'm trying out an alternative arrangement where I have one file for each input with a descriptive name and a corresponding file that contains output from the parse subcommand (typically the s-expression output with field and position info).

Now when there is a failure I am presented with a file path (no searching necessary) but also the expected and actual parse information. Here's a bit of a sample (some parts are elided as ... for brevity):

1..44
ok 1
ok 2
ok 3
...
ok 19
not ok 20 - test/input/sym_lit-unihan.janet
  ---
  found:
    (source [0, 0] - [1, 0]
      (sym_lit [0, 0] - [0, 6]))
  wanted:
    (source [0, 0] - [1, 0]
      (sym_lit [0, 0] - [0, 5]))
  ...
ok 21
...

I chose TAP (well, something close enough) so I can feed the output to a TAP consumer (of which there are a number to choose from) and see a concise summary.

Below is some sample output assuming the whole output from the example above is fed to it:

...................F........................
======================================================================
FAIL: <file=stream>
- test/input/sym_lit-unihan.janet
----------------------------------------------------------------------

----------------------------------------------------------------------
Ran 44 tests in 0.000s

FAILED (failures=1)

As the path is presented in such a way that no extra info needs to be added to it (e.g. like corpus or test/corpus), it's straight-forward to view files that lead to test failures as well as use parse on them.

It seems that this type of approach could be applied to larger ("real-world") files relatively easily as well. Something I presume many folks are already doing as hand-crafted small examples, though useful in more than one way, don't seem sufficient for testing purposes.

@aminya
Copy link
Contributor Author

aminya commented Oct 25, 2023

@sogaiu Yes, the approach I added in this merge request is very crucial for me as I don't need to do any preprocessing to make the tests ready, and any file with the correct extension can be a test input. I treat the tests as snapshot tests, and the generated grammar structure is like the snapshot. This has facilitated testing significantly, and it is a great addition to the tree-sitter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants