Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in isColumnSeparator() parsing logic #61

Merged
merged 3 commits into from
Mar 19, 2022
Merged

Conversation

nekno
Copy link
Contributor

@nekno nekno commented Mar 13, 2022

The isColumnSeparator() logic is intended to check that each row contains the same (non-space) character in the same position.

Instead of checking the rows all the way down, however, it was only checking the first and second rows with a call to slice(0,1).

Additionally, it was checking every column, when columns should each be padded by a space, making the minimum distance between columns two spaces.

Take the following table as an example:

AAA	BBB
BAB	ABA
ABA	BAB

If you enter that table as input, and then click the Parse button to parse the output, it considers any position where the characters in the same column position match across two rows to be a column separator.

So it interprets the middle column of each group as a separator, producing the following:

A	A	B	B
B	B	A	A
A	A	B	B

That stripped out the columns of:

A
A
B

and

B
B
A

because they have matching characters in the same column position in the first two rows.

Changes

  • Checks each row for a matching character in the same column position all the way to the last row (not just the first 2 rows), by recursively evaluating all remaining rows with slice(1) on each iteration.
  • Checks that each column separator (aside from the first column at index 0) was preceded by a space.
  • Because each column separator is padded by a space, two column separators should be a minimum of two columns apart. Adding this criterion ensures that empty columns with no value and columns containing a single character can be parsed correctly.

Tests

  1. Single value columns
    A	B
    A	B
    
  2. Empty columns
    A	B	C
    A		C
    
    A		C
    A		C
    
  3. Columns with repeating values
    AAA	BBB
    BAB	ABA
    ABA	BAB
    

@ozh
Copy link
Owner

ozh commented Mar 14, 2022

Hi, thank for this PR. Could you provide an example of a bug that would be fixed with this code? I'm not sure I'm getting it :)

@nekno
Copy link
Contributor Author

nekno commented Mar 18, 2022

Hi @ozh — Your request to produce a test that demonstrated the fix challenged me to come up with a good, representative test, and I found that my code changes didn't address a lot of similar scenarios.

So I added some additional changes to more thoroughly evaluate the conditions for column separators, in order to handle the test cases I added in the updated PR description.

I'm hoping this is good-to-go now. 🤞

@ozh
Copy link
Owner

ozh commented Mar 19, 2022

Super clear explanation. Thanks ! :)

@ozh ozh merged commit d92c1f0 into ozh:gh-pages Mar 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants