Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(csv): more examples for parse and CsvParseStream #5605

Merged
merged 16 commits into from
Aug 2, 2024

Conversation

magurotuna
Copy link
Member

@magurotuna magurotuna commented Aug 1, 2024

This commit adds more examples to parse function and CsvParseStream class to cover all the provided options.

Also fixes a few other things:

  • Replace stale description ParseError with the correct SyntaxError.
  • Fix the default value of comment property. The old comment says the default value is #, but this was wrong.
  • Get negative value in fieldsPerRecord option working in parse as documented (closes csv: fieldsPerRecord: -1 doesn't seem working #5616)

@magurotuna magurotuna requested a review from kt3k as a code owner August 1, 2024 15:30
@github-actions github-actions bot added the csv label Aug 1, 2024
Copy link

codecov bot commented Aug 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.37%. Comparing base (cd0bc9f) to head (01e3d68).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5605   +/-   ##
=======================================
  Coverage   96.37%   96.37%           
=======================================
  Files         466      466           
  Lines       37572    37574    +2     
  Branches     5539     5539           
=======================================
+ Hits        36211    36213    +2     
  Misses       1319     1319           
  Partials       42       42           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* const result = parse(string, { skipFirstRow: true });
*
* assertEquals(result, [{ a: "d", b: "e", c: "f" }]);
* assertType<IsExact<typeof result, Record<string, string | undefined>[]>>(true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added assertType here because I think this is a very good document for users given that the return type varies depending on the option we pass in.

While I was writing these assertions, though, I was unsure in what cases the value in the return value becomes undefined. Does anyone know why we have Record<string, string | undefined> instead of Record<string, string> here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get an undefined value if you had the following input a,b,c\nd,,f. Perhaps, we should add another test for that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the following three test cases and these pass. I couldn't find a case where the parsed value becomes undefined

    await t.step({
      name: "BlankField2",
      fn() {
        const input = "a,b,c\nd,,f";
        assertEquals(
          parse(input, { skipFirstRow: true }),
          [{ a: "d", b: "", c: "f" }],
        );
      },
    });

    await t.step({
      name: "Exessive fields with skipFirstRow: true",
      fn() {
        const input = "a,b,c\nd,,f,g";
        assertThrows(
          () => parse(input, { skipFirstRow: true }),
          Error,
          "Error number of fields line: 1\nNumber of fields found: 3\nExpected number of fields: 4",
        );
      },
    });

    await t.step({
      name: "Insufficient fields with skipFirstRow: true",
      fn() {
        const input = "a,b,c\nd,e";
        assertThrows(
          () => parse(input, { skipFirstRow: true }),
          Error,
          "Error number of fields line: 1\nNumber of fields found: 3\nExpected number of fields: 2",
        );
      },
    });

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added (or fixed the existing) tests on what happens if the number of fields in records doesn't match that of the header. This fix contains the change in how we display the line number is error messages (which was 0-based but now changed to 1-based).

a91806b

Copy link
Collaborator

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work

* const result = parse(string, { skipFirstRow: true });
*
* assertEquals(result, [{ a: "d", b: "e", c: "f" }]);
* assertType<IsExact<typeof result, Record<string, string | undefined>[]>>(true);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get an undefined value if you had the following input a,b,c\nd,,f. Perhaps, we should add another test for that case.

* assertEquals(result, [["a", "b", "c"], ["d", "e", "f"]]);
* ```
*
* @example Trim leading space with `trimLeadingSpace: true`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason there's an option to trim only leading spaces? Why not just trim the end too using String.prototype.trim()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either, but according to the head comment of parse.ts, many parts of the implementation was ported from Go which has the same option.
https://github.com/golang/go/blob/go1.12.5/src/encoding/csv/reader.go#L136

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a reason to only trim the start... I suggest changing to trim: boolean. Not a strong opinion. What do we think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked ChatGPT, it answered some application may add leading whitespaces when exporting to CSV. I don't know if this is true at all 😄 Anyway trim: boolean should be useful in some scenarios and I think we could add that after 1.0 without breaking things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trailing spaces are usually considered as a part of value of the last column (For example, excel handles it in that way)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find the case when this returns Record<string, string | undefined>, but I also noticed that fieldsPerRecord option doesn't seem working as described.

It says:

If negative, no check is made and records may have a variable number of fields.

But parse doesn't seem allowing wrong number of fields even when I passed negative number to this option. Maybe this current state is related to it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But parse doesn't seem allowing wrong number of fields even when I passed negative number to this option. Maybe this current state is related to it?

Oh that's a good find. I'll add a test case and fix it

Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@magurotuna magurotuna merged commit b0f2088 into denoland:main Aug 2, 2024
13 checks passed
@magurotuna magurotuna deleted the magurotuna/csv branch August 2, 2024 04:24
magurotuna added a commit that referenced this pull request Aug 2, 2024
…#5617)

As we discussed in
#5605 (comment), it seems like
we never get `undefined` as a parse result of fields. If there is a mismatch in
the number of fields across rows, the parse just throws an error. To better
reflect this in typing, this commit removes `undefined` from the record value
type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

csv: fieldsPerRecord: -1 doesn't seem working
3 participants