Skip to content
This repository has been archived by the owner on Jun 28, 2021. It is now read-only.

Problem with unvisible "\r" symbols #58

Closed
kussberg opened this issue Aug 20, 2015 · 20 comments
Closed

Problem with unvisible "\r" symbols #58

kussberg opened this issue Aug 20, 2015 · 20 comments

Comments

@kussberg
Copy link

I am parsing a simple multiline CSV file, that i exported from Google Spreadsheet and i get following error:

events.js:85
      throw er; // Unhandled 'error' event
            ^
Error: Invalid closing quote at line 96; found "\r" instead of delimiter ","
    at Error (native)
    at Parser.__write (/Users/edgarkuskov/Documents/FoodoWorkspace/foodo-api-server/node_modules/csv-parse/lib/index.js:303:151)
    at Parser._transform (/Users/edgarkuskov/Documents/FoodoWorkspace/foodo-api-server/node_modules/csv-parse/lib/index.js:157:10)
    at Transform._read (_stream_transform.js:179:10)
    at Transform._write (_stream_transform.js:167:12)
    at doWrite (_stream_writable.js:301:12)
    at writeOrBuffer (_stream_writable.js:288:5)
    at Writable.write (_stream_writable.js:217:11)
    at ReadStream.ondata (_stream_readable.js:540:20)
    at ReadStream.emit (events.js:107:17)

Does anyone have idea how to fix it?

@wdavidw
Copy link
Member

wdavidw commented Aug 24, 2015

"\r" shall be treated like any other character. I expect your problem to be related to the way quotes are written or escaped. try to isolate the CSV line responsible to your problem and send it to us with the options you're using.

@kussberg
Copy link
Author

I fixed it by making a transformer and replace \n\r to \n.

function fixLineEnding() {
  var Transform = require('stream').Transform;
  var fixEndpoints = new Transform();
  fixEndpoints._transform = function (data, encoding, done) {
    this.push(data.toString().replace(/\r\n/g, '\n'));
    done();
  };
  return fixEndpoints;
}

@wdavidw
Copy link
Member

wdavidw commented Aug 25, 2015

ok, now i understand your problem. did you try setting the option "rowDelimiter" to "\r\n"?

@sidewaiise
Copy link

I just tried this. ( ^ ) Does not fix the problem. I'm not getting many errors, so perhaps this is unique to the start of the file or something.

@wdavidw
Copy link
Member

wdavidw commented Feb 8, 2016

if i could get a test or a sample script to replicate the issue, i'll be happy to fix it.

@jdavisclark
Copy link

@wdavidw @ekussberg: in addition, it would be nice to have some kind of support for dealing with mixed line endings. I know it shouldn't have to be the responsibility of the csv parser, but its a little frustrating when a 3rd party data source randomly starts intermixing line endings in a file and parsing blows up.

Pretty much any utility I've written lately has to support a --strip-random-carriage-returns flag, that conditionally puts a transform stream in front of the csv parser.

@wdavidw
Copy link
Member

wdavidw commented Feb 12, 2016

i am not against the idea, csv input is such a mess sometimes and we often dont control the source. if you wish to work on it, i'll review the patch.

@danopia
Copy link

danopia commented Jun 7, 2016

Hey, I just hit this same situation with Google Sheets.

The header row of my document contains multiline cells. Multiline cells appear to use Linux newlines (\n), while CSV exports use Windows newlines (\r\n) to terminate rows.

So the first newline csv-parse sees is a Linux newline and it assumes that for the rest of the file. I was actually getting \r back in my data. This didn't cause too many problems, at least until the last cell of a row was quoted. The \rs were being treated as normal charactors, and if a cell is quoted then all data of that cell must be inside the quotes.

This seems like a Google Drive bug and I will follow up with that. But in the meantime this library should be able to handle this more easily. Perhaps knowing that \r shouldn't appear in data, or can exist at ends of rows.

Here is a minimal repro:
https://docs.google.com/spreadsheets/d/1vHK01H14ZsztlqFtX4yuUmEaDf0w7pPIGDG_XdSYsYE/edit?usp=sharing (export as CSV)
screen shot 2016-06-06 at 6 49 20 pm

Error: Invalid closing quote at line 6; found "\r" instead of delimiter ","

@wdavidw
Copy link
Member

wdavidw commented Jun 8, 2016

i dont see your problem

parse '"qwer\nasdf",dfh\r\n1,2\r\n4,6\r\n7,3\r\n1,"8,120"\r\n3,2', rowDelimiter: '\r\n', (err, data) ->
   console.log err, data

print

null [ [ 'qwer\nasdf', 'dfh' ],
  [ '1', '2' ],
  [ '4', '6' ],
  [ '7', '3' ],
  [ '1', '8,120' ],
  [ '3', '2' ] ]

@danopia
Copy link

danopia commented Jun 8, 2016

You specified an explicit row delimiter. The super cryptic error message
that spawned this ticket is triggered when auto detection messes up.

In my case I accept dynamic CSV URLs so I can't easily assume all files
will have windows newlines.

If the auto-row-delimiter logic actually looked for the first record-terminator instead of just the first newline, it would work perfectly with this file.

On Wed, Jun 8, 2016, 07:24 Worms David notifications@github.com wrote:

i dont see your problem

parse '"qwer\nasdf",dfh\r\n1,2\r\n4,6\r\n7,3\r\n1,"8,120"\r\n3,2', rowDelimiter: '\r\n', (err, data) ->
console.log err, data

print

null [ [ 'qwer\nasdf', 'dfh' ],
[ '1', '2' ],
[ '4', '6' ],
[ '7', '3' ],
[ '1', '8,120' ],
[ '3', '2' ] ]


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#58 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AACetCbCgO5D1tuoBX3AWMXTdFHeX0asks5qJtCGgaJpZM4FvJSq
.

@dtaylor8726
Copy link

I'm also having this same problem with my CSV file input - I'm really surprised the newline parameter can't accept an argument for "\r\n".

@wdavidw
Copy link
Member

wdavidw commented Jan 15, 2017

The same example with automatic row delimiter detection works as well:

parse '"qwer\nasdf",dfh\r\n1,2\r\n4,6\r\n7,3\r\n1,"8,120"\r\n3,2', (err, data) ->
   console.log err, data

Could you provide me with an example as simple as the one above reproducing the error ?

@danopia, what do you mean by "first record terminator" in "If the auto-row-delimiter logic actually looked for the first record-terminator instead of just the first newline, it would work perfectly with this file." ?

@jd327
Copy link

jd327 commented Jan 24, 2017

Stumbled here after getting [Error: Invalid closing quote at line 2; found "\r" instead of delimiter ","]

I'm converting a Google Sheets doc to CSV (I'm guessing same as @danopia?), and this is what they spit out:

address,name,"
"<CRLF>
sf,test,

For a simple doc that has 2 columns & 2 rows:
simple

Apparently, it thinks there's a 3rd column that's empty, but more importantly the \r is effed. Any suggestions on how to handle this automatically? (cause I have other non-google-spreadsheets docs)

@wdavidw
Copy link
Member

wdavidw commented Jan 26, 2017

What do you mean by "effed". is there an additionnal linefeed after the <CRLF> like in your exemple above or does the string once converted to javascript look like address,name,"\n"\r\nsf,test,. The following is working:

parse 'address,name,"\n"\r\nsf,test,', rowDelimiter: '\r\n', (err, data) ->
  console.log err, data

@wdavidw
Copy link
Member

wdavidw commented Jan 26, 2017

@ivanakimov Looking at your sample, I discovered a bug. Not completely sure if this is the problem we're experiencing in this issue. Please test the latest release and re-open an issue if this isnt the case.

@jd327
Copy link

jd327 commented Jan 26, 2017

@wdavidw whatever you did, that fixed it for me in v1.1.1 -- big thank you.

The reason I didn't want to experiment with rowDelimeter was because it's users' uploaded documents, and I was hoping to auto-detect that. But whatever you patched up, fixed it right up! And "eff'd up" was just me being not nice -- I meant "messed up" ;)

@jonahbron
Copy link

I've tried using version 1.1.1, AND 1.1.11, but the problem still persists for me.

var parse = require('csv-parse');

parse('"foo\nbar"\r\nbaz', {auto_parse: true}, function (error, rows) {
    console.log(error, rows);
});

This script results in the same error:

Error: Invalid closing quote at line 2; found "\r" instead of delimiter ","

@jonahbron
Copy link

Came up with a workaround that strips out carriage returns. Works well in my use-case.

var util = require('util');
var stream = require('stream');

function UniformLineEndings() {
    stream.Transform.call(this);
}
UniformLineEndings.CR_PATTERN = /\r/g;
util.inherits(UniformLineEndings, stream.Transform);
UniformLineEndings.prototype._transform = function (chunk, enc, cb) {
    this.push(chunk.toString().replace(UniformLineEndings.CR_PATTERN, ''));
    cb();
};

// usage
fs.createReadStream(file).pipe(new UniformLineEndings()).pipe(parser);

@wdavidw
Copy link
Member

wdavidw commented Apr 6, 2017

@jonahbron, error is expected in your case, having different row dlimiters cant be suppoorted

@wdavidw wdavidw closed this as completed Apr 6, 2017
@pietersv
Copy link

pietersv commented Jan 4, 2021

The stream transformer by @jonahbron is great. Minor note from applications I find that this.push(chunk.toString().replace(/\r\n|\n|\r/g, '\n'));` works to cover a range of cases such as:

  • Linux CSV with \n (new line without carriage return)
  • MSDos CSV with \r\n (new line with carriage return)
  • Mac Excel Comma Separated Value (.csv) with \r (carriage return without new line)

Update: The Node stream may break the CSV string between a \r and \n so a simple global substitution would occasionally replace \r\n with two line breaks. The following tries to handle that by remembering the final character of the prior chunk:

UniformLineEndings.prototype._transform = function (chunk, enc, cb) {
  var str = chunk.toString()
  if (this.__prior_ending_char == '\r' && str[0] == '\n') str = str.substring(1)
  this.__prior_ending_char = str[str.length-1]
  str = str.replace(/\r\n|\n|\r/g, '\n')
  this.push(str);
  cb();
};

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants