Skip to content

Commit

Permalink
Improve print formatting of strings containing newline characters. (N…
Browse files Browse the repository at this point in the history
…VIDIA#11108)

String data that contained \r and \n characters would print badly mangled.  

Example column containing:
```
.,e,Infinity,+Infinity,-Infinity,+nAn,-naN,Nan,5f,1.2f,\riNf,NULL
```

Doing a `cudf::test::print()` on this would yield mostly garbage, but importantly, plausible-seeming but puzzling garbage:

```
iNf,NULLnity,+Infinity,-Infinity,+nAn,-naN,Nan,5f,1.2f,
```

The fix is to do a little postprocessing and replace the individual chars `'\r'` and `'\n'` with the escaped strings `"\r"` and `"\n"`.  Note that this only applies to the output ultimately sent to the print - not to the raw data retrieved from the device itself using `::to_host`

Authors:
  - https://github.com/nvdbaranec

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Bradley Dice (https://github.com/bdice)
  - Karthikeyan (https://github.com/karthikeyann)

URL: rapidsai/cudf#11108
  • Loading branch information
nvdbaranec authored Jun 21, 2022
1 parent 40ec190 commit 36aa957
Showing 1 changed file with 15 additions and 2 deletions.
17 changes: 15 additions & 2 deletions cpp/tests/utilities/column_utilities.cu
Original file line number Diff line number Diff line change
Expand Up @@ -1054,13 +1054,26 @@ struct column_view_printer {
if (col.is_empty()) return;
auto h_data = cudf::test::to_host<std::string>(col);

// explicitly replace '\r' and '\n' characters with "\r" and "\n" strings respectively.
auto cleaned = [](std::string_view in) {
std::string out(in);
auto replace_char = [](std::string& out, char c, std::string_view repl) {
for (std::string::size_type pos{}; out.npos != (pos = out.find(c, pos)); pos++) {
out.replace(pos, 1, repl);
}
};
replace_char(out, '\r', "\\r");
replace_char(out, '\n', "\\n");
return out;
};

out.resize(col.size());
std::transform(thrust::make_counting_iterator(size_type{0}),
thrust::make_counting_iterator(col.size()),
out.begin(),
[&h_data](auto idx) {
[&](auto idx) {
return h_data.second.empty() || bit_is_set(h_data.second.data(), idx)
? h_data.first[idx]
? cleaned(h_data.first[idx])
: std::string("NULL");
});
}
Expand Down

0 comments on commit 36aa957

Please sign in to comment.