Basic text record format returns 0 records #444

yruslan · 2021-11-30T07:53:10Z

Describe the bug

The basic text record format that uses underlying Spark RDDs to split text files efficiently does not produce correct results.

To Reproduce

      val df = spark
        .read
        .format("cobol")
        .option("copybook_contents", copybook)
        .option("record_format", "D2")
        .load(path)

df.count

Returns 0.

      val df = spark
        .read
        .format("cobol")
        .option("copybook_contents", copybook)
        .option("record_format", "D")
        .load(path)

df.count

Returns a non-zero value.

Expected behaviour

The behavior should be the same for 'D' and 'D2' formats if the input file is in basic ASCII format (e.g. 7-bit English text).

The text was updated successfully, but these errors were encountered:

yruslan added the bug Something isn't working label Nov 30, 2021

yruslan self-assigned this Nov 30, 2021

yruslan added a commit that referenced this issue Dec 1, 2021

#444 Fix reading ASCII data using D2 record format

0f6aad9

yruslan added a commit that referenced this issue Dec 2, 2021

#444 Fix reading ASCII data using D2 record format

547d951

yruslan closed this as completed Dec 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic text record format returns 0 records #444

Basic text record format returns 0 records #444

yruslan commented Nov 30, 2021

Basic text record format returns 0 records #444

Basic text record format returns 0 records #444

Comments

yruslan commented Nov 30, 2021

Describe the bug

To Reproduce

Expected behaviour