Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29378][R] Upgrade SparkR to use Arrow 0.15 API #26555

Closed
wants to merge 6 commits into from
Closed

[SPARK-29378][R] Upgrade SparkR to use Arrow 0.15 API #26555

wants to merge 6 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 16, 2019

What changes were proposed in this pull request?

[SPARK-29376] Upgrade Apache Arrow to version 0.15.1 upgrades to Arrow 0.15 at Scala/Java/Python. This PR aims to upgrade SparkR to use Arrow 0.15 API. Currently, it's broken.

Why are the changes needed?

First of all, it turns out that our Jenkins jobs (including PR builder) ignores Arrow test. Arrow 0.15 has a breaking R API changes at ARROW-5505 and we missed that. AppVeyor was the only one having SparkR Arrow tests but it's broken now.

Jenkins

Skipped ------------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (@test_sparkSQL_arrow.R#25)
- arrow not installed

Second, Arrow throws OOM on AppVeyor environment (Windows JDK8) like the following because it still has Arrow 0.14.

Warnings -----------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (@test_sparkSQL_arrow.R#39) - createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.sparkr.enabled' is set to true; however, failed, attempting non-optimization. Reason: Error in handleErrors(returnStatus, conn): java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
	at org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:669)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$3.readNextBatch(ArrowConverters.scala:243)

It is due to the version mismatch.

int messageLength = MessageSerializer.bytesToInt(buffer.array());
if (messageLength == IPC_CONTINUATION_TOKEN) {
  buffer.clear();
  // ARROW-6313, if the first 4 bytes are continuation message, read the next 4 for the length
  if (in.readFully(buffer) == 4) {
    messageLength = MessageSerializer.bytesToInt(buffer.array());
  }
}

// Length of 0 indicates end of stream
if (messageLength != 0) {
  // Read the message into the buffer.
  ByteBuffer messageBuffer = ByteBuffer.allocate(messageLength);

After upgrading this to 0.15, we are hitting ARROW-5505. This PR upgrades Arrow version in AppVeyor and fix the issue.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the AppVeyor.

This PR passed here.

SparkSQL Arrow optimization: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
................

@dongjoon-hyun dongjoon-hyun changed the title AppVeyor test AppVeyor Arrow test Nov 16, 2019
@dongjoon-hyun dongjoon-hyun changed the title AppVeyor Arrow test [WIP] AppVeyor Arrow test Nov 16, 2019
@dongjoon-hyun dongjoon-hyun changed the title [WIP] AppVeyor Arrow test [WIP] AppVeyor Arrow test (JDK8) Nov 16, 2019
@SparkQA

This comment has been minimized.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Nov 16, 2019

AppVeyor is broken as we see in this PR.

@SparkQA

This comment has been minimized.

@srowen
Copy link
Member

srowen commented Nov 16, 2019

It seems like a good idea to match these, yeah. Different error now?

6. Error: gapply() Arrow optimization - type specification (date and timestamp) (@test_sparkSQL_arrow.R#285) 
invalid 'n' argument
1: tryCatch({
       ret <- gapply(df, "a", function(key, grouped) {
           grouped
       }, schema(df))
       expect_equal(collect(ret), rdf)
   }, finally = {
       callJMethod(conf, "set", "spark.sql.execution.arrow.sparkr.enabled", arrowEnabled)
   }) at C:\projects\spark\bin\../R/pkg/tests/fulltests/test_sparkSQL_arrow.R:285
2: tryCatchList(expr, classes, parentenv, handlers)
3: expect_equal(collect(ret), rdf) at C:\projects\spark\bin\../R/pkg/tests/fulltests/test_sparkSQL_arrow.R:289
4: compare(object, expected, ...)
5: collect(ret)
6: collect(ret)
7: .local(x, ...)
8: tryCatch({
       doServerAuth(conn, authSecret)
       arrowTable <- read_arrow(readRaw(conn))
       if (useAsTibble) {
           as_tibble <- get("as_tibble", envir = asNamespace("arrow"))
           as.data.frame(as_tibble(arrowTable), stringsAsFactors = stringsAsFactors)
       }
       else {
           as.data.frame(arrowTable, stringsAsFactors = stringsAsFactors)
       }
   }, finally = {
       close(conn)
   })
9: tryCatchList(expr, classes, parentenv, handlers)
10: read_arrow(readRaw(conn))
11: as.data.frame(read_table(stream))
12: read_table(stream)
13: readRaw(conn)
14: readBin(con, raw(), as.integer(dataLen), endian = "big")

Still seems like some protocol / version mismatch. I don't know much about this but looks like readRaw gets back info from conn that causes it to try to read an invalid amount, like -1. I think that kind of "-1" return has something to do with being out of memory, so maybe back where we started.

Does the machine need more memory or is that irrelevant?

@dongjoon-hyun
Copy link
Member Author

Correct. The previous one was due to version mismatch. messageLength seems to be read in a wrong way.

      int messageLength = MessageSerializer.bytesToInt(buffer.array());
      if (messageLength == IPC_CONTINUATION_TOKEN) {
        buffer.clear();
        // ARROW-6313, if the first 4 bytes are continuation message, read the next 4 for the length
        if (in.readFully(buffer) == 4) {
          messageLength = MessageSerializer.bytesToInt(buffer.array());
        }
      }

      // Length of 0 indicates end of stream
      if (messageLength != 0) {

        // Read the message into the buffer.
        ByteBuffer messageBuffer = ByteBuffer.allocate(messageLength);

And, yes. Now, with this PR, it seems to show the previous blocker of SPARK-29378?

1. createDataFrame/collect Arrow optimization (@test_sparkSQL_arrow.R#39) - createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.sparkr.enabled' is set to true; however, failed, attempting non-optimization.
Reason: Error in arrow::FileOutputStream(fileName): attempt to apply non-function

@dongjoon-hyun dongjoon-hyun changed the title [WIP] AppVeyor Arrow test (JDK8) [SPARK-29378][R] Upgrade SparkR to use Arrow 0.15 API Nov 16, 2019
schema <- batch$schema
stream_writer <- arrow::RecordBatchStreamWriter(stream, schema)
stream_writer <- arrow::RecordBatchStreamWriter$create(stream, schema)
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARROW-5505 changes their R API. cc @srowen , @HyukjinKwon , @felixcheung

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaneknapp . It seems that our Jenkins doesn't have R Arrow installation and skip these test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not Windows/JDK8 issue. Arrow 0.15 has a breaking change in R interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaneknapp . It seems that our Jenkins doesn't have R Arrow installation and skip these test.

this is correct. should i infer that we need to add R Arrow to our worker nodes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I configured AppVeyor to test Arrow related ones in R for now. So, at least it's being tested in AppVeyor.

Yes, it would be nice if we have Arrow R installed in Jenkins workers to test it out; however, I remember it's a bit tricky to install Arrow R library for now (you cannot just install via CRAN but needs some manual preparation - see https://github.com/apache/arrow/tree/master/r#installation).

I am worried if it messes up the current Jenkins worker's environment. If you see any concern, @shaneknapp, you can just don't install it for now. It's being tested via AppVeyor at least so it's neither a must or urgent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, R arrow installed easily and w/o issue (surprising, i know!). :)

sknapp@ubuntu-testing:~$ Rscript -e "packageVersion('arrow')"
[1] ‘0.15.1.1’

i will install this on the remaining workers later today/tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @shaneknapp !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

dongjoon-hyun added a commit that referenced this pull request Nov 16, 2019
### What changes were proposed in this pull request?

This PR aims to ignore `GitHub Action` and `AppVeyor` file changes. When we touch these files, Jenkins job should not trigger a full testing.

### Why are the changes needed?

Currently, these files are categorized to `root` and trigger the full testing and ends up wasting the Jenkins resources.
- #26555
```
[info] Using build tool sbt with Hadoop profile hadoop2.7 under environment amplab_jenkins
From https://github.com/apache/spark
 * [new branch]      master     -> master
[info] Found the following changed modules: sparkr, root
[info] Setup the following environment variables for tests:
```

### Does this PR introduce any user-facing change?

No. (Jenkins testing only).

### How was this patch tested?

Manually.
```
$ dev/run-tests.py -h -v
...
Trying:
    [x.name for x in determine_modules_for_files([".github/workflows/master.yml", "appveyor.xml"])]
Expecting:
    []
...
```

Closes #26556 from dongjoon-hyun/SPARK-IGNORE-APPVEYOR.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@SparkQA

This comment has been minimized.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @dongjoon-hyun !

appveyor.yml Outdated
@@ -43,9 +43,8 @@ install:
- ps: .\dev\appveyor-install-dependencies.ps1
# Required package for R unit tests
- cmd: R -e "install.packages(c('knitr', 'rmarkdown', 'e1071', 'survival'), repos='https://cloud.r-project.org/')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to restore this line from #26041 to install arrow?

cmd: R -e "install.packages(c('knitr', 'rmarkdown', 'e1071', 'survival', 'arrow'), repos='https://cloud.r-project.org/')"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yes. That will be better.

@dongjoon-hyun
Copy link
Member Author

For the other errors, I'm working on.

@SparkQA

This comment has been minimized.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29378][R] Upgrade SparkR to use Arrow 0.15 API [WIP][SPARK-29378][R] Upgrade SparkR to use Arrow 0.15 API Nov 16, 2019
@SparkQA
Copy link

SparkQA commented Nov 16, 2019

Test build #113945 has finished for PR 26555 at commit 96f20f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-29378][R] Upgrade SparkR to use Arrow 0.15 API [SPARK-29378][R] Upgrade SparkR to use Arrow 0.15 API Nov 16, 2019
@dongjoon-hyun
Copy link
Member Author

Now, this PR passed all R tests (Jenkins + AppVeyor including Arrow 0.15).
Could you review this PR again, @srowen , @BryanCutler , @HyukjinKwon , @felixcheung , @gatorsmile ?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means the minimum version is 0.15? Then we should also update document like sparkr.md.

@dongjoon-hyun
Copy link
Member Author

There is already a documentation JIRA for that, @viirya

@viirya
Copy link
Member

viirya commented Nov 17, 2019

@dongjoon-hyun Thanks! I see. Then let's update the R document in that.

@viirya
Copy link
Member

viirya commented Nov 17, 2019

From the Arrow PR apache/arrow#5279, this change looks good.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for review. I'll merge this PR to recover the AppVeyor build and test.
Since it's only for Window-environment, we need to enable this arrow test in Jenkins to protect our branch.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-R-TEST branch November 17, 2019 02:29
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Havnt taken a close look but looks good. Thanks @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thanks, @HyukjinKwon !

@@ -42,10 +42,8 @@ install:
# Install maven and dependencies
- ps: .\dev\appveyor-install-dependencies.ps1
# Required package for R unit tests
- cmd: R -e "install.packages(c('knitr', 'rmarkdown', 'e1071', 'survival'), repos='https://cloud.r-project.org/')"
# Use Arrow R 0.14.1 for now. 0.15.0 seems not working for now. See SPARK-29378.
- cmd: R -e "install.packages(c('knitr', 'rmarkdown', 'e1071', 'survival', 'arrow'), repos='https://cloud.r-project.org/')"
- cmd: R -e "install.packages(c('assertthat', 'bit64', 'fs', 'purrr', 'R6', 'tidyselect'), repos='https://cloud.r-project.org/')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was to install Arrow dependencies manually (because devtools started to require testthat dependency as 2.0.0+; however, SparkR requires 1.0.2). So, I think we can remove this line too. Let me make a quick followup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Go ahead!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a followup - #26566. For clarification It actually makes no diff but only less codes.

HyukjinKwon added a commit that referenced this pull request Nov 18, 2019
…dencies in AppVeyor build

### What changes were proposed in this pull request?

This PR remove manual installation of Arrow dependencies in AppVeyor build

### Why are the changes needed?

It's unnecessary. See #26555 (comment)

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

AppVeyor will test.

Closes #26566 from HyukjinKwon/SPARK-29378.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
dongjoon-hyun added a commit that referenced this pull request Nov 18, 2019
### What changes were proposed in this pull request?

This PR aims to ignore `GitHub Action` and `AppVeyor` file changes. When we touch these files, Jenkins job should not trigger a full testing.

### Why are the changes needed?

Currently, these files are categorized to `root` and trigger the full testing and ends up wasting the Jenkins resources.
- #26555
```
[info] Using build tool sbt with Hadoop profile hadoop2.7 under environment amplab_jenkins
From https://github.com/apache/spark
 * [new branch]      master     -> master
[info] Found the following changed modules: sparkr, root
[info] Setup the following environment variables for tests:
```

### Does this PR introduce any user-facing change?

No. (Jenkins testing only).

### How was this patch tested?

Manually.
```
$ dev/run-tests.py -h -v
...
Trying:
    [x.name for x in determine_modules_for_files([".github/workflows/master.yml", "appveyor.xml"])]
Expecting:
    []
...
```

Closes #26556 from dongjoon-hyun/SPARK-IGNORE-APPVEYOR.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit d0470d6)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Jun 24, 2020
[[SPARK-29376] Upgrade Apache Arrow to version 0.15.1](apache#26133) upgrades to Arrow 0.15 at Scala/Java/Python. This PR aims to upgrade `SparkR` to use Arrow 0.15 API. Currently, it's broken.

First of all, it turns out that our Jenkins jobs (including PR builder) ignores Arrow test. Arrow 0.15 has a breaking R API changes at [ARROW-5505](https://issues.apache.org/jira/browse/ARROW-5505) and we missed that. AppVeyor was the only one having SparkR Arrow tests but it's broken now.

**Jenkins**
```
Skipped ------------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (test_sparkSQL_arrow.R#25)
- arrow not installed
```

Second, Arrow throws OOM on AppVeyor environment (Windows JDK8) like the following because it still has Arrow 0.14.
```
Warnings -----------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (test_sparkSQL_arrow.R#39) - createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.sparkr.enabled' is set to true; however, failed, attempting non-optimization. Reason: Error in handleErrors(returnStatus, conn): java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
	at org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:669)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$3.readNextBatch(ArrowConverters.scala:243)
```

It is due to the version mismatch.
```java
int messageLength = MessageSerializer.bytesToInt(buffer.array());
if (messageLength == IPC_CONTINUATION_TOKEN) {
  buffer.clear();
  // ARROW-6313, if the first 4 bytes are continuation message, read the next 4 for the length
  if (in.readFully(buffer) == 4) {
    messageLength = MessageSerializer.bytesToInt(buffer.array());
  }
}

// Length of 0 indicates end of stream
if (messageLength != 0) {
  // Read the message into the buffer.
  ByteBuffer messageBuffer = ByteBuffer.allocate(messageLength);
```
 After upgrading this to 0.15, we are hitting ARROW-5505. This PR upgrades Arrow version in AppVeyor and fix the issue.

No.

Pass the AppVeyor.

This PR passed here.
- https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/28909044

```
SparkSQL Arrow optimization: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
................
```

Closes apache#26555 from dongjoon-hyun/SPARK-R-TEST.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Jul 15, 2020
[[SPARK-29376] Upgrade Apache Arrow to version 0.15.1](apache#26133) upgrades to Arrow 0.15 at Scala/Java/Python. This PR aims to upgrade `SparkR` to use Arrow 0.15 API. Currently, it's broken.

First of all, it turns out that our Jenkins jobs (including PR builder) ignores Arrow test. Arrow 0.15 has a breaking R API changes at [ARROW-5505](https://issues.apache.org/jira/browse/ARROW-5505) and we missed that. AppVeyor was the only one having SparkR Arrow tests but it's broken now.

**Jenkins**
```
Skipped ------------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (test_sparkSQL_arrow.R#25)
- arrow not installed
```

Second, Arrow throws OOM on AppVeyor environment (Windows JDK8) like the following because it still has Arrow 0.14.
```
Warnings -----------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (test_sparkSQL_arrow.R#39) - createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.sparkr.enabled' is set to true; however, failed, attempting non-optimization. Reason: Error in handleErrors(returnStatus, conn): java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
	at org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:669)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$3.readNextBatch(ArrowConverters.scala:243)
```

It is due to the version mismatch.
```java
int messageLength = MessageSerializer.bytesToInt(buffer.array());
if (messageLength == IPC_CONTINUATION_TOKEN) {
  buffer.clear();
  // ARROW-6313, if the first 4 bytes are continuation message, read the next 4 for the length
  if (in.readFully(buffer) == 4) {
    messageLength = MessageSerializer.bytesToInt(buffer.array());
  }
}

// Length of 0 indicates end of stream
if (messageLength != 0) {
  // Read the message into the buffer.
  ByteBuffer messageBuffer = ByteBuffer.allocate(messageLength);
```
 After upgrading this to 0.15, we are hitting ARROW-5505. This PR upgrades Arrow version in AppVeyor and fix the issue.

No.

Pass the AppVeyor.

This PR passed here.
- https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/28909044

```
SparkSQL Arrow optimization: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
................
```

Closes apache#26555 from dongjoon-hyun/SPARK-R-TEST.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Jul 15, 2020
[[SPARK-29376] Upgrade Apache Arrow to version 0.15.1](apache#26133) upgrades to Arrow 0.15 at Scala/Java/Python. This PR aims to upgrade `SparkR` to use Arrow 0.15 API. Currently, it's broken.

First of all, it turns out that our Jenkins jobs (including PR builder) ignores Arrow test. Arrow 0.15 has a breaking R API changes at [ARROW-5505](https://issues.apache.org/jira/browse/ARROW-5505) and we missed that. AppVeyor was the only one having SparkR Arrow tests but it's broken now.

**Jenkins**
```
Skipped ------------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (test_sparkSQL_arrow.R#25)
- arrow not installed
```

Second, Arrow throws OOM on AppVeyor environment (Windows JDK8) like the following because it still has Arrow 0.14.
```
Warnings -----------------------------------------------------------------------
1. createDataFrame/collect Arrow optimization (test_sparkSQL_arrow.R#39) - createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.sparkr.enabled' is set to true; however, failed, attempting non-optimization. Reason: Error in handleErrors(returnStatus, conn): java.lang.OutOfMemoryError: Java heap space
	at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
	at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
	at org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:669)
	at org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$3.readNextBatch(ArrowConverters.scala:243)
```

It is due to the version mismatch.
```java
int messageLength = MessageSerializer.bytesToInt(buffer.array());
if (messageLength == IPC_CONTINUATION_TOKEN) {
  buffer.clear();
  // ARROW-6313, if the first 4 bytes are continuation message, read the next 4 for the length
  if (in.readFully(buffer) == 4) {
    messageLength = MessageSerializer.bytesToInt(buffer.array());
  }
}

// Length of 0 indicates end of stream
if (messageLength != 0) {
  // Read the message into the buffer.
  ByteBuffer messageBuffer = ByteBuffer.allocate(messageLength);
```
 After upgrading this to 0.15, we are hitting ARROW-5505. This PR upgrades Arrow version in AppVeyor and fix the issue.

No.

Pass the AppVeyor.

This PR passed here.
- https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/28909044

```
SparkSQL Arrow optimization: Spark package found in SPARK_HOME: C:\projects\spark\bin\..
................
```

Closes apache#26555 from dongjoon-hyun/SPARK-R-TEST.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants