Skip to content

Commit

Permalink
Merge with master, include changes to method args
Browse files Browse the repository at this point in the history
commit 68d6de4
Author: cafreeman <cfreeman@alteryx.com>
Date:   Thu Feb 26 09:08:52 2015 -0600

    Fix typos

commit 3294949
Author: Chris Freeman <cfreeman@alteryx.com>
Date:   Wed Feb 25 18:19:39 2015 -0600

    Restore `rdd` argument to `getJRDD`

commit c652b4c
Author: cafreeman <cfreeman@alteryx.com>
Date:   Wed Feb 25 16:22:36 2015 -0600

    Update method signatures to use generic arg

    Replace the `rdd` argument in all of the S4 methods with `x`. This will allow us to standardize the code as other Spark components get added and we need to set up multiple dispatch on S4 methods.

    In any cases where `x` was used as a generic iterator, I've replaced it with `i` except in a few cases where a different letter made sense. For example, in some of the pair functions, we now use `function(k)` and `function(v)` for the key/value functions.

commit c10148e
Merge: 08102b0 910e3be
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Fri Feb 20 17:55:47 2015 -0800

    Merge pull request apache#174 from shivaram/sparkr-runner

    [SPARKR-178] Integrate with SparkR with spark-submit

commit 910e3be
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Fri Feb 20 10:41:54 2015 -0800

    Add a timeout for initialization
    Also move sparkRBackend.stop into a finally block

commit bf52b17
Merge: 88bf97f 08102b0
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Fri Feb 20 10:36:35 2015 -0800

    Merge remote-tracking branch 'amplab-sparkr/master' into sparkr-runner

    Conflicts:
    	pkg/R/sparkR.R

commit 08102b0
Merge: 06bf250 179ab38
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Thu Feb 19 23:39:41 2015 -0800

    Merge pull request apache#176 from lythesia/master

    [SPARKR-193] Retry backend connection if it doesn't come up

commit 179ab38
Author: lythesia <iranaikimi@gmail.com>
Date:   Fri Feb 20 12:02:47 2015 +0800

    add try counts and increase time interval

commit 06bf250
Merge: 17eda4c 06d99f0
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Thu Feb 19 16:50:50 2015 -0800

    Merge pull request apache#173 from shivaram/windows-space-fix

    [SPARKR-200][SPARKR-149] Fix path, classpath separator for Windows

commit 88bf97f
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Thu Feb 19 16:45:03 2015 -0800

    Create SparkContext for R shell launch

commit f9268d9
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Thu Feb 19 15:58:17 2015 -0800

    Fix code review comments

commit e6ad12d
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Thu Feb 19 12:35:45 2015 -0800

    Update comment describing sparkR-submit

commit 17eda4c
Merge: 0981dff ba2b72b
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Thu Feb 19 11:19:14 2015 -0800

    Merge pull request apache#175 from falaki/docfix

    Minor documentation cleanup

commit ba2b72b
Author: Hossein <hossein@databricks.com>
Date:   Thu Feb 19 10:35:07 2015 -0800

    Spark 1.1.0 is default

commit 4cd7d3f
Author: lythesia <iranaikimi@gmail.com>
Date:   Thu Feb 19 21:51:44 2015 +0800

    retry backend connection

commit 749e2d0
Author: Hossein <hossein@databricks.com>
Date:   Wed Feb 18 22:56:25 2015 -0800

    Updated README

commit bc04cf4
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Wed Feb 18 18:24:44 2015 -0800

    Use SPARKR_BACKEND_PORT in sparkR.R as default
    Change SparkRRunner to use EXISTING_SPARKR_BACKEND_PORT to
    differentiate between the two

commit 22a19ac
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Wed Feb 18 14:34:32 2015 -0800

    Use a semaphore to wait for backend to initalize
    Also pick a random port to avoid collisions

commit 0981dff
Merge: fd8f8a9 0cda231
Author: Zongheng Yang <zongheng.y@gmail.com>
Date:   Wed Feb 18 09:50:06 2015 -0800

    Merge pull request apache#168 from sun-rui/SPARKR-153_2

    [SPARKR-153] phase 2: implement aggregateByKey() and foldByKey().

commit 86fc639
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Tue Feb 17 23:29:51 2015 -0800

    Move sparkR-submit into pkg/inst

commit fd8f8a9
Merge: 384e6e2 a33dbea
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Tue Feb 17 23:17:39 2015 -0800

    Merge branch 'hqzizania-master'

commit a33dbea
Merge: 384e6e2 9c391c7
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Tue Feb 17 23:17:00 2015 -0800

    Merge branch 'master' of https://github.com/hqzizania/SparkR-pkg into hqzizania-master

    Conflicts:
    	pkg/R/RDD.R

commit 384e6e2
Merge: 2271030 1f5a6ac
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Tue Feb 17 20:41:17 2015 -0800

    Merge pull request apache#171 from hlin09/hlin09

    [SPARKR-159] Adds support of pipeRDD().

commit 1f5a6ac
Author: hlin09 <hlin09pu@gmail.com>
Date:   Tue Feb 17 22:57:37 2015 -0500

    fixed comments

commit 5292be7
Author: hlin09 <hlin09pu@gmail.com>
Date:   Mon Feb 16 16:05:11 2015 -0500

    Adds support of pipeRDD().

commit 0cda231
Author: Sun Rui <rui.sun@intel.com>
Date:   Mon Feb 16 16:51:34 2015 +0800

    [SPARKR-153] phase 2: implement aggregateByKey() and foldByKey().

commit 95ee6b4
Merge: 67fbc60 2271030
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Sun Feb 15 23:45:54 2015 -0800

    Merge remote-tracking branch 'amplab-sparkr/master' into sparkr-runner

commit 67fbc60
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Sun Feb 15 23:44:59 2015 -0800

    Add support for SparkR shell to use spark-submit
    This ensures that SparkConf options are read in both
    in batch and interactive modes

commit 2271030
Merge: 52f94c4 7fcb46a
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Sun Feb 15 19:11:18 2015 -0800

    Merge pull request apache#167 from sun-rui/removePartionByInRDD

    Remove partitionBy() in RDD.

commit 7fcb46a
Author: Sun Rui <rui.sun@intel.com>
Date:   Mon Feb 16 10:44:20 2015 +0800

    Remove partitionBy() in RDD.

commit 52f94c4
Merge: 5836650 59e2d54
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Sun Feb 15 10:59:36 2015 -0800

    Merge pull request apache#160 from lythesia/master

    [SPARKR-137] Move pair RDD functions into a new file

commit 59e2d54
Merge: d968664 5836650
Author: lythesia <iranaikimi@gmail.com>
Date:   Sun Feb 15 11:54:23 2015 +0800

    merge with upstream

commit 5836650
Merge: c91ede2 141723e
Author: Zongheng Yang <zongheng.y@gmail.com>
Date:   Sat Feb 14 22:45:02 2015 -0500

    Merge pull request apache#163 from sun-rui/SPARKR-153_1

    [SPARKR-153] phase 1: implement fold() and aggregate().

commit 141723e
Author: Sun Rui <rui.sun@intel.com>
Date:   Sun Feb 15 10:25:11 2015 +0800

    fix comments.

commit c91ede2
Merge: 7972858 9d335a9
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Fri Feb 13 16:14:32 2015 -0800

    Merge pull request apache#164 from hlin09/hlin09

    Makes git to ignore Eclipse meta files.

commit 9d335a9
Author: hlin09 <hlin09pu@gmail.com>
Date:   Fri Feb 13 19:00:19 2015 -0500

    Makes git to ignore Eclipse meta files.

commit 94066bf
Author: Sun Rui <rui.sun@intel.com>
Date:   Fri Feb 13 20:03:30 2015 +0800

    [SPARKR-153] phase 1: implement fold() and aggregate().

commit 9c391c7
Merge: 5f29551 7972858
Author: hqzizania <qian.huang@intel.com>
Date:   Thu Feb 12 16:44:25 2015 +0800

    Merge remote-tracking branch 'upstream/master'

commit 5f29551
Author: hqzizania <qian.huang@intel.com>
Date:   Thu Feb 12 16:26:20 2015 +0800

    	modified:   pkg/R/RDD.R
    	modified:   pkg/R/context.R

commit d968664
Author: lythesia <iranaikimi@gmail.com>
Date:   Thu Feb 12 12:21:21 2015 +0800

    fix comment

commit 7972858
Merge: bd6705b f4573c1
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Wed Feb 11 09:05:22 2015 -0800

    Merge pull request apache#159 from sun-rui/SPARKR-150_2

    [SPARKR-150] phase 2: implement takeOrdered() and top().

commit 7690878
Author: lythesia <iranaikimi@gmail.com>
Date:   Wed Feb 11 13:53:14 2015 +0800

    separate out pair RDD functions

commit f4573c1
Author: Sun Rui <rui.sun@intel.com>
Date:   Wed Feb 11 11:29:28 2015 +0800

    Use reduce() instead of sortBy().take() to get the ordered elements.

commit 63e62ed
Author: Sun Rui <rui.sun@intel.com>
Date:   Tue Feb 10 19:09:17 2015 +0800

    [SPARKR-150] phase 2: implement takeOrdered() and top().

commit 050390b
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Mon Feb 9 21:40:27 2015 -0800

    Fix bugs in inferring R file

commit 8398f2e
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Mon Feb 9 21:15:16 2015 -0800

    Add sparkR-submit helper script
    Also adjust R file path for YARN cluster mode

commit bd6705b
Merge: 0c6e071 c7964c9
Author: Zongheng Yang <zongheng.y@gmail.com>
Date:   Mon Feb 9 19:21:31 2015 -0800

    Merge pull request apache#154 from sun-rui/SPARKR-150

    [SPARKR-150] phase 1: implement sortBy() and sortByKey().

commit c7964c9
Merge: 7feac38 0c6e071
Author: Sun Rui <rui.sun@intel.com>
Date:   Tue Feb 10 09:41:00 2015 +0800

    Merge with upstream master.

commit 7feac38
Author: Sun Rui <rui.sun@intel.com>
Date:   Mon Feb 9 18:40:28 2015 +0800

    Use default arguments for sortBy() and sortKeyBy().

commit de2bfb3
Author: Sun Rui <rui.sun@intel.com>
Date:   Mon Feb 9 15:42:14 2015 +0800

    Fix minor comments and add more test cases.

commit 0c6e071
Merge: 343b6ab f5038c0
Author: Zongheng Yang <zongheng.y@gmail.com>
Date:   Sun Feb 8 22:59:49 2015 -0800

    Merge pull request apache#157 from lythesia/master

    [SPARKR-161] Support reduceByKeyLocally()

commit f5038c0
Author: lythesia <iranaikimi@gmail.com>
Date:   Sun Feb 8 11:49:18 2015 +0800

    pull out anonymous functions in groupByKey

commit ba6f044
Author: lythesia <iranaikimi@gmail.com>
Date:   Sat Feb 7 15:37:07 2015 +0800

    fixes for reduceByKeyLocally

commit 343b6ab
Author: Oscar Olmedo <oscarjr@gmail.com>
Date:   Fri Feb 6 18:57:37 2015 -0800

    Export sparkR.stop
    Closes apache#156 from oscaroboto/master

commit 25639cf
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Fri Feb 6 11:55:36 2015 -0800

    Replace tabs with spaces

commit bb25920
Merge: 08ff30b 345f1b8
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Fri Feb 6 11:53:17 2015 -0800

    Merge branch 'dputler-master'

commit b082a35
Author: lythesia <iranaikimi@gmail.com>
Date:   Fri Feb 6 16:36:34 2015 +0800

    add reduceByKeyLocally

commit 7ca6512
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Thu Feb 5 23:10:57 2015 -0800

    First cut of SparkRRunner

commit 345f1b8
Author: dputler <dan.putler@gmail.com>
Date:   Wed Feb 4 22:17:23 2015 -0800

    [SPARKR-195] Implemented project style guidelines for if-else statements

commit 8043559
Author: Sun Rui <rui.sun@intel.com>
Date:   Thu Feb 5 12:12:47 2015 +0800

    Add a TODO to use binary search in the range partitioner.

commit 91b2fd6
Author: Sun Rui <rui.sun@intel.com>
Date:   Thu Feb 5 11:09:29 2015 +0800

    Add more test cases.

commit 0c53d6c
Author: dputler <dan.putler@gmail.com>
Date:   Wed Feb 4 09:00:49 2015 -0800

    Data frames now coerced to lists, and messages issued for a data frame or matrix on how they are parallelized

commit d9da451
Author: Sun Rui <rui.sun@intel.com>
Date:   Wed Feb 4 21:46:49 2015 +0800

    [SPARKR-150] phase 1: implement sortBy() and sortByKey().

commit 08ff30b
Merge: 554bda0 9767e8e
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Date:   Tue Feb 3 22:48:57 2015 -0800

    Merge pull request apache#153 from hqzizania/master

    [SPARKR-160] Support collectAsMap()

commit 9767e8e
Author: hqzizania <qian.huang@intel.com>
Date:   Wed Feb 4 14:21:50 2015 +0800

    	modified:   pkg/man/collect-methods.Rd

commit 5d69f0a
Author: hqzizania <qian.huang@intel.com>
Date:   Wed Feb 4 14:01:00 2015 +0800

    	modified:   pkg/R/RDD.R

commit 4914091
Author: hqzizania <qian.huang@intel.com>
Date:   Wed Feb 4 13:46:15 2015 +0800

    	modified:   pkg/inst/tests/test_rdd.R

commit a95823e
Author: hqzizania <qian.huang@intel.com>
Date:   Wed Feb 4 09:35:43 2015 +0800

    	modified:   pkg/R/RDD.R

commit 554bda0
Merge: c662f29 f34bb88
Author: Zongheng Yang <zongheng.y@gmail.com>
Date:   Mon Feb 2 19:29:01 2015 -0800

    Merge pull request apache#147 from shivaram/sparkr-ec2-fixes

    Bunch of fixes for longer running jobs

commit f34bb88
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Mon Feb 2 10:08:43 2015 -0800

    Remove profiling information from this PR

commit 60da1df
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Sun Feb 1 06:38:58 2015 +0000

    Initialize timing variables

commit 179aa75
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Sun Feb 1 06:28:27 2015 +0000

    Bunch of fixes for longer running jobs
    1. Increase the timeout for socket connection to wait for long jobs
    2. Add some profiling information in worker.R
    3. Put temp file writes before stdin writes in RRDD.scala

commit 06d99f0
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Sat Jan 31 10:35:39 2015 -0800

    Fix URI to have right number of slashes

commit add97f5
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Sat Jan 31 10:18:06 2015 -0800

    Use URL encode to create valid URIs for jars

commit 73430c6
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Date:   Sat Jan 31 00:06:50 2015 -0800

    Make SparkR work on paths with spaces on Windows
  • Loading branch information
cafreeman committed Feb 26, 2015
1 parent 0d07770 commit 8c241a3
Show file tree
Hide file tree
Showing 16 changed files with 887 additions and 301 deletions.
15 changes: 5 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,6 @@ SparkR requires Scala 2.10 and Spark version >= 0.9.0. Current build by default
Apache Spark 1.1.0. You can also build SparkR against a
different Spark version (>= 0.9.0) by modifying `pkg/src/build.sbt`.

SparkR also requires the R package `rJava` to be installed. To install `rJava`,
you can run the following command in R:

install.packages("rJava")


### Package installation
To develop SparkR, you can build the scala package and the R package using

Expand All @@ -31,9 +25,9 @@ If you wish to try out the package directly from github, you can use [`install_g

SparkR by default uses Apache Spark 1.1.0. You can switch to a different Spark
version by setting the environment variable `SPARK_VERSION`. For example, to
use Apache Spark 1.2.0, you can run
use Apache Spark 1.3.0, you can run

SPARK_VERSION=1.2.0 ./install-dev.sh
SPARK_VERSION=1.3.0 ./install-dev.sh

SparkR by default links to Hadoop 1.0.4. To use SparkR with other Hadoop
versions, you will need to rebuild SparkR with the same version that [Spark is
Expand Down Expand Up @@ -97,8 +91,9 @@ To run one of them, use `./sparkR <filename> <args>`. For example:

./sparkR examples/pi.R local[2]

You can also run the unit-tests for SparkR by running
You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):

R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
./run-tests.sh

## Running on EC2
Expand All @@ -110,7 +105,7 @@ Instructions for running SparkR on EC2 can be found in the
Currently, SparkR supports running on YARN with the `yarn-client` mode. These steps show how to build SparkR with YARN support and run SparkR programs on a YARN cluster:

```
# assumes Java, R, rJava, yarn, spark etc. are installed on the whole cluster.
# assumes Java, R, yarn, spark etc. are installed on the whole cluster.
cd SparkR-pkg/
USE_YARN=1 SPARK_YARN_VERSION=2.4.0 SPARK_HADOOP_VERSION=2.4.0 ./install-dev.sh
```
Expand Down
3 changes: 3 additions & 0 deletions pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
exportClasses("RDD")
exportClasses("Broadcast")
exportMethods(
"aggregateByKey",
"aggregateRDD",
"cache",
"checkpoint",
Expand All @@ -19,6 +20,7 @@ exportMethods(
"flatMap",
"flatMapValues",
"fold",
"foldByKey",
"foreach",
"foreachPartition",
"fullOuterJoin",
Expand All @@ -41,6 +43,7 @@ exportMethods(
"numPartitions",
"partitionBy",
"persist",
"pipeRDD",
"reduce",
"reduceByKey",
"reduceByKeyLocally",
Expand Down
Loading

0 comments on commit 8c241a3

Please sign in to comment.