-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create CreateTableDesc in HiveQl #17
Create CreateTableDesc in HiveQl #17
Commits on May 7, 2015
-
[SPARK-7328] [MLLIB] [PYSPARK] Pyspark.mllib.linalg.Vectors: Missing …
…items Add 1. Class methods squared_dist 3. parse 4. norm 5. numNonzeros 6. copy I made a few vectorizations wrt squared_dist and dot as well. I have added support for SparseMatrix serialization in a separate PR (apache#5775) and plan to complete support for Matrices in another PR. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes apache#5872 from MechCoder/local_linalg_api and squashes the following commits: a8ff1e0 [MechCoder] minor ce3e53e [MechCoder] Add error message for parser 1bd3c04 [MechCoder] Robust parser and removed unnecessary methods f779561 [MechCoder] [SPARK-7328] Pyspark.mllib.linalg.Vectors: Missing items
Configuration menu - View commit details
-
Copy full SHA for 347a329 - Browse repository at this point
Copy the full SHA 347a329View commit details -
[SPARK-5726] [MLLIB] Elementwise (Hadamard) Vector Product Transformer
See https://issues.apache.org/jira/browse/SPARK-5726 Author: Octavian Geagla <ogeagla@gmail.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes apache#4580 from ogeagla/spark-mllib-weighting and squashes the following commits: fac12ad [Octavian Geagla] [SPARK-5726] [MLLIB] Use new createTransformFunc. 90f7e39 [Joseph K. Bradley] small cleanups 4595165 [Octavian Geagla] [SPARK-5726] [MLLIB] Remove erroneous test case. ded3ac6 [Octavian Geagla] [SPARK-5726] [MLLIB] Pass style checks. 37d4705 [Octavian Geagla] [SPARK-5726] [MLLIB] Incorporated feedback. 1dffeee [Octavian Geagla] [SPARK-5726] [MLLIB] Pass style checks. e436896 [Octavian Geagla] [SPARK-5726] [MLLIB] Remove 'TF' from 'ElementwiseProductTF' cb520e6 [Octavian Geagla] [SPARK-5726] [MLLIB] Rename HadamardProduct to ElementwiseProduct 4922722 [Octavian Geagla] [SPARK-5726] [MLLIB] Hadamard Vector Product Transformer
Configuration menu - View commit details
-
Copy full SHA for 658a478 - Browse repository at this point
Copy the full SHA 658a478View commit details -
[SPARK-6948] [MLLIB] compress vectors in VectorAssembler
The compression is based on storage. brkyvz Author: Xiangrui Meng <meng@databricks.com> Closes apache#5985 from mengxr/SPARK-6948 and squashes the following commits: df56a00 [Xiangrui Meng] update python tests 6d90d45 [Xiangrui Meng] compress vectors in VectorAssembler
Configuration menu - View commit details
-
Copy full SHA for e43803b - Browse repository at this point
Copy the full SHA e43803bView commit details -
[SQL] [MINOR] make star and multialias extend NamedExpression
`Star` and `MultiAlias` just used in `analyzer` and them will be substituted after analyze, So just like `Alias` they do not need extend `Attribute` Author: scwf <wangfei1@huawei.com> Closes apache#5928 from scwf/attribute and squashes the following commits: 73a0560 [scwf] star and multialias do not need extend attribute
Configuration menu - View commit details
-
Copy full SHA for 97d1182 - Browse repository at this point
Copy the full SHA 97d1182View commit details -
[SPARK-7277] [SQL] Throw exception if the property mapred.reduce.task…
…s is set to -1 JIRA: https://issues.apache.org/jira/browse/SPARK-7277 As automatically determining the number of reducers is not supported (`mapred.reduce.tasks` is set to `-1`), we should throw exception to users. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#5811 from viirya/no_neg_reduce_tasks and squashes the following commits: e518f96 [Liang-Chi Hsieh] Consider other wrong setting values. fd9c817 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into no_neg_reduce_tasks 4ede705 [Liang-Chi Hsieh] Throw exception instead of warning message. 68a1c70 [Liang-Chi Hsieh] Show warning message if mapred.reduce.tasks is set to -1.
Configuration menu - View commit details
-
Copy full SHA for ea3077f - Browse repository at this point
Copy the full SHA ea3077fView commit details -
[SPARK-5281] [SQL] Registering table on RDD is giving MissingRequirem…
…entError Go through the context classloader when reflecting on user types in ScalaReflection. Replaced calls to `typeOf` with `typeTag[T].in(mirror)`. The convenience method assumes all types can be found in the classloader that loaded scala-reflect (the primordial classloader). This assumption is not valid in all contexts (sbt console, Eclipse launchers). Fixed SPARK-5281 Author: Iulian Dragos <jaguarul@gmail.com> Closes apache#5981 from dragos/issue/mirrors-missing-requirement-error and squashes the following commits: d103e70 [Iulian Dragos] Go through the context classloader when reflecting on user types in ScalaReflection
Configuration menu - View commit details
-
Copy full SHA for 937ba79 - Browse repository at this point
Copy the full SHA 937ba79View commit details -
[SPARK-2155] [SQL] [WHEN D THEN E] [ELSE F] add CaseKeyWhen for "CASE…
… a WHEN b THEN c * END" Avoid translating to CaseWhen and evaluate the key expression many times. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#5979 from cloud-fan/condition and squashes the following commits: 3ce54e1 [Wenchen Fan] add CaseKeyWhen
Configuration menu - View commit details
-
Copy full SHA for 35f0173 - Browse repository at this point
Copy the full SHA 35f0173View commit details -
[SPARK-7450] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
Author: tedyu <yuzhihong@gmail.com> Closes apache#5897 from tedyu/master and squashes the following commits: 473bf9d [tedyu] Address Josh's review comments 1719c5b [tedyu] Correct upper bound in for loop b51dcaf [tedyu] Add unit test in BitSetSuite for BitSet#anySet() 83f9f87 [tedyu] Merge branch 'master' of github.com:apache/spark 817e3f9 [tedyu] Replace constant 8 with SIZE_OF_LONG 75a467b [tedyu] Correct offset for UNSAFE.getLong() 855374b [tedyu] Remove second loop since bitSetWidthInBytes is WORD aligned 093b7a4 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 63ee050 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 4ca0ef6 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 3e9b691 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
Configuration menu - View commit details
-
Copy full SHA for 88063c6 - Browse repository at this point
Copy the full SHA 88063c6View commit details
Commits on May 8, 2015
-
[SPARK-7305] [STREAMING] [WEBUI] Make BatchPage show friendly informa…
…tion when jobs are dropped by SparkListener If jobs are dropped by SparkListener, at least we can show the job ids in BatchPage. Screenshot: ![b1](https://cloud.githubusercontent.com/assets/1000778/7434968/f19aa784-eff3-11e4-8f86-36a073873574.png) Author: zsxwing <zsxwing@gmail.com> Closes apache#5840 from zsxwing/SPARK-7305 and squashes the following commits: aca0ba6 [zsxwing] Fix the code style 718765e [zsxwing] Make generateNormalJobRow private 8073b03 [zsxwing] Merge branch 'master' into SPARK-7305 83dec11 [zsxwing] Make BatchPage show friendly information when jobs are dropped by SparkListener
Configuration menu - View commit details
-
Copy full SHA for 22ab70e - Browse repository at this point
Copy the full SHA 22ab70eView commit details -
[SPARK-6908] [SQL] Use isolated Hive client
This PR switches Spark SQL's Hive support to use the isolated hive client interface introduced by apache#5851, instead of directly interacting with the client. By using this isolated client we can now allow users to dynamically configure the version of Hive that they are connecting to by setting `spark.sql.hive.metastore.version` without the need recompile. This also greatly reduces the surface area for our interaction with the hive libraries, hopefully making it easier to support other versions in the future. Jars for the desired hive version can be configured using `spark.sql.hive.metastore.jars`, which accepts the following options: - a colon-separated list of jar files or directories for hive and hadoop. - `builtin` - attempt to discover the jars that were used to load Spark SQL and use those. This option is only valid when using the execution version of Hive. - `maven` - download the correct version of hive on demand from maven. By default, `builtin` is used for Hive 13. This PR also removes the test step for building against Hive 12, as this will no longer be required to talk to Hive 12 metastores. However, the full removal of the Shim is deferred until a later PR. Remaining TODOs: - Remove the Hive Shims and inline code for Hive 13. - Several HiveCompatibility tests are not yet passing. - `nullformatCTAS` - As detailed below, we now are handling CTAS parsing ourselves instead of hacking into the Hive semantic analyzer. However, we currently only handle the common cases and not things like CTAS where the null format is specified. - `combine1` now leaks state about compression somehow, breaking all subsequent tests. As such we currently add it to the blacklist - `part_inherit_tbl_props` and `part_inherit_tbl_props_with_star` do not work anymore. We are correctly propagating the information - "load_dyn_part14.*" - These tests pass when run on their own, but fail when run with all other tests. It seems our `RESET` mechanism may not be as robust as it used to be? Other required changes: - `CreateTableAsSelect` no longer carries parts of the HiveQL AST with it through the query execution pipeline. Instead, we parse CTAS during the HiveQL conversion and construct a `HiveTable`. The full parsing here is not yet complete as detailed above in the remaining TODOs. Since the operator is Hive specific, it is moved to the hive package. - `Command` is simplified to be a trait that simply acts as a marker for a LogicalPlan that should be eagerly evaluated. Author: Michael Armbrust <michael@databricks.com> Closes apache#5876 from marmbrus/useIsolatedClient and squashes the following commits: 258d000 [Michael Armbrust] really really correct path handling e56fd4a [Michael Armbrust] getAbsolutePath 5a259f5 [Michael Armbrust] fix typos 81bb366 [Michael Armbrust] comments from vanzin 5f3945e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient 4b5cd41 [Michael Armbrust] yin's comments f5de7de [Michael Armbrust] cleanup 11e9c72 [Michael Armbrust] better coverage in versions suite 7e8f010 [Michael Armbrust] better error messages and jar handling e7b3941 [Michael Armbrust] more permisive checking for function registration da91ba7 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient 5fe5894 [Michael Armbrust] fix serialization suite 81711c4 [Michael Armbrust] Initial support for running without maven 1d8ae44 [Michael Armbrust] fix final tests? 1c50813 [Michael Armbrust] more comments a3bee70 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient a6f5df1 [Michael Armbrust] style ab07f7e [Michael Armbrust] WIP 4d8bf02 [Michael Armbrust] Remove hive 12 compilation 8843a25 [Michael Armbrust] [SPARK-6908] [SQL] Use isolated Hive client
Configuration menu - View commit details
-
Copy full SHA for cd1d411 - Browse repository at this point
Copy the full SHA cd1d411View commit details -
[SPARK-7452] [MLLIB] fix bug in topBykey and update test
the toArray function of the BoundedPriorityQueue does not necessarily preserve order. Add a counter-example as the test, which would fail the original impl. Author: Shuo Xiang <shuoxiangpub@gmail.com> Closes apache#5990 from coderxiang/topbykey-test and squashes the following commits: 98804c9 [Shuo Xiang] fix bug in topBykey and update test
Configuration menu - View commit details
-
Copy full SHA for 92f8f80 - Browse repository at this point
Copy the full SHA 92f8f80View commit details -
[SPARK-6986] [SQL] Use Serializer2 in more cases.
With apache@0a2b15c, the serialization stream and deserialization stream has enough information to determine it is handling a key-value pari, a key, or a value. It is safe to use `SparkSqlSerializer2` in more cases. Author: Yin Huai <yhuai@databricks.com> Closes apache#5849 from yhuai/serializer2MoreCases and squashes the following commits: 53a5eaa [Yin Huai] Josh's comments. 487f540 [Yin Huai] Use BufferedOutputStream. 8385f95 [Yin Huai] Always create a new row at the deserialization side to work with sort merge join. c7e2129 [Yin Huai] Update tests. 4513d13 [Yin Huai] Use Serializer2 in more places.
Configuration menu - View commit details
-
Copy full SHA for 3af423c - Browse repository at this point
Copy the full SHA 3af423cView commit details -
[SPARK-7470] [SQL] Spark shell SQLContext crashes without hive
This only happens if you have `SPARK_PREPEND_CLASSES` set. Then I built it with `build/sbt clean assembly compile` and just ran it with `bin/spark-shell`. ``` ... 15/05/07 17:07:30 INFO EventLoggingListener: Logging events to file:/tmp/spark-events/local-1431043649919 15/05/07 17:07:30 INFO SparkILoop: Created spark context.. Spark context available as sc. java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2493) at java.lang.Class.getConstructor0(Class.java:2803) ... Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 52 more <console>:10: error: not found: value sqlContext import sqlContext.implicits._ ^ <console>:10: error: not found: value sqlContext import sqlContext.sql ^ ``` yhuai marmbrus Author: Andrew Or <andrew@databricks.com> Closes apache#5997 from andrewor14/sql-shell-crash and squashes the following commits: 61147e6 [Andrew Or] Also expect NoClassDefFoundError
Configuration menu - View commit details
-
Copy full SHA for 714db2e - Browse repository at this point
Copy the full SHA 714db2eView commit details -
[SPARK-7232] [SQL] Add a Substitution batch for spark sql analyzer
Added a new batch named `Substitution` before `Resolution` batch. The motivation for this is there are kind of cases we want to do some substitution on the parsed logical plan before resolve it. Consider this two cases: 1 CTE, for cte we first build a row logical plan ``` 'With Map(q1 -> 'Subquery q1 'Project ['key] 'UnresolvedRelation [src], None) 'Project [*] 'Filter ('key = 5) 'UnresolvedRelation [q1], None ``` In `With` logicalplan here is a map stored the (`q1-> subquery`), we want first take off the with command and substitute the `q1` of `UnresolvedRelation` by the `subquery` 2 Another example is Window function, in window function user may define some windows, we also need substitute the window name of child by the concrete window. this should also done in the Substitution batch. Author: wangfei <wangfei1@huawei.com> Closes apache#5776 from scwf/addbatch and squashes the following commits: d4b962f [wangfei] added WindowsSubstitution 70f6932 [wangfei] Merge branch 'master' of https://github.com/apache/spark into addbatch ecaeafb [wangfei] address yhuai's comments 553005a [wangfei] fix test case 0c54798 [wangfei] address comments 29aaaaf [wangfei] fix compile 1c9a092 [wangfei] added Substitution bastch
Configuration menu - View commit details
-
Copy full SHA for f496bf3 - Browse repository at this point
Copy the full SHA f496bf3View commit details -
[SPARK-7392] [CORE] bugfix: Kryo buffer size cannot be larger than 2M
Author: Zhang, Liye <liye.zhang@intel.com> Closes apache#5934 from liyezhang556520/kryoBufSize and squashes the following commits: 5707e04 [Zhang, Liye] fix import order 8693288 [Zhang, Liye] replace multiplier with ByteUnit methods 9bf93e9 [Zhang, Liye] add tests d91e5ed [Zhang, Liye] change kb to mb
Configuration menu - View commit details
-
Copy full SHA for c2f0821 - Browse repository at this point
Copy the full SHA c2f0821View commit details -
[SPARK-6869] [PYSPARK] Add pyspark archives path to PYTHONPATH
Based on apache#5478 that provide a PYSPARK_ARCHIVES_PATH env. within this PR, we just should export PYSPARK_ARCHIVES_PATH=/user/spark/pyspark.zip,/user/spark/python/lib/py4j-0.8.2.1-src.zip in conf/spark-env.sh when we don't install PySpark on each node of Yarn. i run python application successfully on yarn-client and yarn-cluster with this PR. andrewor14 sryza Sephiroth-Lin Can you take a look at this?thanks. Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes apache#5580 from lianhuiwang/SPARK-6869 and squashes the following commits: 66ffa43 [Lianhui Wang] Update Client.scala c2ad0f9 [Lianhui Wang] Update Client.scala 1c8f664 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 008850a [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 f0b4ed8 [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 150907b [Lianhui Wang] Merge remote-tracking branch 'remotes/apache/master' into SPARK-6869 20402cd [Lianhui Wang] use ZipEntry 9d87c3f [Lianhui Wang] update scala style e7bd971 [Lianhui Wang] address vanzin's comments 4b8a3ed [Lianhui Wang] use pyArchivesEnvOpt e6b573b [Lianhui Wang] address vanzin's comments f11f84a [Lianhui Wang] zip pyspark archives 5192cca [Lianhui Wang] update import path 3b1e4c8 [Lianhui Wang] address tgravescs's comments 9396346 [Lianhui Wang] put zip to make-distribution.sh 0d2baf7 [Lianhui Wang] update import paths e0179be [Lianhui Wang] add zip pyspark archives in build or sparksubmit 31e8e06 [Lianhui Wang] update code style 9f31dac [Lianhui Wang] update code and add comments f72987c [Lianhui Wang] add archives path to PYTHONPATH
Configuration menu - View commit details
-
Copy full SHA for ebff732 - Browse repository at this point
Copy the full SHA ebff732View commit details -
[SPARK-3454] separate json endpoints for data in the UI
Exposes data available in the UI as json over http. Key points: * new endpoints, handled independently of existing XyzPage classes. Root entrypoint is `JsonRootResource` * Uses jersey + jackson for routing & converting POJOs into json * tests against known results in `HistoryServerSuite` * also fixes some minor issues w/ the UI -- synchronizing on access to `StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/ the way we handle retained jobs & stages. Author: Imran Rashid <irashid@cloudera.com> Closes apache#5940 from squito/SPARK-3454_better_test_files and squashes the following commits: 1a72ed6 [Imran Rashid] rats 85fdb3e [Imran Rashid] Merge branch 'no_php' into SPARK-3454 1fc65b0 [Imran Rashid] Revert "Revert "[SPARK-3454] separate json endpoints for data in the UI"" 1276900 [Imran Rashid] get rid of giant event file, replace w/ smaller one; check both shuffle read & shuffle write 4e12013 [Imran Rashid] just use test case name for expectation file name 863ef64 [Imran Rashid] rename json files to avoid strange file names and not look like php
Configuration menu - View commit details
-
Copy full SHA for c796be7 - Browse repository at this point
Copy the full SHA c796be7View commit details -
[SPARK-7383] [ML] Feature Parity in PySpark for ml.features
Implemented python wrappers for Scala functions that don't exist in `ml.features` Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#5991 from brkyvz/ml-feat-PR and squashes the following commits: adcca55 [Burak Yavuz] add regex tokenizer to __all__ b91cb44 [Burak Yavuz] addressed comments bd39fd2 [Burak Yavuz] remove addition b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
Configuration menu - View commit details
-
Copy full SHA for f5ff4a8 - Browse repository at this point
Copy the full SHA f5ff4a8View commit details -
[SPARK-7474] [MLLIB] update ParamGridBuilder doctest
Multiline commands are properly handled in this PR. oefirouz ![screen shot 2015-05-07 at 10 53 25 pm](https://cloud.githubusercontent.com/assets/829644/7531290/02ad2fd4-f50c-11e4-8c04-e58d1a61ad69.png) Author: Xiangrui Meng <meng@databricks.com> Closes apache#6001 from mengxr/SPARK-7474 and squashes the following commits: b94b11d [Xiangrui Meng] update ParamGridBuilder doctest
Configuration menu - View commit details
-
Copy full SHA for 65afd3c - Browse repository at this point
Copy the full SHA 65afd3cView commit details -
[SPARK-6824] Fill the docs for DataFrame API in SparkR
This patch also removes the RDD docs from being built as a part of roxygen just by the method to delete " ' '" of " \#' ". Author: hqzizania <qian.huang@intel.com> Author: qhuang <qian.huang@intel.com> Closes apache#5969 from hqzizania/R1 and squashes the following commits: 6d27696 [qhuang] fixes in NAMESPACE eb4b095 [qhuang] remove more docs 6394579 [qhuang] remove RDD docs in generics.R 6813860 [hqzizania] Fill the docs for DataFrame API in SparkR 857220f [hqzizania] remove the pairRDD docs from being built as a part of roxygen c045d64 [hqzizania] remove the RDD docs from being built as a part of roxygen
Configuration menu - View commit details
-
Copy full SHA for 008a60d - Browse repository at this point
Copy the full SHA 008a60dView commit details -
[SPARK-7436] Fixed instantiation of custom recovery mode factory and …
…added tests Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes apache#5977 from jacek-lewandowski/SPARK-7436 and squashes the following commits: ff0a3c2 [Jacek Lewandowski] SPARK-7436: Fixed instantiation of custom recovery mode factory and added tests
Configuration menu - View commit details
-
Copy full SHA for 35d6a99 - Browse repository at this point
Copy the full SHA 35d6a99View commit details -
[SPARK-7298] Harmonize style of new visualizations
- Colors on the timeline now match the rest of the UI - The expandable buttons to show timeline view, DAG, etc are now more visible - Timeline text is smaller - DAG visualization text and colors are more consistent throughout - Fix some JavaScript style issues - Various small fixes throughout (e.g. inconsistent capitalization, some confusing names, HTML escaping, etc) Author: Matei Zaharia <matei@databricks.com> Closes apache#5942 from mateiz/ui and squashes the following commits: def38d0 [Matei Zaharia] Add some tooltips 4c5a364 [Matei Zaharia] Reduce stage and rank separation slightly 43dcbe3 [Matei Zaharia] Some updates to DAG fac734a [Matei Zaharia] tweaks 6a6705d [Matei Zaharia] More fixes 67629f5 [Matei Zaharia] Various small tweaks
Configuration menu - View commit details
-
Copy full SHA for a1ec08f - Browse repository at this point
Copy the full SHA a1ec08fView commit details -
[SPARK-7133] [SQL] Implement struct, array, and map field accessor
It's the first step: generalize UnresolvedGetField to support all map, struct, and array TODO: add `apply` in Scala and `__getitem__` in Python, and unify the `getItem` and `getField` methods to one single API(or should we keep them for compatibility?). Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#5744 from cloud-fan/generalize and squashes the following commits: 715c589 [Wenchen Fan] address comments 7ea5b31 [Wenchen Fan] fix python test 4f0833a [Wenchen Fan] add python test f515d69 [Wenchen Fan] add apply method and test cases 8df6199 [Wenchen Fan] fix python test 239730c [Wenchen Fan] fix test compile 2a70526 [Wenchen Fan] use _bin_op in dataframe.py 6bf72bc [Wenchen Fan] address comments 3f880c3 [Wenchen Fan] add java doc ab35ab5 [Wenchen Fan] fix python test b5961a9 [Wenchen Fan] fix style c9d85f5 [Wenchen Fan] generalize UnresolvedGetField to support all map, struct, and array
Configuration menu - View commit details
-
Copy full SHA for 2d05f32 - Browse repository at this point
Copy the full SHA 2d05f32View commit details -
[SPARK-6627] Finished rename to ShuffleBlockResolver
The previous cleanup-commit for SPARK-6627 renamed ShuffleBlockManager to ShuffleBlockResolver, but didn't rename the associated subclasses and variables; this commit does that. I'm unsure whether it's ok to rename ExternalShuffleBlockManager, since that's technically a public class? cc pwendell Author: Kay Ousterhout <kayousterhout@gmail.com> Closes apache#5764 from kayousterhout/SPARK-6627 and squashes the following commits: 43add1e [Kay Ousterhout] Spacing fix 96080bf [Kay Ousterhout] Test fixes d8a5d36 [Kay Ousterhout] [SPARK-6627] Finished rename to ShuffleBlockResolver
Configuration menu - View commit details
-
Copy full SHA for 4b3bb0e - Browse repository at this point
Copy the full SHA 4b3bb0eView commit details -
[SPARK-7490] [CORE] [Minor] MapOutputTracker.deserializeMapStatuses: …
…close input streams GZIPInputStream allocates native memory that is not freed until close() or when the finalizer runs. It is best to close() these streams explicitly. stephenh made the same change for serializeMapStatuses in commit b0d884f. This is the same change for deserialize. (I ran the unit test suite! it seems to have passed. I did not make a JIRA since this seems "trivial", and the guidelines suggest it is not required for trivial changes) Author: Evan Jones <ejones@twitter.com> Closes apache#5982 from evanj/master and squashes the following commits: 0d76e85 [Evan Jones] [CORE] MapOutputTracker.deserializeMapStatuses: close input streams
Configuration menu - View commit details
-
Copy full SHA for 25889d8 - Browse repository at this point
Copy the full SHA 25889d8View commit details -
[MINOR] Ignore python/lib/pyspark.zip
Add `python/lib/pyspark.zip` to `.gitignore`. After merging apache#5580, `python/lib/pyspark.zip` will be generated when building Spark. Author: zsxwing <zsxwing@gmail.com> Closes apache#6017 from zsxwing/gitignore and squashes the following commits: 39b10c4 [zsxwing] Ignore python/lib/pyspark.zip
Configuration menu - View commit details
-
Copy full SHA for dc71e47 - Browse repository at this point
Copy the full SHA dc71e47View commit details -
[WEBUI] Remove debug feature for vis.js
`vis.min.js` refers `vis.map` and this even refers `vis.js` which is used for debug `vis.js` but this debug feature is not needed for Spark itself. This issue is really minor so I don't file this in JIRA. /CC andrewor14 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes apache#5994 from sarutak/remove-debug-feature-for-vis and squashes the following commits: 8be038f [Kousuke Saruta] Remove vis.map entry from .rat-exclude 7404945 [Kousuke Saruta] Removed debug feature for vis.js
Configuration menu - View commit details
-
Copy full SHA for c45c09b - Browse repository at this point
Copy the full SHA c45c09bView commit details -
[SPARK-7489] [SPARK SHELL] Spark shell crashes when compiled with sca…
…la 2.11 Spark shell crashes when compiled with scala 2.11 and SPARK_PREPEND_CLASSES=true There is a similar Resolved JIRA issue -SPARK-7470 and a PR apache#5997 , which handled same issue only in scala 2.10 Author: vinodkc <vinod.kc.in@gmail.com> Closes apache#6013 from vinodkc/fix_sqlcontext_exception_scala_2.11 and squashes the following commits: 119061c [vinodkc] Spark shell crashes when compiled with scala 2.11
Configuration menu - View commit details
-
Copy full SHA for 4e7360e - Browse repository at this point
Copy the full SHA 4e7360eView commit details -
[MINOR] Defeat early garbage collection of test suite variable
The JVM is free to collect references to variables that no longer participate in a computation. This simple patch adds an operation to the variable 'rdd' to ensure it is not collected early in the test suite's explicit calls to GC. ref: http://bugs.java.com/view_bug.do?bug_id=6721588 Author: Tim Ellison <t.p.ellison@gmail.com> Closes apache#6010 from tellison/master and squashes the following commits: 77d1c8f [Tim Ellison] Defeat early garbage collection of test suite variable by aggressive JVMs
Configuration menu - View commit details
-
Copy full SHA for 31da40d - Browse repository at this point
Copy the full SHA 31da40dView commit details -
[SPARK-7466] DAG visualization: fix orphan nodes
Simple fix. We were comparing an option with `null`. Before: <img src="https://issues.apache.org/jira/secure/attachment/12731383/before.png" width="250px"/> After: <img src="https://issues.apache.org/jira/secure/attachment/12731384/after.png" width="250px"/> Author: Andrew Or <andrew@databricks.com> Closes apache#6002 from andrewor14/dag-viz-orphan-nodes and squashes the following commits: a1468dc [Andrew Or] Fix null check
Andrew Or committedMay 8, 2015 Configuration menu - View commit details
-
Copy full SHA for 3b0c5e7 - Browse repository at this point
Copy the full SHA 3b0c5e7View commit details -
[MINOR] [CORE] Allow History Server to read kerberos opts from config…
… file. Order of initialization code was wrong. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#5998 from vanzin/hs-conf-fix and squashes the following commits: 00b6b6b [Marcelo Vanzin] [minor] [core] Allow History Server to read kerberos opts from config file.
Marcelo Vanzin authored and Andrew Or committedMay 8, 2015 Configuration menu - View commit details
-
Copy full SHA for 9042f8f - Browse repository at this point
Copy the full SHA 9042f8fView commit details -
[SPARK-7378] [CORE] Handle deep links to unloaded apps.
The code was treating deep links as if they were attempt IDs, so for example if you tried to load "/history/app1/jobs" directly, that would fail because the code would treat "jobs" as an attempt id. This change modifies the code to try both cases - first without an attempt id, then with it, so that deep links are handled correctly. This assumes that the links in the Spark UI do not clash with the attempt id namespace, though, which is the case for YARN at least, which is the only backend that currently publishes attempt IDs. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#5922 from vanzin/SPARK-7378 and squashes the following commits: 96f648b [Marcelo Vanzin] Fix comparison. ed3bcd4 [Marcelo Vanzin] Merge branch 'master' into SPARK-7378 23483e4 [Marcelo Vanzin] Fat fingers. b728f08 [Marcelo Vanzin] [SPARK-7378] [core] Handle deep links to unloaded apps.
Marcelo Vanzin authored and Andrew Or committedMay 8, 2015 Configuration menu - View commit details
-
Copy full SHA for 5467c34 - Browse repository at this point
Copy the full SHA 5467c34View commit details -
[SPARK-7390] [SQL] Only merge other CovarianceCounter when its count …
…is greater than zero JIRA: https://issues.apache.org/jira/browse/SPARK-7390 Also fix a minor typo. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#5931 from viirya/fix_covariancecounter and squashes the following commits: 352eda6 [Liang-Chi Hsieh] Only merge other CovarianceCounter when its count is greater than zero.
Configuration menu - View commit details
-
Copy full SHA for 90527f5 - Browse repository at this point
Copy the full SHA 90527f5View commit details -
[SPARK-4699] [SQL] Make caseSensitive configurable in spark sql analyzer
based on apache#3558 Author: Jacky Li <jacky.likun@huawei.com> Author: wangfei <wangfei1@huawei.com> Author: scwf <wangfei1@huawei.com> Closes apache#5806 from scwf/case and squashes the following commits: cd51712 [wangfei] fix compile d4b724f [wangfei] address michael's comment af512c7 [wangfei] fix conflicts 4ef1be7 [wangfei] fix conflicts 269cf21 [scwf] fix conflicts b73df6c [scwf] style issue 9e11752 [scwf] improve SimpleCatalystConf b35529e [scwf] minor style a3f7659 [scwf] remove unsed imports 2a56515 [scwf] fix conflicts 6db4bf5 [scwf] also fix for HiveContext 7fc4a98 [scwf] fix test case d5a9933 [wangfei] fix style eee75ba [wangfei] fix EmptyConf 6ef31cf [wangfei] revert pom changes 5d7c456 [wangfei] set CASE_SENSITIVE false in TestHive 966e719 [wangfei] set CASE_SENSITIVE false in hivecontext fd30e25 [wangfei] added override 69b3b70 [wangfei] fix AnalysisSuite 5472b08 [wangfei] fix compile issue 56034ca [wangfei] fix conflicts and improve for catalystconf 664d1e9 [Jacky Li] Merge branch 'master' of https://github.com/apache/spark into case 12eca9a [Jacky Li] solve conflict with master 39e369c [Jacky Li] fix confilct after DataFrame PR dee56e9 [Jacky Li] fix test case failure 05b09a3 [Jacky Li] fix conflict base on the latest master branch 73c16b1 [Jacky Li] fix bug in sql/hive 9bf4cc7 [Jacky Li] fix bug in catalyst 005c56d [Jacky Li] make SQLContext caseSensitivity configurable 6332e0f [Jacky Li] fix bug fcbf0d9 [Jacky Li] fix scalastyle check e7bca31 [Jacky Li] make caseSensitive configuration in Analyzer and Catalog 91b1b96 [Jacky Li] make caseSensitive configurable in Analyzer f57f15c [Jacky Li] add testcase 578d167 [Jacky Li] make caseSensitive configurable
Configuration menu - View commit details
-
Copy full SHA for 6dad76e - Browse repository at this point
Copy the full SHA 6dad76eView commit details -
[SPARK-5913] [MLLIB] Python API for ChiSqSelector
Add a Python API for mllib.feature.ChiSqSelector https://issues.apache.org/jira/browse/SPARK-5913 Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#5939 from yanboliang/spark-5913 and squashes the following commits: cdaac99 [Yanbo Liang] Python API for ChiSqSelector
Configuration menu - View commit details
-
Copy full SHA for 35c9599 - Browse repository at this point
Copy the full SHA 35c9599View commit details -
I needed to run some d2 instances, so I updated the spark_ec2.py accordingly Author: Brendan Collins <bcollins@blueraster.com> Closes apache#6014 from brendancol/ec2-instance-types-update and squashes the following commits: d7b4191 [Brendan Collins] Merge branch 'ec2-instance-types-update' of github.com:brendancol/spark into ec2-instance-types-update 6366c45 [Brendan Collins] added back cc1.4xlarge fc2931f [Brendan Collins] updated ec2 instance types 80c2aa6 [Brendan Collins] vertically aligned whitespace 85c6236 [Brendan Collins] vertically aligned whitespace 1657c26 [Brendan Collins] updated ec2 instance types
Configuration menu - View commit details
-
Copy full SHA for 1c78f68 - Browse repository at this point
Copy the full SHA 1c78f68View commit details
Commits on May 9, 2015
-
[SPARK-6955] Perform port retries at NettyBlockTransferService level
Currently we're doing port retries in the TransportServer level, but this is not specified by the TransportContext API and it has other further-reaching impacts like causing undesirable behavior for the Yarn and Standalone shuffle services. Author: Aaron Davidson <aaron@databricks.com> Closes apache#5575 from aarondav/port-bind and squashes the following commits: 3c2d6ed [Aaron Davidson] Oops, never do it. a5d9432 [Aaron Davidson] Remove shouldHostShuffleServiceIfEnabled e901eb2 [Aaron Davidson] fix local-cluster mode for ExternalShuffleServiceSuite 59e5e38 [Aaron Davidson] [SPARK-6955] Perform port retries at NettyBlockTransferService level
Configuration menu - View commit details
-
Copy full SHA for ffdc40c - Browse repository at this point
Copy the full SHA ffdc40cView commit details -
[SPARK-7469] [SQL] DAG visualization: show SQL query operators
The DAG visualization currently displays only low-level Spark primitives (e.g. `map`, `reduceByKey`, `filter` etc.). For SQL, these aren't particularly useful. Instead, we should display higher level physical operators (e.g. `Filter`, `Exchange`, `ShuffleHashJoin`). cc marmbrus ----------------- **Before** <img src="https://issues.apache.org/jira/secure/attachment/12731586/before.png" width="600px"/> ----------------- **After** (Pay attention to the words) <img src="https://issues.apache.org/jira/secure/attachment/12731587/after.png" width="600px"/> ----------------- Author: Andrew Or <andrew@databricks.com> Closes apache#5999 from andrewor14/dag-viz-sql and squashes the following commits: 0db23a4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-sql 1e211db [Andrew Or] Update comment 0d49fd6 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-sql ffd237a [Andrew Or] Fix style 202dac1 [Andrew Or] Make ignoreParent false by default e61b1ab [Andrew Or] Visualize SQL operators, not low-level Spark primitives 569034a [Andrew Or] Add a flag to ignore parent settings and scopes
Andrew Or committedMay 9, 2015 Configuration menu - View commit details
-
Copy full SHA for bd61f07 - Browse repository at this point
Copy the full SHA bd61f07View commit details -
[SPARK-7237] Clean function in several RDD methods
Author: tedyu <yuzhihong@gmail.com> Closes apache#5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods
Configuration menu - View commit details
-
Copy full SHA for 54e6fa0 - Browse repository at this point
Copy the full SHA 54e6fa0View commit details -
[SPARK-7488] [ML] Feature Parity in PySpark for ml.recommendation
Adds Python Api for `ALS` under `ml.recommendation` in PySpark. Also adds seed as a settable parameter in the Scala Implementation of ALS. Author: Burak Yavuz <brkyvz@gmail.com> Closes apache#6015 from brkyvz/ml-rec and squashes the following commits: be6e931 [Burak Yavuz] addressed comments eaed879 [Burak Yavuz] readd numFeatures 0bd66b1 [Burak Yavuz] fixed seed 7f6d964 [Burak Yavuz] merged master 52e2bda [Burak Yavuz] added ALS
Configuration menu - View commit details
-
Copy full SHA for 84bf931 - Browse repository at this point
Copy the full SHA 84bf931View commit details -
[SPARK-7451] [YARN] Preemption of executors is counted as failure cau…
…sing Spark job to fail Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on. andrewor14 Author: Ashwin Shankar <ashankar@netflix.com> Closes apache#5993 from ashwinshankar77/SPARK-7451 and squashes the following commits: 90900cf [Ashwin Shankar] Fix log info message cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure
Configuration menu - View commit details
-
Copy full SHA for b6c797b - Browse repository at this point
Copy the full SHA b6c797bView commit details -
[SPARK-7231] [SPARKR] Changes to make SparkR DataFrame dplyr friendly.
Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr Using these changes we can pretty much run the examples as described in http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html with the same syntax The only thing missing in SparkR is auto resolving column names when used in an expression i.e. making something like `select(flights, delay)` works in dply but we right now need `select(flights, flights$delay)` or `select(flights, "delay")`. But this is a complicated change and I'll file a new issue for it cc sun-rui rxin Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes apache#6005 from shivaram/sparkr-df-api and squashes the following commits: 5e0716a [Shivaram Venkataraman] Fix some roxygen bugs 1254953 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into sparkr-df-api 0521149 [Shivaram Venkataraman] Changes to make SparkR DataFrame dplyr friendly. Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr
Configuration menu - View commit details
-
Copy full SHA for 0a901dd - Browse repository at this point
Copy the full SHA 0a901ddView commit details -
[SPARK-7375] [SQL] Avoid row copying in exchange when sort.serializeM…
…apOutputs takes effect This patch refactors the SQL `Exchange` operator's logic for determining whether map outputs need to be copied before being shuffled. As part of this change, we'll now avoid unnecessary copies in cases where sort-based shuffle operates on serialized map outputs (as in apache#4450 / SPARK-4550). This patch also includes a change to copy the input to RangePartitioner partition bounds calculation, which is necessary because this calculation buffers mutable Java objects. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5948) <!-- Reviewable:end --> Author: Josh Rosen <joshrosen@databricks.com> Closes apache#5948 from JoshRosen/SPARK-7375 and squashes the following commits: f305ff3 [Josh Rosen] Reduce scope of some variables in Exchange 899e1d7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-7375 6a6bfce [Josh Rosen] Fix issue related to RangePartitioning: ad006a4 [Josh Rosen] [SPARK-7375] Avoid defensive copying in exchange operator when sort.serializeMapOutputs takes effect.
Configuration menu - View commit details
-
Copy full SHA for cde5483 - Browse repository at this point
Copy the full SHA cde5483View commit details -
[SPARK-7262] [ML] Binary LogisticRegression with L1/L2 (elastic net) …
…using OWLQN in new ML package 1) Handle scaling and addBias internally. 2) L1/L2 elasticnet using OWLQN optimizer. Author: DB Tsai <dbt@netflix.com> Closes apache#5967 from dbtsai/lor and squashes the following commits: fa029bb [DB Tsai] made the bound smaller 0806002 [DB Tsai] better initial intercept and more test 5c31824 [DB Tsai] fix import c387e25 [DB Tsai] Merge branch 'master' into lor c84e931 [DB Tsai] Made MultiClassSummarizer private f98e711 [DB Tsai] address feedback a784321 [DB Tsai] fix style 8ec65d2 [DB Tsai] remove new line f3f8c88 [DB Tsai] add more tests and they match R which is good. fix a bug 34705bc [DB Tsai] first commit
Configuration menu - View commit details
-
Copy full SHA for 86ef4cf - Browse repository at this point
Copy the full SHA 86ef4cfView commit details -
[SPARK-7498] [ML] removed varargs annotation from Params.setDefaults
In SPARK-7429 and PR apache#5960, I added the varargs annotation to Params.setDefault which takes a variable number of ParamPairs. It worked locally and on Jenkins for me. However, mengxr reported issues compiling on his machine. So I'm reverting the change introduced in apache#5960 by removing varargs. Author: Joseph K. Bradley <joseph@databricks.com> Closes apache#6021 from jkbradley/revert-varargs and squashes the following commits: 098ed39 [Joseph K. Bradley] removed varargs annotation from Params.setDefaults taking multiple ParamPairs
Configuration menu - View commit details
-
Copy full SHA for 2992623 - Browse repository at this point
Copy the full SHA 2992623View commit details -
[SPARK-7438] [SPARK CORE] Fixed validation of relativeSD in countAppr…
…oxDistinct Author: Vinod K C <vinod.kc@huawei.com> Closes apache#5974 from vinodkc/fix_countApproxDistinct_Validation and squashes the following commits: 3a3d59c [Vinod K C] Reverted removal of validation relativeSD<0.000017 799976e [Vinod K C] Removed testcase to assert IAE when relativeSD>3.7 8ddbfae [Vinod K C] Remove blank line b1b00a3 [Vinod K C] Removed relativeSD validation from python API,RDD.scala will do validation 122d378 [Vinod K C] Fixed validation of relativeSD in countApproxDistinct
Configuration menu - View commit details
-
Copy full SHA for dda6d9f - Browse repository at this point
Copy the full SHA dda6d9fView commit details -
[SPARK-7403] [WEBUI] Link URL in objects on Timeline View is wrong in…
… case of running on YARN When we use Spark on YARN and have AllJobPage via ResourceManager's proxy, the link URL in objects which represent each job on timeline view is wrong. In timeline-view.js, the link is generated as follows. ``` window.location.href = "job/?id=" + getJobId(this); ``` This assumes the URL displayed on the web browser ends with "jobs/" but when we access AllJobPage via the proxy, the url displayed does not end with "jobs/" The proxy doesn't return status code 301 or 302 so the url displayed still indicates the base url, not "/jobs" even though displaying AllJobPages. ![2015-05-07 3 34 37](https://cloud.githubusercontent.com/assets/4736016/7501079/a8507ad6-f46c-11e4-9bed-62abea170f4c.png) Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes apache#5947 from sarutak/fix-link-in-timeline and squashes the following commits: aaf40e1 [Kousuke Saruta] Added Copyright for vis.js 01bee7b [Kousuke Saruta] Fixed timeline-view.js in order to get correct href
Configuration menu - View commit details
-
Copy full SHA for 12b95ab - Browse repository at this point
Copy the full SHA 12b95abView commit details -
[STREAMING] [DOCS] Fix wrong url about API docs of StreamingListener
A little fix about wrong url of the API document. (org.apache.spark.streaming.scheduler.StreamingListener) Author: dobashim <dobashim@oss.nttdata.co.jp> Closes apache#6024 from dobashim/master and squashes the following commits: ac9a955 [dobashim] [STREAMING][DOCS] Fix wrong url about API docs of StreamingListener
Configuration menu - View commit details
-
Copy full SHA for 7d0f172 - Browse repository at this point
Copy the full SHA 7d0f172View commit details -
Configuration menu - View commit details
-
Copy full SHA for d25a4aa - Browse repository at this point
Copy the full SHA d25a4aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for d166afa - Browse repository at this point
Copy the full SHA d166afaView commit details -
Configuration menu - View commit details
-
Copy full SHA for f4e243f - Browse repository at this point
Copy the full SHA f4e243fView commit details
Commits on May 10, 2015
-
Configuration menu - View commit details
-
Copy full SHA for a8260e8 - Browse repository at this point
Copy the full SHA a8260e8View commit details -
Configuration menu - View commit details
-
Copy full SHA for f87ace6 - Browse repository at this point
Copy the full SHA f87ace6View commit details