Skip to content

This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language

Notifications You must be signed in to change notification settings

spark-examples/spark-scala-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Explanation of all Spark SQL, RDD, DataFrame and Dataset examples present on this project are available at https://sparkbyexamples.com/ , All these examples are coded in Scala language and tested in our development environment.

Table of Contents (Spark Examples in Scala)

Spark RDD Examples

  • Create a Spark RDD using Parallelize
  • Spark – Read multiple text files into single RDD?
  • Spark load CSV file into RDD
  • Different ways to create Spark RDD
  • Spark – How to create an empty RDD?
  • Spark RDD Transformations with examples
  • Spark RDD Actions with examples
  • Spark Pair RDD Functions
  • Spark Repartition() vs Coalesce()
  • Spark Shuffle Partitions
  • Spark Persistence Storage Levels
  • Spark RDD Cache and Persist with Example
  • Spark Broadcast Variables
  • Spark Accumulators Explained
  • Convert Spark RDD to DataFrame | Dataset

Spark SQL Tutorial

  • Spark Create DataFrame with Examples
  • Spark DataFrame withColumn
  • Ways to Rename column on Spark DataFrame
  • Spark – How to Drop a DataFrame/Dataset column
  • Working with Spark DataFrame Where Filter
  • Spark SQL “case when” and “when otherwise”
  • Collect() – Retrieve data from Spark RDD/DataFrame
  • Spark – How to remove duplicate rows
  • How to Pivot and Unpivot a Spark DataFrame
  • Spark SQL Data Types with Examples
  • Spark SQL StructType & StructField with examples
  • Spark schema – explained with examples
  • Spark Groupby Example with DataFrame
  • Spark – How to Sort DataFrame column explained
  • Spark SQL Join Types with examples
  • Spark DataFrame Union and UnionAll
  • Spark map vs mapPartitions transformation
  • Spark foreachPartition vs foreach | what to use?
  • Spark DataFrame Cache and Persist Explained
  • Spark SQL UDF (User Defined Functions
  • Spark SQL DataFrame Array (ArrayType) Column
  • Working with Spark DataFrame Map (MapType) column
  • Spark SQL – Flatten Nested Struct column
  • Spark – Flatten nested array to single array column
  • [Spark explode array and map columns to rows

Spark SQL Functions

  • Spark SQL String Functions Explained
  • Spark SQL Date and Time Functions
  • Spark SQL Array functions complete list
  • Spark SQL Map functions – complete list
  • Spark SQL Sort functions – complete list
  • Spark SQL Aggregate Functions
  • Spark Window Functions with Examples

Spark Data Source API

  • Spark Read CSV file into DataFrame
  • Spark Read and Write JSON file into DataFrame
  • Spark Read and Write Apache Parquet
  • Spark Read XML file using Databricks API
  • Read & Write Avro files using Spark DataFrame
  • Using Avro Data Files From Spark SQL 2.3.x or earlier
  • Spark Read from & Write to HBase table | Example
  • Create Spark DataFrame from HBase using Hortonworks
  • Spark Read ORC file into DataFrame
  • Spark 3.0 Read Binary File into DataFrame

Spark Streaming & Kafka

  • Spark Streaming – Different Output modes explained
  • Spark Streaming files from a directory
  • Spark Streaming – Reading data from TCP Socket
  • Spark Streaming with Kafka Example
  • Spark Streaming – Kafka messages in Avro format
  • Spark SQL Batch Processing – Produce and Consume Apache Kafka Topic

About

This project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages