AWS EMR Spark

Steps for Deploying Spark App on Amazon EMR

Step 1

Test your application using Scala - ide using sample data.
Step 2
- Remove all local path and Spark Context master local reference from Scala file.
- Use SBT to package your application
  - Create and empty directory sbt
  - sbt new scala/hello-world.g8.
  - add your scale files under sbt\movies\src\main\scala directory
  - edit sbt\movies\built.sbt
- ```
name := "MostRatedMovies100k"

version := "1.0"

organization := "com.forsynet.sparkemr"

scalaVersion := "2.11.12"

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.5" % "provided"
)
```
- at command prompt run command "sbt assembly"
- This will create jar files with all dependencies under sbt\movies\target\scala-2.11\MostRatedMovies100k-assembly-1.0.jar
Step 3
- upload the jar files and the data files to s3 bucket .Use UI or below cli commands
- ```
aws configure
aws s3api create-bucket --bucket rupeshemr
aws s3 sync data/
```
  - Verify the data is uploaded to s3 bucket

Create an Amazon EMR cluster

Step 4
Use the aws cli or the UI to create the cluster.

aws emr create-cluster \
    --instance-type m3.xlarge \
    --release-label emr-5.10.0 \
    --service-role EMR_DefaultRole \
    --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \
    --security-configuration mySecurityConfiguration \
    --kerberos-attributes file://kerberos_attributes.json

Verify Cluster Creation
Step5
Add SSH Inbound rule to security groups

Step 6
ssh into emr master node
copy the jar file from s3 bucker

aws s3 cp s3://rupeshemr/MostRatedMovies-1.0.jar ./

Submit the Spark Job

Step 7
- ```
spark-submit MostRatedMovies-1.0.jar
```
Verify Results of top rated movies
Use the Spark History Server UI to see the Spark Job History for submitted job

Verify Amazon s3 bucket for logs created for the job

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS EMR Spark

Create an Amazon EMR cluster

Submit the Spark Job

About

Releases

Packages

rupeshtr78/aws-emr

Folders and files

Latest commit

History

Repository files navigation

AWS EMR Spark

Create an Amazon EMR cluster

Submit the Spark Job

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages