Skip to content

Lightweight approach to set up Jupyter for pySpark on Hadoop/YARN cluster

Notifications You must be signed in to change notification settings

ledovsky/set-up-jupyter-for-pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

A lightweght approach to start a Jupyter Notebook instance on a Spark cluster edge node

Create virtualenv

# use a directory you like
cd ~
# strongly recomend to create virtualenv based on your main spark python
pip install --user virtualenv
python -m virtualenv venv
source venv/bin/activate
pip install jupyter
pip install pandas
# you can add any libs you want. but they will be available only

Clone this repo to your edge node

git clone https://github.com/ledovsky/set-up-jupyter-for-pyspark.git

Check out start_jupyter.sh script. Set up your Jupyter working directory and SPARK_HOME

Start a script afterwards

cd ~/set-up-jupyter-for-pyspark
./start_jupyter.sh

Open a notebook. Set up proper PYSPARK_PYTHON (python for executors - should be present on each node) and JAVA_HOME if not set

Try the sample cells - everything should work

About

Lightweight approach to set up Jupyter for pySpark on Hadoop/YARN cluster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published