-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a getting started on K8s page #1932
Conversation
This reverts commit f80cb6a. Rollback changes to README
Add Version Matrix in a spreadsheet.
Add Version Matrix in a spreadsheet. Signed-off-by: Hao Zhu <viadeazhu@gmail.com>
This reverts commit 744bb42.
This is a new getting-started-kubernetes.md with more examples and details for a quick-start.
Fixed one typo
I assume this is to replace https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#running-on-kubernetes ? so we probably need to remove that section. |
Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com>
Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
reword Co-authored-by: Jason Lowe <jlowe@nvidia.com>
ok I think that doc is fine but how is this fitting in with existing docs: why don't we just remove it here? or were you going to do that in a followup - if followup needs to be done right away so we don't forget about it. |
build |
I was able to go through the documentation and create a k8s cluster, but it looks like the GPU operator plugin isn't working properly. I tried both the deepops deployment several times (on a desktop) and the alternative option of using kubeadmin+helm (on top of a new Ubuntu 18.04 install on AWS EC2). I had trouble getting ansible to authenticate on EC2 so I went ahead with kubeadmin+helm. Here are similar bug reports to what I encountered with the GPU operator plugin: NVIDIA/gpu-operator#83, NVIDIA/gpu-operator#166. Spark-shell can be started, but it complains about no resources being available when running GPU enabled tasks. Spark worker web UI on port 4040 confirms that no resources were assigned. Are there alternatives to using GPU-Operator? I was able to confirm that my dockerfile works with CPU only spark run on k8s prior to adding the GPU operator plugin. |
As discussed, how to install a working K8s cluster with nvidia GPU support is not in this scope of this article. Regarding the error message such as Of course, you can refer to k8s doc for more k8s cluster management tasks such as how to create service account, how to create ClusterRoleBinding, how to assign Resource Quotas etc. |
First of all, GPU operator is not a must for testing this Spark feature. Today I also tried to use DeepOps to deploy a K8s cluster on a single EC2 machine, and it works fine for me as well. For example, when I was testing installing K8s using DeepOPS, assume we use all default settings,
Here the Eventually the root cause is below entry was added by Ansible in /etc/hosts:
After removing above entries in /etc/hosts, I am pretty sure there could be more K8s related issues but here is just my quick summary for using DeepOPS to install a K8s cluster with a single EC2 instance. |
* doing some test * Revert "doing some test" This reverts commit f80cb6a. Rollback changes to README * Update download.md Add Version Matrix in a spreadsheet. * Update download.md Add Version Matrix in a spreadsheet. Signed-off-by: Hao Zhu <viadeazhu@gmail.com> * Revert "Update download.md" This reverts commit 744bb42. * Create getting-started-kubernetes.md This is a new getting-started-kubernetes.md with more examples and details for a quick-start. * Fixed one typo Fixed one typo * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * changed nav_order to 6 * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md re-word. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md fix typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * create "To delete the Driver POD" section create "To delete the Driver POD" section * Add a note. "This is a quick start guide which uses default settings which may be different from your cluster." * add spark.kubernetes.memoryOverheadFactor=0.6 add spark.kubernetes.memoryOverheadFactor=0.6 * Changed to spark.executor.memoryOverhead=3G Changed to spark.executor.memoryOverhead=3G * Added a note to explain the jar location * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * reword reword * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com>
* doing some test * Revert "doing some test" This reverts commit f80cb6a. Rollback changes to README * Update download.md Add Version Matrix in a spreadsheet. * Update download.md Add Version Matrix in a spreadsheet. Signed-off-by: Hao Zhu <viadeazhu@gmail.com> * Revert "Update download.md" This reverts commit 744bb42. * Create getting-started-kubernetes.md This is a new getting-started-kubernetes.md with more examples and details for a quick-start. * Fixed one typo Fixed one typo * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * changed nav_order to 6 * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md Fixed typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md re-word. Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md fix typo Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * create "To delete the Driver POD" section create "To delete the Driver POD" section * Add a note. "This is a quick start guide which uses default settings which may be different from your cluster." * add spark.kubernetes.memoryOverheadFactor=0.6 add spark.kubernetes.memoryOverheadFactor=0.6 * Changed to spark.executor.memoryOverhead=3G Changed to spark.executor.memoryOverhead=3G * Added a note to explain the jar location * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * reword reword * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> * Update docs/get-started/getting-started-kubernetes.md reword Co-authored-by: Jason Lowe <jlowe@nvidia.com> Co-authored-by: Jason Lowe <jlowe@nvidia.com>
No description provided.