Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark job with eventhubs in Databricks ends: Failed to find data source: eventhubs #284

Closed
ritazh opened this issue Mar 20, 2018 · 8 comments
Assignees

Comments

@ritazh
Copy link
Member

ritazh commented Mar 20, 2018

Bug Report:
When running the following spark job in Databricks:

val df = spark
  .readStream
  .format("eventhubs")
  .options(ehConf.toMap)
  • Actual behavior
ERROR Uncaught throwable from user code: java.lang.ClassNotFoundException: Failed to find data source: eventhubs. Please find packages at http://spark.apache.org/third-party-projects.html
  • Expected behavior
    Should be able to find the class.

  • Spark version
    2.3.0

cc @erikschlegel

@ritazh
Copy link
Member Author

ritazh commented Mar 20, 2018

This was resolved by calling .format("org.apache.spark.sql.eventhubs.EventHubsSourceProvider") directly.

@sabeegrewal
Copy link
Contributor

Hey @ritazh, this is an issue with how your Databricks cluster is configured. The most common cases have been when there are multiple versions of the library attached to your cluster. I'd recommend:

  • Detaching (and not deleting the Maven package)
  • Restarting your cluster
  • Confirm in the "Libraries" tab of your cluster that you see no mention of the Maven package
  • Re-attach a single package

Then format("eventhubs") will work 👍 It works for me on DBR 3.5 and 4.0!

@sabeegrewal
Copy link
Contributor

Thanks for your interest in the connector :) Please let me know if anything else comes up! If not, feel free to close out the issue.

@sabeegrewal sabeegrewal self-assigned this Mar 21, 2018
@ManjunathGuntha
Copy link

hey @sabeegrewal , I am facing the same above issue... In my case I am creating the jar file in local using Intellij and running the file in databricks cluster using spark-submit... can you please let me know what could be the issue??

@sabeegrewal
Copy link
Contributor

@ManjunathGuntha do the steps I listed above...just do it with the JAR you're building locally. You have too many "EventHubsSourceProvider"s available on the cluster.

@ganges-morekonda
Copy link

I'm experiencing the same issue. I have trying to connect to IoT Hub (Eventhub-compatible endpoint). I'm trying from the Jupyter Notebook.

ehConf = { 'eventhubs.connectionString' : eventHubNSConnStr }
df = spark.readStream.format('eventhubs').options(**ehConf).load()

An error occurred while calling o196.load.
: java.lang.ClassNotFoundException: Failed to find data source: eventhubs. Please find packages at http://spark.apache.org/third-party-projects.html

Any help will be much appreciated!

@kawofong
Copy link

@ganges-morekonda
Your cluster cannot find the appropriate packages. I resolve the very same issue by installing com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.9 on my cluster. Hope this helps!

@ytaous
Copy link

ytaous commented Oct 11, 2019

I am having same issue. The only workaround is the one posted by Rita - calling .format("org.apache.spark.sql.eventhubs.EventHubsSourceProvider") directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants