Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Info Logs leak the eventhub connection string #484

Closed
alex-shchetkov opened this issue Apr 1, 2020 · 2 comments
Closed

Info Logs leak the eventhub connection string #484

alex-shchetkov opened this issue Apr 1, 2020 · 2 comments
Assignees

Comments

@alex-shchetkov
Copy link

Summary: Full eventhub connection string is being printed in the INFO log during Microbatch Execution

  • Actual behavior:
    Logger - org.apache.spark.sql.execution.streaming.MicroBatchExecution
    Message - "Using Source [org.apache.spark.sql.eventhubs.EventHubsSource@1e59e6ae] from DataSourceV1 named 'eventhubs' [DataSource(org.apache.spark.sql.SparkSession@b4c1fb2,eventhubs,List(),None,List(),None,Map(eventhubs.consumerGroup -> my-group, eventhubs.connectionString -> Endpoint=sb://my-eventhub.servicebus.windows.net/;SharedAccessKeyName=msl;SharedAccessKey=actual-key-is-here;EntityPath=topic-name, eventhubs.startingPosition -> {"enqueuedTime": "2020-03-30T00:00:00.0000Z", "isInclusive": true}),None)]"

  • Expected behavior
    The connection string should not be printed in any logs at any time

  • Spark version
    2.4.4

  • spark-eventhubs artifactId and version
    com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.14.1

@nyaghma nyaghma self-assigned this Apr 3, 2020
@sjkwak
Copy link
Member

sjkwak commented Apr 4, 2020

@alex-shchetkov - the log message comes from the Spark runtime and I believe the logging was added in Spark runtime 2.4 and it logs the contents of options (for e.g., eventhub config) which is set when initializing a spark session.

@spzSource
Copy link

spzSource commented Jan 16, 2021

Hi @nyaghma and @sjkwak

I have to raise concern about #491

In the PR connection string are encrypted using library version as a key for AES algorithm. So it's completely insecure, as everybody can decrypt the value. Hence applied fix do not solves initial issue, but just hides it.

Also by forcing user for encrypting connection string, it adds additional troubles when using this library under .NET for Apache Spark (a.k.a .net backend), where it's not possible to directly access to JVM methods.

To be precise, following code doesn't work in .NET backed for Spark:

sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)

So that it forces to re-implement encrypt logic in .NET (and other spark backends), which is quite odd..

My view is that correct fix would be to apply changes for Spark runtime, so that spark itself should not log sensitive content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants