Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #6754] Improve the performance of ranger access requests #6758

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

wankunde
Copy link

@wankunde wankunde commented Oct 18, 2024

🔍 Description

Issue References 🔗

This pull request fixes #6754

Describe Your Solution 🔧

Right now in RuleAuthorization we use an ArrayBuffer to collect access requests, which is very slow because each new PrivilegeObject needs to be compared with all access requests.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Before

Behavior With This Pull Request 🎉

After

Related Unit Tests

Add benchmark
Before

Java HotSpot(TM) 64-Bit Server VM 17.0.12+8-LTS-286 on Mac OS X 14.6
Apple M3
Collecting files ranger access request:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
50000 files benchmark                            181863         189434         NaN         -0.0 -181863368958.0       1.0X

After

Java HotSpot(TM) 64-Bit Server VM 17.0.12+8-LTS-286 on Mac OS X 14.6
Apple M3
Collecting files ranger access request:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
50000 files benchmark                              1332           1372          44         -0.0 -1331675209.0       1.0X

Checklist 📝

Be nice. Be informative.

extensions/spark/kyuubi-spark-authz/pom.xml Outdated Show resolved Hide resolved
extensions/spark/kyuubi-spark-authz/pom.xml Outdated Show resolved Hide resolved
@@ -34,23 +35,22 @@ case class RuleAuthorization(spark: SparkSession) extends Authorization(spark) {
val auditHandler = new SparkRangerAuditHandler
val ugi = getAuthzUgi(spark.sparkContext)
val (inputs, outputs, opType) = PrivilegesBuilder.build(plan, spark)
val requests = new ArrayBuffer[AccessRequest]()
var requests = new HashMap[(AccessResource, AccessType), AccessRequest]()
Copy link
Member

@pan3793 pan3793 Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why immutable.HashMap instead of mutable.HashMap is used here?

BTW, Scala 2.13 changed the alias of HashMap from mutable.HashMap to immutable.HashMap, to make the code clear, please write new immutable.HashMap or new mutable.HashMap inline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and please leave a few comments here to explain the background

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and, was immutable.Map.newBuilder considered?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Change to mutable.HashMap
  2. Added some comments
  3. I'm not sure if immutable.Map.newBuilder is better than mutable.HashMap?

}

// scalastyle:on
test("KYUUBI #6754: improve the performance of ranger access requests") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benchmark cases are not expected to run in regular test workflow, I remember we ported Spark benchmark framework, can we write it in that style?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added benchmark, thanks

@wankunde wankunde changed the title [KYUUBI #6754] Improve the performance of ranger access requests [WIP][KYUUBI #6754] Improve the performance of ranger access requests Oct 20, 2024
@wankunde wankunde changed the title [WIP][KYUUBI #6754] Improve the performance of ranger access requests [KYUUBI #6754] Improve the performance of ranger access requests Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Improve the performance of ranger access requests
2 participants