Add SchedulerConfig for the scheduler configurations, like event_loop_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds #472

yahoNanJing · 2022-10-28T13:04:29Z

Which issue does this PR close?

Closes #469.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

yahoNanJing · 2022-10-28T13:08:18Z

Hi @andygrove, @Dandandan, @avantgardnerio, could you help review this PR which refactors the scheduler configurations so that we can add more scheduler related configurations easily in the future?

…_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds

avantgardnerio

I reviewed the config changes, they seem to solve an immediate problem without making the existing pattern any worse
I didn't review the functional change - cleaning up job stuff, I assume that works
Let's discuss the runtime HashMap based config somewhere and if we all agree on a resolution, update this issue

avantgardnerio · 2022-10-29T01:40:18Z

ballista/core/src/config/query.rs

+
+impl BallistaConfig {
+    /// Create a configuration builder
+    pub fn builder() -> BallistaConfigBuilder {


This is more of a philosophical nit: I don't love the builder patterns that are propagating through the codebase. AFAICT, builders came from Java and were built on nullability and mutable state, due to the fact that it didn't have a using statement or a spread operator.

In Rust we have the equivalent of spread, so I think we could just do:

let my_config = { param_a, param_b, .. ValidConfig };

To accomplish the same thing in a much more concise manner.

avantgardnerio · 2022-10-29T01:45:33Z

ballista/core/src/config.rs

            default_value,
        }
    }
 }

-/// Ballista configuration builder
-pub struct BallistaConfigBuilder {
-    settings: HashMap<String, String>,


Oh, it looks like this was already a HashMap based thing. I guess this PR is refactoring that pattern?

avantgardnerio · 2022-10-29T01:49:10Z

ballista/core/src/config.rs


-    /// All available configuration options
-    pub fn valid_entries() -> HashMap<String, ConfigEntry> {


I feel like it would make more sense to just serde this once from untyped things like env vars into a typed struct and use that everywhere. A quick googling reveals: https://github.com/softprops/envy

avantgardnerio · 2022-10-29T01:52:05Z

ballista/scheduler/src/main.rs

@@ -74,7 +79,7 @@ async fn start_server(
    addr: SocketAddr,
    scheduling_policy: TaskSchedulingPolicy,
    slots_policy: SlotsPolicy,
-    event_loop_buffer_size: usize,


This looks like the actual goal of the PR?

To aid in PR review, I think it can really help if the author comments on a few of their own lines of code to highlight things like this

@daltonmadolin I think you're currently working on the same thing, PTAL

If merging this PR makes things no worse, and fixes the 8 arguments clippy issue, I'm fine with it.

yahoNanJing · 2022-10-29T03:18:28Z

Thanks @avantgardnerio for your comment. I'll learn about https://github.com/softprops/envy later. For the concern of using HashMap, I think we can raise another issue or PR if serde is better.

andygrove · 2022-10-30T16:50:18Z

ballista/core/src/config/query.rs

+
+/// Ballista configuration, mainly for the query
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct BallistaConfig {


What is the intent of moving the configs into a query namespace? Not all configs will be related to the execution of a single query.

The existing configurations seem to be all related to the query execution, which may be used by both the scheduler and executor. The reason to add the query namespace is to distinguish it from the new added SchedulerConfig, since the name BallistaConfig may be too general and may give users wrong impression that it includes the SchedulerConfig.

yahoNanJing · 2022-10-31T16:14:20Z

Hi @andygrove, do you have any more concerns?

andygrove · 2022-10-31T17:42:28Z

@yahoNanJing I don't see any updates in the user guide page that covers configuration. I think it would make the review easier if I can see the new docs explaining how to use these configs.

…sult_route_endpoint and add it to the SchedulerConfig

yahoNanJing · 2022-11-01T18:53:40Z

Hi @avantgardnerio and @DaltonModlin, with the current configuration refactoring, just add the advertise_endpoint to the scheduler config. Could you help review the related changes?

yahoNanJing · 2022-11-01T19:10:22Z

Hi @andygrove, the scheduler config just be added to the user guide.

avantgardnerio

Does this turn the command line argument into --advertise-flight-result-route-endpoint? That seems overly verbose for a command line argument name. --help will show the full description. I think we could safely get away with --advertise-flight-endpoint at most.

andygrove · 2022-11-01T19:19:50Z

docs/source/user-guide/configs.md

+
+| key                                            | type      | default      | description                                                                                                                                                                      |
+|------------------------------------------------|-----------|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| scheduler_policy                               | Utf8      | pull-staged  | Sets the scheduing policy for the scheduler, possible values: pull-staged, push-staged.                                                                                          |


The table is using underscore (scheduler_policy) but the example shell command is using hyphen (scheduler-policy).

andygrove · 2022-11-01T23:03:52Z

ballista/scheduler/src/main.rs

+    let mut config_builder = SchedulerConfigBuilder::default()
+        .set(
+            BALLISTA_SCHEDULER_EVENT_LOOP_BUFFER_SIZE,
+            &opt.event_loop_buffer_size.to_string(),
+        )
+        .set(
+            BALLISTA_FINISHED_JOB_DATA_CLEANUP_DELAY_SECS,
+            &opt.finished_job_data_clean_up_interval_seconds.to_string(),
+        )
+        .set(
+            BALLISTA_FINISHED_JOB_STATE_CLEANUP_DELAY_SECS,
+            &opt.finished_job_state_clean_up_interval_seconds.to_string(),
+        );


I don't understand the intention here of creating key-value pairs for these configs. It looks like these configs are just command-line arguments to the scheduler and there is no need to serialize them or send them over the network?

Shouldn't the scheduler config just be a simple struct like this?

struct SchedulerConfig { event_loop_buffer_size: ..., finished_job_data_clean_up_interval_seconds: ..., ... }

Thanks @andygrove for your comments. Agree with you. It's a bit over designed. I'll change it back and use a normal struct to indicate the scheduler config.

yahoNanJing · 2022-11-02T02:53:13Z

Does this turn the command line argument into --advertise-flight-result-route-endpoint? That seems overly verbose for a command line argument name. --help will show the full description. I think we could safely get away with --advertise-flight-endpoint at most.

Hi @avantgardnerio, is it only for the result or both the result and the intermediate shuffle data? If it's only for the result, it's better to include 'result' in the configuration name. How about advertise-flight-result-endpoint

avantgardnerio · 2022-11-02T14:39:50Z

advertise-flight-result-endpoint

How about --advertise-flightsql-endpoint, which should remove any ambiguity as this is only related to FlightSQL, not regular flights between clients?

A tangent not for this PR: we probably just want separate --advertise-host and --flightsql-port at some point, as I imagine other services may eventually need to know what host is listening. And now that I think about it, we shouldn't need the port to be specified as that shouldn't get remapped by anything.

yahoNanJing · 2022-11-02T15:35:22Z

--advertise-flightsql-endpoint

Seems good to me.

…-endpoint

andygrove

Thanks @yahoNanJing! That is looking much cleaner now and easier to review.

kyotoYaho added 2 commits October 28, 2022 11:17

Move data cleanup caller explicitly to the event loop

e32c564

Refactor BallistaConfig by extracting common validation logic

ae691f1

yahoNanJing requested review from andygrove and Dandandan October 28, 2022 13:04

yahoNanJing mentioned this pull request Oct 28, 2022

Long running stability #466

Open

9 tasks

kyotoYaho added 2 commits October 28, 2022 22:23

Create a separate mod for BallistaConfig

c35d126

Add SchedulerConfig for the scheduler configurations, like event_loop…

0ebca57

…_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds

yahoNanJing force-pushed the issue-469 branch from d35d3c2 to 0ebca57 Compare October 28, 2022 14:24

avantgardnerio mentioned this pull request Oct 29, 2022

Refactor config #479

Open

avantgardnerio approved these changes Oct 29, 2022

View reviewed changes

andygrove reviewed Oct 30, 2022

View reviewed changes

kyotoYaho added 4 commits November 1, 2022 17:10

Fix conflicts

53e0079

Rename the scheduler config advertise_endpoint to advertise_flight_re…

3d818d0

…sult_route_endpoint and add it to the SchedulerConfig

Allow redundant configurations in ValidConfiguration

60d1709

Don't need to be delayed for cleaning up shuffle data of failed job

df02487

yahoNanJing force-pushed the issue-469 branch from d951e05 to 26c8611 Compare November 1, 2022 19:06

Update user-guide for the scheduler configurations

8335ea9

yahoNanJing force-pushed the issue-469 branch from 26c8611 to 8335ea9 Compare November 1, 2022 19:08

avantgardnerio reviewed Nov 1, 2022

View reviewed changes

andygrove reviewed Nov 1, 2022

View reviewed changes

andygrove mentioned this pull request Nov 1, 2022

Add grpc service of cleaning up job shuffle data for the scheduler to make it able to be triggered by client explicitly #485

Merged

Fix doc config name format

798e273

Change the SchedulerConfig to be an explicit struct

5dec45f

yahoNanJing force-pushed the issue-469 branch 2 times, most recently from ce09a10 to 6b2cb02 Compare November 2, 2022 06:11

Fix doc format

b4f371c

yahoNanJing force-pushed the issue-469 branch from 0ac60b5 to b4f371c Compare November 2, 2022 07:25

yahoNanJing force-pushed the issue-469 branch from 24f7f71 to f4c4d64 Compare November 2, 2022 15:46

Rename advertise-flight-result-route-endpoint to advertise-flight-sql…

fa52343

…-endpoint

yahoNanJing force-pushed the issue-469 branch from f4c4d64 to fa52343 Compare November 2, 2022 15:48

andygrove approved these changes Nov 2, 2022

View reviewed changes

andygrove merged commit 926605e into apache:master Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SchedulerConfig for the scheduler configurations, like event_loop_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds #472

Add SchedulerConfig for the scheduler configurations, like event_loop_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds #472

yahoNanJing commented Oct 28, 2022

yahoNanJing commented Oct 28, 2022

avantgardnerio left a comment

avantgardnerio Oct 29, 2022

avantgardnerio Oct 29, 2022

avantgardnerio Oct 29, 2022

avantgardnerio Oct 29, 2022

yahoNanJing commented Oct 29, 2022

andygrove Oct 30, 2022

yahoNanJing Oct 31, 2022

yahoNanJing commented Oct 31, 2022

andygrove commented Oct 31, 2022

yahoNanJing commented Nov 1, 2022 •

edited

Loading

yahoNanJing commented Nov 1, 2022

avantgardnerio left a comment

andygrove Nov 1, 2022

andygrove Nov 1, 2022

yahoNanJing Nov 2, 2022

yahoNanJing commented Nov 2, 2022

avantgardnerio commented Nov 2, 2022

yahoNanJing commented Nov 2, 2022

andygrove left a comment


		/// All available configuration options
		pub fn valid_entries() -> HashMap<String, ConfigEntry> {

Add SchedulerConfig for the scheduler configurations, like event_loop_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds #472

Add SchedulerConfig for the scheduler configurations, like event_loop_buffer_size, finished_job_data_clean_up_interval_seconds, finished_job_state_clean_up_interval_seconds #472

Conversation

yahoNanJing commented Oct 28, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

yahoNanJing commented Oct 28, 2022

avantgardnerio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yahoNanJing commented Oct 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yahoNanJing commented Oct 31, 2022

andygrove commented Oct 31, 2022

yahoNanJing commented Nov 1, 2022 • edited Loading

yahoNanJing commented Nov 1, 2022

avantgardnerio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yahoNanJing commented Nov 2, 2022

avantgardnerio commented Nov 2, 2022

yahoNanJing commented Nov 2, 2022

andygrove left a comment

Choose a reason for hiding this comment

yahoNanJing commented Nov 1, 2022 •

edited

Loading