-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Druid automated quickstart #13365
Druid automated quickstart #13365
Conversation
This pull request introduces 1 alert when merging 8c9dcc2 into 309cae7 - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
This pull request introduces 1 alert when merging 342e5d7 into 309cae7 - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
This is a great work to ease the initial setup work for users especially when they're new to Druid. This startup script relies on a Python script, I know that writting python scripts is much simpler than writing bash scripts, but my question is that what if they don't have a Python interpreter? Which version of Python do they need to run this script? Python 2 or Python 3? Will they struggle on installing a python interpreter? |
@FrankChen021 thanks for going over the change.
It requires python 3.
Installing python is straightforward on most environments, IMO it shouldn't pose any challenge. |
…anges in druid-quickstart.py
This pull request introduces 1 alert when merging 68a1e66 into 5b625ce - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
This pull request introduces 1 alert when merging e570606 into 5b625ce - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I haven't reviewed the code, but I do have some comments based on building this branch on my machine and running it:
- On my machine (64GB RAM) the script decided to set total memory = 52GB and then allocated that chunk of memory 100% to on-heap and direct memory for the various processes. It doesn't leave any room for page cache or for tasks on MMs. The calculations should be adjusted to take those items into account. (By "take page cache into account", I mean leave some memory free so the OS can use it to cache mmapped segments.) I suggest comparing the calculated numbers here against the single-server configs we have: https://druid.apache.org/docs/latest/operations/single-server.html. For example, 64GB RAM would match our "small" config, where we give the Historical a
4g
heap; the script currently generates12752m
. - The way the script accepts command-line arguments is nonstandard; consider using
argparse
instead and accepting--x=y
style arguments. It would help with error messages, too. I tried an invalidtotalMemory
setting and noticed that the error message had a traceback in it, which makes it look messy.argparse
wouldn't do that. - I noticed that when I provide a very low
totalMemory
(like40m
) the calculations generate negative numbers and then the processes crash. It would be better for the script to print a nice error message. - The script should be less chatty in the normal case. Feel free to provide a
--verbose
mode, but normally, I don't think it should print anything. The first line the user should see is the one frombin/greet
: "Starting Apache Druid." - When you have scripts tail-calling out to other scripts, it's best to use
exec
(in bash) andexecv
or one of the other exec variants (in python). This way, each call replaces the original one rather than forming a chain. It is cleaner and uses fewer resources. For the bash scriptstart-druid
, refer topost-index-task
for an example of how you can useexec
. For python refer to https://docs.python.org/3/library/os.html foros.exec*
functions.
This pull request introduces 1 alert when merging b413d15 into fa3ab27 - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
@gianm thanks for the feedback.
After going over https://druid.apache.org/docs/latest/operations/single-server.html except for nano and xlarge profiles, the ratio is close to 50% This script also provides user the ability to run any subset of services, which implies one can start standalone services, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done a partial review. Added some comments.
supervise script changes to process java opts array use argparse, leave free memory, logging
This pull request introduces 3 alerts when merging a4d14f7 into fbf76ad - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
This pull request introduces 3 alerts when merging 08507b4 into d85fb8c - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
then | ||
exec python2 "$WHEREAMI/start-druid-main.py" "$@" | ||
else | ||
exec python "$WHEREAMI/start-druid-main.py" "$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should do a elif [ -x "$(command -v python)" ]
and if that fails (in the else) it should print out a nice warning that python could not be found and is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done as a followup also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you! Let's remember, as part of the next release cycle, to test this on a bunch of OSes so we can be sure things work as expected.
docs/operations/single-server.md
Outdated
Druid includes a set of reference configurations and launch scripts for single-machine deployments. | ||
These configuration bundles are located in `conf/druid/single-server/`. | ||
|
||
The `auto` configuration sizes runtime parameters based on available processors and memory. Other configurations include hard-coded runtime parameters for various server sizes. Most users should stick with `auto`. Refer below [Druid auto start](#Druid-auto-start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the anchor needs to be #druid-auto-start
not #Druid-auto-start
examples/bin/supervise
Outdated
); | ||
|
||
usage() unless $opt{'conf'} && $opt{'vardir'}; | ||
usage() unless ((@{$opt{'command'}} || $opt{'conf'}) && $opt{'vardir'}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough perl but something in this line is messed up
When I run
./bin/start-micro-quickstart
Can't use an undefined value as an ARRAY reference at /Users/vadim/Projects/apache-druid/distribution/target/apache-druid-26.0.0-SNAPSHOT/bin/supervise line 202.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
examples/bin/supervise
Outdated
); | ||
|
||
usage() unless $opt{'conf'} && $opt{'vardir'}; | ||
usage() unless (($opt{'conf'} || @{$opt{'command'}}) && $opt{'vardir'}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this should be:
usage() unless (($opt{'command'} && @{$opt{'command'}}) || $opt{'conf'}) && $opt{'vardir'}
breaking down ($opt{'command'} && @{$opt{'command'}})
:
$opt{'command'}
means "there is some value at$opt{'command'}
- we expect this to be an arrayref, so,
@{$opt{'command'}}
means "the arrayref at$opt{'command'}
is nonempty"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Druid automated quickstart * remove conf/druid/single-server/quickstart/_common/historical/jvm.config * Minor changes in python script * Add lower bound memory for some services * Additional runtime properties for services * Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py * File end newline * Limit the ability to start multiple instances of a service, documentation changes * simplify script arguments * restore changes in medium profile * run-druid refactor * compute and pass middle manager runtime properties to run-druid supervise script changes to process java opts array use argparse, leave free memory, logging * Remove extra quotes from mm task javaopts array * Update logic to compute minimum memory * simplify run-druid * remove debug options from run-druid * resolve the config_path provided * comment out service specific runtime properties which are computed in the code * simplify run-druid * clean up docs, naming changes * Throw ValueError exception on illegal state * update docs * rename args, compute_only -> compute, run_zk -> zk * update help documentation * update help documentation * move task memory computation into separate method * Add validation checks * remove print * Add validations * remove start-druid bash script, rename start-druid-main * Include tasks in lower bound memory calculation * Fix test * 256m instead of 256g * caffeine cache uses 5% of heap * ensure min task count is 2, task count is monotonic * update configs and documentation for runtime props in conf/druid/single-server/quickstart * Update docs * Specify memory argument for each profile in single-server.md * Update middleManager runtime.properties * Move quickstart configs to conf/druid/base, add bash launch script, support python2 * Update supervise script * rename base config directory to auto * rename python script, changes to pass repeated args to supervise * remove exmaples/conf/druid/base dir * add docs * restore changes in conf dir * update start-druid-auto * remove hashref for commands in supervise script * start-druid-main java_opts array is comma separated * update entry point script name in python script * Update help docs * documentation changes * docs changes * update docs * add support for running indexer * update supported services list * update help * Update python.md * remove dir * update .spelling * Remove dependency on psutil and pathlib * update docs * Update get_physical_memory method * Update help docs * update docs * update method to get physical memory on python * udpate spelling * update .spelling * minor change * Minor change * memory comptuation for indexer * update start-druid * Update python.md * Update single-server.md * Update python.md * run python3 --version to check if python is installed * Update supervise script * start-druid: echo message if python not found * update anchor text * minor change * Update condition in supervise script * JVM not jvm in docs
Did we forgot to change the default values in Configuration reference doc page? |
Thanks for pointing it out. I will update it. |
Description
Setting up a new Druid cluster could involve tuning several runtime properties required by each service. Each process must also be given the appropriate jvm arguments so that the system can perform optimally.
Druid already contains several single-server profiles such as
nano-quickstart
,micro-quickstart
,large-quickstart
, etc. But it is still difficult to have a setup which does not match exactly with any of these profiles. A lot of the properties in these profiles can also be commoned out and simplified.This PR adds a new script
start-druid
in order to:Usage
Command:
Examples:
Run all services with total system memory and default runtime properties
./start-druid
Run one instance each of specific services with given total memory
./start-druid --services=broker,router,historical --memory=10g
Run services with overridden properties
./start-druid --services=broker,router,middleManager,historical --config=rootConfigDir
Directory
_common
must be present insiderootConfigDir
and must contain filescommon.runtime.properties
andcommon.jvm.config
with mandatory properties and any other property that you wish to override. Please refer toconf/druid/auto/_common
for an example.runtime.properties
for that service. For the example command above, create a filerootConfigDir/broker/runtime.properties
. This file should contain the properties you wish to override.broker
,router
,middleManager
,historical
inside therootConfigDir
.Add a file
jvm.config
specifying the following values in each of the service directories.- Xmx***
- Xms***
- XX:MaxDirectMemorySize***
(required only for broker and historical)For middleManager, alongwith jvm args, you must override
druid.indexer.runner.javaOptsArray
inrootConfigDir/middleManager/runtime.properties
Design
Flow
start-druid
with one of the possible invocationsstart-druid
invokes python scriptstart-druid-main.py
which computes the memory required for each servicestart-druid-main.py
invokesbin/supervise
with the commands for each service to start.bin/supervise
starts the services, one at a time, by invokingrun-druid
(same as existing flow while using./bin/start-micro-quickstart
etc.)Parameters
start-druid
accepts these optional parameters:memory
config
runtime.properties
andjvm.config
can be givenexamples/conf/druid/auto
services
zk
verbose
compute
Validations
memory
should be in valid format (integer followed by m or g)memory
should satisfy the total minimum requirements<rootConfDir>/<service>/jvm.config
should either be present for all the services being started or none<rootConfDir>/middleManager/runtime.properties
must havedruid.indexer.runner.javaOptsArray
parameter, if middleManager is to be started with overridden memoryMemory distribution
Running existing profiles
For example, starting
small
in verbose mode:Limitations
Major changes
start-druid
, bash script with functionality as described abovestart-druid-main
, python script with functionality as described aboveconf/druid/auto
directorysupervise
, to accept service startup commands as arguments, refactored the method which reads the supervise config filerun-druid
script to accept optional parameters for jvm argument and middle manager task paramsA note on runtime properties
The following runtime properties have not been added in the service-specific
runtime.properties
files insideconf/druid/auto
because their default values computed in the code seem adequate.max(10, (numCores * 17) / 16 + 2) + 30
max(numCores - 1, 1)
max(25m, heap_size * 0.02)
min(1g, directMemory/ (numThreads + numMergeBuffers + 1))
max(2, numThreads / 4)
~(heap_size / 10)
max(numCores - 1, 1)
middle-manager/runtime.properties
if specified, otherwise computed bystart-druid
max(10, ((numCores * 17) / 16 + 2) + 30)
Release note
Add a new script
start-druid
which greatly simplifies starting Druid services on a single server.Usage:
This PR has: