Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce node level limits if node is started in production env #16733

Merged
merged 1 commit into from
Feb 22, 2016

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Feb 19, 2016

This commit tries to 'guess' if a user starts a node in production by
checking if any network host is configured. If that is the case soft-limits
that are only logged otherwise are enforced like number of open file descriptors.

Closes #16727

* - discovery.zen.minimum_master_nodes
* - discovery.zen.ping.unicast.hosts is set if we use zen disco
* - ensure we can write in all data directories
* - fail if mlock all failed and was configured */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️, this is one that I've wanted to do for a long time.

@bleskes
Copy link
Contributor

bleskes commented Feb 20, 2016

LGTM. Great!

@s1monw s1monw force-pushed the enforce_limits branch 2 times, most recently from d104334 to ba5c4a3 Compare February 21, 2016 17:33
@s1monw
Copy link
Contributor Author

s1monw commented Feb 21, 2016

@jasontedor @bleskes I added some tests and I wonder if we should also switch to strict mode when folks configure profiles?

This commit tries to 'guess' if a user starts a node in production by
checking if any network host is configured. If that is the case soft-limits
that are only logged otherwise are enforced like number of open file descriptors.

Closes elastic#16727
s1monw added a commit that referenced this pull request Feb 22, 2016
Enforce node level limits if node is started in production env
@s1monw s1monw merged commit 1e15ae6 into elastic:master Feb 22, 2016
@s1monw s1monw deleted the enforce_limits branch February 22, 2016 18:30
@rmuir
Copy link
Contributor

rmuir commented Feb 26, 2016

This commit is broken on OS X.

@rmuir
Copy link
Contributor

rmuir commented Feb 26, 2016

The referenced workaround #16813 does not work either. We should revert this commit.

@rmuir
Copy link
Contributor

rmuir commented Feb 26, 2016

I tried to revert it myself, but github does not allow it. I'm too lazy to revert it manually, but this should be reverted, and only committed when actually tested.

@jasontedor
Copy link
Member

@rmuir It does. But you have to sysctl kern.maxfiles, and kern.maxfilesperproc to higher limits, reboot, and set the JVM flag in both GRADLE_OPTS (if you're running tests) and ES_JAVA_OPTS (the GRADLE_OPTS is necessary if you're running tests because a forked process will not be granted more descriptors than its parent).

@rmuir
Copy link
Contributor

rmuir commented Feb 26, 2016

yeah thats intuitive. we should revert this.

@jasontedor
Copy link
Member

@rmuir I agree, the situation is complicated. I spent some time last night reading JVM and Darwin sources (and running dtraces) to understand the implications. The short version is that this flag is weird, especially on OS X. I'll post the longer version of my notes later tonight or early tomorrow on the issue #16813 that @jaymode filed.

An immediate idea I have for temporary relief is to disable the check on snapshot builds.

@rmuir
Copy link
Contributor

rmuir commented Feb 26, 2016

Lets enumerate the problems here.

I sit here with my brother, who works statistics in elasticsearch, because its a nice opportunity to test nick's PR (#16817), since my brother actually does this stuff, you know, the hard way right now.

Anyway, lets enumerate the shitty experience we had here:

  1. he installed ES on his operating system, and of course its "log4j not configured properly", but continues to run. so logging isnt working, and garbage leniency just lets it run. Fucking awesome guys.
  2. attempts to index anything with threads > 1 results in RejectedExecutionException/TransportReplicationAction/nonsense. If me, my brother, mike, and ryan cant figure the shit out, users have no fucking hope.
  3. after discussing with ryan and nik, the idea is, lets try master, it might be better. master neither runs nor compiles on freebsd, because gradle 2.4 just fails in strange ways. doesnt fail with "you need at least gradle 2.x", just fails in a strange way. fucking fantastic.
  4. download latest version of gradle, fails also because "missing native libraries". Guess setAccessible only carries you so far groovy guys! wonderful.
  5. try to run master on my mac, binding to the ethernet port, since my mac is "already setup", then the idea is, my brother can just index to it, and we can get shit done. This check fails.
  6. Try passing the linked workaround from jay, this also fails, it seems to just make ES think there is then a limit of only 256 file descriptors.

@s1monw
Copy link
Contributor Author

s1monw commented Feb 27, 2016

I have no problem reverting this if it causes problems. It didn't cause problems in my test runs.
I have problems with statements like this:

At this point, I am officially embarrassed to be affiliated with this company or this software in any way. This is fucking shit trash guys.

I tried to make things better and preserve the OOB experience. If there are problems then lets reiterate but with statements like this I loose any kind of enjoyment to work with you @rmuir and I am sure am not the only one. I will revert this check now.

@jasontedor
Copy link
Member

I will revert this check now.

I chatted with @s1monw via another channel, and we are going to try the approach of disabling the check on snapshot builds. I opened #16835.

@jasontedor
Copy link
Member

I'll post the longer version of my notes later tonight or early tomorrow on the issue #16813 that @jaymode filed.

I posted my notes.

@rmuir
Copy link
Contributor

rmuir commented Feb 27, 2016

As usual, i'm not attacking anyone personally. Just the code! I do attack the code, and i believe it needs some attacking! But lets be clear, its not about any persons. Just a bunch of crap we should clean up.

@clintongormley clintongormley added :Core/Infra/Settings Settings infrastructure and APIs and removed >enhancement labels Feb 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants