Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max file descriptor issues on OS X #16813

Closed
jaymode opened this issue Feb 25, 2016 · 2 comments
Closed

max file descriptor issues on OS X #16813

jaymode opened this issue Feb 25, 2016 · 2 comments
Labels

Comments

@jaymode
Copy link
Member

jaymode commented Feb 25, 2016

With the recent change in #16733 we introduced the notion of running in a production mode, which is detected by inspecting whether network.host is set. On OS X this is problematic with the Oracle JDK, since it limits the java process to 10240 file descriptors by default unless the -XX:-MaxFDLimit VM option is passed. This limit will cause elasticsearch to fail to start (see below).

The JDK 7 documentation states that this option is only relevant to solaris, but according to this issue it is relevant to all platforms but the documentation won't be updated. The JDK8 documentation does not list this option anymore even though it is still in use.

Some output from a terminal:

$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
$ ulimit -n
65536
$ launchctl limit
    cpu         unlimited      unlimited      
    filesize    unlimited      unlimited      
    data        unlimited      unlimited      
    stack       8388608        67104768       
    core        0              unlimited      
    rss         unlimited      unlimited      
    memlock     unlimited      unlimited      
    maxproc     2048           2048           
    maxfiles    65536          65536  
$ bin/elasticsearch
Exception in thread "main" java.lang.IllegalStateException: max file descriptors [10240] for elasticsearch process likely too low, increase it to at least [65536]
    at org.elasticsearch.bootstrap.Bootstrap.enforceOrLogLimits(Bootstrap.java:401)
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:192)
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:283)
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:37)
Refer to the log for complete error details.
$ export ES_JAVA_OPTS=-XX:-MaxFDLimit
$ bin/elasticsearch
[2016-02-25 15:00:33,638][INFO ][node                     ] [Hector] version[3.0.0], pid[30941], build[c9c4cac/2016-02-25T13:06:48.503Z]
[2016-02-25 15:00:33,638][INFO ][node                     ] [Hector] initializing ...
[2016-02-25 15:00:33,988][INFO ][plugins                  ] [Hector] modules [lang-mustache, lang-painless, ingest-grok, lang-expression, lang-groovy], plugins []
[2016-02-25 15:00:34,008][INFO ][env                      ] [Hector] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [123.2gb], net total_space [232.6gb], spins? [unknown], types [hfs]
[2016-02-25 15:00:34,008][INFO ][env                      ] [Hector] heap size [989.8mb], compressed ordinary object pointers [true]
[2016-02-25 15:00:35,338][INFO ][node                     ] [Hector] initialized
[2016-02-25 15:00:35,338][INFO ][node                     ] [Hector] starting ...
[2016-02-25 15:00:35,414][INFO ][transport                ] [Hector] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2016-02-25 15:00:35,420][INFO ][discovery                ] [Hector] elasticsearch/pqUOEZAZQtGy_Sbax5uLIg
[2016-02-25 15:00:38,453][INFO ][cluster.service          ] [Hector] new_master {Hector}{pqUOEZAZQtGy_Sbax5uLIg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-02-25 15:00:38,473][INFO ][http                     ] [Hector] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2016-02-25 15:00:38,474][INFO ][node                     ] [Hector] started
[2016-02-25 15:00:38,492][INFO ][gateway                  ] [Hector] recovered [0] indices into cluster_state

In the above scenario, the only change to elasticsearch.yml is setting network.host to localhost and I am running a build from master.

We may want to consider adding this option to the elasticsearch script or documenting it.

@jasontedor
Copy link
Member

This setting is weird. Note that it's enabled by default and the documentation says

Bump the number of file descriptors to max.

So why do we want to disable it to increase the number of file descriptors past 10240 on OS X?

I dove into the OpenJDK code to understand this flag and how it interacts with each of the major operating systems. When this flag is set, all of them basically delegate to getrlimit with the resource RLIMIT_NOFILE and then the JVM tries to set the soft limit to the hard limit (it silently ignores failure). The one exception to this is OS X which takes the minimum of OPEN_MAX and the soft limit as the new soft limit.

The reason for the exception on OS X is due to this from man setrlimit:

setrlimit() now returns with errno set to EINVAL in places that historically succeeded. It no longer accepts "rlim_cur = RLIM_INFINITY" for RLIM_NOFILE. Use "rlim_cur = min(OPEN_MAX, rlim_max)".

The constant OPEN_MAX is defined as:

#define OPEN_MAX 10240 /* max open files per process - todo, make a config option? */

in /usr/include/sys/syslimits.h. A todo. 😞 This explains the 10240 number that we are seeing in the output.

When the flag is disabled, the number of file descriptors is equal to the soft limit. This is why if the soft limit is increased on OS X, then the 10240 limit can be avoided. Sneaky and rather counterintuitive.

The situation with OS X gets weird though. If you look at int fdalloc(proc_t p, int want, int *result) in bsd/kern/kern_descrip.c there is this code:

    lim = min((int)p->p_rlimit[RLIMIT_NOFILE].rlim_cur, maxfiles);

The limit is the minimum of the soft limit and maxfiles. What is maxfiles? It's a global variable defined in bsd/conf/param.c:

#define MAXFILES (OPEN_MAX + 2048)
int maxfiles = MAXFILES;

Wait, so is the max files still only 12288?

I ran the dtrace

$ dtrace -n 'BEGIN { trace(`maxfiles); exit(0); }’

which gives

dtrace: description 'BEGIN ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0      1                           :BEGIN         12288

This then leads us to kern.maxfiles and kern.maxfilesperproc which need to be increased via sysctl followed by a reboot.

Of course, none of this makes sense on Windows (insert joke about how Windows is not a major operating system) where getrlimit doesn't even exist. On Windows, MaxFDLimit has no effect (the flag is recognized at start up, but has no impact on the runtime behavior of the JVM).

I note that there is a comment on the OpenJDK bug that @jaymode linked to that this this flag is going to be deprecated, but it still appears in the JDK9 sources, so I'm skeptical of that at this time.

My conclusion for all of this is that this is way too complicated. I think we should take a different route than working around the limit via this JVM flag and all these others dances on OS X. Instead, I think that we should just disable the flag if the build is a snapshot build.

I opened #16835.

@jasontedor
Copy link
Member

Closed by #16835

facebook-github-bot pushed a commit to facebook/buck that referenced this issue Aug 4, 2021
…al actions).

Summary:
`Too many open files` issue on MacOs.

`-XX:- MaxFDLimit` - the way to  stop the Java VM from restricting the number of open files to 1024

Related links:
* https://stackoverflow.com/questions/16451343/java-file-limit-on-osx-lower-than-in-bash/16535804#16535804
* elastic/elasticsearch#16813

Reviewed By: jiawei-lyu

fbshipit-source-id: a40d30529a561e9b7319022d56fbfee9fb48dac4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants