Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix leaks in HTTP::Server specs #7197

Merged

Conversation

straight-shoota
Copy link
Member

This fixes several issues with HTTP::Server specs.

Previously, there were some weird influences between individual specs. For example, when disabling handles exception during SSL handshake (e.g. marking it pending), the following spec closes gracefully would fail with the following error:

End of file reached (IO::EOFError)
         from src/io.cr:814:31 in 'read_line:chomp'
         from src/http/content.cr:196:7 in 'read_chunk_size'
         from src/http/content.cr:178:7 in 'next_chunk'
         from src/http/content.cr:145:7 in 'peek'
         from src/io.cr:632:37 in 'gets'
         from src/io.cr:605:5 in 'gets'
         from src/io.cr:575:5 in 'gets'
         from src/io.cr:574:3 in 'gets'
         from spec/std/http/server/server_spec.cr:609:11 in '->'
         from src/spec/methods.cr:255:3 in 'it'
         from spec/std/http/server/server_spec.cr:587:7 in '->'
         from src/spec/context.cr:255:3 in 'describe'
         from src/spec/methods.cr:16:5 in 'describe'
         from spec/std/http/server/server_spec.cr:586:5 in '->'
         from src/spec/context.cr:255:3 in 'describe'
         from src/spec/methods.cr:16:5 in 'describe'
         from spec/std/http/server/server_spec.cr:234:3 in '__crystal_main'
         from src/crystal/main.cr:97:5 in 'main_user_code'
         from src/crystal/main.cr:86:7 in 'main'
         from src/crystal/main.cr:106:3 in 'main'
         from __libc_start_main

@RX14 had already identified reusing fildescriptors by leftover fibers as probable reason.

And in fact, a large number of specs was not fully contained and leaking either open ports or unstopped fibers.

One issue was an actual bug in HTTP::Server#bind_tcp (and similar methods) which would bind a new socket and try to add it to the server. If that failed (because the server was already closed), it would raise an error but the newly created socket would not be terminated.

The other issues were all introduced by the specs themselves, a few missing server.close.

I also added a helper run_server which starts the server in a new fiber, then yields to a block for interacting with it and finally ensures that it is properly shut down. This should avoid any breaking specs from leaking server fibers.

I'm not sure if ensure_no_ports_leaking should stay around. It's rather rudimentary. If a open port is leaked at some point, all following specs will fail because of this.

This might not fix all the issues and I still don't completely understand the reasons why exactly one spec failed when the other was deactivated. But it already improves quite a lot.
I'll try to run the specs randomized to see if there might be any other issues with couplings between specs.

@straight-shoota
Copy link
Member Author

The CI failure shows there are even more specs leaking open ports in other files. I'll take a look at it tomorrow.

@j8r
Copy link
Contributor

j8r commented Dec 18, 2018

The lsof trick isn't a satisfying solution 😕

@straight-shoota
Copy link
Member Author

@j8r No it's not. I'm mainly using it to manually detect any leaking open ports. It's not solid enough to be included in automatic tests.

That being said, it would be nice if specs could automatically fail if they leak fibers or open file descriptors. I don't think this can be implemented, though.

src/http/server.cr Outdated Show resolved Hide resolved
@straight-shoota straight-shoota force-pushed the jm/fix/http-server-specs branch 2 times, most recently from e0e8e33 to e4e0a5d Compare December 19, 2018 09:16
@straight-shoota straight-shoota changed the title Fix HTTP::Server specs Fix leaks in HTTP::Server specs Dec 19, 2018
@straight-shoota
Copy link
Member Author

This PR is ready for review.

It hopefully removes all interdependencies between individual server specs, and so far it looks good. The initial example with deactivating a specific spec no longer reproduces. I've run over a thousand permutations of specs (random order + leaving some out) and found no other breaking specs.

This should eliminate fiber and port leaks in most places of the stdlib spec suite. A few specs specs still leak fibers (see CI results). Some are caused by the implementation of parallel (see #7204). I can't see why Channel::Buffered pings leaks a fiber. It doesn't even help to give it some time to clean up. This looks weird.

The websocket spec negotiates over HTTPS correctly also leaks a fiber, which might be caused by #7199, but I'm not sure. This needs further investigation and finally refactoring the specs (for negotiates over HTTP correctly, too).

The bcrypt spec is an odd place for a fiber finishing. It's reducing global fiber count by 1, consistently among all platforms (see CI results for 24c3f37). I'm not sure what's going on there.

src/http/server.cr Outdated Show resolved Hide resolved
@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib topic:stdlib:concurrency kind:refactor labels Dec 30, 2018
@sdogruyol
Copy link
Member

@straight-shoota I'd really like to see this merged. IMHO this is as good as it's, just needs a rebase.

@asterite
Copy link
Member

If I checkout this branch and run:

bin/crystal spec spec/std/http/client/client_spec.cr 

I get:

Error reading socket: Connection reset by peer (Errno)

in four specs. I don't know what's going on...

This is on mac osx. I don't know why CI is happy.

@straight-shoota
Copy link
Member Author

I don't have a Mac, can anyone reproduce?

Which specs are failing exactly? And is it this plain branch or with 1000.times do?

@asterite
Copy link
Member

@straight-shoota The specs that fail for me, without 1000.times, randomly, are:

  • "doesn't read the body if request was HEAD"
  • "tests connect_timeout"
  • "tests empty Content-Type"
  • "tests read_timeout"

What I see happening is: the server writes the response and then does io.close (in test_server) but this could happen before the client reads the response, hence we get "connection reset by peer" (I think). Could this be it?

If I put Fiber.yield right after io.flush then it works well all the time.

However, I don't understand now why it never happens that the server ends up raising because of writing the response after waiting for read_timeout when the client should have already been closed (what I saw in #7402). Maybe it's because of the channel and receive waiting... but I'm not sure. Actually, if I write client.close after the expect_raises then I get the error... which means that if we don't do that then it can happen later automatically when the GC runs and the server error can happen. Makes sense?

And also, if you try it locally but put everything inside 20.times do, can you reproduce the errors I'm seeing?

@straight-shoota
Copy link
Member Author

Okay, so it effectively happens in all the specs using test_server.

However, I can't reproduce any such error on linux.

So it seems to be specific to MacOS. But since it doesn't trigger on CI, there must be something else to it. Would be great if a few Mac users could test this on other machines.

@straight-shoota
Copy link
Member Author

I'd like to see this merged. New PRs like #7610 add specs that could benefit from the spec helpers defined here.

I can't debug the issues on MacOS. But they seem to be related to changes to the test_server helper used in client_spec. So I guess we could merge this without these changes (essentially the third commit) and. WDYT?

@asterite
Copy link
Member

asterite commented Apr 1, 2019

Oh, I thought this was already merged.

Yes, let's please merge this in whatever way that works.

@straight-shoota
Copy link
Member Author

@asterite I removed the changes to test_server. Could you run your test on MacOS again?

@asterite
Copy link
Member

asterite commented Apr 1, 2019

@straight-shoota

$ bin/crystal spec spec/std/http/client/client_spec.cr 
Error in line 1: while requiring "./spec/std/http/client/client_spec.cr"

in spec/std/http/client/client_spec.cr:136: undefined method 'run_server'

      run_server(server) do
      ^~~~~~~~~~

@straight-shoota
Copy link
Member Author

Ooops, sry. Fixed.

Copy link
Member

@asterite asterite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@straight-shoota It's working fine now, thanks!

@asterite
Copy link
Member

asterite commented Apr 1, 2019

Feel free to merge. Then I'll rebase the remote_address PR.

`#bind_tcp`, `#bind_unix` and `#bind_tls` would bind a new socket and
try to add it to the server. If that failed (because the server was
already closed), it would raise an error but the newly created socket
would not be terminated, effectively leaking it.
Spec helper for concurrently running an `HTTP::Server` in a spec and
finally closing it before continuing execution in the parent scope.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:refactor topic:stdlib:networking topic:stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants