Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lingering renderer processes #2

Closed
mihe opened this issue Mar 6, 2018 · 13 comments
Closed

Lingering renderer processes #2

mihe opened this issue Mar 6, 2018 · 13 comments

Comments

@mihe
Copy link
Contributor

mihe commented Mar 6, 2018

Just gave this build a try in our application and in your sample cefmixer application. In both cases I have excellent performance. Disabling vsync in Chromium definitely helped with responsiveness.

I do have an issue that shows up in both applications though, which is that the --type=renderer processes are hanging around after the browser process has closed, and they all seem to be spinning on one thread (each taking up 25% on my 4-core CPU). It seems somewhat random when this happens.

Any idea what might be causing this?

Might be worth mentioning that I have an AMD RX 580 graphics card.

@wesselsga
Copy link
Contributor

When you saw the issue with the cefmixer application - were you just using the stock aquarium url? I've tried to reproduce this here with no luck yet. Just wondering if there was a particular page you are navigating to?

@mihe
Copy link
Contributor Author

mihe commented Mar 6, 2018

I am able to reproduce the issue with the default fishgl URL in cefmixer. It's very random though.

The only way I've been able to reproduce it somewhat consistently is if I launch 6+ instances of the application simultaneously and close them all after a couple of seconds. But even then there's no guarantee that it happens.

@wesselsga
Copy link
Contributor

wesselsga commented Mar 14, 2018

I still have not been able to reproduce this issue locally yet. I'm curious if you build the latest cefmixer project if you still see it. I recently modified how the lifetime of CEF is handled in the test app (I went away from using the multi_threaded_message_loop option). The new code can be found in:

void CefModule::startup() { ... }
void CefModule::shutdown() { ... }

If it still happens - do you have a DEBUG build of CEF that can be attached to the hung process?

@mihe
Copy link
Contributor Author

mihe commented Mar 14, 2018

I'll give the new cefmixer a try as soon as I can to see if I can still reproduce the issue there.

As I've mentioned we do experience this issue in our own application as well. We only just recently switched multi_threaded_message_loop from false to true, and I've had this issue appear with both configurations.

I haven't compiled my own build of your patched CEF, but instead made use of your binary distributions, so I'm not able to debug it unfortunately.

You don't happen to have built your distribution with debug symbols by any chance? Otherwise I'll see about compiling CEF/Chromium if the new cefmixer doesn't work out.

@wesselsga
Copy link
Contributor

It will take me a little while, but I will also get the symbols together for the sample distribution.

@mihe
Copy link
Contributor Author

mihe commented Mar 15, 2018

So I finally managed to get my own build going, with debug symbols and everything, and I managed to reproduce the issue in the latest cefmixer.

Here's the call stack for the lingering render process (on the CrRendererMain thread)

NtCreateEvent()
CreateEventW()
base::WaitableEvent::WaitableEvent(base::WaitableEvent::ResetPolicy reset_policy, base::WaitableEvent::InitialState initial_state) Line 27
ui::Gpu::EstablishGpuChannelSync() Line 343
content::RenderThreadImpl::EstablishGpuChannelSync() Line 1972
content::RenderThreadImpl::RequestNewLayerTreeFrameSink(int routing_id, scoped_refptr<content::FrameSwapMessageQueue> frame_swap_message_queue, const GURL & url, const base::RepeatingCallback<void (std::unique_ptr<cc::LayerTreeFrameSink,std::default_delete<cc::LayerTreeFrameSink> >)> & callback) Line 2062
content::RenderWidget::RequestNewLayerTreeFrameSink(const base::RepeatingCallback<void (std::unique_ptr<cc::LayerTreeFrameSink,std::default_delete<cc::LayerTreeFrameSink> >)> & callback) Line 1002
content::RenderWidgetCompositor::RequestNewLayerTreeFrameSink() Line 1220
base::debug::TaskAnnotator::RunTask(const char * queue_function, base::PendingTask * pending_task) Line 53
blink::scheduler::TaskQueueManager::ProcessTaskFromWorkQueue(blink::scheduler::internal::WorkQueue * work_queue, blink::scheduler::LazyNow time_before_task, base::TimeTicks * time_after_task) Line 543
blink::scheduler::TaskQueueManager::DoWork(blink::scheduler::internal::Sequence::WorkType work_type) Line 343
base::debug::TaskAnnotator::RunTask(const char * queue_function, base::PendingTask * pending_task) Line 53
blink::scheduler::internal::ThreadControllerImpl::DoWork(blink::scheduler::internal::Sequence::WorkType work_type) Line 99
base::debug::TaskAnnotator::RunTask(const char * queue_function, base::PendingTask * pending_task) Line 53
base::MessageLoop::RunTask(base::PendingTask * pending_task) Line 399
base::MessageLoop::DoWork() Line 462
base::MessagePumpDefault::Run(base::MessagePump::Delegate * delegate) Line 37
base::RunLoop::Run() Line 136
content::RendererMain(const content::MainFunctionParams & parameters) Line 218
content::ContentMainRunnerImpl::Run() Line 717
service_manager::MainRun(service_manager::MainParams & params) Line 467
service_manager::Main(service_manager::MainParams & params) Line 514
content::ContentMain(const content::ContentMainParams & params) Line 19
CefExecuteProcess(const CefMainArgs & args, scoped_refptr<CefApp> application, void * windows_sandbox_info) Line 200
cef_execute_process(const _cef_main_args_t * args, _cef_app_t * application, void * windows_sandbox_info) Line 194

Let me know if you need anything else.

@mihe
Copy link
Contributor Author

mihe commented Mar 16, 2018

Can confirm that I get the same call stack in our own application as well. It's constantly trying the RenderWidgetCompositor::RequestNewLayerTreeFrameSink task and failing, thereby sending another request, forever and ever.

@wesselsga
Copy link
Contributor

wesselsga commented Mar 16, 2018

With DEBUG builds I have seen a few issues with shutdown and DCHECKs failing. I also added a --grid option to cefmixer and noticed more DCHECK failures with multiple html view instances. The --grid option lets you specify something like --grid=2x2 and cefmixer will tile multiple html views. Works well for stress testing.

I believe the code I added to CEF originally is problematic:

compositor->SetAcceleratedWidget(gfx::kNullAcceleratedWidget);

This code was intended to get Chromium to use the Offscreen FBO, but it seems to have negative consequences. I removed this since it really turned out to be unnecessary because we have another flag on the compositor telling it to use shared textures with the method EnableSharedTextures.

Anyway, I did reproduce the hung process issue here and my call stack was the same as yours - I'm testing the above change now to see if the results are more stable. It did resolve the DCHECK startup failures with multiple instances

@mihe
Copy link
Contributor Author

mihe commented Mar 17, 2018

I am unfortunately still able to reproduce the issue with your latest commit, both with the --grid=2x2 argument as well as without.

@wesselsga
Copy link
Contributor

wesselsga commented Mar 18, 2018

The only way I can seem to reproduce this here is by running a DEBUG build in the debugger and the terminating it prematurely with a Stop Debugging command. I have yet to reproduce this locally with a Release build.

Do you have any tips for reproducing this with a Release build? I've been attempting to launch several instances and quickly closing them ... no luck yet. Even force killing the browser process with Task Manager does not seem to exhibit the lingering render process

@mihe
Copy link
Contributor Author

mihe commented Mar 18, 2018

One thing that made it easier for me to reproduce was to do...

// ...
while (msg.message != WM_QUIT)
{
	static int counter = 0;
	if (counter++ > 1000)
		exit(0);

	if (PeekMessage(&msg, nullptr, 0, 0, PM_REMOVE))
// ...

... in main.cpp, and then launch 5-6 of them in parallel and keep launching them as they disappear.

You can probably come up with better ways to do the same thing, but that worked well enough for me, and lets me reproduce the issue within at most 20-30 launches.

I have had this appear without doing Stop Debugging or exit(0), but using those methods does seem to make the issue appear more frequently.

@mihe
Copy link
Contributor Author

mihe commented Mar 18, 2018

Alright, so I just downloaded the 3.3325.1750 build from the Spotify automated builds website, pointed cefmixer to use it, and just commented out the OnAcceleratedPaint/shared_textures_enabled code. I can happily/unfortunately report that I see the same issue there.

So I assume that means it can't be an issue with your branch then. I'm gonna go back and try a couple of more builds from their archive and see if I can find when it started happening.

Apologies for not trying this earlier on. We jumped straight from 3.3112.1650 to your branch, so I can only assume that this bug appeared somewhere inbetween there.

@mihe mihe closed this as completed Mar 18, 2018
@mihe
Copy link
Contributor Author

mihe commented Mar 18, 2018

For posterity's sake, it seems to have been introduced with Chromium 64. I don't have the issue in build CEF 3.3239.1723.g071d1c1 / Chromium 63.0.3239.132, and I do have the issue in build CEF 3.3282.1728.g2171fc7 / Chromium 64.0.3282.119.

I'll try and track down what caused it and post an issue in the appropriate forum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants