Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RpcClientImpl 在调用stop的时候hung死 #99

Closed
baimushan opened this issue Jun 22, 2016 · 7 comments
Closed

RpcClientImpl 在调用stop的时候hung死 #99

baimushan opened this issue Jun 22, 2016 · 7 comments

Comments

@baimushan
Copy link

栈状态
#0 0x00007f1834e7b22d in pthread_join () from /lib64/libpthread.so.0
#1 0x00000000006215ab in sofa::pbrpc::ThreadGroupImpl::stop (this=0x228f6b0) at src/sofa/pbrpc/thread_group_impl.h:182
#2 0x00000000006176c7 in sofa::pbrpc::RpcClientImpl::Stop (this=0x22be000) at src/sofa/pbrpc/rpc_client_impl.cc:109

查看io_service的内存信息如下
(gdb) p *(boost::asio::detail::task_io_service * const) 0x22738e0
$26 = {boost::asio::detail::service_baseboost::asio::detail::task_io_service = {boost::asio::io_service::service = {boost::noncopyable_::noncopyable = {},
vptr.service = 0xa24a90 <vtable for boost::asio::detail::task_io_service+16>, key = {type_info_ = 0xa245a0 <typeinfo for boost::asio::detail::typeid_wrapperboost::asio::detail::task_io_service>,
id_ = 0x0}, owner_ = @0x228f6d0, next_ = 0x0}, static id = {boost::asio::io_service::id = {boost::noncopyable_::noncopyable = {}, }, }},
one_thread_ = false, mutex_ = {boost::noncopyable_::noncopyable = {}, mutex_ = {__data = {__lock = 0, __count = 0, __owner = 0, _nusers = 7, kind = 0, spins = 0, list = {
prev = 0x0, next = 0x0}}, size = '\000' <repeats 12 times>, "\a", '\000' <repeats 26 times>, align = 0}}, task = 0x2266e10,
task_operation
= {boost::asio::detail::task_io_service_operation = {next
= 0x0, func
= 0x0, task_result
= 0}, }, task_interrupted = false, outstanding_work = {value = 3},
op_queue = {boost::noncopyable::noncopyable = {}, front = 0x0, back = 0x0}, stopped = false, shutdown = false, first_idle_thread = 0x7f182991fce0}

我理解调用后stop函数后task_io_service 的 outstanding_work_变量会被减为0 并退出他的run函数。
从而使得pthread_join函数成功返回。可能的问题点在哪里呢?

@baimushan
Copy link
Author

停的时候stream_map的值为3这个有关系吗

@baimushan
Copy link
Author

还发现一个现象
#0 0x00007f1833fb2163 in epoll_wait () from /lib64/libc.so.6
#1 0x000000000061f888 in boost::asio::detail::epoll_reactor::run (this=0x2266e10, block=, ops=...) at /usr/local/include/boost/asio/detail/impl/epoll_reactor.ipp:392
#2 0x0000000000624671 in boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., this=0x22738e0) at /usr/local/include/boost/asio/detail/impl/task_io_service.ipp:396
#3 boost::asio::detail::task_io_service::run (this=0x22738e0, ec=...) at /usr/local/include/boost/asio/detail/impl/task_io_service.ipp:153
#4 0x000000000062521e in boost::asio::io_service::run (this=0x228f6d0) at /usr/local/include/boost/asio/impl/io_service.ipp:59
#5 sofa::pbrpc::ThreadGroupImpl::thread_run (param=0x22bf100) at src/sofa/pbrpc/thread_group_impl.h:263
#6 0x00007f1834e7a9d1 in start_thread () from /lib64/libpthread.so.0
#7 0x00007f1833fb1b6d in clone () from /lib64/libc.so.6

hang在了epoll_wait上,
但是输出了epoll_reactor中的信息
(gdb) p timer_fd_
$6 = 515
所以不会给 timeout 传入 -1 。很奇怪为啥会hang死在这个地方。

@zd-double
Copy link
Collaborator

@baimushan 你把使用的方法和场景说一下,我尝试本地复现

@baimushan
Copy link
Author

server 不停的启动停止,退出前我们会stop rpclient实例, 就会遇到这样的case。
从epoll_wait的使用上应该不能hang啊??

@zd-double
Copy link
Collaborator

根据你描述的场景,没有复现出hang住的情况,能否留一下邮箱或其他联系方式方便沟通。

@baimushan
Copy link
Author

我的qq 406455861

@zd-double
Copy link
Collaborator

@baimushan ,近期在我们的环境复现了你说的问题,修复代码已经merge到master分支,请知晓,谢谢!

@bluebore bluebore closed this as completed Sep 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants