Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dpip adds show may fails #156

Open
icymoon opened this issue Mar 30, 2018 · 9 comments
Open

dpip adds show may fails #156

icymoon opened this issue Mar 30, 2018 · 9 comments
Assignees

Comments

@icymoon
Copy link
Collaborator

icymoon commented Mar 30, 2018

Here is log, several types:
multicast_msg_send: mc_msg(type:12, cid:0->slaves) timeout(2000 us), drop...
SAPOOL: sa_pool_stats: mc msg send fail: msg dropped
MSGMGR: multicast_msg_send: repeated sequence number for multicast msg: type 12, seq 0
SAPOOL: sa_pool_stats: mc msg send fail: invalid parameter
MSGMGR: multicast_msg_send: repeated sequence number for multicast msg: type 12, seq 0
SAPOOL: sa_pool_stats: mc msg send fail: invalid parameter

NETIF: [netif_sockopt_get] msg dropped for netif sockmsg opt 502)

MSGMGR: sockopt_ctl: socket msg<type=GET, id=502> callback failed
MSGMGR: [sockopt_msg_send:msg#502] errcode set in sockopt msg reply: msg dropped

@beacer
Copy link
Contributor

beacer commented Mar 30, 2018

@icymoon How about your script to monitor sa_pool or other details ? is it easy to reproduce or not ?
@ywc689 pls help to check the msg issue.

@icymoon
Copy link
Collaborator Author

icymoon commented Mar 30, 2018

It easy to reproduce with workload.
Monitors call dpip, ipvsadm command with different parameters.

@icymoon
Copy link
Collaborator Author

icymoon commented Mar 30, 2018

每段开始出问题的时间,都是以这样的消息开头的
MSGMGR: multicast_msg_send: mc_msg(type:11, cid:0->slaves) timeout(2000 us), drop...
SERVICE: dp_vs_copy_stats: send message fail.
或者
msg_send: uc_msg(type:8, cid:0->13, flags=2) timeout(2000 us), drop...
NETIF: [netif_sockopt_get] msg dropped for netif sockmsg opt 502)

前者的概率更大些似乎

@beacer
Copy link
Contributor

beacer commented Mar 30, 2018

As discussed with @icymoon , to reproduce it:

  1. test with 2G bps througput.
  2. run dpip addr show with dead loop: while [ true ]; dpip addr show; done > xxx.log

@icymoon
Copy link
Collaborator Author

icymoon commented Mar 30, 2018

reproduced with this work load:
# ./wrk -t26 -c5000 -d15m http://xxxx/
Running 15m test @ http://xxx/
26 threads and 5000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.01ms 48.57ms 2.00s 99.11%
Req/Sec 34.96k 19.43k 155.07k 73.96%
813153656 requests in 15.00m, 583.85GB read
Socket errors: connect 3332, read 0, write 0, timeout 16561
Requests/sec: 903409.67
Transfer/sec: 664.22MB

@ywc689
Copy link
Collaborator

ywc689 commented Apr 1, 2018

  1. DPVS lcore multicast msg is master-to-all-slaves round trip message , i.e. the master lcore send messages to all slaves, and wait until all slaves reply with a response message if the message is blockable. So the message would time out if any lcore get slower than 2000us/2, which is very likely to happen in the stress test. Increase the msg timeout in the dpvs.conf may helps.
  2. The error log repeated sequence number is a result of that dpvs is too busy to process the frequently generated dpip commands of the same type. We can make use of the seq field in dpvs_msg to solve the problem. But the root cause still is dpvs get slower than dpip.

@icymoon
Copy link
Collaborator Author

icymoon commented Apr 2, 2018

谢谢@ywc689 或者我只尝试读per core的计数然后在用户态加起来,可以规避这个问题?

@ywc689
Copy link
Collaborator

ywc689 commented Apr 2, 2018

@icymoon 你把dpvs.conf配置文件中ctrl_defs/lcore_msg/sync_msg_timeout_us这个超时时间调大一些,不让系统出现msg timeout,就应该可以拿到统计数据了。 这个问题可能是由于个别lcore处理循环中出现了卡顿或者压力过大,我们会进一步排查原因。

@icymoon
Copy link
Collaborator Author

icymoon commented Apr 10, 2018

谢谢@ywc689 统计数据抖一下没关系,最多是图不好看,我最怕的是处理卡顿意味着有稳定性或者性能隐患。。。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants