Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mellanox Technologies MT27500 双口网卡做bond 失败,报错Segmentation fault #975

Closed
fishyu-mushroom opened this issue Jul 3, 2024 · 2 comments

Comments

@fishyu-mushroom
Copy link

fishyu-mushroom commented Jul 3, 2024

本机共有三张Mellanox网卡 其中一个为双口的网卡,用于内网bond。dpdk0,dpdk1是在一个Mellanox双口网卡上。
网卡信息如下:
image
运行报错如下:
root@network-10-48-16-2:/home/intsig# dpvs -- -l 0-10 &
current thread affinity is set to FFFFFFFFFFFF
EAL: Detected 48 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Invalid NUMA socket, default to 0
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:01:00.0 (socket 0)
mlx5_pci: No kernel/verbs support for VF LAG bonding found.
common_mlx5: Failed to load driver = mlx5_pci.

EAL: Requested device 0000:01:00.0 cannot be used
EAL: Invalid NUMA socket, default to 0
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:01:00.1 (socket 0)
mlx5_pci: No kernel/verbs support for VF LAG bonding found.
common_mlx5: Failed to load driver = mlx5_pci.

EAL: Requested device 0000:01:00.1 cannot be used
EAL: Invalid NUMA socket, default to 0
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:02:00.1 (socket 0)
common_mlx5: RTE_MEM is selected.
mlx5_pci: Size 0xFFFF is not power of 2, will be aligned to 0x10000.
EAL: Invalid NUMA socket, default to 0
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:43:00.1 (socket 0)
mlx5_pci: Size 0xFFFF is not power of 2, will be aligned to 0x10000.
EAL: No legacy callbacks, legacy socket not created
DPVS: dpvs version: 1.9-6, build on 2024.07.03.18:33:27
DPVS: dpvs-conf-file: /etc/dpvs.conf
DPVS: dpvs-pid-file: /var/run/dpvs.pid
DPVS: dpvs-ipc-file: /var/run/dpvs.ipc
CFG_FILE: Opening configuration file '/etc/dpvs.conf'.
CFG_FILE: log_level = DEBUG
CFG_FILE: kni = on
NETIF: pktpool_size = 524287 (round to 2^n-1)
NETIF: pktpool_cache_size = 256 (round to 2^n)
NETIF: netif device config: dpdk0
NETIF: dpdk0:rx_queue_number = 8
NETIF: dpdk0:nb_rx_desc = 1024 (round to 2^n)
NETIF: dpdk0:rss = all
NETIF: dpdk0:tx_queue_number = 8
NETIF: dpdk0:nb_tx_desc = 1024 (round to 2^n)
NETIF: dpdk0:kni_name = dpdk0.kni
NETIF: netif device config: dpdk1
NETIF: dpdk1:rx_queue_number = 8
NETIF: dpdk1:nb_rx_desc = 1024 (round to 2^n)
NETIF: dpdk1:rss = all
NETIF: dpdk1:tx_queue_number = 8
NETIF: dpdk1:nb_tx_desc = 1024 (round to 2^n)
NETIF: dpdk1:kni_name = dpdk1.kni
NETIF: netif device config: dpdk2
NETIF: dpdk2:rx_queue_number = 8
NETIF: dpdk2:nb_rx_desc = 1024 (round to 2^n)
NETIF: dpdk2:rss = all
NETIF: dpdk2:tx_queue_number = 8
NETIF: dpdk2:nb_tx_desc = 1024 (round to 2^n)
NETIF: dpdk2:kni_name = dpdk2.kni
NETIF: netif device config: dpdk3
NETIF: dpdk3:rx_queue_number = 8
NETIF: dpdk3:nb_rx_desc = 1024 (round to 2^n)
NETIF: dpdk3:rss = all
NETIF: dpdk3:tx_queue_number = 8
NETIF: dpdk3:nb_tx_desc = 1024 (round to 2^n)
NETIF: dpdk3:kni_name = dpdk3.kni
NETIF: netif bonding config: bond0
NETIF: bonding bond0:mode=4
NETIF: bonding bond0:slave0=dpdk0
NETIF: bonding bond0:slave1=dpdk1
NETIF: bonding bond0:primary=dpdk0
NETIF: bonding bond0:kni_name=bond0.kni
NETIF: bonding bond0 options: dedicated_queues=off
NETIF: netif worker config: cpu0
NETIF: cpu0:type = master
NETIF: cpu0:cpu_id = 0
NETIF: netif worker config: cpu1
NETIF: cpu1:type = slave
NETIF: cpu1:cpu_id = 1
NETIF: worker cpu1:bond0 queue config
NETIF: worker cpu1:bond0 rx_queue_id += 0
NETIF: worker cpu1:bond0 tx_queue_id += 0
NETIF: worker cpu1:dpdk2 queue config
NETIF: worker cpu1:dpdk2 rx_queue_id += 0
NETIF: worker cpu1:dpdk2 tx_queue_id += 0
NETIF: worker cpu1:dpdk3 queue config
NETIF: worker cpu1:dpdk3 rx_queue_id += 0
NETIF: worker cpu1:dpdk3 tx_queue_id += 0
DTIMER: sched_interval = 500
NEIGHBOUR: arp_unres_qlen = 128
NEIGHBOUR: arp_reachable_timeout = 60
IPSET: ipset_hash_pool_size = 131072 (round to 2^n-1)
IPV4: ipv4:forwarding = off
IPV4: inet_def_ttl = 64
IP4FRAG: ip4_frag_buckets = 4096
IP4FRAG: ip4_frag_bucket_entries = 16 (round to 2^n)
IP4FRAG: ip4_frag_max_entries = 4096
IP4FRAG: ip4_frag_ttl = 1
IPV6: ipv6:disable = off
IPV6: ipv6:forwarding = off
RT6: route6:method = hlist
RT6: ipv6:route:recycle_time = 10
MSGMGR: msg_ring_size = 4096 (round to 2^n)
MSGMGR: sync_msg_timeout_us = 20000
MSGMGR: priority_level = low
IPVS: conn_pool_size = 2097152 (round to 2^n-1)
IPVS: conn_pool_cache = 256 (round to 2^n)
IPVS: conn_init_timeout = 3
IPVS: uoa_max_trail = 3
IPVS: udp_timeout_oneway = 60
IPVS: udp_timeout_normal = 300
IPVS: udp_timeout_last = 3
IPVS: tcp_timeout_none = 2
IPVS: tcp_timeout_established = 90
IPVS: tcp_timeout_syn_sent = 3
IPVS: tcp_timeout_syn_recv = 30
IPVS: tcp_timeout_fin_wait = 7
IPVS: tcp_timeout_time_wait = 7
IPVS: tcp_timeout_close = 3
IPVS: tcp_timeout_close_wait = 7
IPVS: tcp_timeout_last_ack = 7
IPVS: tcp_timeout_listen = 120
IPVS: tcp_timeout_synack = 30
IPVS: tcp_timeout_last = 2
IPVS: synack_mss = 1452
IPVS: synack_ttl = 63
IPVS: synproxy_synack_options_sack ON
IPVS: close_client_window ON
IPVS: rs_syn_max_retry = 3
IPVS: ack_storm_thresh = 10
IPVS: max_ack_saved = 3
IPVS: synproxy_conn_reuse ON
IPVS: synproxy_conn_reuse: CLOSE
IPVS: synproxy_conn_reuse: TIMEWAIT
SAPOOL: sapool_filter_enable = on
NETIF: Add bonding device "bond0"
mode: 4
primary: dpdk0
numa_node: 0
slaves: dpdk0 dpdk1
bond_ethdev_mode_set(1603) - Using mode 4, it is necessary to do TX burst and RX burst at least every 100ms.
NETIF: created bondig device bond0: mode=4, primary=dpdk0, numa_node=0
NETIF: bonding device port id range: [2, 3)
DTIMER: [01] timer initialized 0x7f4e9b41d600.
DTIMER: [03] timer initialized 0x7f4e9a41b600.
DTIMER: [02] timer initialized 0x7f4e9ac1c600.
DTIMER: [04] timer initialized 0x7f4e99c1a600.
DTIMER: [05] timer initialized 0x7f4e99419600.
DTIMER: [07] timer initialized 0x7f4e93ffd600.
DTIMER: [06] timer initialized 0x7f4e98c18600.
DTIMER: [10] timer initialized 0x7f4e927fa600.
DTIMER: [09] timer initialized 0x7f4e92ffb600.
DTIMER: [08] timer initialized 0x7f4e937fc600.
DTIMER: [00] timer initialized 0x5633b42d5660.
NETIF: LCORE STATUS
enabled: 0 1 2 3 4 5 6 7 8 9 10
disabled: 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

[1]+ Segmentation fault (core dumped)

dpsv config配置信息如下:

global_defs {
log_level DEBUG
!log_file /var/log/dpvs.log
! log_async_mode on
kni on
}

! netif config
netif_defs {
pktpool_size 524287
pktpool_cache 256

<init> device dpdk0 {
    rx {
        queue_number        8
        descriptor_number   1024
        rss                 all
    }
    tx {
        queue_number        8
        descriptor_number   1024
    }
    ! mtu                   1072
    ! promisc_mode
    ! allmulticast
    kni_name                dpdk0.kni
}
<init> device dpdk1 {
    rx {
        queue_number        8
        descriptor_number   1024
        rss                 all
    }
    tx {
        queue_number        8
        descriptor_number   1024
    }
    ! mtu                    1072
    ! promisc_mode
    ! allmulticast
    kni_name                dpdk1.kni
}
    <init> device dpdk2 {
    rx {
        queue_number        8
        descriptor_number   1024
        rss                 all
    }
    tx {
        queue_number        8
        descriptor_number   1024
    }
    ! mtu                   1072
    ! promisc_mode
    ! allmulticast
    kni_name                dpdk2.kni
}
   <init> device dpdk3 {
    rx {
        queue_number        8
        descriptor_number   1024
        rss                 all
    }
    tx {
        queue_number        8
        descriptor_number   1024
    }
    ! mtu                   1072
    ! promisc_mode
    ! allmulticast
    kni_name                dpdk3.kni
}

<init> bonding bond0 {
    mode                    4
    slave                   dpdk0
    slave                   dpdk1
    primary                 dpdk0
    kni_name                bond0.kni
    options                 dedicated_queues=off
}

}

! worker config (lcores)
worker_defs {
worker cpu0 {
type master
cpu_id 0
}

<init> worker cpu1 {
    type    slave
    cpu_id  1
    port    bond0 {
        rx_queue_ids     0
        tx_queue_ids     0
    }
    port    dpdk2 {
        rx_queue_ids     0
        tx_queue_ids     0
    }
    port    dpdk3 {
        rx_queue_ids     0
        tx_queue_ids     0
    }

}

}

! timer config
timer_defs {
# cpu job loops to schedule dpdk timer management
schedule_interval 500
}

! dpvs neighbor config
neigh_defs {
unres_queue_length 128
timeout 60
}

! dpvs ipset config
ipset_defs {
ipset_hash_pool_size 131072
}

! dpvs ipv4 config
ipv4_defs {
forwarding off
default_ttl 64
fragment {
bucket_number 4096
bucket_entries 16
max_entries 4096
ttl 1
}
}

! dpvs ipv6 config
ipv6_defs {
disable off
forwarding off
route6 {
method hlist
recycle_time 10
}
}

! control plane config
ctrl_defs {
lcore_msg {
ring_size 4096
sync_msg_timeout_us 20000
priority_level low
}
}

! ipvs config
ipvs_defs {
conn {
conn_pool_size 2097152
conn_pool_cache 256
conn_init_timeout 3
! expire_quiescent_template
! fast_xmit_close
! redirect off
}

udp {
    ! defence_udp_drop
    uoa_mode        opp
    uoa_max_trail   3
    timeout {
        oneway      60
        normal      300
        last        3
    }
}

tcp {
    ! defence_tcp_drop
    timeout {
        none        2
        established 90
        syn_sent    3
        syn_recv    30
        fin_wait    7
        time_wait   7
        close       3
        close_wait  7
        last_ack    7
        listen      120
        synack      30
        last        2
    }
    synproxy {
        synack_options {
            mss             1452
            ttl             63
            sack
            ! wscale        0
            ! timestamp
        }
        close_client_window
        ! defer_rs_syn
        rs_syn_max_retry    3
        ack_storm_thresh    10
        max_ack_saved       3
        conn_reuse_state {
            close
            time_wait
            ! fin_wait
            ! close_wait
            ! last_ack
       }
    }
}

}

! sa_pool config
sa_pool {
pool_hash_size 16
flow_enable on
}

@fishyu-mushroom fishyu-mushroom changed the title Mellanox Technologies MT27500 双口网卡做bond 失败。 Mellanox Technologies MT27500 双口网卡做bond 失败,报错Segmentation fault Jul 3, 2024
@fishyu-mushroom
Copy link
Author

如果双线网卡只用一个口。
image

让后dpdk0 dpdk1做bond,就可以正常启动。

想问下是不支持Mellanox Technologies MT27500 双口网卡 做bond吗?

@fishyu-mushroom
Copy link
Author

问题原因是: 在系统层对二个端口做了bond,取消系统层的bond就可以了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant