I am working on a maximum throughput site to site VPN solution and have dedicated VPN hardware on each end as well as dedicated traffic generation computers (used iperf, iperf3, and SoftEther Traffic generator). I am bench testing and everything is directly connected. End to end latency is ~5 ms through loaded tunnel.
Pre-test 1: direct iperf (no SoftEther tunnel) between "vpnserver" and "vpnbridge" is ~920 Mbps with sub 0.1 ms latency as it is 1 GigE NIC hardware and 1 GigE switch.
Test 1: "vpnserver" and "vpnbridge" are Quad core i7-6700K @ 4.00 GHz on each end and with 1 GigE NIC hardware and 1 GigE switch. It appears I can max the tunnel out at ~850/850 Mbps bidirectional throughput. It does not appear to matter at all how many TCP sessions are active.
Pre-test 2: direct iperf (no SoftEther tunnel) between "vpnserver" and "vpnbridge" is ~9.15 Gbps with sub 0.1 ms latency as it is 10 GigE NIC hardware and 10 GigE switch.
Test 2: "vpnserver" and "vpnbridge" are Octo core Xeon D-1541 @ 2.10 GHz on each end and with 10 GigE NIC hardware and 10 GigE switch. It appears the tunnel maxes out at ~425/425 Mbps bidirectional throughput. It does not appear to matter at all how many TCP sessions are active, I tried everything from 32 down to 1.
Enabling SoftEther Cascade "half-duplex" mode appeared to have no impact on overall throughput.
Disabling QoS appeared to have no impact on overall throughput.
During Test 2, I began going into the BIOS and reducing the number of active cores:
8 cores ~ 425/425 Mbps bi-directional throughput
4 cores ~ 425/425 Mbps bi-directional throughput
2 cores ~ 390/390 Mbps bi-directional throughput
1 cores ~ 375/375 Mbps bi-directional throughput
(The reduced core count throughput is from my memory, I was not recording at the time.)
I also tried "compression" but that really slowed things down (~125/125 Mbps bi-directional throughput).
My traffic load is iperf with "-d" and with many sessions running which matches my real world use case of supporting many users on each side of the tunnel and accessing multiple resources (not just a single bulk file copy).
My current conclusion based on testing and results is that SoftEther appears to be single CPU bound and therefore maximum throughput is a result of single core maximum CPU processing.
Can anyone confirm or deny this, please?
Did I miss something in my testing? Suggestions?
Does anyone know of a way to increase overall site to site throughput?
Is there a methodology for SoftEther to maximize core count by establishing a tunnel per core and then balancing the overall traffic load across cores (a single traffic flow being limited to a single tunnel but the aggregate being spread across all available cores)?
Thank you!
Maximum Throughput Site to Site VPN Solution
-
- Posts: 22
- Joined: Wed Jan 25, 2017 8:40 pm
-
- Posts: 22
- Joined: Wed Jan 25, 2017 8:40 pm
Re: Maximum Throughput Site to Site VPN Solution
Attached are a couple images showing system load (htop) and throughput (bwm-ng).
You do not have the required permissions to view the files attached to this post.
-
- Posts: 22
- Joined: Wed Jan 25, 2017 8:40 pm
Re: Maximum Throughput Site to Site VPN Solution
This is a comparison image showing system load and throughput.
You do not have the required permissions to view the files attached to this post.
-
- Posts: 2458
- Joined: Mon Feb 24, 2014 11:03 am
Re: Maximum Throughput Site to Site VPN Solution
Please try Windows.
-
- Posts: 22
- Joined: Wed Jan 25, 2017 8:40 pm
Re: Maximum Throughput Site to Site VPN Solution
thisjun wrote:
> Please try Windows.
Thank you for the suggestion, Windows is not an option for this project and I do not have a copy to use for testing, everything is unix based (Linux and macOS).
> Please try Windows.
Thank you for the suggestion, Windows is not an option for this project and I do not have a copy to use for testing, everything is unix based (Linux and macOS).
-
- Posts: 2458
- Joined: Mon Feb 24, 2014 11:03 am
Re: Maximum Throughput Site to Site VPN Solution
Localbridge on Linux is not good performance because Linux can only receive one packet on one system call.
MacOS is more bad.
MacOS is more bad.