Still trying to figure out how to maximize ipsec throughput, and it seems that perhaps Protectli hardware is a better bet than Netgate. For example, compare the Protectli FW6C (CPU: I5 7200U) vs Netgate SG-5100 (CPU: Atom C3558). Equivalently configured, the Protectli is $150 less ($546 vs $699), and its CPU is nearly twice the power (https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-7200U+%40+2.50GHz&id=2865 vs https://www.cpubenchmark.net/cpu.php?cpu=Intel+Atom+C3558+%40+2.20GHz&id=3129).
The C3558 has more cores and over double the cache. Increased cache is hard to quantify in terms of benefit, but is usually quite significant. On the other hard, it has a lower clock speed and no turboboost. In the end I’d say these are pretty balanced, especially because IPSec performance AFAIK is going to be limited by the AES-NI blocks within the CPU, which for a given generation is the same for all of the CPUs (if one of these is going to have an edge in AES-NI, I expect it to be the C3558, because it is newer and designed for networking/edge appliances).
I’ve been disappointed by the ipsec throughput using the C35587 or i3-4130T; it seems to max out at 220 mbps. What about the C3758 or the Xeon D-1541?
I don’t have any firsthand knowledge, but I would say to make sure that you have the hardware acceleration enabled for AES. The documentation about this is here: https://docs.netgate.com/pfsense/en/latest/hardware/cryptographic-accelerator-support.html but specifically “IPsec will take advantage of cryptodev automatically when a supported cipher is chosen. … For AES-NI acceleration, use AES-GCM on both sides of the tunnel.”
Yeah, thanks, but I’ve had hardware acceleration enabled on both pfsense boxes (C3558 and i3-4130t).
Problem completely solved. With proper crypto settings, ipsec throughput is 900 mbps on gbps LAN.
I spoke too fast.
ipsec throughput is 900 mbps flowing from the SG-5100 (C3558) to the i3-4130t box, but maxes out at only 100 mbps in the other direction. What could cause such asymmetry? could it be asymmetry in encryption vs decryption? on which side?
At 900 mbps (SG-5100 to i3 box), the C3558 is at 75-80%, the i3 at 25%. in other direction, at 100 mbps, the C3558 is at 20%, the i3 at 9%.
I would double check your settings on the i3 side. Something seems off.
You can change these to be the cipher you’re using and see the raw speed your chips can handle. I’m pretty sure this is encrypt, and decrypt is generally 1.5-2x faster.
openssl speed -elapsed aes-256-cbc
openssl speed -elapsed -evp aes-256-cbc
I checked that both boxes have identical ipsec settings:
Phase 1: AES128-GCM 128 bits SHA1 1 (768 bits)
Phase 2: AES128-GCM 128 bits
I agree that the issue is likely with the i3’s encryption. but cannot figure out the problem.
Go into Diagnostics > Command Prompt and run these on each.
openssl speed -elapsed aes-128-gcm
openssl speed -elapsed -evp aes-128-gcm
I can’t find any documentation on if there are different generations of ASE-NI chips or what cyphers they support. But I’m thinking that the 4th gen i3 just might not have good support for aes-gcm. Those commands will give you some concrete numbers to go off of.
I can’t speak to Netgate, but I’ve been rocking a Protectli for about 18 months and I can’t speak highly enough about it and the company. Rock solid device and amazing support. Actually, ridiculously great support considering how inexpensive the devices are.
The first command generated this:
Error: bad option or value
mdc2 md4 md5 hmac sha1 sha256 sha512 whirlpoolrmd160
idea-cbc seed-cbc rc2-cbc rc5-cbc bf-cbc
des-cbc des-ede3 aes-128-cbc aes-192-cbc aes-256-cbc aes-128-ige aes-192-ige aes-256-ige
camellia-128-cbc camellia-192-cbc camellia-256-cbc rc4
rsa512 rsa1024 rsa2048 rsa4096
dsa512 dsa1024 dsa2048
ecdsap160 ecdsap192 ecdsap224 ecdsap256 ecdsap384 ecdsap521
ecdsak163 ecdsak233 ecdsak283 ecdsak409 ecdsak571
ecdsab163 ecdsab233 ecdsab283 ecdsab409 ecdsab571
ecdhp160 ecdhp192 ecdhp224 ecdhp256 ecdhp384 ecdhp521
ecdhk163 ecdhk233 ecdhk283 ecdhk409 ecdhk571
ecdhb163 ecdhb233 ecdhb283 ecdhb409 ecdhb571
idea seed rc2 des aes camellia rsa blowfish
-engine e use engine e, possibly a hardware device.
-evp e use EVP e.
-decrypt time decryption instead of encryption (only EVP).
-mr produce machine readable output.
-multi n run n benchmarks in parallel.
The second command:
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-gcm for 3s on 16 size blocks: 66926332 aes-128-gcm’s in 3.00s
Doing aes-128-gcm for 3s on 64 size blocks: 42692863 aes-128-gcm’s in 3.02s
Doing aes-128-gcm for 3s on 256 size blocks: 22995223 aes-128-gcm’s in 3.02s
Doing aes-128-gcm for 3s on 1024 size blocks: 6861409 aes-128-gcm’s in 3.01s
Doing aes-128-gcm for 3s on 8192 size blocks: 1034296 aes-128-gcm’s in 3.02s
OpenSSL 1.0.2o-freebsd 27 Mar 2018
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
The ‘numbers’ are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-gcm 356940.44k 906062.00k 1947047.72k 2335944.42k 2809683.84k
What do you make of this?
Great! Which model do you have? And afayct is ipsec throughput good?
Have you double checked that AES-NI offload is on under System > Advanced > Miscellaneous > Cryptographic Hardware?
That should be more than enough speed to saturate your bandwidth.
On a side note, use ``` on both sides of command output to preserve the formatting.
Turns out the solution was, as you suggested, setting Crypto Hardware as: AES-NI CPU-based acceleration.
Now I get close to 900 mbps in both directions. But interesting the C3758 is nearly maxed out (cpu usage at 94%) when it’s on the download side. The i3 is solid, encryption causing the cpu to roll along at 40%.
That download cpu usage is odd. You would expect the upload to be the highest. Do you have Suricata, Snort, or pfBlocker running? It sounds like it’s doing something more than just decrypting the traffic.
Good to hear you’re getting good speed now. It’s amazing how much difference those dedicated chips can make.
I made a typo: the maxed out chip is a C3558 (as mentioned earlier in the thread; I never used a C3758). The i3-4130T is just systematically more capable, which is not surprising if you check out the single thread and multithread benchmarks of these two cpus. This would seem to indicate that a custom build based on an i3-8100 would be sufficient for most uses (if its 65W and fan requirements are acceptable), as it should be able to handle two simultaneous gbps ispec tunnels.
I have the FW4A with the following specs:
- Intel® Atom™ CPU E3845 @ 1.91GHz
- 4 CPUs: 1 package(s) x 4 core(s)
- AES-NI CPU Crypto: Yes (inactive)
- RAM: 8GB, Disk: 56G
I’m not running VPN yet, so I can’t comment on IPSEC throughput, but I have a dozen packages running on it over a 1Gbps WAN and the CPU utilization never cracks 10%. Device is always cool.
I avoid captive portal unless is is absolutely necessary as many devices such as phones will no always prompt for it. As for enabling it, when you turn it on in pfsense you just choose what ever interface, either physicla or VLAN, you want it to apply too.