Controlling IPsec crypto cores to obtain better performance
The following examples to illustrate how Crypto engines are mapped to dataplane cores, and how that can be controlled by the user from CLI, in order to get the desired aggregate throughput.
Example 1
In this example, we have a vRouter with 10 cores with IPsec site-to-site tunnels to a single remote network. With the default dataplane core assignment, this results in a single free core, which can be used as a crypto engine, which results in sub-optimal IPsec forwarding performance, as all crypto SAs are assigned to a single core.
set interfaces dataplane dp0s5 address '1.1.1.1/24'
set interfaces dataplane dp0s6 address '192.85.1.1/24'
set interfaces dataplane dp0s7 address 10.10.10.1/24'
set security vpn ipsec esp-group ESP lifetime '86400'
set security vpn ipsec esp-group ESP pfs disable'
set security vpn ipsec esp-group ESP proposal 1 encryption 'aes128gcm128'
set security vpn ipsec esp-group ESP proposal 1 hash 'null'
set security vpn ipsec ike-group IKE ike-version '2'
set security vpn ipsec ike-group IKE lifetime '86400'
set security vpn ipsec ike-group IKE proposal 1 dh-group '2'
set security vpn ipsec ike-group IKE proposal 1 encryption 'aes256'
set security vpn ipsec ike-group IKE proposal 1 hash 'sha2_512'
set security vpn ipsec site-to-site peer 1.1.1.5 authentication mode 'pre-shared-secret'
set security vpn ipsec site-to-site peer 1.1.1.5 authentication pre-shared-secret 'test'
set security vpn ipsec site-to-site peer 1.1.1.5 default-esp-group 'ESP'
set security vpn ipsec site-to-site peer 1.1.1.5 ike-group 'IKE'
set security vpn ipsec site-to-site peer 1.1.1.5 local-address '1.1.1.1'
set security vpn ipsec site-to-site peer 1.1.1.5 tunnel 1 local prefix '192.85.1.0/24'
set security vpn ipsec site-to-site peer 1.1.1.5 tunnel 1 protocol 'all'
set security vpn ipsec site-to-site peer 1.1.1.5 tunnel 1 remote prefix '196.85.1.0/24'
The following monitor dataplane and show dataplane command output shows that there is a single core for all crypto traffic and the vRouter total crypto throughput is limited to 1.3 Mpps.
Dataplane CPU activity
Core Interface RX Rate TX Rate Idle
--------------------------------------------------------
1 dp0s7 0 250 µs
2 dp0s7 0 250 µs
3 dp0s6 1.5M 1 µs
4 dp0s6 1.5M 3 µs
5 dp0s5 0 250 µs
6 dp0s5 0 250 µs
7 dp0s5 0 250 µs
8 dp0s5 1.5M 10 µs
9 [crypt] 1.3M 0 µs
RX TX Slow Path
Interface Packets Rate Packets Rate In Out
------------------------------------------------------------------------------
[crypt] 333087232 1.3M
dp0s5 397954426 1.5M 3 11
dp0s6 780658237 2.9M 51 6
dp0s7 0 0 0 7
As dp0s7 is just for low volumes of traffic, this can be limited to a single core, and dp0s5 can be reduced to just 2 cores as the 40Gb link it underutilized, and dp0s6 remains with 2 cores via the configuration.
set interfaces dataplane dp0s7 cpu-affinity 1
set interfaces dataplane dp0s5 cpu-affinity 2-3
set interfaces dataplane dp0s6 cpu-affinity 4-5
Following a reboot, we now see that the cpu assignment matched our configuration, and we now have 2 crypt processes and the vRouter total crypto throughput is increased 2.8 Mpps.
Dataplane CPU activity
Core Interface RX Rate TX Rate Idle
--------------------------------------------------------
1 dp0s7 0 250 µs
dp0s7 0 250 µs
2 dp0s5 0 250 µs
dp0s5 0 250 µs
3 dp0s5 0 250 µs
dp0s5 1.5M 10 µs
4 dp0s6 1.5M 1 µs
5 dp0s6 1.5M 0 µs
8 [crypt] 1.3M 0 µs
9 [crypt] 1.5M 0 µs
vyatta@dut-1:~$ show vpn ipsec sa
Peer ID / IP Local ID / IP
------------ -------------
1.1.1.5 1.1.1.1
Tunnel Id State Bytes Out/In Encrypt Hash DH A-Time L-Time
------ ---------- ----- ------------- ------------ -------- -- ------ ------
1 1 up 10.0G/9.7G aes128gcm128 null 2 11 86400
vyatta@dut-1:~$ show dataplane
RX TX Slow Path
Interface Packets Rate Packets Rate In Out
------------------------------------------------------------------------------
[crypt] 1796307664 2.8M
dp0s5 3304762213 1.5M 36 40
dp0s6 3845688861 2.9M 30 7
dp0s7 0 0 0 6
Further performance improvements can be made by splitting the traffic across multiple tunnels whose crypt processes can run on the other 2 free cores. In this example, the customer's traffic profile is split between TCP and other protocols, better performance can be obtained by creating a second tunnel, (which will create a second pair of SAs, and therefore a second pair of crypt processes)
set security vpn ipsec site-to-site peer 1.1.1.5 tunnel 2 local prefix '192.85.1.0/24'
set security vpn ipsec site-to-site peer 1.1.1.5 tunnel 2 remote prefix '196.85.1.0/24'
set security vpn ipsec site-to-site peer 1.1.1.5 tunnel 2 protocol tcp
Now we see that there are 4 crypto processes and the vRouter total crypto through-put is increased to 5.4 Mpps.
Dataplane CPU activity
Core Interface RX Rate TX Rate Idle
--------------------------------------------------------
1 dp0s7 0 250 µs
dp0s7 0 250 µs
2 dp0s5 0 250 µs
dp0s5 0 250 µs
3 dp0s5 0 250 µs
dp0s5 2.6M 10 µs
4 dp0s6 1.5M 0 µs
5 dp0s6 1.5M 1 µs
6 [crypt] 1.3M 0 µs
7 [crypt] 1.3M 0 µs
8 [crypt] 1.4M 0 µs
9 [crypt] 1.3M 0 µs
vyatta@dut-1:~$ show vpn ipsec sa
Peer ID / IP Local ID / IP
------------ -------------
1.1.1.5 1.1.1.1
Tunnel Id State Bytes Out/In Encrypt Hash DH A-Time L-Time
------ ---------- ----- ------------- ------------ -------- -- ------ ------
1 5 up 40.9G/34.0G aes128gcm128 null 2 106 86400
2 6 up 21.4G/17.0G aes128gcm128 null 2 105 86400
vyatta@dut-1:~$ show dataplane
RX TX Slow Path
Interface Packets Rate Packets Rate In Out
--------------------------------------------------------------------------
[crypt] 5549204792 5.4M
dp0s5 2770036752 2.6M 31 36
dp0s6 3219935397 2.9M 30 7
dp0s7 0 0 0 6
If there were initially no free cores, rather than a single one, on a vRouter with 9 cores, the crypt processes would share the cores with the dataplane forwarding threads, as shown below.
Dataplane CPU activity
Core Interface RX Rate TX Rate Idle
--------------------------------------------------------
1 dp0s7 0 250 µs
[crypt] 1.3M 0 µs
2 dp0s7 0 250 µs
[crypt] 1.5M 0 µs
3 dp0s5 0 250 µs
4 dp0s5 0 250 µs
5 dp0s5 0 250 µs
6 dp0s5 1.5M 7 µs
7 dp0s6 1.5M 0 µs
8 dp0s6 1.5M 1 µs
Example 2
In the following example, the CPU cycles for the crypt process have to be shared with that on the dataplane forwarding, however, as in this case the dp0s7 interface is receiving no traffic, the performance matches that of a dedicated core as shown in the previous example, and the vRouter total crypto throughput is 2.8 Mpps.
vyatta@dut-1:~$ show dataplane
RX TX Slow Path
Interface Packets Rate Packets Rate In Out
------------------------------------------------------------------------------
[crypt] 500745504 2.8M
dp0s5 271005437 1.5M 3 10
dp0s6 537440476 2.9M 53 7
dp0s7 0 0 0 7
Performance can be improved by creating a second tunnel for the TCP traffic resulting in total crypto throughput of 4.3 Mpps. Here the performance does not match that of the dedicated cores, as core 6 is now doing both packet forwarding and crypt processing.
Dataplane CPU activity
Core Interface RX Rate TX Rate Idle
--------------------------------------------------------
1 dp0s7 0 250 µs
2 dp0s7 0 250 µs
3 dp0s5 0 250 µs
[crypt] 1.3M 0 µs
4 dp0s5 0 250 µs
[crypt] 1.4M 0 µs
5 dp0s5 0 250 µs
[crypt] 796.8K 2 µs
6 dp0s5 2.3M 0 µs
[crypt] 796.8K 0 µs
7 dp0s6 1.5M 2 µs
8 dp0s6 1.5M 1 µs
vyatta@dut-1:~$ show dataplane
RX TX Slow Path
Interface Packets Rate Packets Rate In Out
------------------------------------------------------------------------------
[crypt] 410930913 4.3M
dp0s5 665245216 2.3M 10 15
dp0s6 1166023120 2.9M 53 7
dp0s7 0 0 0 7