How did they build that — EC2 Enhanced Networking

Among the flurry of new features introduced by AWS in 2013, is a performance enhancement known as ‘Enhanced Networking‘. According to the blurb: ” enhanced networking on your instance results in higher performance (packets per second), lower latency, and lower jitter’. The requirements are that you install an Intel 10GbE driver (ixgbef) in your instance and enable a feature called SR-IOV.

The AWS cloud is built around virtualization technology — specifically your instances are virtual machines running on top of a version of the open source Xen Hypervisor.
The hypervisor is what guarantees the isolation between my instance and your instance when they both run on the same set of CPUs.

The hypervisor intercepts all I/O from the virtual machine so that the virtual machine is abstracted from the hardware — this provides security as well as portability since the VM doesn’t need to care about the drivers for the I/O hardware. The VM sees a NIC that is software defined and as a result the hypervisor can inspect all traffic to and from the VM. -This allows AWS to control the networking traffic between the VM and the rest of the infrastructure. This is used to deliver features such as security groups and ACL.

The downside of processing all network traffic to/from the VM is that the host CPU cycles are consumed processing this traffic. This is quite a significant overhead compared to a bare-metal instance. The hypervisor needs to apply stateful firewall rules on every packet, switch the packet and encapsulate it. Some estimates put this overhead as high as 70% of the CPU available to the hypervisor (at 10 Gb/s). Software processing also introduces problems of noisy neighbors — variable jitter and high latency at 10Gbps are common.

Slide2

Fortunately, SR-IOV (Single Root IO Virtualization) provides a direct path for the VM to access the underlying hardware NIC. Bypassing the hypervisor leads to line-rate performance. Enhanced Networking takes advantage of this: in order to benefit from this, your AMI needs to have SR-IOV drivers installed in it.

Slide3

Great — but now that the hypervisor is out of the path, how does AWS provide software-defined features such as security groups and ACL? The current generation of SR-IOV NICs (AWS uses the Intel 82599 ) do not have stateful firewalls or the ability to have process large number of ACL. Furthermore, we know that AWS must be using some kind of encapsulation / tunnelling so that VPC are possible. The Intel 82599 does not provide encapsulation support.

The solution then would be to do the extra processing elsewhere — either off the host or in the host, using a co-processor. This schematic shows processing happening at the TOR switch. The drawback is that even intra-host traffic has to be tromboned via the TOR. Furthermore the switch now becomes a pretty big bottleneck and a failure in the switch could lead to several hosts losing network connectivity.Slide4

 

Using a co-processor would be the best solution. Tilera is one such processor that comes to mind. Since the Tilera provides general purpose processing cores, the encap/decap/filtering/stateful firewall processing could be done in software instead of ASICs or FPGAs.

Slide5

 

The software/hardware solution could allow AWS to introduce further innovations in its networking portfolio, including end-to-end encryption, IDS and IPS.

Disclaimer: I have no knowledge of AWS internals. This is just an exploration of “how did they build it?”.

Update: a confirmation of sorts on Werner Vogel’s blog: http://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html

Advertisements

2 thoughts on “How did they build that — EC2 Enhanced Networking

  1. Pingback: AWS的增强型网络是如何支持VPC的 | 云计算技术分享交流

  2. Pingback: AWS的增强型网络是如何支持VPC的 | 云计算技术分享交流

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s