In my last post I argued that security groups eliminate the need for network security devices in certain parts of the datacenter. The trick that enables this is the network firewall in the hypervisor. Each hypervisor hosts dozens or hundreds of VMs — and provides a firewall per VM. The figure below shows a typical setup, with Xen as the hypervisor. Ingress network traffic flows through the hardware into the control domain (“dom0”) where it is switched in software (so called virtual switch or vswitch) to the appropriate VM.
The vswitch provides filtering functions that can block or allow certain types of traffic into the VM. Traffic between the VMs on the same hypervisor goes through the vswitch as well. The vswitch used in this design is the Linux Bridge; the firewall function is provided by netfilter ( “iptables”).
Security groups drop all traffic by default and only allow those configured by the rules. Suppose the red VMs in the figure (“Guest 1” and “Guest 4”) are in a security group “management”. We want to allow access to them from the subnet 192.168.1.0/24 on port 22 (ssh). The iptables rules might look like this:
iptables -A FORWARD -p tcp --dport 22 --src 192.168.1.0/24 -j ACCEPT iptables -A FORWARD -j DROP
Line 1 reads: for packets forwarded across the bridge (vswitch) that are destined for port 22, and are from source 192.168.1.0/24, allow (ACCEPT) them. Line 2 reads: DROP everything. The rules form a chain: packets traverse the chain until they match. (this is highly simplified: we want to match on the particular bridge ports that are connected to the VMs in question as well).
Now, let’s say we want to allow members of the ‘management’ group access their members over ssh as well. Let’s say there are 2 VMs in the group, with IPs of ‘A’ and ‘B’. We calculate the membership and for each VM’s firewall, we write additional rules:
#for VM A iptables -I FORWARD -p tcp --dport 22 --source B -j ACCEPT #for VM B iptables -I FORWARD -p tcp --dport 22 --source A -j ACCEPT
As we add more VMs to this security group, we have to add more such rules to each VM’s firewall. (A VM’s firewall is the chain of iptables rules that are specific to the VM). If there are ‘N’ VMs in the security group, then each VM has N-1 iptables rules for just this one security group rule. Remember that a packet has to traverse the iptables rules until it matches or gets dropped at the end. Naturally each rule adds latency to a packet (at least to the connection-initiating ones). After a certain number (few hundreds) of rules, the latency tends to go up hockey-stick fashion. In a large cloud, each VM could be in several security groups and each security group could have rules that interact with other security groups — easily leading to several hundred rules.
Aha, you might say, why not just summarize the N-1 source IPs and write a single rule like:
iptables -I FORWARD -p tcp --dport 22 --source <summary cidr> -j ACCEPT
Unfortunately, this isn’t possible since it is never guaranteed that the N-1 IPs will be in a single CIDR block. Fortunately this is a solved problem: we can use ipsets. We can add the N-1 IPs to a single named set (“ipset”). Then:
ipset -A mgmt <IP1> ipset -A mgmt <IP2> ... iptables -I FORWARD -p tcp --dport 22 -m set match-set mgmt src -j ACCEPT
IPSets matching is usually very fast and fixes the ‘scale up’ problem. In practice, I’ve seen it handle tens of thousands of IPs without significantly affecting latency or CPU load.
The second (perhaps more challenging) problem is that when the membership of a group changes, or a rule is added / deleted, a large number of VM firewalls have to be updated. Since we want to build a very large cloud, this usually means thousands or tens of thousands of hypervisors have to be updated with these changes. Let’s say in the single group/single rule example above, there are 500 VMs in the security groups. Adding a VM to the group means that 501 VM firewalls have to be updated. Adding a rule to the security group means that 500 VM firewalls have to be updated. In the worst case, the VMs are on 500 different hosts — making this a very big distributed systems problem.
If we consider a typical datacenter of 40,000 hypervisor hosts, with each hypervisor hosting an average of 25 VMs, this becomes the million firewall problem.
Part 2 will examine how this is solved in CloudStack’s Basic Zone.