AWS CloudFormation provides a simple-yet-powerful way to create ‘stacks’ of Cloud resources with a single call. The stack is described in a parameterized template file; creation of the stack is a simple matter of providing stack parameters. The template includes description of resources such as instances and security groups and provides a language to describe the ordering dependencies between the resources.
CloudStack doesn’t have any such tool (although it has been discussed). I was interested in exploring what it takes to provide stack creation services to a CloudStack deployment. As I read through various sample templates, it was clear that the structure of the template imposed an ordering of resources. For example, an ‘Instance’ resource might refer to a ‘SecurityGroup’ resource — this means that the security group has to be created successfully first before the instance can be created. Parsing the LAMP_Single_Instance.template for example, the following dependencies emerge:
WebServer depends on ["WebServerSecurityGroup", "WaitHandle"] WaitHandle depends on  WaitCondition depends on ["WaitHandle", "WebServer"] WebServerSecurityGroup depends on 
This can be expressed as a Directed Acyclic Graph — what remains is to extract an ordering by performing a topological sort of the DAG. Once sorted, we need an execution engine that can take the schedule and execute it. Fortunately for me, Ruby has both: the TSort module performs topological sorts and the wonderful Ruote workflow engine by @jmettraux. Given the topological sort produced by TSort:
["WebServerSecurityGroup", "WaitHandle", "WebServer", "WaitCondition"]
You can write a process definition in Ruote:
Ruote.define my_stack do sequence WebServerSecurityGroup WaitHandle WebServer WaitCondition end end
What remains is to implement the ‘participants‘ inside the process definition. For the most part it means making API calls to CloudStack to create the security group and instance. Here, the freshly minted CloudStack Ruby client from @chipchilders came in handy.
Stackmate is the result of this investigation — satisfyingly it is just 350 odd lines of ruby or so.
Ruote gives a nice split between defining the flow and the actual work items. We can ask Ruote to roll back (cancel) a process that has launched but not finished. We can create resources concurrently instead of in sequence. There’s a lot more workflow patterns here. The best part is that writing the participants is relatively trivial — just pick the right CloudStack API call to make.
While prototyping the design, I had to make a LOT of instance creation calls to my CloudStack installation — since I don’t have a ginormous cloud in back pocket, the excellent CloudStack simulator filled the role.
- As it stands today stackmate is executed on the command line and the workflow executes on the client side (server being CloudStack). This mode is good for CloudStack developers performing a pre-checkin test or QA developers developing automated tests. For a production CloudStack however, stackmate needs to be a webservice and provide a user interface to launch CloudFormation templates.
- TSort generates a topologically sorted sequence; this can be further optimized by executing some steps in parallel.
- There’s more participants to be written to implement templates with VPC resources
- Implement rollback and timeout
Given ruote’s power, Ruby’s flexibility and the generality of CloudFormation templates:
- We should be able to write CloudStack – specific templates (e.g, to take care of stuff like network offerings)
- We should be able to execute AWS templates on clouds like Google Compute Engine
- QA automation suddenly becomes a matter of writing templates rather than error-prone API call sequences
- Templates can include custom resources such as 3rd party services: for example, after launching an instance, make an API call to a monitoring service to start monitoring port 80 on the instance, or for QA automation: make a call to a testing service
- Even more general purpose complex workflows: can we add approval workflows, exception workflows and so on. For example, a manager has to approve before the stack can be launched. Or if the launch fails due to resource limits, trigger an approval workflow from the manager to temporarily bump up resource limits.