Introduction
MicroVMs are a technology I was playing with for the first product we considered spinning out, the Virtual Server Room, a sort of virtual appliance micro-cluster in a box made up of back office IT servers. I thought I would write a bit about MicroVMs because I think they are going to have a special place in a future where virtualization is a dominant technology.
MicroVMs are to virtual appliances what small embedded systems like Linksys routers are to larger scale hardware appliances, like a NetScreen-50 firewall. Today’s typical virtual appliance is typically a simplistic packaging of a traditional OS and a bundled application. Some folks, like rPath and JumpBox have either created tools that more sophisticated packaging of these virtual appliances, like a traditional appliance, or, in the case of JumpBox, literally attempt to recreate the hardware appliance experience with a virtual machine.
I think that over time we’ll start seeing more sophisticated virtual appliances, but they will continue in the mold of the current crop and essentially under the hood is a general purpose OS that has been heavily customized.
Contrast this to small embedded appliances we see every day like Linksys routers, the iPhone, and similar products. Or more esoteric embedded systems like those from Dust Networks. Most embedded systems have specialized operating systems. Even when based on a more general purpose OS, they tend to have been stripped of almost anything recognizable to the original OS and, becoming in effect an embedded OS. This is usually done for power or simplicity concerns.
In fact, the smaller the appliance, the less likely it is to share attributes with general purpose computers. Smaller appliances have no disk drives, no consoles, no serial port access, with only a single button to reset them to ‘factory default’.
Imagine the equivalent for today’s virtual appliances. This is what I call a MicroVM.
MicroVMs
Given that resources such as disk, memory, and compute are so cheap why would a MicroVM that mimics a typical small form-factor embedded system be of interest? There are a number of reasons, which are generally similar to those for a virtual appliance, including:
- Increased security
- Ease of deployment
- Disposability
But in my mind, perhaps the most interesting usage is the temporary deployment of services at run-time in front of already deployed virtual machines. For example, say that you want to deploy a network sniffer in front of an already deployed virtual appliance? This is easy today. An IT staffer will simply put a laptop or a network sniffing device temporarily in-line with a physical server. But with a virtual server it is much harder. In fact, we’re regularly seeing more complex network setups inside of a single physical server while the tools to troubleshoot and debug those setups have not become more sophisticated.
An Example
As part of the Virtual Server Room (VSR) project, which never saw the light of day outside of the lab, I developed a MicroVM based on the excellent pfSense (firewall) live CD-ROM. This firewall MicroVM built on top of VMware technology had some interesting characteristics:
- Booted from a ‘live’ CD-ROM
- Configuration state was saved to a ‘floppy disk’
- RAM footprint was 64MB
- No disk drives
This allowed me to install a network firewall in front of every customer’s Virtual Server Room using a mere 64MB of RAM, 35MB of disk space for the ISO CD-ROM, and 1.44MB per MicroVM for the ‘floppy disk’ image. In return, for each customer I had a full-fledged, easy to use and maintain firewall including routing, NAT, and a dedicated DMZ segment for ‘public’ virtual appliances. (In this particular case, the pfSense live CD-ROM is a MicroVM out of the box essentially, I just added control, configuration, and provisioning tools on top of it).
The embedded characteristics made adding a firewall to each VSR trivial and painless.
Wrapping Up
This is a very small example, but I think in the future we’re going to see most virtualization platforms (what the folks at CloudScale call a ‘virtual fabric’) such as VMware Virtual Center, VirtualIron, EC2, and XenSource make it very easy to change and modify the ‘physical’ structure inside the fabric itself at run-time. In those cases, adding MicroVMs on the fly for diagnostics, security, or similar capabilities is a no-brainer. How many security and network engineers would love the ability to slap a specialized tool into their current networks on-demand and in seconds? Most, I think.
We’re going to see a number of interesting use cases like this in the near future. MicroVMs, coming to you soon…
July 12th, 2007
This entry is partly a rebuttal to Billy Marshall’s recent blog entry Amazon and the CIO Nightmare and also partly an opportunity to transition this blog to a place to expound a bit on what interests me in technology and IT.
Quick Background
One of the things I did in my first several jobs was try to automate myself out of those same jobs. In fact, in one of my early positions I was actually chided for using so much automation, given the net result was that I had spare time to read USENET news. My immediate manager thought that I was wasting time, but the developers had a different perspective, assured by the fact that the systems just ran and they seemingly had no problems and zero outages. (A very unusual circumstance in any organization)
Since then, reducing operational costs in terms of labor, capacity planning, and scalability has been something close to my heart.
The Misconception
This is why Billy’s article feels off-track. It propagates the myth that the costs of IT infrastructure are solely those of bits and bytes. It’s true, Amazon’s Simple Storage Service (S3), Elastic Compute Cloud (EC2), and related services do bring new economies of scale to IT resources such as storage and compute. These services are available to anyone with a credit card and surely this compelling cost structure will lead to wide adoption over time.
Or, it would except there is a major flaw in the theory. It’s not enough to compare these items without comparing the management costs involved.
Operational Costs
As everyone knows the costs of storage and compute (per unit) have been on a steady decline for as long as we can remember. Amazon Web Services simply combine Amazon’s buying power with today’s already rock-bottom storage and compute costs to the masses in an on-demand fashion.
For storage this turns out be an amazing advantage. Download one simple tool, upload your files to Amazon’s S3 and voila you’re done. You can retrieve them or share them at your leisure. My 10-year old cousin could do it.
But Amazon’s Elastic Compute Cloud (EC2) is a whole different ball game. In a matter of a few minutes my 10-year old cousin can instantiate a server. A plain jane Amazon virtual machine, one customized by a third party or even a sophisticated rPath appliance. And then?
A single virtual machine, even a virtual appliance, has little value in isolation for folks in the enterprise. And probably little value to most enterprise employees given that any rPath appliance they instantiate on Amazon could disappear without notice and with complete loss of data (more below).
The reality is, for the enterprise, once the storage / compute have been commoditized the real cost that is left is human labor. Again, a single server or virtual appliance has little value in isolation. At the bare minimum it requires care and feeding, but in most cases it requires other servers or appliances to provide adjunct services.
Appliance Clusters
Servers and appliances work in groups to acheive a given effect. For example, at the edge of your network an enterprise will monitor and protect against a multitude of potential malicious activities, including viruses, spam, automated attacks, real-time attackers, inappropriate content, information leaks, and many more. There is not only one appliance to solve this problem, but savvy customers don’t want a single appliance. They want one application on one server or appliance.
Soon your cluster (aka ‘architecture’) of security appliances is a large group requiring significant care and feeding. Virtual appliances themselves don’t provide this cluster management capability and neither do commodity virtual grids such as Amazon’s Elastic Compute Cloud.
The Devil in the Details
But it’s even more complex than this. Let’s say you want to deploy a full-scale production system to leverage Amazon’s scale. Many challenges will arise.
For one, Amazon doesn’t guarantee your server will stay around. It’s a virtual server that goes away immediately upon being shutdown, whether due to a mistake, by maliciousness, or due to hardware failure. All data is lost and nothing is recoverable. There are ways to solve this particular problem, but it takes skill and expertise to build redundant systems / storage around Amazon’s service offerings, particularly EC2.
Large scale systems don’t run themselves. They need to be tested, monitored, triaged (when problems arise), new code deployed, and capacity needs to be added when needed to handle load. [sidenote: my personal opinion is that these are all issues of “scale” obviously.]
This matters to the enterprise.
Simply put, while most Amazon Web Services are self-service in the truest sense, the Elastic Compute Cloud is an order of magnitude (or more) complex in terms of management and usage. Just provisioning a server on demand in minutes doesn’t help one build real web-scale systems.
The Real Cost Comparison
There are different estimates of how many servers per sysadmin/IT staffer you can run, but they range between 20 and 100 servers per. Let’s look at both ends of the spectrum. We’ll assume a median income (Silicon Valley) of an IT worker of $80,000 per year. And that all of your EC2 instances are wired up 24×365, which is 8760 hours total per year.
With 1 instance, that’s $9.13/hour of time. At 20, it’s $.46/hour of time and at 100, it’s a mere $.05. Of course, this is just a back of the napkin calculation, but the point I’m driving to here is that at the low end of the spectrum your labor management costs are 5x the cost of your EC2 instance, not including non-labor related management costs. At the high end, where a single administrator runs 100 servers, your labor costs are still 50% of your costs to use EC2.
When Utility Computing Matters
This is why utility computing (i.e. the ‘compute’ not the ’storage’) will matter most when we have driven the costs of systems management to a place where they are reasonable. Is it 10x the cost of the compute? 5x? 50%? I don’t know. The market will tell us for sure, but I will put a stake in the ground and say that the compelling cost structure of utility computing will really matter when the operational costs to run that compute infrastructure are within 2x the costs of the infrastructure itself.
June 20th, 2007
Apologies for the radio silence. We have been rapidly evolving the Virtual Server Room product over the past several months. The result has been the decision to morph into a service that leverages Amazon’s Elastic Compute Cloud (EC2) and to roll that service into a new company and organization, CloudScale Networks.
You can find out more at the CloudScale website. In the meantime, neoTactics will continue as an IT-focused consultancy.
May 7th, 2007
Yesterday, virt-factory, an internal RedHat project, went public. It is a framework for deploying and managing virtual appliances in a datacenter environment. It’s great to see things heading this direction, one we feel is imminent. This project is brought to you by David Lutterkort who works in RedHat’s emerging technologies group and is also a principle contributor to Puppet, a next generation system configuration system that we are interested in.
March 16th, 2007