Storage requirements are exploding causing more and more small and medium businesses to employ creative solutions to stem the tide. In December, Hu Yoshida, CTO of Hitachi Data Systems (HDS), posted a blog entry about projected enterprise data growth. The entire posting is worth a read, but the included chart really paints the picture well:

Experience with our clients bears this out and we think most organizations are ‘feeling the pinch’. Perhaps most interesting about this trend is that, as Hu points out in the article, data falls into different kinds of buckets. He chooses to talk about structured vs. unstructured data, but there are other ways to slice this pie.
Case In Point
For example, we find that most clients are struggling with price/performance issues for large pieces of data. To take an extreme case, a friend of ours at LucasFilm is currently struggling with increasing the cost-effectiveness of their storage solutions.
LucasFilm has three types of storage solution:
- Tier-1, high speed, low capacity disk storage
- Tier-2, average speed, high capacity disk storage
- Tier-3, slow speed, medium capacity long term tape storage (AKA ‘archives’)
This is fairly typical for most businesses. The difference in costs can be quite dramatic between tiers. In LucasFilm's case their cost for tier-1 storage can be as much as 30x the cost for tier-2.
In order to be cost-effective, LucasFilm continues to expand tier-2 storage capacity and spends considerable time and money shuffling data between tier-1 and tier-2 storage.
Unfortunately most smaller businesses do not have LucasFilm's resources. Even for larger organizations, increased spending on storage and storage management is undesirable.
New Solutions
Fortunately, there is a very compelling new solution to this problem. Sun’s ZFS is now available and production worthy. ZFS is a revolutionary new open source filesystem that provides all of the capabilities of a NetApp storage appliance, in terms of redundancy, ease of use, and capabilities (e.g. NFS, iSCSI, Windows File Sharing). It requires Solaris, OpenSolaris, NexentaOS, or FreeBSD to run, but will work on most modern hardware.
ZFS provides a cost-effective option for tier-2 (and some tier-1) storage solutions. No special hardware or RAID controllers are necessary. It is designed to work on both inexpensive commodity and enterprise class hardware.
Using ZFS you can now build high-capacity, redundant storage systems for as little as $.25-.50/GB, which is pretty close to street price for the drives themselves. Alternatively, you can build tier-1 high-performance redundant storage systems, roughly equivalent in quality to enterprise solutions, for as low as $2/GB, which is practically unheard of.
For example, recently we priced a StoreVault S550 (NetApp’s SMB targeted appliances) vs. a ‘roll your own’ ZFS appliance using NexentaStor (see below). Total cost for a 10TB solution was $23K vs. $6K for a NexentaStor-based ZFS appliance. List price for a 1TB 7200 RPM SATA drive for StoreVault? $1500. List price for buying your own 1TB SATA drive? Less than $200.
When ‘New’ Works
Some might consider ZFS too ‘new’ to put in production; however, we have been running it successfully in production for over a year and are very happy with it. It’s reduced our storage and storage management costs dramatically and allowed a small consulting business to have enterprise class storage without enterprise class prices.
Still, if you need more or desire enterprise support, you can get it. We recommend one of the following two options:
- Solaris + Sun support
- NexentaStor from Nexenta
Right now we prefer Nexenta as they are 100% focused on delivering a ZFS-based storage solution. They provide a commercial solution and enterprise support using a specialized NexentaOS-based storage distribution called NexentaStor.
NexentaStor
NexentaStor turns commodity hardware into a sophisticated storage appliance, like NetApp, but at a quarter the cost. Best of all, the NexentaStor solution is based on NexentaOS, itself a re-packaging of OpenSolaris using Debian (Linux) userland and utilities. This means network-based upgrades, easier management, and a very small distribution.
Some items in the pipeline from Nexenta that are very exciting include compact-flash based versions of NexentaStor. With judicious selection of your hardware you can reduce most, if not all points of failure in your storage systems, dramatically increasing MTBF (mean time between failures) and putting your ‘commodity’ hardware into the same class as similar enterprise solutions such as EMC or NetApp.
Conclusion
Whichever way you go, ZFS is a rock-solid production capable filesystem with a compelling value proposition. We especially like it’s use as a tier-2 storage solution.
I’ll follow up shortly and talk about some specific exciting use cases for ZFS, including it’s capabilities for compression at the filesystem level and it’s use as a backing store for virtual machines.
Disclaimer: neoTactics is a Nexenta certified partner and a proud member of the Sun Startup Essentials program.
April 28th, 2008
Introduction
MicroVMs are a technology I was playing with for the first product we considered spinning out, the Virtual Server Room, a sort of virtual appliance micro-cluster in a box made up of back office IT servers. I thought I would write a bit about MicroVMs because I think they are going to have a special place in a future where virtualization is a dominant technology.
MicroVMs are to virtual appliances what small embedded systems like Linksys routers are to larger scale hardware appliances, like a NetScreen-50 firewall. Today’s typical virtual appliance is typically a simplistic packaging of a traditional OS and a bundled application. Some folks, like rPath and JumpBox have either created tools that more sophisticated packaging of these virtual appliances, like a traditional appliance, or, in the case of JumpBox, literally attempt to recreate the hardware appliance experience with a virtual machine.
I think that over time we’ll start seeing more sophisticated virtual appliances, but they will continue in the mold of the current crop and essentially under the hood is a general purpose OS that has been heavily customized.
Contrast this to small embedded appliances we see every day like Linksys routers, the iPhone, and similar products. Or more esoteric embedded systems like those from Dust Networks. Most embedded systems have specialized operating systems. Even when based on a more general purpose OS, they tend to have been stripped of almost anything recognizable to the original OS and, becoming in effect an embedded OS. This is usually done for power or simplicity concerns.
In fact, the smaller the appliance, the less likely it is to share attributes with general purpose computers. Smaller appliances have no disk drives, no consoles, no serial port access, with only a single button to reset them to ‘factory default’.
Imagine the equivalent for today’s virtual appliances. This is what I call a MicroVM.
MicroVMs
Given that resources such as disk, memory, and compute are so cheap why would a MicroVM that mimics a typical small form-factor embedded system be of interest? There are a number of reasons, which are generally similar to those for a virtual appliance, including:
- Increased security
- Ease of deployment
- Disposability
But in my mind, perhaps the most interesting usage is the temporary deployment of services at run-time in front of already deployed virtual machines. For example, say that you want to deploy a network sniffer in front of an already deployed virtual appliance? This is easy today. An IT staffer will simply put a laptop or a network sniffing device temporarily in-line with a physical server. But with a virtual server it is much harder. In fact, we’re regularly seeing more complex network setups inside of a single physical server while the tools to troubleshoot and debug those setups have not become more sophisticated.
An Example
As part of the Virtual Server Room (VSR) project, which never saw the light of day outside of the lab, I developed a MicroVM based on the excellent pfSense (firewall) live CD-ROM. This firewall MicroVM built on top of VMware technology had some interesting characteristics:
- Booted from a ‘live’ CD-ROM
- Configuration state was saved to a ‘floppy disk’
- RAM footprint was 64MB
- No disk drives
This allowed me to install a network firewall in front of every customer’s Virtual Server Room using a mere 64MB of RAM, 35MB of disk space for the ISO CD-ROM, and 1.44MB per MicroVM for the ‘floppy disk’ image. In return, for each customer I had a full-fledged, easy to use and maintain firewall including routing, NAT, and a dedicated DMZ segment for ‘public’ virtual appliances. (In this particular case, the pfSense live CD-ROM is a MicroVM out of the box essentially, I just added control, configuration, and provisioning tools on top of it).
The embedded characteristics made adding a firewall to each VSR trivial and painless.
Wrapping Up
This is a very small example, but I think in the future we’re going to see most virtualization platforms (what the folks at CloudScale call a ‘virtual fabric’) such as VMware Virtual Center, VirtualIron, EC2, and XenSource make it very easy to change and modify the ‘physical’ structure inside the fabric itself at run-time. In those cases, adding MicroVMs on the fly for diagnostics, security, or similar capabilities is a no-brainer. How many security and network engineers would love the ability to slap a specialized tool into their current networks on-demand and in seconds? Most, I think.
We’re going to see a number of interesting use cases like this in the near future. MicroVMs, coming to you soon…
July 12th, 2007
This entry is partly a rebuttal to Billy Marshall’s recent blog entry Amazon and the CIO Nightmare and also partly an opportunity to transition this blog to a place to expound a bit on what interests me in technology and IT.
Quick Background
One of the things I did in my first several jobs was try to automate myself out of those same jobs. In fact, in one of my early positions I was actually chided for using so much automation, given the net result was that I had spare time to read USENET news. My immediate manager thought that I was wasting time, but the developers had a different perspective, assured by the fact that the systems just ran and they seemingly had no problems and zero outages. (A very unusual circumstance in any organization)
Since then, reducing operational costs in terms of labor, capacity planning, and scalability has been something close to my heart.
The Misconception
This is why Billy’s article feels off-track. It propagates the myth that the costs of IT infrastructure are solely those of bits and bytes. It’s true, Amazon’s Simple Storage Service (S3), Elastic Compute Cloud (EC2), and related services do bring new economies of scale to IT resources such as storage and compute. These services are available to anyone with a credit card and surely this compelling cost structure will lead to wide adoption over time.
Or, it would except there is a major flaw in the theory. It’s not enough to compare these items without comparing the management costs involved.
Operational Costs
As everyone knows the costs of storage and compute (per unit) have been on a steady decline for as long as we can remember. Amazon Web Services simply combine Amazon’s buying power with today’s already rock-bottom storage and compute costs to the masses in an on-demand fashion.
For storage this turns out be an amazing advantage. Download one simple tool, upload your files to Amazon’s S3 and voila you’re done. You can retrieve them or share them at your leisure. My 10-year old cousin could do it.
But Amazon’s Elastic Compute Cloud (EC2) is a whole different ball game. In a matter of a few minutes my 10-year old cousin can instantiate a server. A plain jane Amazon virtual machine, one customized by a third party or even a sophisticated rPath appliance. And then?
A single virtual machine, even a virtual appliance, has little value in isolation for folks in the enterprise. And probably little value to most enterprise employees given that any rPath appliance they instantiate on Amazon could disappear without notice and with complete loss of data (more below).
The reality is, for the enterprise, once the storage / compute have been commoditized the real cost that is left is human labor. Again, a single server or virtual appliance has little value in isolation. At the bare minimum it requires care and feeding, but in most cases it requires other servers or appliances to provide adjunct services.
Appliance Clusters
Servers and appliances work in groups to acheive a given effect. For example, at the edge of your network an enterprise will monitor and protect against a multitude of potential malicious activities, including viruses, spam, automated attacks, real-time attackers, inappropriate content, information leaks, and many more. There is not only one appliance to solve this problem, but savvy customers don’t want a single appliance. They want one application on one server or appliance.
Soon your cluster (aka ‘architecture’) of security appliances is a large group requiring significant care and feeding. Virtual appliances themselves don’t provide this cluster management capability and neither do commodity virtual grids such as Amazon’s Elastic Compute Cloud.
The Devil in the Details
But it’s even more complex than this. Let’s say you want to deploy a full-scale production system to leverage Amazon’s scale. Many challenges will arise.
For one, Amazon doesn’t guarantee your server will stay around. It’s a virtual server that goes away immediately upon being shutdown, whether due to a mistake, by maliciousness, or due to hardware failure. All data is lost and nothing is recoverable. There are ways to solve this particular problem, but it takes skill and expertise to build redundant systems / storage around Amazon’s service offerings, particularly EC2.
Large scale systems don’t run themselves. They need to be tested, monitored, triaged (when problems arise), new code deployed, and capacity needs to be added when needed to handle load. [sidenote: my personal opinion is that these are all issues of “scale” obviously.]
This matters to the enterprise.
Simply put, while most Amazon Web Services are self-service in the truest sense, the Elastic Compute Cloud is an order of magnitude (or more) complex in terms of management and usage. Just provisioning a server on demand in minutes doesn’t help one build real web-scale systems.
The Real Cost Comparison
There are different estimates of how many servers per sysadmin/IT staffer you can run, but they range between 20 and 100 servers per. Let’s look at both ends of the spectrum. We’ll assume a median income (Silicon Valley) of an IT worker of $80,000 per year. And that all of your EC2 instances are wired up 24×365, which is 8760 hours total per year.
With 1 instance, that’s $9.13/hour of time. At 20, it’s $.46/hour of time and at 100, it’s a mere $.05. Of course, this is just a back of the napkin calculation, but the point I’m driving to here is that at the low end of the spectrum your labor management costs are 5x the cost of your EC2 instance, not including non-labor related management costs. At the high end, where a single administrator runs 100 servers, your labor costs are still 50% of your costs to use EC2.
When Utility Computing Matters
This is why utility computing (i.e. the ‘compute’ not the ’storage’) will matter most when we have driven the costs of systems management to a place where they are reasonable. Is it 10x the cost of the compute? 5x? 50%? I don’t know. The market will tell us for sure, but I will put a stake in the ground and say that the compelling cost structure of utility computing will really matter when the operational costs to run that compute infrastructure are within 2x the costs of the infrastructure itself.
June 20th, 2007
Previous Posts