There is ongoing confusion about the relationship between clouds and datacenters. You can see this with the term ‘virtual datacenter’ in reference to servers in the cloud or the lengths to which some folks go to bring datacenter-centric methodologies to the cloud[1]. Unfortunately, clouds are not datacenters. They ride on top of datacenters, they span datacenters, but they are not a datacenter.
Why does this matter, what’s the difference, and how are they related?
Read on for more.
What is a Datacenter?
Once again, Wikipedia to the rescue where we find that a datacenter is:
… a facility used to house computer systems and associated components, such as telecommunications and storage systems. It generally includes redundant or backup power supplies, redundant data communications connections, environmental controls (e.g., air conditioning, fire suppression) and security devices.
This is a nice clean definition, but what should immediately jump out at you is how tied to locality the notion of a datacenter is. It has specific redundant power supplies, redundant communications, environmental controls, and it’s own security devices.
A datacenter is a specific place housing specific servers. Clearly clouds need a datacenter, but the actual promise of clouds is that you no longer need to think about datacenters.
Where are you servers? [2]
Out there. Somewhere.
In the clouds. You don’t know.
You don’t care.
Clouds
Being an abstraction of datacenters, clouds are a representation of the very notion of location independence. The ‘cloud’ will give you some resources: storage, servers, applications, and you don’t care where or how so long as they are reasonably reliable and redundant.
Reliability and redundancy comes from cloud providers using multiple datacenters, so clouds almost certainly span one or more datacenters, but themselves are not datacenters.
The canonical example of this is Amazon Web Services (AWS), the market leader. They have three distinct ‘availability zones’, which they guarantee to be 100% independent in terms of power, cooling, network, storage, etc. However, your cloud interface is a single interface to all three ‘zones’.
At the very least, the market demands this. We work with a number of different cloud providers and all are either already multi-site, multi-datacenter in nature, or planning to be in the near future.
A brief aside: it’s worth pointing out that while clouds don’t care about a particular locality, they care deeply about relative network locality and topography. For example, if you want to serve Europe, you would rather do it from a European cloud, or even U.S. west coast users from the location ‘closest’ to them. You still aren’t particular where the datacenters that support that cloud are. Just that they are ‘close’ to the market you serve. I talked very briefly about the ‘reach’ of clouds in an earlier posting here.
The Cloud Abstraction
So, tying this down a bit further, you can think of clouds as an abstraction of the raw parts that a datacenter provides: servers, storage, networking, etc.
Datacenters give you specific servers, network, storage, power and cooling. Clouds provide you abstracted versions of these components that are not tied to a specific datacenter: virtual servers, virtual storage, and virtual networking.
Why do we need an abstraction? Because it allows us to think about and use infrastructure in a new (old) way that is fundamental to how the Internet functions. This notion is called ‘distributed computing’.
Distributed Computing
The distributed computing entry on Wikipedia gives us some background, but suffers from an academic writing style. A simpler definition is:
A programming paradigm focusing on designing distributed, open, scalable, transparent, fault tolerant systems. This paradigm is a natural result of the use of computers to form networks.
Distributed computing techniques are the result of our highly connected computers. Cloud computing is simply a natural outgrowth of the need to service our desire to build distributed, open, scalable, transparent, and fault tolerant applications. What is Google if not one gigantic distributed computing application?
Datacenters come from an older mainframe paradigm. Clouds are a result of the need for a new paradigm that is highly distributed and highly connected.
Clouds as Distributed Computing
Clouds are a form of distributed computing. Distributed computing can be simplified down to the act of running software or an application in a multi-server, distributed manner. While you can run an application distributed within a single datacenter, usually this is not considered ‘distributed’.
There is a classic set of concerns about distributed computing that are very well formed called the Fallacies of Distributed Computing. Briefly recapped, the Fallacies are:
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn’t change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
These fallacies explain the traditional problems with distributed computing, but although you might have a multi-server application in your local datacenter, the datacenter environments themselves mask most of the normal distributed computing issues. This is because inside a single location, the network is usually very reliable, extremely low latency, high bandwidth, secure, stable, under a single user or group’s control, and homogeneous, or nearly so.
Clouds (re)surface all of these distributed computing concerns that are so critical. What do you do when the network or systems are unreliable? When latency is highly variable? When a potential attacker can be on the wire between any two nodes?
These are critical questions that matter when considering clouds vs. datacenters.
Why Does the Distinction Matter?
It matters because the older datacenter-centric model of systems management is more of a centralized and monolithic model that does not take into account the Fallacies of Distributed Computing nor the ephemeral nature of the clouds themselves. Today’s datacenter tools and methodologies are focused either on the datacenter itself or on individual servers. For example, you’ll find many enterprise ‘datacenter management’ tools today, but none are suitable to managing clouds. You’ll also find a bevy of tools that help you manage a single server or even a small group of servers, but you need more than this in the cloud.
Just digging in briefly, if you look at a typical datacenter there is significant support infrastructure that facilitates the efficient running of that datacenter, like:
- NTP (Network Time Protocol) servers for synchronizing time
- DNS (Domain Name Service) servers for mapping hostnames to IP addresses
- DHCP servers to give hosts IP addresses
- Network installation systems such as Kickstart, Jumpstart, and RIS
- Monitoring services such as Nagios, Munin, HP OpenView, and Sitescope
- Power control and environmental metrics are also necessary
This isn’t even a complete list and many of these tools are either unnecessary or unfeasible in the cloud yet most of the functionality is still required. The functionality is usually delivered in a different manner than inside the datacenter. Most datacenter management tools don’t map directly to the cloud. New solutions are needed or the old tools need to be changed.
So if the cloud isn’t a datacenter, if those tools don’t map, if managing a single server isn’t enough, and if you want to be able to scale your application, then what makes sense?
Cloud Oriented Architectures
Cloud Oriented Architecture (COA) is an application-centric architecture. What does it mean to be application-centric? I think we’re still figuring this out, but at the very least, being application centric means:
- Deploy your application wherever you like: external clouds or internal datacenters
- Applications can go on one server or many
- Add more resources and capacity to your application at any time
- Composite your application from many ’services’; on one cloud or many
- Determine the health and state of your application on demand
For additional ideas, James Urquhart (The Wisdom of Clouds) takes a deep dive into COA.
One thing you might have started to notice is that COA is similar to other well established distributed computing application architectures like Service Oriented Architecture (SOA), Event Driven Architecture (EDA), and related. These are developer notions more than they have been operations, IT, and systems administrator notions historically.
Operations in the Cloud
Cloud Oriented Architectures, cloud APIs, and the difference between clouds and datacenter paradigms mean that there are many new challenges for IT and Operations folks. Most of the early adopters and innovators in the cloud to date have been developers, like Don McAskill of SmugMug and his exploits with SkyNet. There is still a dearth of operations and IT experts developing innovative new tools in the cloud. This is part of the neoTactics focus and the reason for the CloudScale Project.
It’s going to take new and ongoing innovation to deliver COA to the cloud melding the skills of developers and wisdom of Internet, IT, and web operators.
[1] http://www.itbusinessedge.com/blogs/dcc/?p=72
[2] Please excuse my attempt at Haiku. :)
October 8th, 2008
Rails doesn’t scale, Amazon’s Elastic Compute Cloud does, but wait you need Scalr to actually scale.
So… What is ‘scaling‘?
Everyone’s an expert on it these days it seems. It’s talked about in terms of languages, cloud computing grids, databases, and more.
Just today, Ola Bini provides a humorous swag at how it’s talked about in terms of languages. Recently, John Willis had a nice overview in terms of the ‘cloud’.
I’ve helped scale networks, applications, systems, and databases. The one thing I’m certain about scaling is that it is not a one-size-fits-all solution. There is no silver bullet.
Unfortunately, the fact that there is no silver bullet hasn’t stopped people from talking about it like there is or assigning blame when ’scaling’ fails.
It would be great if the conversation could evolve a bit. These are age old issues that have a new shine because hardware, bandwidth, and software have all rapidly commoditized. In the world of systems management and architecture design there are new tools like cloud computing grids and virtualization, but the actual problems haven’t changed.
Scaling a system, application, or database is a non-trivial task that requires specific knowledge of the problem domain. No amount of hand-waving will create automagical scaling. We can only create better tools that help us build better infrastructure.
May 4th, 2008
Storage requirements are exploding causing more and more small and medium businesses to employ creative solutions to stem the tide. In December, Hu Yoshida, CTO of Hitachi Data Systems (HDS), posted a blog entry about projected enterprise data growth. The entire posting is worth a read, but the included chart really paints the picture well:

Experience with our clients bears this out and we think most organizations are ‘feeling the pinch’. Perhaps most interesting about this trend is that, as Hu points out in the article, data falls into different kinds of buckets. He chooses to talk about structured vs. unstructured data, but there are other ways to slice this pie.
Case In Point
For example, we find that most clients are struggling with price/performance issues for large pieces of data. To take an extreme case, a friend of ours at LucasFilm is currently struggling with increasing the cost-effectiveness of their storage solutions.
LucasFilm has three types of storage solution:
- Tier-1, high speed, low capacity disk storage
- Tier-2, average speed, high capacity disk storage
- Tier-3, slow speed, medium capacity long term tape storage (AKA ‘archives’)
This is fairly typical for most businesses. The difference in costs can be quite dramatic between tiers. In LucasFilm's case their cost for tier-1 storage can be as much as 30x the cost for tier-2.
In order to be cost-effective, LucasFilm continues to expand tier-2 storage capacity and spends considerable time and money shuffling data between tier-1 and tier-2 storage.
Unfortunately most smaller businesses do not have LucasFilm's resources. Even for larger organizations, increased spending on storage and storage management is undesirable.
New Solutions
Fortunately, there is a very compelling new solution to this problem. Sun’s ZFS is now available and production worthy. ZFS is a revolutionary new open source filesystem that provides all of the capabilities of a NetApp storage appliance, in terms of redundancy, ease of use, and capabilities (e.g. NFS, iSCSI, Windows File Sharing). It requires Solaris, OpenSolaris, NexentaOS, or FreeBSD to run, but will work on most modern hardware.
ZFS provides a cost-effective option for tier-2 (and some tier-1) storage solutions. No special hardware or RAID controllers are necessary. It is designed to work on both inexpensive commodity and enterprise class hardware.
Using ZFS you can now build high-capacity, redundant storage systems for as little as $.25-.50/GB, which is pretty close to street price for the drives themselves. Alternatively, you can build tier-1 high-performance redundant storage systems, roughly equivalent in quality to enterprise solutions, for as low as $2/GB, which is practically unheard of.
For example, recently we priced a StoreVault S550 (NetApp’s SMB targeted appliances) vs. a ‘roll your own’ ZFS appliance using NexentaStor (see below). Total cost for a 10TB solution was $23K vs. $6K for a NexentaStor-based ZFS appliance. List price for a 1TB 7200 RPM SATA drive for StoreVault? $1500. List price for buying your own 1TB SATA drive? Less than $200.
When ‘New’ Works
Some might consider ZFS too ‘new’ to put in production; however, we have been running it successfully in production for over a year and are very happy with it. It’s reduced our storage and storage management costs dramatically and allowed a small consulting business to have enterprise class storage without enterprise class prices.
Still, if you need more or desire enterprise support, you can get it. We recommend one of the following two options:
- Solaris + Sun support
- NexentaStor from Nexenta
Right now we prefer Nexenta as they are 100% focused on delivering a ZFS-based storage solution. They provide a commercial solution and enterprise support using a specialized NexentaOS-based storage distribution called NexentaStor.
NexentaStor
NexentaStor turns commodity hardware into a sophisticated storage appliance, like NetApp, but at a quarter the cost. Best of all, the NexentaStor solution is based on NexentaOS, itself a re-packaging of OpenSolaris using Debian (Linux) userland and utilities. This means network-based upgrades, easier management, and a very small distribution.
Some items in the pipeline from Nexenta that are very exciting include compact-flash based versions of NexentaStor. With judicious selection of your hardware you can reduce most, if not all points of failure in your storage systems, dramatically increasing MTBF (mean time between failures) and putting your ‘commodity’ hardware into the same class as similar enterprise solutions such as EMC or NetApp.
Conclusion
Whichever way you go, ZFS is a rock-solid production capable filesystem with a compelling value proposition. We especially like it’s use as a tier-2 storage solution.
I’ll follow up shortly and talk about some specific exciting use cases for ZFS, including it’s capabilities for compression at the filesystem level and it’s use as a backing store for virtual machines.
Disclaimer: neoTactics is a Nexenta certified partner and a proud member of the Sun Startup Essentials program.
April 28th, 2008
I missed the announcement of Google’s App Engine[1] by a few hours, but it was quickly brought to my attention. Initially, it looked like a direct competitor to technology we have been working on. Fortunately, it’s not.
CloudScale made a decision in January to move away from building a commodity product that serviced the mass of web developers for a number of reasons. First, we had some inkling that Google was headed this direction. Second, after we saw the Heroku application it seemed apparent to us that they had really solved this particular problem very elegantly.
Now that Google has launched App Engine we can see that it is very similar to Heroku. Both have outstanding technology and are putting a stake in the ground that matters. It goes something like this:
If you are a small developer or want to vet an initial application idea you can do it without having to run or maintain your own infrastructure.
How many of you wind up spinning your wheels configuring databases, operating systems, application servers, etc. instead of working on your core application? We’ve all been there.
So how does this affect CloudScale? We think that generally it helps the CloudScale project.
None of our current ALPHA customer use cases can run under Google’s App Engine. The leverage we help our CloudScale-related customers derive from EC2 is largely related to helping them rapidly deploy and manage the life-cycle of very complex heterogeneous web applications.
This is a value that Amazon, Google, and Heroku can’t provide, while some of our other competitors can, but not as well. Most of the others are weighed down by needing to support commodity web developers for free or very little.
At the end of the day, it’s the app that matters. All applications go through an evolution from a proof-of-concept to a scalable web app to a very large globally distributed web app. In the initial phases, the app is the whole of the app. In the later stages, the scalability of the app is a critical component of the application itself.
CloudScale is firmly targeted for after-the-proof-of-concept phase to help you build a fully scalable web application. Most importantly, it’s designed to scale up to a large globally distributable web application.
[1] There is more background on App Engine on Niall Kennedy’s blog.
April 11th, 2008
Greg Borenstein, principal behind Music for Dozens and out loud thinker sums up the potential long term impact of Amazon’s successful cloud computing model. It’s an insightful article and I think worthy of a close read, including the comments.
First Greg correctly sums up the obvious:
With the announcement of Amazon’s much-anticipated SimpleDB service this week, we now officially live in a world where the kind of enormous systems run by Google, Yahoo, Ebay, et al — systems that power huge portions of the web (where 500+ million users is totally mundane) — are available on demand in small doses and at reasonable prices to anyone who needs them.
And then talks about the impact on typical web developers:
On this infrastructure, the only real difference between running a small application (a custom CMS for a medium-sized non-profit, for example) and a large one (say, Digg) is the size of your monthly bill.
…
Within a few years, a scale of computation that is currently only available to a handful of multi-billion dollar companies will be available to any pair of dorm room-bound hacker kids with $30/mo. and a pair of MacBooks.
Unfortunately, Greg misses the mark here. While Amazon Web Services (AWS) are a strong evolutionary step forward, it’s clear that by themselves, despite Amazon’s excellent marketing, they do not create ‘web-scale’ (we prefer ‘cloud-scale’) systems. They only enable developing web-scale systems without deploying your own infrastructure.
Even today’s giants, like Amazon, rely heavily on know-how and expertise that is not readily available to the masses. There are no systematic approaches to building scalable, distributed, secure, and reliable systems. Well, not yet anyway . . . (more below)
Greg makes some additional comparisons to watershed technology moments:
For a few clues as to what happens when the current, if obscure, state of the art becomes an industry-standard lowest common denominator, it helps to look at some history. This has happened at least twice in the last thirty years: once when industry standardization around x86 hardware lead to the collapse in prices for DOS- and then Windows-compatible PCs that made them ubiquitous around the world; and again when these same PCs reached a level of power and the open source software written to run on them reached a level of affordability and reliability that, together, they displaced the expensive and proprietary server systems and radically lowered the barrier to entry for web-development leading, as Tim O’Reilly has clearly outlined, to Web 2.0.
I think these are apt, comparisons, but not taken to their logical conclusion. It’s truly the synergy of commodity hardware and commodity (open source) software that allowed cheap x86 systems to displace Big Iron. The software, in this case, provides the systematic mechanisms for repeatably leveraging cheap hardware. Without it, people would have to individually cook their own operating system and the applications on top of it or pay for expensive enterprise software.
You can think of open source software in this context as the institutionalized know-how for running cheap commodity hardware.
There is currently no parallel for web-based cloud computing services
Tools like Amazon’s S3, EC2, SimpleDB, and the others are simply one-off point services. They are more akin, respectively, to a disk drive, a CPU, and a rolodex. By themselves they have only minimal use. It is only when they are combined together well (and with other services?) by an operations or development architect with knowhow that you have a synergistic effect that allows the creation of cloud-scale computing infrastructure.
Greg’s first commenter hits the nail on the head:
The mere existence of these services is far from breaking down the barrier necessary for *anyone* to scale to amazon/ebay/google size services. There are at least, as I see it, 2 things still missing that won’t be “commoditized.”
First, is the know how. Building a scalable application (really, application infrastructure) is not the same as building a basic app. And if anything, the general trends have shown that there is a great wide void of knowledge in the market about how to do this successfully.
Second, is the management of this architecture. Ask any company running a large number of EC2 instances how they are managing it. Most likely, they are using a number of custom built tools. If they’re lucky, they managed to shoehorn some of the existing systems management tools into doing what they need. Either way, there’s a lot that goes into making this work.
Which is accurate and part of why CloudScale Networks exists. It is possible to write a new generation of software that institutionalizes the running of Internet infrastructure. Possible and inevitable.
Greg follows through to the crux of the matter by highlighting how the new economics suddenly make it feasible for folks to pursue long tail plays or other business models that were previously not possible.
Cloud computing is here to stay and it is absolutely a game-changing evolution of Internet infrastructure. We need to embrace and hasten the arrival of this model.
December 18th, 2007
Previous Posts