I work for a startup where we have what I think would be a fairly common configuration; a mixture of development environments and services with the need to connect everything together.  Our primary setup is an ASP.NET/IIS 6 service with some satellite services written in PHP5 hosted on Apache 2.

Up until recently our satellite services were hosted with GoDaddy on a shared machine; needless to say performance was not up to speed for production purposes.  So we began shopping around looking at various grid computing solutions.  We chose to do grid not because we need some massive processing power or many boxes, but because we wanted an environment that was redundant without having to invest in several dedicated boxes to get there.  Our research led us to three grid computing providers: Amazon's Elastic Compute Cloud (EC2), Rackspace's Mosso service, and ServePath's GoGrid.  At the time, none of these grid systems was out of beta so we expected our mileage to vary.

The problem we were trying to solve through grid computing was to host (redundantly) several PHP pages and scripts that process some XML files received by FTP push from a third party.  The end results of some of the PHP pages - used essentially as a web service for our ASP.NET/IIS6 environment - is to generate images.  One large part of our processing generates tons of images and is handled asynchronously on a schedule managed by cron.

Amazon's Elastic Compute Cloud (EC2)

While EC2 was not the first grid solution we looked at we did consider it.  EC2 is a fairly complex setup with a pay-as-you-go model that seems to be typical of grid computing.  It requires that you also use an Amazon Simple Storage Service (S3) for storing your images (think of images as virtual computers).  The idea behind EC2 is that you configure your images and then spin up any number of them to perform a task when the computer power is required.  There's no guarantee that the internal state of an image is retained and its possible that images may go down (kernel panic in your virtual machine or Amazon's own needs).  To use EC2 optimally you must write your software to take advantage of other Amazon services, such as S3 for file storage or SimpleDB for database access.  You could stand up a MySQL instance on one of your images but with no guarantee that the data you want to store in it will be there the next time your image spins up.  That's the nature of EC2 - store the data on other redundant data stores and spin up images to do the processing as required.

To run an EC2 virtual machine for a month (i.e. 31 days) in a 24/7 accessible manner costs about $72 assuming fairly moderate outbound traffic and some usage of S3 (S3 fees are separate from the CPU time costs of EC2 but included in my estimation figure).  This configuration would be similar to what you would get for a dedicated box running Apache 2 with PHP5 and MySQL to do something like host multiple sites.  All in all not bad.  Additionally you pay a fee for public IP addresses and you link IP addresses to virtual machines.

So EC2 is fairly impressive and with the community tools its usable.  Like S3 the only tools that you have for working with it are the ones created by the community.  There is a bit of knowledge spin up required, some key management, and of course you must also conform to the Amazon S3/SimpleDB data storage APIs.  This means if you want to install blog software that needs a MySQL database... well you can do it, but no guarantees that your data will remain between machine spin ups.  When a machine is up and running you can ssh it.

We ended up not using EC2 due to complexity and the necessity that we use Amazon APIs for storing data.  Also after our initial setup we "lost" a box - and I don't mean we could ping it but we didn't know where it was in our apartment, I mean it was inexplicably gone.

Rackspace's Mosso

Mosso is a grid computing solution that sells itself more as a web host reseller than anything else.  Their claims are the ability to host ASP.NET applications under IIS6 (or 7 now) and PHP, ruby, Perl, and Python applications under a specially configured Apache 1.3 instance at the same URL.  Yes, that means: www.yourdomain.com/somepage.aspx and www.yourdomain.com/anotherpage.php.  You could do this with IIS7, FastCGI, and install and configure the various other languages but where would be the fun in that?  They also offer SQL Server (costs more) and MySQL database hosting.

Essentially the virtual machine aspect of the grid is abstracted away from you.  You are presented with a web management front end (one of two, a "final" version that's buggy or a beta AJAXy version that's also buggy) that allows you to create accounts, databases, database users, and configure cron jobs (with minimum 5 minute intervals).  Everything there points to you subleasing Mosso grid time and hosting to other people but you could certainly use it for your own non-subleasing needs.  It appears that you can configure pricing structures for your clients but we didn't need to do anything with that so its a side we did not explore.

Mosso is a pay as you go but with $100 up front monthly entry fee and SQL Server databases costing additional monthly fees.  You pay for bandwidth and processor cycles and it looks like after the $100 its fairly cheap for a "dedicated" box - as dedicated as a virtual instance abstracted away from you can be.  There is no ssh/scp access but they have temperamental FTP access (it disconnects you constantly).  The tech support is pretty good, there's a live operator text-based chat and a phone both of which are accessible (and responsive) at all hours of the day.

We started with Mosso, ran on it for about a month, encountered so many problems that we had to move away from it.  An annoyance, but not a deal killer, were the naming conventions - for example: ftp1.ftptoyoursite.com bit one of our contractors since he thought "ftptoyoursite" meant he should stick in the hosted domain name.  Paths on the SAN are long and contain an 8 character numeric directory, databases have a numeric prefix, and database users have the same prefix.  Two problems we ran into: the cron jobbing was unreliable - tasks scheduled at midnight actually ran at noon instead, but tasks scheduled for 1 AM ran at 1 AM (not PM as one would expect).  We got around this by adding a task that needs to run at midnight to our ruby script within a time check - surprisingly the grid's time was correct even if it confused midnight and noon.  The other larger problem - and this was the deal killer - output write speed and the technical support that wasn't.  As I said before one of our PHP processes involved writing out a large number of images to many subdirectories (this uses gd2 and while its not the most performance friendly code, it executes quickly on local installations).  The execution of this process was taking longer than 6 minutes, and possibly several hours - and this is a process we needed scheduled to run every 15 minutes!  I communicated with Mosso tech support via text chat, email, and phone and after a week the problem was not resolved and I had to badger them to get updates.  Overall a cool idea but probably needs a few more months in the oven before its ready for prime time; also the tech support responsiveness could improve.

ServePath's GoGrid

GoGrid is another grid computing solution that's more like EC2 - the virtualization is not abstracted away from you.  You choose server images, fire them up, and off you go.  Additionally you get a free load balancer so you don't have to program against an API (a la EC2) to get load balancing.  That's not to say there isn't an API available because there is, we just didn't have to explicitly use it.  Because the implementation is not abstracted you do end up with separate virtual machines running different OSes - Windows for IIS6 and Linux for Apache for example - but through some magic you can get the same level of functionality that Mosso provides.

Like Amazon's EC2 there are many images available - though not quite the cesspool of choices EC2 provides.  Fortunately your choices are limited to one of several Windows 2003, CentOS, or Redhat Enterprise images.  We didn't stand up a Windows 2003 server but one would assume to configure it you just RDC to it and you're on your way.  The CentOS image we did use immediately allowed us to ssh to it and yum away our configuration.  Something missing from GoGrid which we've been promised soon is the ability to backup your virtual machine images.  This is certainly unlike EC2 (which is a totally different data storage paradigm anyway) and the fact that our VMs retain state between going up and down is reassuring.  I was immediately comfortable with the management UI presented by GoGrid and after our instance (a CentOS Apache/MySQL image) was up sshing into it and configuring everything.

GoGrid is also a pay as you go, but they charge on RAM time instead of processor time.  The load balancer is free (update: you can have multiple free load balancers if you need them).  I think we're looking at $50 a month for a 1 GB RAM machine 24/7 mode which is what we require for our purposes.  For redundancy you can spin up another instance of your image, link it to the load balancer, and there you go.  One thing you'll note - their hosting software attempts to place virtual images in different server clusters so if one cluster goes down the other images aren't affected (redundancy!) but to verify that this is indeed the case you need to phone them and have their techs check.

We settled on GoGrid because configuration was a breeze, our ability to administer our servers did not require community supported tools (standard industry programs were suitable), our code ran exceptionally well, and for being a beta their uptime has been incredible thus far (100%).  Revealed to us after the fact is great communication - we know what's going on at GoGrid via email updates in terms of scheduled maintenance (Mosso also had this in the form of a blog - but the number of problems they encountered with their solution was rather distressing).

Updated for grammar, spelling, completeness; content remains essentially the same.