Creating a virtual datacenter with Scalr and Amazon Web Services
I have recently been tasked with putting together a new server setup under Amazon Web Services for our company’s web products. We do a bit of everything, hosting both our own PHP-based products and open-source projects like Drupal and Wordpress. We need the hosting to be both reliable and scalable to very high traffic on short notice. We’ve been using AWS since the beginning, but I decided to create a new setup using Scalr, the free open-source EC2 management solution. This has turned out to be a simply phenomenal solution for anyone who needs to do large-scale web hosting, so I’ll go through some of the details.
MySQL
Since we host Drupal and Wordpress-based sites, we have to have a reliable MySQL server with fault-tolerance and a solid backup system. This is where Scalr really shines. Out of the box, it can create a MySQL host that will:
- automatically spin up new slave servers when load has become too high and take them down when load is too low
- host up to 60GB of data locally (faster), 1TB on EBS (slightly slower), or more on an array of 1TB EBS partitions
- automatically backup to S3
The only part that isn’t done automatically is the configuration of your software to use the correct master or slave MySQL server depending on the server configuration of the moment and whether a read or write query is happening. At first, I thought this was going to be a difficult thing to deal with, especially considering we have Drupal and Wordpress installations that I don’t want to custom-patch to do this. But then I discovered MySQL Proxy.
MySQL Proxy
MySQL Proxy is a very small, simple piece of software that has some truly amazing functionality. We run it on each web-serving instance, where it provides a virtual MySQL server on localhost that proxies write queries to the master server and read queries to a random slave. It will even failover to another slave if necessary, etc. And it is easily dynamically configured. We use an init.d script to start or restart the proxy whenever there is a Scalr host-up/host-down event. This lets any software on the box simply use a default mysql config and lets the heavy lifting be done by a centralized system. If someday we scale to the point where we exceed the write throughput of an extra large EC2 MySQL master, we can use the scripting capabilities to create universal partitioning and sharding policies that are distributed to the web nodes. Wonderful.
One way that standard PHP software is not built for scalability is that many packages assume the existence of a shared filesystem to store uploaded files in addition to a shared MySQL data store. In an ideal world, all software would store its files on S3, as all our in-house software does. But we don’t live in an ideal world. Again, I thought this was going be a difficult solution using NFS, which I’ve used before and works but is difficult to set up, not very fast, and not very configurable. But then I discovered this guide to using GlusterFS on Scalr, which is another simple but amazing solution. We have one Gluster server which serves a 1TB EBS partition to all the web servers. We’re storing all our code for everything on there for now, but if that turns out to slow things down, we’ll store code locally with an rsync script that will sync from the share. The partition is automatically backed up with incremental snapshots using built-in Amazon functionality. This server is a single point of failure for the entire cluster, so we will eventually use a mirrored GlusterFS configuration, which it easily supports.
Web servers
Scalr provides auto-scaling for web servers, and out-of-the-box load balancing of multiple servers using a separate load-balancing instance. This is simple and solid, a great solution. Each web server also runs a 64MB memcached server, but Scalr can also provide dedicated memcached servers. The share lets us distribute Apache configurations and code to all the instances simultaneously. I set up Postfix to use an Authsmtp account, so outgoing mail can be sent normally by PHP but not be blocked as spam as most mail coming from EC2 instances is.
Backend servers
We have backend servers performing certain periodic tasks (DB maintenance, web crawling, etc). We control these with cron jobs that insert events into SQS. Scalr scales this role based on the length of the SQS queue, so it only runs one server unless there’s a backlog, in which case it magically creates enough servers to take care of it and then shuts them down.
Scalr highs and lows
Scalr is great, great software, but it’s not perfect. It doesn’t handle servers like our Gluster master well that need only one server running and a certain disk mounted all the time. I’m not convinced it would auto-recover from a Gluster master failure, but I think it would from just about any other detected failure (as always, the real risk to downtime comes from undetected failure). If the MySQL host goes down, new servers are created to replace it, and the same goes for web servers and load balancers. The scripting interface makes it easy to configure your software to respond to events, making it fairly easy to configure software to follow the dynamic configuration of the cluster. The rebundling feature kicks ass; it means that all you have to do to install new server software is log into one instance, make the change, and then tell Scalr, which packages up the instance, creates an AMI, and then replaces all instances of that role with new instances based on the new AMI. This beats writing AMI creation scripts hands-down. On the other hand, it basically only supports Ubuntu, and only Ubuntu from the base AMIs that they have prepackaged for you. You can easily build whatever you want on that, but for some applications that could be a dealbreaker. For me, it’s fine; Ubuntu is a great distro. The Scalr interface definitely has some rough edges and some real bugs, and there are some places where I see we might need to patch to get the functionality we need. But that’s open-source software for you. The only real competitor to this is Rightscale, which costs big money. I’ll take it the free one with the bugs that I can fix if I have to, thanks.
Tips
- Run Scalr on a non-AWS host that doesn’t disappear, because it has no watchdog watching it. It needs at least 512MB of ram, so the 512MB Slicehost slice works well. Don’t try to install it on anything that’s not Ubuntu, it needs obscure PHP modules that are easily available as Ubuntu packages.
- Security: lock down your Scalr install to only accept connections over an SSH tunnel (except for requests from the AWS farm to /query-env, /event_handler.php, or /config_opts.php). The master keys to everything are protected by nothing more than over-the-wire passwords if you don’t, which is not necessarily unacceptable, but makes me nervous in a way that the good old SSH private key doesn’t.
Advantages
This is a game-changer for startups, or for anyone who needs a datacenter solution. I created this setup in under a week of work with next to no up-front investment. No buying servers from Dell, no forecasting of usage patterns, no driving to datacenters, no tape backups. And because of a combination of Amazon’s amazing pricing and the on-demand nature of Scalr, I expect to end up paying less than half of what a traditional datacenter solution would cost. When you have traditional servers, you have to buy enough of them to handle the biggest amount of traffic you expect in the next month + 50%, or risk being caught trying to rush-ship new servers to meet demand, which is a place you don’t want to be. With this solution, if we have clients with campaigns that take off overnight, we just pay for what we actually need during each hour, which I think will probably average out to be about half of the iron we would need under the max + 50% rule.
There are reasons beyond the strict bottom line as well. Much of they money you spend for web hosting of any kind goes into power and cooling from electricity sources inside the United States, which today translates into mostly coal. So it’s in all of our interests to minimize our datacenter costs, because every dollar that we spend indirectly increases global warming just a bit. Additionally, because of their simply absurd scale, Amazon’s datacenters are, I’m sure, more than marginally more electricity-efficient than any colocation facility per computing unit, increasing the carbon savings.
About this entry
You’re currently reading “Creating a virtual datacenter with Scalr and Amazon Web Services,” an entry on Scott Martin
- Published:
- 07.11.09 / 12pm
- Category:
- Uncategorized
6 Comments
Jump to comment form | comments rss [?] | trackback uri [?]