I’ve been reading all the great things about Amazon EC2 (or Elastic Compute Cloud) and lots of pricing comparisons with VPS and dedicated hosting. I finally got an EC2 account and tinkered a bit and there’s a big difference between EC2 and Virtual Private Server or Dedicated hosting that most of the preliminary write-ups I’ve seen completely overlook.
– With a VPS or Dedicated server you can shutdown, reboot, crash, or have an outage and your data stays on disk.
– With an Amazon EC2 account when you shutdown, reboot, crash, or have an outage any new data that was not in the server image that you originally uploaded is gone!
This makes it totally inadequate for running a production database server. On the forums people have argued for using replication or clustering solutions while others have advocated streaming redo logs to Amazon’s S3 service. However, let’s look at the disaster recovery scenario if all your virtual machines go down (and don’t think it can’t happen, racks overload their circuits, generators fail during a power outages, AC’s can’t keep up during heat-waves, etc…):
– With a VPS or dedicated server you startup your servers, do a data integrity check, and you’re up and running again!
– With EC2 you restart the database server image, then restore the full database backup from the previous night (which you’ve been backing up to S3), then you rerun all of the redo logs (which you’ve also been backing up to S3).
The latter scenario isn’t that bad with a small database but for with a large database it could add hours to your recovery time. I imagine Amazon will eventually add permanent storage to EC2 and if/when that happens you’d be able to compare Amazon EC2 to a VPS but right now it’s comparing apples and oranges!
Umm no. Amazon makes you think about disaster recovery, its built in. Its been faster than our VPS/DS, cheaper and we built a system to do database replication + 1 minute logging backups to s3. At MOST we can lose a minutes data, not bad for a new 2.0 startup.
Ben, what do you mean “umm no”. What you describe is exactly the 2nd DR scenario I describe. Perhaps you didn’t read the entire post?
I wrote a couple of similar posts (huge datacenter and a call to google) about these new Amazon services. I read quite a bit on their forums and the database area is one that I think will really keep them from being used widespread until they get a good solution in place. It doesn’t mean it isn’t totally viable, but definitely not for the beginner.
One company however is really singing the praise for Amazon’s S3 product, SmugMug. Their CEO has a blog and talks about how much they are saving in hardware costs using Amazon’s storage; several hundred thousand in under a year no less.
One idea that I’m working on will use a combination of a VPS solution and EC2. It would utilize EC2 servers for image processing thus saving the main server for the web application. In this scenario I don’t need it to be persistent, just up long enough to process the images, and since you can start and stop instances programmatically, it can scale as needed and I won’t have to have the servers running 24/7, which is great since they charge by the hour and not by month.
Robert, I think S3 is great but even SmugMug is only using S3 to backup their users images. The primary copy that gets served up on their website is stored locally on their own machines.
I think your idea of using EC2 to process images is interesting but I would do some serious testing with regards to network latency if you have to transfer the image up to EC2 and then resize it, all while the end user is waiting…
True, SmugMug does have some primary storage, but the CEO said that they are slowly moving things over to S3 so that they don’t have to buy any more storage hardware. I believe he said they were spending some 40k a month just in new hardware that they don’t have to do now.
As for my potential use, I’m not worried about the latency because it wouldn’t be while the user is waiting for a response. The use case isn’t upload an image, process and show me the results. It more of an archiving process where they would batch upload hundreds of files for storage.
The advantages outweigh the disadvantages right now for most people, and the storage issues will be worked out. Why all the FUD?
Also, note that data is *not* lost on reboot. I reboot my instances all the time, and all the data added to the image since it was loaded is still there when it fires back up.
I do see great power in EC2 for web applications, you just need to have in mind that you can’t use it for all the parts of your application….
For example if you concern is persistence of your data the solution is simple: keep the database server in a physical hardware that you control and extrapolate the Front-end / back-end logic of your application to a virtual machine (AMI in EC2) that you can instantiate / unload on demand.
If you experience latency issues in your EC2/DB connections, put a MemcacheD server in EC2 and cache as many (read) queries as you possibly can.
This gives you the flexibility of using (and paying for) only the resources that you need with a granularity of “hours”, instead of paying full rate for multiple servers even when your traffic is low (i.e. at 2:00am in the morning).
You will also save in the overhead of maintaining all those servers yourself, installing software, doing updates, managing repositories, etc. In EC2 all you have is an image that instantiate multiple times (as many as you need depending on your traffic at any given time).
And finally you will escape hardware issues, you don’t have to worry about defective hardware and downtime for preventive maintenance, Amazon does that for you 🙂
I think the advantages clearly overweight the con’s 🙂
Dear todd
Everything you said is true about amazon ec2. but like any other revolutionary technology platform , cloud has to be faced with a major paradigm shift on end2end architecture.
As an Enterprise Technology Architect i think there are dozens of architecture design patterns that let us utilize a flawless service which is cost efficient at the same time.
The architecture approach, software platform and technology facilities have to upgrade harder and stop looking at cloud as a PHYSICAL SERVER.
Monstrous mainframe era tools like oracle RDBMS, SAP and other dinosaurs belong to ice age and we have a new era ahead of us. so let’s just start thinking of enterprise differently, fasten our seat belts and fly up to the cyber era.
As a case study i have managed to create a complete , enterprise scaled core banking system functioning on EC2 (capability of act in real time sync with local private clouds based on Open Stack).
There can be 100% availability and business continuity using a technology mash up of EC2, locally hosted private clouds and right application architecture and platform.
Right technology , right architecture and smart mash up gives you any possibility in lowest costs these days.
The statement:
“With an Amazon EC2 account when you shutdown, reboot, crash, or have an outage any new data that was not in the server image that you originally uploaded is gone!”
Is not totally correct.
In EC2, when you launch an instance you can configure the shutdown behavior, ‘Stop’ or ‘Terminate’. If you don’t want to lose your data when you restart, select ‘stop’ This allows you to start and stop the instance without losing data.