I’ve been reading all the great things about Amazon EC2 (or Elastic Compute Cloud) and lots of pricing comparisons with VPS and dedicated hosting. I finally got an EC2 account and tinkered a bit and there’s a big difference between EC2 and Virtual Private Server or Dedicated hosting that most of the preliminary write-ups I’ve seen completely overlook.
- With a VPS or Dedicated server you can shutdown, reboot, crash, or have an outage and your data stays on disk.
- With an Amazon EC2 account when you shutdown, reboot, crash, or have an outage any new data that was not in the server image that you originally uploaded is gone!
This makes it totally inadequate for running a production database server. On the forums people have argued for using replication or clustering solutions while others have advocated streaming redo logs to Amazon’s S3 service. However, let’s look at the disaster recovery scenario if all your virtual machines go down (and don’t think it can’t happen, racks overload their circuits, generators fail during a power outages, AC’s can’t keep up during heat-waves, etc…):
- With a VPS or dedicated server you startup your servers, do a data integrity check, and you’re up and running again!
- With EC2 you restart the database server image, then restore the full database backup from the previous night (which you’ve been backing up to S3), then you rerun all of the redo logs (which you’ve also been backing up to S3).
The latter scenario isn’t that bad with a small database but for with a large database it could add hours to your recovery time. I imagine Amazon will eventually add permanent storage to EC2 and if/when that happens you’d be able to compare Amazon EC2 to a VPS but right now it’s comparing apples and oranges!
Umm no. Amazon makes you think about disaster recovery, its built in. Its been faster than our VPS/DS, cheaper and we built a system to do database replication + 1 minute logging backups to s3. At MOST we can lose a minutes data, not bad for a new 2.0 startup.
Ben, what do you mean “umm no”. What you describe is exactly the 2nd DR scenario I describe. Perhaps you didn’t read the entire post?
I wrote a couple of similar posts (huge datacenter and a call to google) about these new Amazon services. I read quite a bit on their forums and the database area is one that I think will really keep them from being used widespread until they get a good solution in place. It doesn’t mean it isn’t totally viable, but definitely not for the beginner.
One company however is really singing the praise for Amazon’s S3 product, SmugMug. Their CEO has a blog and talks about how much they are saving in hardware costs using Amazon’s storage; several hundred thousand in under a year no less.
One idea that I’m working on will use a combination of a VPS solution and EC2. It would utilize EC2 servers for image processing thus saving the main server for the web application. In this scenario I don’t need it to be persistent, just up long enough to process the images, and since you can start and stop instances programmatically, it can scale as needed and I won’t have to have the servers running 24/7, which is great since they charge by the hour and not by month.
Robert, I think S3 is great but even SmugMug is only using S3 to backup their users images. The primary copy that gets served up on their website is stored locally on their own machines.
I think your idea of using EC2 to process images is interesting but I would do some serious testing with regards to network latency if you have to transfer the image up to EC2 and then resize it, all while the end user is waiting…
True, SmugMug does have some primary storage, but the CEO said that they are slowly moving things over to S3 so that they don’t have to buy any more storage hardware. I believe he said they were spending some 40k a month just in new hardware that they don’t have to do now.
As for my potential use, I’m not worried about the latency because it wouldn’t be while the user is waiting for a response. The use case isn’t upload an image, process and show me the results. It more of an archiving process where they would batch upload hundreds of files for storage.
The advantages outweigh the disadvantages right now for most people, and the storage issues will be worked out. Why all the FUD?
Also, note that data is *not* lost on reboot. I reboot my instances all the time, and all the data added to the image since it was loaded is still there when it fires back up.