Excerpt : Why and when In-house, Cloud (amazon aws, rackspace, heroku) ? Here I am presenting a thorough ‘pros and cons list’ to help people understand the differneces, strenghts and weaknesses.
This is a huge topic. It’s impossible to discuss this without some definitions.
What is In-house?
server in the basement/garage might be the extreme version of this. However, for this post I’d like to define it as
Fully-owned servers and hardware, physically located within company property
Hosting solutions where you buy and own the hardware yourself, but others are responsible for basically power, bandwidth and physical security (and perhaps firewalls) should probably be grouped within this. An examples in Norway would be Ventelo Hosting. But it’s not really in-house, is it? It’s a hybrid of hosted and in-house, much closer to in-house than cloud.
What is cloud?
Well, it turns out I can’t, it’s too generic and needs be split in fundamentally different concepts.
IaaS – Infrastructure as a Service. As Amazon (EC2 and S3), Rackspace, Linode, Azure
All they provide you is the hardware, but more importantly, hardware on demand. They typically provide “pay pr usage” models, so that you can 10 double servers during peak hours and just keep a few during slow hours (usually at night). You have to mange operating system and all apps running. For a web app you’d have to set up load-balancing, caching servers, database servers, just like you would with the in-house solution
PaaS – Platform as a Service – As Heroku, Engine Yard, Slicehost, GoogleAppEngine
Within PaaS there are extreme differences too. Heroku gives you everything and has addons just like in firefox for extending your webapplication. The make adding SaaS even easier and you don’t have to set up load balancer, varnish cache, ruby/rails versions. Just use git and push to it. Then you can script it to autoscale when traffic increases. You’re limited to what they have to offer. Engine Yard is closer to Rackspace than heroku, but they define themselves as a PaaS. This is because the handle the OS and gives you a lot of stuff for free too. They get you closer to the hardware than heroku and you have more freedom, as you can install more yourself (= more work). Engine Yard has different than heroku. If you have slow single-thread performance, running on a bigger CPU might be a solution. Heroku gives you no such possibility, Engine Yard gives you more (the same as amazon EC2 I think, else EC2 would be the next performance improvement step)
SaaS – Software as a Service. As Cloudant, MongoHQ, Amazon DB, New Relic.
You just configure it and use it for running the app. Heroku’s addons list has good examples of what I might call SaaS. I refuse to put gmail, salesforce or them in this group, although they probably are. This concept is probably the one thing that has been out there the longest, but not so much in form of search databases, mongodb clusters. Those are the game changers for us as developers.
Where should S3 be? I think SaaS. What do you think?
For another description of the cloud variants, check out this article IaaS-PaaS-and-SaaS-Terms-Explained-and-Defined
Pros and cons for in-house server-park vs cloud
As I couldn’t really talk about cloud as standalone topic, I can’t really compare with it to. I’ve chosen to include Amazon AWS(EC2, S3 ++), Engine Yard and Heroku, making this a four-way pro-con. If you see more, please enlighten me in a comment.
- closest to the hardware
- you can create VMs and run multiple apps on it, even multiple instances of same Rails app on a single server, with a load balancer in front, utilizing the most of the computer.
- Varnish cache server should not run in VM, but directly on server (as I understand is best for it)
- fit hardware to need. Do you have a lot of data, vertical scaling of your DB-server is the most effective performance-wise.
- closeness to users, if your users are close to your server-park.
- setup server parks around the country/countries, close to users, to reduce latency and save a som ms pr HTTP request.
- only you can access the servers physically.
- fixed cost
- not scalable
- to scale up you need to order physical hardware, install and setup
- to scale down (you have to much hardware, maybe you moved some services to the cloud), you .. ? try to sell?
- closeness to user
- are your users just from your state or country, ok. If not, then you should be google (they put servers everywhere).
- the costs are fixed, but high. You do the math.
- If some hardware breaks, you will lose 1/x share of your computing data, along with the data since last backup (depending on the hardware failure)
- backup is required set up at different machine
- Setup and maintain Operating System
- setting up VMs
- install security updates asap
- security updates, hardware renewal all require downtime, if you don’t build an infrastructure for seamlessly move to other servers (like they do in cloud)
- hardware crash might occur during peak hours and tends to bring your app out of service for a long time.
- Manual installation and maintenance of standard software, VMs and other.
- Manually setup and maintain Web Application infrastructure
- Load Balancer
- Varnish Cache server
- app instances
- automate deployment
- database server
- If people want to hurt you (or steal), they know where your servers are, physically.
- Build security measures yourself.
Most ideally used when:
- In short: When your app needs and can justify (money) vertical scaling
- Your app has intense database load (read about 37Signals vertically scaled ~160GB RAM campfire DB server)
- You need extreme single-thread performance (Super High CPU)
- when staffing is cheap, reducing the cost in the long run for your servers and traffic is smooth (no scalability needs in particular).
- closeness to hardware
- horizontal scaling – 1 to 50 servers in minutes – can get hardware on-demand (both dedicated servers and cloud hosting)
- vertical scaling – choose from 256MB RAM to 15,5GB RAM for cloud servers. Up to 128 GB for managed instance (you’ll have to do most of OS and installation yourself)
- closeness to user (they have available server parks around the globe)
- nearly half of Fortune 100 hosts with them (from their homepage )
- no own installation and setup of load balancer, operation system or security packages
- server monitoring.
- fixed cost possible
- on-demand pricing – pay for use
- can scale at peak hours, can scale down to a minimum in slow hours (night) and save money. First and foremost, you give your users a better experience
- cheap to get started
- 100% network uptime guarantee
- Require some real dollars each month, minimum $11
- a managed 128GB RAM server is cheaper to own yourself.
- install and setup ruby, rails and gems your self. (like in-house, but you can easily save the setup for later duplication)
- automate deployment yourself, using capistrano or other. (like in-house)
- ramp up time. first time deployment needs a bit of setup to get going (as in-house). Not too suited for experimentation or fast prototypes, like heroku.
- Rackspace employees may access your servers, no matter how unlikely.
I guess most of the pros/cons below apply to GAE (GoogleAppEngine) as well, and that other php cloud thing. Windows? I don’t care, really.
- minimal setup. Push your app source code in git to heroku and voila! Your up and running.
- Free plan including database, own domain name and tons of addons.
- Ideal for prototyping
- and if popular – your app is scalable and can meet the growing use.
- horizontal scaling. You can scale hardware with a scrollbar.
- scale at peak hours. Script the it to add hardware when traffic goes up, and reduce hardware when traffic is low
- on-demand pricing – pay pr use.
- one-click addons. Monitoring with New Relic, mongodb hosting with mongohq. Huge list.
- easy interface.
- No need for OS knowledge.
- Failover, Load Balancer,
- Makes you follow good practices. e.g.
- Read-only make you move disk writes off your server (disk writes destroy your servers performance).
- Delay work to background jobs, as heroku cannot vertically scale and single thread performance is low. Optimize flow and/or write better code.
- Develop better caching to reduce server hits as scaling is more expensive. Better caching is a bigger improvement than vertically scale the server.
- Focus on developer time, rather than server cost.
- Less people needed/or no knowledge about OS, setting up load balancers or varnish cache are needed. This is the startups wet dream.
- Used by hundreds of thousands, including NITH (norwegian IT college), Thoughtworks (for clients) and BestBuy
- No vertical scaling. Limited to ‘compute units’. Do you need high single thread performance, this will not suffice.
- Expensive for bigger apps. Adding a ‘compute unit’ doesn’t make much difference, you need many.
- More expensive than in-house? I doubt it! It depends. Do you have varying traffic during the day, week or month? in-house doesn’t scale up or down on demand, remember? There might be an intersection point somewhere at a 100 compute units, if scaling is not needed. I would go for a manged environment like Rackspace or Engine Yard anyhow.
- Limited to what heroku offers. Their OS, load balancer and their supported ruby versions. (and php)
- No OS access. No ssh. They eliminate the need, given the above limitations.
- heroku staff may access your server and data (there are >400.000 apps running…)
- Githubs move to Rackspace in 2009 is a recommended read: github-is-moving-to-rackspace
- For bigger, less technical perspective/overview of the cloud and it’s actors: Gartner Magic Quadrant
This post is not meant as a dictionary reference. I know what I know, what I don’t know or are mistaken about, I hope and expect you’ll tell me about. I appreciate it!
The goal of the post is to show you the why’s and when’s, help free som doubts and clarify what it’s all about.