I'm creating an application that will be hosted on amazon EC2 and a lot of the data that'll be saved is more document oriented (as well as saving tweets and such related to those documents).
Right now I'm at a crossroads... should I use simpleDB or couchDB? Whats the pros/cons of using either? Should I just try both for a month and decide then?
You may find the the article Amazon SimpleDB and CouchDB Compared to be useful.
I've also found that MongoDB gives excellent performance.
Keep in mind that if your code lives in EC2, SimpleDB will be presumably hosted in the same data center that your code is, which would give SimpleDB a lower latency than CouchDB for requests from an EC2 server. Also, Amazon doesn't charge you bandwidth costs between EC2 and SimpleDB.
I would expect SimpleDB to be both faster and cheaper for code running in EC2, for those reasons.
SimpleDB is hosted and maintained by Amazon for you, CouchDB is all up to you. That's the big difference.
I would absolutely do some benchmark of the two solutions with your own use-case, if that's possible, i.e. if you can build a reasonable subset of your application to run on either databases (they have quite different APIs so this might not be easy).
If you develop in .Net environment there's an excellent lib for SimpleDB called Simple Savant which really eases the integration..
I've built some live solutions using SimpleDB and it works very well, especially with a caching layer in front of it (cf memcached et al). However I've recently started scoping out a new project and have decided to move to CouchDB for the primary reason of having control over the data.
As your commitment to SimpleDB grows, it gets harder and harder to migrate away to anything else (ah the joys of vendor lock in) and, frankly, that just isn't great for our business.
I remain a strong evangelist of cloud tech, and Amazon in particular, but I feel a lot better running couchdb on EC2 than I do with SimpleDB.
Roger
Related
I've been trying to wrap my head around the best solution for hosting development sites for our company lately.
To be completely frank I'm new to AWS and it's architecture, so more then anything I just want to know if I should keep learning about it, or find another more suitable solution.
Right now we have a dedicated server which hosts our own website, our intranet, and a lot of websites we've developed for clients.
Our own web and the intranet isn't an issue, however I'm not quite sure about the websites we produced for our clients.
There are about 100 of them right now, these sites are only used pre-launch so our clients can populate the sites with content. As soon as the content is done we host the website somewhere else. And the site that is still on our developer server is no longer used at all, but we keep them there if the client wants a new template/function so we can show it there before sending it to production.
This means the development sites have almost zero traffic, with perhaps at most 5 or so people adding content to them at any given time (5 people for all 100 sites, not 5 per site).
These sites needs to be available at all times, and should always feel snappy.
These are not static sites, they all require a database connection.
Is AWS (ES2, or any other kind of instance, lightsail?) a valid solution for hosting these sites. Or should I just downgrade our current dedicated server to a VPS, and just worry about hosting our main site on AWS?
I'll put this in an answer because it's too long, but it's just advice.
If you move those sites to AWS you're likely to end up paying (significantly) more than you do now. You can use the Simple Monthly Calculator to get an idea.
To clarify, AWS is cost-effective for certain workloads. It is cost effective because it can scale automatically when needed so you don't have to provision for peak traffic all the time. And because it's easy to work with, so it takes fewer people and you don't have to pay a big ops team. It is cost effective for small teams that want to run production workloads with little operational overhead, up to big teams that are not yet big enough to build their own cloud.
Your sites are development sites that just sit there and see very little activity. Which means those sites are probably under the threshold of cost effective.
You should clarify why you want to move. If the reason is that you want as close to 100% uptime as possible, then AWS is a good choice. But it will cost you, both in terms of bill paid to Amazon and price of learning to set up such infrastructure. If cost is a primary concern, you might want to think it over.
That said, if your requirements for the next year or more are predictable enough and you have someone who knows what they are doing in AWS, there are ways to lower the cost, so it might be worth it. But without further detail it's hard for anyone to give you a definitive answer.
However. You also asked if you should keep learning AWS. Yes. Yes, you should. If not AWS, one of the other major clouds. Cloud and serverless[1] are the future of much of this industry. For some that is very much the present. Up to you if you start with those dev sites or something else.
[1] "Serverless" is as misleading a name as NoSQL. It doesn't mean no servers.
Edit:
You can find a list of EC2 (Elastic Cloud Compute) instance types here. That's CPU and RAM. Realistically, the cheapest instance is about $8 per month. You also need storage, which is called EBS (Elastic Block Store). There are multiple types of that too, you probably want GP2 (General Purpose SSD).
I assume you also have one or more databases behind those sites. You can either set up the database(s) on EC2 instance(s), or use RDS (Relational Database Service). Again, multiple choices there. You probably don't want Multi-AZ there for dev. In short, Multi-AZ means two RDS instances so that if one crashes the other one takes over, but it's also double the price. You also pay for storage there, too.
And, depending on how you set things up you might pay for traffic. You pay for traffic between zones, but if you put everything in the same zone traffic is free.
Storage and traffic are pretty cheap though.
This is only the most basic of the basics. As I said, it can get complicated. It's probably worth it, but if you don't know AWS you might end up paying more than you should. Take it slow and keep reading.
I am trying to use DynamoDB for the backend DB of my application, but am having a hard time finding useful information associated to it.
What is the best source of examples and tutorial information for syntax structures etc?
AWS docs are really confusing. Or am I the only person sitting with these problems?
Oh and is the newly launched AWS DocumentDB (Basically MongaDB) going to make DynamoDB pointless to learn, or is there still merit in learning DynamoDB?
The pricing model between DocumentDB and DynamoDB are completely different - there is definitely a place for both - imo, dynamodb is not going away any time soon.
As far as tutorials - there are tons of AWS reinvent videos on youtube, and this site allows you to search/find them easily: https://reinventvideos.com/. Good place to start.
What is a better mBaaS that supports offline sync and caching?
I am evaluating several mBaaS solutions for my hybrid mobile app under development. I looked at Kinvey, Kii, buddy, and Telerik BackEnd platform. I have also came across some open source solutions like openmobster and dreamfactory. I am looking to store data in sql-lite on mobile app and then sync it back with an online data store. Kinvey has this support, but their pricing model (per user) is not suitable in my scenario. I can see that openmobster does this but, how is what I need to understand? Can I host in on Azure VM or something? Also please suggest if there is any other solution commercial/open source capable of doing offline sync and caching with push notifications and data storage?
DreamFactory could be a good fit for your scenario. It is open source and comes with a full 30 days of free support. After which it's only like $25/month for a developer account - and this isn't even a requirement to use its product. It's specifically a support package.
To address your question a little more in-depth... I don't believe DreamFactory supports offline syncing at the moment, though they plan to very soon. In regards to sql-lite, DreamFactory's (DSP) product has a built in sql-lite driver to connect to that DB. However, it hasn't been tested enough for them to say it is a fully supported RDBMS. One of the beautiful things about DreamFactory is you're able to host the DSP (DreamFactory Service Platform) on Azure and Amazon EC2 instances (cloud solutions), host locally on your own server, or even use its own free hosted edition!
I would definitely take a little time to look into DF. It doesn't seem to me like you have much to lose. Especially, considering it's a free open-source product!
Feel free to ask me any questions you may have about DreamFactory!
-Mark
I have a web app running on php, mysql, apache on a virtual windows server. I want to redesign it so it is scalable (for fun so I can learn new things) on AWS.
I can see how to setup an EC2 and dump it all in there but I want to make it scalable and take advantage of all the cool features on AWS.
I've tried googling but just can't find a simple guide (note - I have no command line experience of Linux)
Can anyone direct me to detailed resources that can lead me through the steps and teach me? Or alternatively, summarise the steps in an answer so I can research based on what you say.
Thanks
AWS is growing and changing all the time, so there aren't a lot of books to help. Amazon offers training that's excellent. I took their three day class on Architecting with AWS that seems to be just what you're looking for.
Of course, not everyone can afford to spend the travel time and money to attend a class. The AWS re:Invent conference in November 2012 had a lot of sessions related to what you want, and most (maybe all) of the sessions have videos available online for free. Building Web Scale Applications With AWS is probably relevant (slides and video available), as is Dissecting an Internet-Scale Application (slides and video available).
A great way to understand these options better is by fiddling with your existing application on AWS. It will be easy to just move it to an EC2 instance in AWS, then start taking more advantage of what's available. The first thing I'd do is get rid of the MySql server on your own machine and use one offered with RDS. Once that's stable, create one or more read replicas in RDS, and change your application to read from them for most operations, reading from the main (writable) database only when you need completely current results.
Does your application keep any data on the web server, other than in the database? If so, get rid of all local storage by moving that data off the EC2 instance. Some of it might go to the database, some (like big files) might be suitable for S3. DynamoDB is a good place for things like session data.
All of the above reduces the load on the web server to just your application code, which helps with scalability. And now that you keep no state on the web server, you can use ELB and Auto-scaling to automatically run multiple web servers (and even automatically launch more as needed) to handle greater load.
Does the application have any long running, intensive operations that you now perform on demand from a web request? Consider not performing the operation when asked, but instead queueing the request using SQS, and just telling the user you'll get to it. Now have long running processes (or cron jobs or scheduled tasks) check the queue regularly, run the requested operation, and email the result (using SES) back to the user. To really scale up, you can move those jobs off your web server to dedicated machines, and again use auto-scaling if needed.
Do you need bigger machines, or perhaps can live with smaller ones? CloudWatch metrics can show you how much IO, memory, and CPU are used over time. You can use provisioned IOPS with EC2 or RDS instances to improve performance (at a cost) as needed, and use difference size instances for more memory or CPU.
All this AWS setup and configuration can be done with the AWS web console, or command-line tools, or SDKs available in many languages (Python's boto library is great). After learning the basics, look into CloudFormation to automate it better (I've written a couple of posts about that so far).
That's a bit of the 10,000 foot high view of one approach. You'll need to discover the details of each AWS service when you try to use them. AWS has good documentation about all of them.
Depending on how you look at it, this is more of a comment than it is an answer, but it was too long to write as a comment.
What you're asking for really can't be answered on SO--it's a huge, complex question. You're basically asking is "How to I design a highly-scalable, durable application that can be deployed on a cloud-based platform?" The answer depends largely on:
The specifics of your application--what does it do and how does it work?
Your tolerance for downtime balanced against your budget
Your present development and deployment workflow
The resources/skill sets you have on-staff to support the application
What your launch time frame looks like.
I run a software consulting company that specializes in consulting on Amazon Web Services architecture. About 80% of our business is investigating and answering these questions for our clients. It's a multi-week long project each time.
However, to get you pointed in the right direction, I'd recommend that you look at Elastic Beanstalk. It's a PaaS-like service that abstracts away the underlying AWS resources, making AWS easier to use for developers who don't have a lot of sysadmin experience. Think of it as "training wheels" for designing an autoscaling application on AWS.
If I want to write a django app with Amazon SimpleDB, should I install a local SimpleDB server in my development environment? If so, is there a good one around? simpledb-dev seems to be no longer maintained. Or, should I access the DB on the cloud directly?
I would access simpledb directly, just point to a different account or a different set of domains.
By the way, I don't think there are any "local SimpleDB" servers. You would have to write your own test implementation which sounds like a nightmare, but then again maybe you have a lot of free time.
Also, you are probably going to want to actually get a feel for SimpleDB which will be easier if you are using the real thing.