choosing the right EC2 Instance - amazon-web-services

I am currently working on a file sharing android project for my college, in which I want to use aws EC2 instance as my application server backend. But which EC2 instance should I have to use? I am very much confused about this.
My server contains few PHP scripts for Login, signup,and operations on database(adding rows of a shared file link, deleting..etc) I am using aws S3 to store the files before sharing the link of the file. Sharing the link is done over TCP/IP sockets. So I even want to use the same server for the socket server. There are many instance types here https://aws.amazon.com/ec2/instance-types/ . But don't know which one to use for my particular project.

I prefer to choose from http://www.ec2instances.info/ but honestly the decision remains up to you.
The only thumb of rule is : "start with small instances and get big when needed" - define some alerts about your CPU/RAM usage of your instance and then go to a bigger one when needed

Related

Saving images on files or on database when using docker

I'm using docker for a project, the main focus for its usage is to make the application available even if one of the node (it's a 6 nodes cluster with docker swarm) is down.
The application is basically a Django App that can save some images from users and others models. I'm currently saving the images as files, but since I need to specify a volume locally for a single machine, I would like to know if it would be better to save the images on database cluster, so it would be available even if the whole node goes down. Or is there another way?
#Edit
Note: The cluster runs locally and doesn't have internet access
The two options are two perform the file sharing via database or via the file system.
For file system sharing, you can use something like GlusterFS, so for each container it seems like they are mounting a host-local volume, but it's actually shared via GlusterFS between the hosts.
To my mind, if it's your application (e.g you can modify it at will), saving stuff in database would be the easier approach for most developers.
The best solution is often to go for a hosted option (such as MongoDB Atlas). Making a database resilient and highly available is really hard, and unless you are an expert on docker and mongo I would strongly recommend you to go for a hosted option.

Medium Hadoop / Spark Cluster Administration

Please let me know if this question is more appropriate for a different channel but I was wondering what the recommended tools are for being able to install, configure and deploy hadoop/spark across a large number of remote servers. I'm already familiar with how to setup all of the software but I'm trying to determine what I should start using that would allow me to easily deploy across a large number of servers. I've started to look into configuration management tools (ie. chef, puppet, ansible) but was wondering what the best and most user friendly option to start off with is out there. I also do not want to use spark-ec2. Should I be creating homegrown scripts to loop through a hosts file containing IP? Should I use pssh? pscp? etc. I want to just be able to ssh with as many servers as needed and install all of the software.
If you have some experience in scripting language then you can go for chef. The recipes are already available for deployment and configuration of cluster and it's very easy to start with.
And if wants to do it by your own then you can use sshxcute java API which runs the script on remote server. You can build up the commands there and pass them to sshxcute API to deploy the cluster.
Check out Apache Ambari. Its a great tool for central management of configs, adding new nodes, monitoring the cluster, etc. This would be your best bet.

How to only push local changes without destroying the container?

I have deployed my app (PHP Buildpack) to production with cf push app-name. After that I worked on further features and bugfixes. Now I would to push my local changes to production. But when I do that all the images (e.g. profile image) which are being saved on the production server get lost with every push.
How do I take over only the changes in the code without losing any stored files on the production server?
It should be like a "git pull"
Your application container should be stateless. To persist data, you should use the offered services. The Swisscom Application Cloud offers an S3 compatible Dynamic Storage (e.g. for pictures or user avatars) or different database services (MongoDB, MariaDB and others). If you need to save user data, you should save it in one of these services instead of the local filesystem of the app's container. If you keep your app stateless, you can migrate and scale it more easily. You can find more information about how your app should be structured to run in a modern cloud environment here. To get more information about how to use your app with a service, please check this link.
Quote from Avoid Writing to the Local File System
Applications running on Cloud Foundry should not write files to the
local file system for the following reasons:
Local file system storage is short-lived. When an application instance
crashes or stops, the resources assigned to that instance are
reclaimed by the platform including any local disk changes made since
the app started. When the instance is restarted, the application will
start with a new disk image. Although your application can write local
files while it is running, the files will disappear after the
application restarts.
Instances of the same application do not share a
local file system. Each application instance runs in its own isolated
container. Thus a file written by one instance is not visible to other
instances of the same application. If the files are temporary, this
should not be a problem. However, if your application needs the data
in the files to persist across application restarts, or the data needs
to be shared across all running instances of the application, the
local file system should not be used. We recommend using a shared data
service like a database or blobstore for this purpose.
In future your problem will be "solved" with Volume Services (Experimental). You will have a persistent disk for your app.
Cloud Foundry application developers may want their applications to
mount one or more volumes in order to write to a reliable,
non-ephemeral file system. By integrating with service brokers and the
Cloud Foundry runtime, providers can offer these services to
developers through an automated, self-service, and on-demand user
experience.
Please subscribe to our newsletter for feature announcements. Please also monitor the CF community for upstream development.

Deploying WordPress on Elastic Beanstalk?

Suppose I create a site in Wordpress, which is running on Elastic Beanstalk. Now, on the running app I will create posts /pages, upload images, etc. That is, some data, videos, files and records in a database will be added to the running application.
3 questions:
If WordPress is running on Elastic Beanstalk with multiple Amazon EC2 instances actually running my WordPress install, then will those files propagate automatically to all running instances? And will this also happen, if a new EC2 instance is fired up - for example, to handle increased load?
From what I see in AWS console, I can deploy different versions of an app-- but as per scenario above, if I deploy a new version, wont I lose all the files uploaded directly into running app (i.e. files and database records)? How do I keep those and at the same time deploy a new version of my app?
The WordPress team keeps issuing upgrades. Can I directly upgrade my running WordPress install, through the web interface? Or do I have to first upgrade my local version of WordPress, and then upload the new version of the app to Beanstalk? If I have to upgrade my local version and then upload, then again I am back to point 1, i.e. changes made by users directly to the older version of running app. How do I preserve those changes?
I've been working on this as well, and have learned a couple of things that are relevant here -- your question about uploads in particular has been on my mind:
(1) The best way to handle uploads, it seems to me, is to either go the NFS/NAS route like you suggest, but one better than that is to use an Amazon S3 plugin for WordPress, so that any uploads automatically copy up to S3 and the URLs in your WordPress media library reflect the FQDN of your bucket and not your specific site. That way you could have one or ten WP nodes in your Beanstalk and media/images are independent of any one of those servers.
(2) You should absolutely be using RDS here. Few things are easier to work with and as stress-free as a Multi-AZ, reserved MySQL RDS instance. Either that or your own EC2 running MySQL that is independent of the Beanstalk, but why run that when RDS is so much easier?
(3) Yes you definitely have to commit changes to your Git repository or local file first (new plugins, changes to themes, WP upgrades) and then upload/install as a revision to the Beanstalk code. Otherwise, all the changes you make via the web interface to one node will never be in the new load for a new node -- in fact you'll have an upgraded database but an older set of code in the Beanstalk application, so it's likely to create errors of some kind or another.
I took an AWS architecture course, and their advice for EC2 and the Beanstalk is to start to think about server instances as very disposable -- so you should try to think about easy ways for your boxes to provision themselves in the bootstrapping process and to take over work for one another without any precious resources on just one box. So losing an instance should never be a big deal. (This is definitely not how we thought in the world of physical servers, where we got everything tweaked 'just so'.)
Good luck!
Well, I'm no expert, but since no one has answered, I'll give it my best shot.
You are absolutely right--kind of. While each EC2 instance does have some local storage, it is destroyed and reset with each new instance. Because of this, Amazon has things like Elastic Block Storage and S3 for persistent files. I don't know how one would configure WP to utilize this, but that will likely be the solution.
I think this problem is solved by my answer to #1. As for the database, all of your EC2 instances should be pulling from the same RDS location. Again, while you could have MySQL running on each EC2 instance, in the interest of persistence, having a separate database makes more sense.
You, again, have most everything right. Local development should always precede live deployment. Upgrading locally then pushing to the live servers will make sure all of your instances remain identical.
Truth-be-told, I am still in the process of learning all of this too, as I said, I'm not an expert. Hopefully someone else will come along and give a more informed answer. However, the key conceptual hurdle here is the idea of elastic scalability--and the salient point of this idea is the separation of elements between what is elastic/scalable/disposable and what is persistent.
Hopefully that helps.
I have deployed a small Wordpress site on EB, S3 and RDS. S3 holds all static data, such as media uploads. This works through a plugin. RDS holds the database. EB holds the latest deployed application. The application is deployed from my dev environment, with a build script. This way, I just have to press one button and I redeploy.
I wrote an article about it here: http://www.cortexcode.com/wordpress-to-aws-code-example/
While it was at first annoying to work with, the speed of AWS is nice and now it's easier than ever. It used to be that I had to upload a bunch of files over FTP, this is way more efficient. :-)
As an addition to all the great answers already:
1) I can highly recommend EFS but also S3 for media files, so they are served from high availability regions in combination with cloudfront. For Wordpress there is one plugin that really speeds up this ( not affiliated to them, just really like the plugin ). There is also an assets plugin, if you'd like to serve JS, CSS files from S3. For the EFS solution, take a look at the AWSlabs docs on git, and specifically this file on how they mount the uploads file.
In general, EBS is really great for Wordpress, but you'll need to think in a different mindset as compared to other hosting solutions ( shared hosting, managed hosting ).
OK I researched a lot on this particular issue, and this is what I learned--
(1) If a wordpress user uploads some files, then his files will be uploaded only to the virtual machine that is actually serving his request at that time. Eg if currently the wordpress site is cloud-deployed and is using 5 virtual machines, now when user makes request he is directed to one virtual machine-- the one with the lowest load at that point... His uploads are stored only on that server. Current Platform-as-a-service solutions (like Amazon Elastic Beanstalk and App Fog) do not have the ability to propagate the changes to all the running instances. Either that (propagate changes to all servers) or use a common storage by all running instances-- these are the only 2 solutions to this problem. (Eg of common storage would be all 5 running virtual machines using Network-Attached-Storage (NAS)... )
(2) With ref to platforms available currently like Amazon Elastic Beanstalk and App Fog, for example, even if user made changes directly to running app- these platforms rely on the local version of code (which the admin deployed initially to cloud)- and there is no way to update the local version of code (on admin's PC) with the changes made by a user to running app-- hence these changes viz, files are lost-- Similarly, changes in database by user to running app are also lost-- unless the admin is using exactly the same database for his local app (that he deployed to cloud)
(3) Any changes to running apps first have to be made to the local app on admin's PC and then pushed to cloud.
I am working on a Cloud PaaS that addresses all these concerns-- viz updates can be made to running apps, code changes made to running app are also updated in code repository accessible by user...The Proof of concept is ready, hopefully it will be as good as I hope it should be :) -- currently the only thing that is actually there is the website (anyacloudpanel.com) -- design work is going on :)
If there is some rule that I should not mention my website( Anya Cloud Panel) -- then I am sorry -- pls feel free to edit and remove my website URL from my answer :)
Thanks,
Arvind.
Deploying WordPress to AWS Elastic Beanstalk does require some change to the normal WordPress deployment as mentioned here a few times. To answer your questions, here is a great tutorial explaining stateless applications and how to deploy to Elastic Beanstalk:
Deploying WordPress to Amazon Web Services AWS EC2 and RDS via ElasticBeanstalk
Be careful if you use a theme from themeforest for example. Some of them are incompatible with wordpress S3 plugin. Then you're screwed, you can not deploy your wordpress on the cloud.

Can I run 1000 AWS micro instances simultaneously?

I have a simple pure C program, that takes an integer as input, runs for a while (let's say an hour) and then returns me a text file. I want to run this program 1000 times with input integers from 1 to 1000.
Currently I'm running this program in parallel (4 processors), what takes 250 hours. The program is such that it fits in a AWS micro instance (I've tested it). Would it be possible to use 1000 micro instances in AWS to do the whole job in one hour? (at a cost of ~20$ - $0.02/instance)?
If it is possible, does anybody have some guidlines on how to do that?
If it is not, does anybody have an also low-budget alternative to that?
Thanks a lot!
In order to achieve this, you will need to:
Create a S3 bucket to store your bootstrapping scripts, application, input data and output data
Custom Lightweight AMI: you might want to create a custom lightweight AMI which knows about downloading the bootstrapping script
Bootstrapping Script: will download your software from your S3 bucket, parse a custom instance tag which will contain the integer [1..1000] and download any additional data.
Your application: does the processing stuff.
End of processing script is which uploads the result to a another result S3 bucket and terminates the instance, you might also want to send a SNS notification to communicate the end of processing status.
If you need result consolidation you might want to create another instance and use it as a coordinator, waiting for all "end of processing" notifications in order to finish the processing. In this case you might consider using in the future the Amazon's Hadoop map reduce engine, since it will do almost all this heavy lifting for free.
You didn't specify what language you'd like to do this in and since the app you use to deploy your program doesn't have to be in the same language I would recommend C#. There are a number of examples around the web on how to programmatically spawn new Amazon Instances using the SDK. For example:
How to start an Amazon EC2 instance programmatically in .NET
You can create your own AMIs with the program already present but that might end up being a pain if you want to make adjustments to it since it will require you to recreate the entire AMI. I'd recommend creating an extra instance or simply hosting the program in a location which is accessible from a public URL. Then I would create some kind of service which would be installed on the AMI ahead of time to allow me to specify the URL to download the app from along with whatever command line parameters I wanted for that particular instance.
Hope this helps.
Be carefull by default you can only spawn 20 instance by zone. You need to ask amazon in order to use more instances.
If you want low cost but don't care about delay you should use spot instances.