updating all files on AWS EC2 - amazon-web-services

I'm trying to determine the "best" way for a small company to keep web app EC2 instances in sync with current files while using autoscaling.
From my research, CloudFormation, Chef, Puppet, OpsWorks, and others seem like the tools to do so. All of them seem to have a decent learning curve, so I am hoping someone can point me in the right direction and I'll learn one.
The initial setup I am after is:
Route53
1x Load Balancer
2x EC2 (different AZ) - Apache/PHP
1x ElastiCache Redis
2x EC2 (different AZ) w/ MySQL
Email thru Google Apps
Customer File/Image Storage via S3
CloudFront for CDN
The only major challenge I can see is versioning/syncing the web/app server. We're small now, so I could probably just manually update the EBS or even using rsync, but I would rather automate it and be setup for autoscaling.

This is probably too broad of a question and may be closed, but let me give you a few thoughts.
Why not use RDS for MySQL?
You need to get into the thought of how to make and promote disk images. In the cloud world, you don't want to be rsyncing around a bunch of files from server to server. When you are ready to publish a revised set of code, just make am image from your staging environment, start new EC2 instances in your ELB based on that image, and turn off old instances. You may have a little different deployment sequence if you need to coordinate with DB schema changes, but that is a pretty straightforward approach.
You should still seek to automate some of your activities using tools such as those you mentioned. You don't need to do this all at once. Just figure out a manual part in your process that you want to automate and do it.

Related

Amazon ec2 set up best practices for Rails app with mysql or postgres

I have to setup ec2 for a medium rails app running on apache2, mysql, capistrano and a few background services. I would like to know what is the best practices that every developer usually does to set up his rails app. I would like to know what kind of setup that is easy to scale and can mimimic at least
auto deployment
security
regular data backup and an easy and quick way to restore the data
server recovery
fault tolerance
I am also interested in how to monitor the server status and performance and other kind of best practice would be also helpful.
ps: take into account also that my app database will grow a fast.
I think a good look into the AWS docs and in particular the architecture center would be the best place to start. However, let me address as many of your questions as I can.
Database
The easiest way to get a scalable, fault tolerant database on AWS is to use the Relational Database Service. You should read the docs and best practices to ensure you get the most out of it - ie. multiple AZs.
EC2 Servers
The most recommended way to structure your servers is to decouple them into Web Servers (serve html to users) and App Servers (application logic, usually returns json or xml etc). See this architecture example.
However, the key is to use an AutoScaling group behind an Elastic Load Balancer.
Automation
If you want to use capistrano, just install it into your servers. You could create a pre-configured AMI with it installed along with whatever else you want. Alternatively, you could install it in a deployment script. However, the most recommended method for this kind of thing is to use the AWS OpsWorks service which is Chef in the cloud.
Server Recovery & Fault Tolerance
If you use EC2 AutoScaling, if a server becomes unavailable ie. hardware fails or it stops replying to EC2 health checks, AutoScaling will automatically terminate it and launch a replacement.
With the addition of the ELB and ELB health checks, instances that stop responding to web requests can be brought out of service by the ELB.
You need to read the docs for more info on this.
Backup and Recovery
For backing up data on EBS volumes attached to EC2 instances, use EBS Snapshots. However, the best types of architectures keep EC2 instances stateless - they don't store anything except application code, if they died it wouldn't matter. In these situations all data, including user files can be stored on S3. On S3, you have a number of back up options such as Cross Region Replication and or data archiving to Glacier
Monitoring
AWS provides CloudWatch which can provide you with hypervisor visible metrics such as network in and out, CPU utilization and more. If you want to get more data, you could use custom metrics and push things like eg. memory usage. In addition to cloudwatch, you could use a server level monitoring tool.
Deployment
I recommend AWS Code Deploy.
Security
Use Security Groups to open only the ports you want users to be able to connect on. Also, use security groups to lock down important ports eg.22 to only a specific set of IPs. You can also use Network ACLS to block undesired traffic. AWS provides more information and suggestions here.
I also recommend you read this Whitepaper.

Do I need to duplicate code on every EC2 instance running behind an ELB?

Hi this is a very noob question, but I am trying to deply my Node JS API server on AWS.
Everything is working fine with one m1.large instance that my Front End running on S3 connects to.
Now I want to Scale and put my EC2 instance and possibly many more behing and ELB and an Auto Scaling Group.
Do I need to duplicate my server code on every EC2 instance?
If so , I assume I'll have to create a seperate DB server which all of the EC2 instances will connect to.
Am I right,anyone experienced in Amazon AWS can answer this, I tried googling but most of the links point to detailed tutorials which however don't answer my question.
Any help would be much appreciated. Thanks
yep. that's basically correct. the code needs to be on all instances fronted by the load balancer. for the database you may want to look into RDS.
Of course NOT.. But sure you can do..
That's why there are EFS volumes, which are shared volumes to more than one EC2 instance, but you have to choose a region that support them since they are available on certain regions. As a candidate AWS certified architect I would recommend you more than two options.
You can follow your first approach and create an EC2 instance put your code inside and then create an AMI and use this AMI to launch your upcoming EC2s through autoscaling group. In my opinion bad decision since on any code change you have to go on each one and put the new code and then create a new AMI and a new Auto scaling configuration..Lot's of stuff to do, but it will work.
Second approach, following the first approach but do not create an AMI, instead upload your code on a private (I suppose) Repo like github, bitbucket, install SSM and the appropriate roles for managing EC2 and on every code changes push them to repo and pull them on your EC2, using SSM. Of course you may write a webhook to bitbucket to call an api and run the git pull command on each EC2. Probably the last sentence could be a third approach but needs more coding!!!
Last but not least!! Use an EFS volume put your code there, mount this volume on your EC2, add a auto mount command on every boot, alter your apache httpd main document to point on this EFS/folder and create an AMI with this configuration. Voila! every new EC2 will use the same code which located on this shared/network volume. Whenever you need to change something you have to log in on a third instance outside of your autoscaling group for a certain amount of time upload your changes and then turn it off and all of your EC2 will take immediately the new code. Of course you may pull the changes from a repo following the third approach.
Maybe there are more approaches, I'm using the third one with private repos of course and until now I haven't faced any problem (Fingers crossed)!
One other option is to use Elastic Beanstalk to Deploy NodeJs applications. Here is the guide specific to NodeJs. This will take care of most of the stuff which you would need to do otherwise if you only use EC2 For example: ELB, Autoscaling Cloudwatch etc.
For Database, you may want to use the Master Slave with Read Replicas. Another option is to evaluate NoSql Databases like DynamoDB if it fits your use case. The scalability of DynamoDB tables is managed by AWS so you dont need to worry about it.

How to setup shared persistent storage for multiple AWS EC2 instances?

I have a service hosted on Amazon Web Services. There I have multiple EC2 instances running with the exact same setup and data, managed by an Elastic Load Balancer and scaling groups.
Those instances are web servers running web applications based on PHP. So currently there are the very same files etc. placed on every instance. But when the ELB / scaling group launches a new instance based on load rules etc., the files might not be up-to-date.
Additionally, I'd rather like to use a shared file system for PHP sessions etc. than sticky sessions.
So, my question is, for those reasons and maybe more coming up in the future, I would like to have a shared file system entity which I can attach to my EC2 instances.
What way would you suggest to resolve this? Are there any solutions offered by AWS directly so I can rely on their services rather than doing it on my on with a DRBD and so on? What is the easiest approach? DRBD, NFS, ...? Is S3 also feasible for those intends?
Thanks in advance.
As mentioned in a comment, AWS has announced EFS (http://aws.amazon.com/efs/) a shared network file system. It is currently in very limited preview, but based on previous AWS services I would hope to see it generally available in the next few months.
In the meantime there are a couple of third party shared file system solutions for AWS such as SoftNAS https://aws.amazon.com/marketplace/pp/B00PJ9FGVU/ref=srh_res_product_title?ie=UTF8&sr=0-3&qid=1432203627313
S3 is possible but not always ideal, the main blocker being it does not natively support any filesystem protocols, instead all interactions need to be via an AWS API or via http calls. Additionally when looking at using it for session stores the 'eventually consistent' model will likely cause issues.
That being said - if all you need is updated resources, you could create a simple script to run either as a cron or on startup that downloads the files from s3.
Finally in the case of static resources like css/images don't store them on your webserver in the first place - there are plenty of articles covering the benefit of storing and accessing static web resources directly from s3 while keeping the dynamic stuff on your server.
From what we can tell at this point, EFS is expected to provide basic NFS file sharing on SSD-backed storage. Once available, it will be a v1.0 proprietary file system. There is no encryption and its AWS-only. The data is completely under AWS control.
SoftNAS is a mature, proven advanced ZFS-based NAS Filer that is full-featured, including encrypted EBS and S3 storage, storage snapshots for data protection, writable clones for DevOps and QA testing, RAM and SSD caching for maximum IOPS and throughput, deduplication and compression, cross-zone HA and a 100% up-time SLA. It supports NFS with LDAP and Active Directory authentication, CIFS/SMB with AD users/groups, iSCSI multi-pathing, FTP and (soon) AFP. SoftNAS instances and all storage is completely under your control and you have complete control of the EBS and S3 encryption and keys (you can use EBS encryption or any Linux compatible encryption and key management approach you prefer or require).
The ZFS filesystem is a proven filesystem that is trusted by thousands of enterprises globally. Customers are running more than 600 million files in production on SoftNAS today - ZFS is capable of scaling into the billions.
SoftNAS is cross-platform, and runs on cloud platforms other than AWS, including Azure, CenturyLink Cloud, Faction cloud, VMware vSPhere/ESXi, VMware vCloud Air and Hyper-V, so your data is not limited or locked into AWS. More platforms are planned. It provides cross-platform replication, making it easy to migrate data between any supported public cloud, private cloud, or premise-based data center.
SoftNAS is backed by industry-leading technical support from cloud storage specialists (it's all we do), something you may need or want.
Those are some of the more noteworthy differences between EFS and SoftNAS. For a more detailed comparison chart:
https://www.softnas.com/wp/nas-storage/softnas-cloud-aws-nfs-cifs/how-does-it-compare/
If you are willing to roll your own HA NFS cluster, and be responsible for its care, feeding and support, then you can use Linux and DRBD/corosync or any number of other Linux clustering approaches. You will have to support it yourself and be responsible for whatever happens.
There's also GlusterFS. It does well up to 250,000 files (in our testing) and has been observed to suffer from an IOPS brownout when approaching 1 million files, and IOPS blackouts above 1 million files (according to customers who have used it). For smaller deployments it reportedly works reasonably well.
Hope that helps.
CTO - SoftNAS
For keeping your webserver sessions in sync you can easily switch to Redis or Memcached as your session handler. This is a simple setting in the PHP.ini and they can all access the same Redis or Memcached server to do sessions. You can use Amazon's Elasticache which will manage the Redis or Memcache instance for you.
http://phpave.com/redis-as-a-php-session-handler/ <- explains how to setup Redis with PHP pretty easily
For keeping your files in sync is a little bit more complicated.
How to I push new code changes to all my webservers?
You could use Git. When you deploy you can setup multiple servers and it will push your branch (master) to the multiple servers. So every new build goes out to all webserver.
What about new machines that launch?
I would setup new machines to run a rsync script from a trusted source, your master web server. That way they sync their web folders with the master when they boot and would be identical even if the AMI had old web files in it.
What about files that change and need to be live updated?
Store any user uploaded files in S3. So if user uploads a document on Server 1 then the file is stored in s3 and location is stored in a database. Then if a different user is on server 2 he can see the same file and access it as if it was on server 2. The file would be retrieved from s3 and served to the client.
GlusterFS is also an open source distributed file system used by many to create shared storage across EC2 instances
Until Amazon EFS hits production the best approach in my opinion is to build a storage backend exporting NFS from EC2 instances, maybe using Pacemaker/Corosync to achieve HA.
You could create an EBS volume that stores the files and instruct Pacemaker to umount/dettach and then attach/mount the EBS volume to the healthy NFS cluster node.
Hi we currently use a product called SoftNAS in our AWS environment. It allows us to chooses between both EBS and S3 backed storage. It has built in replication as well as a high availability option. May be something you can check out. I believe they offer a free trial you can try out on AWS
We are using ObjectiveFS and it is working well for us. It uses S3 for storage and is straight forward to set up.
They've also written a doc on how to share files between EC2 instances.
http://objectivefs.com/howto/how-to-share-files-between-ec2-instances

Auto scaling and data replication on EC2

Here is my scenario.
We have an ELB setup with two reserved instances of EC2 acting as web server under it (Amazon Linux).
There are some rapidly changing files (pdf, xls, jpg, etc) on the web server which are consumed by the websites hosted on the EC2 instances. Code files are identical and we will be sure to update both the servers manually at the same time with new code as and when needed.
The main problem is the user uploaded content which is stored on the EC2 instance.
What is the best approach to make sure that the uploaded files are available on both the servers almost instantly ?
Many people have suggested the use of rsync or unison, but this will involve setting a cron job. I am looking for something like FileSystemWatcher in C# which is triggered
ONLY when the contents of the specified folder are changed. Moreover due to the ELB we are not sure which of the EC2 instances will actually be connected to the user when the files are uploaded.
To add to the above we have one more Staging Server which pushes certain files to one of the EC2 web servers. We want these files too replicated to the other instance.
I was wondering whether S3 can solve the problem ? Will this setup be still good if we decide to enable auto scaling ?
I am confused at this stage. Please help
S3 will be the choice for your case. In this way, you don't have to sync files between EC2 instances. Also it is probably the best choice if you need to enable auto scaling. You should not put any data in EC2 instances, they should be stateless so that you can easily auto scale.
To use S3, it will require your application to support it instead of directly writing to local file system. It should be quite easy, there are many libraries in each language which can help you to store files into S3.

Is s3cmd a safe option for sync EC2 instances?

I have the following problem: we are working on a project on AWS which will use autoscaling, so the EC2 instances will start and die very often. Freeze images, update the launch configurations, auto scalling groups, alarms, etc, takes a while and several things can go wrong.
I just want the new instances to sync the most recent code, so I was just thinking about fetching it from S3 using s3cmd once the instance finishes booting and manually updating it everytime we have new codes to be uploaded. So my doubts are:
Is it too much risky to store the code on s3? How secure are the files in there? Using the s3cmd encryption password it is unlikely someone will be able do decrypt them?
What other ooptions would be good for this? I was thinking about rsync, but then I think I would need to store the private key for the servers inside them, which I don't think its a good idea.
Thanks for any advices
You might be a candidate for Elastic Beanstalk - using a plain vanilla AMI.
Then package your application, use AWS's ebextensions tool to customize the instance when it is spun up. ebextensions will allow you to do anything you like to the image, in place, as it is deploying. change .htaccess, erase a file, place a cron job, whatever.
When you have code updates, package them, upload and do a rolling update.
All instances will use your latest code, including auto-scaled ones.
The key concept here is to never have your real data in the instance, where it might go away if an instance dies or is shut down.
Elastic Beanstalk will allow you to set up the load balancing, auto-scaling, monitoring, etc.