Integrating AWS EC2, RDS and... S3? - amazon-web-services

I'm following this tutorial and 100% works like a charm :
http://docs.aws.amazon.com/gettingstarted/latest/wah-linux/awsgsg-wah-linux.pdf
but, in that tutorial, it use Amazon EC2 and RDS only. I was wondering what if my servers scaled up into multiple EC2 instances then I need to update my PHP files.
do I have to distribute it manually across those instances? because, as far as I know, those instances are not synced each other.
so, I decided to use S3 as replacement of my /var/www so the PHP files is now centralised in one place.
so, whenever those EC2 scaled up, the files remains in one place and I don't need to upload to multiple EC2.
is this the best practice to have centralised file server (S3) for /var/www ? because currently I still having permission issue when it's mounted using s3fs.
thank you.

You have to put your /var/www/ in S3 and when your instances scaled up have to make 'aws s3 sync' from your bucket, you can do that in the userdata. Also you have to select a 'master' instance where you make changes, a sync script upload changes to S3 and with rsync it copy changes to your alive FE. This is because if you have 3 FE that downloaded /var/www/ from S3 and you want to make a new change you would have to make a s3 sync in all your instances.
You can manage changes in your 'master' instance with inotify. Inotify can detect a change in /var/www/ and exec two commands, one could be aws s3 sync and then a rsync to the rest of your instances. You can get the list of your instances from the ELB through the AWS API.
The last thing is check the instance terminate protection in your 'master' instance.
Your architecture should look like here http://www.markomedia.com.au/scaling-wordpress-in-amazon-cloud/
Good look!!

Related

AWS EC2 Apache file persistence

I'm new to AWS and a little perplexed as to the situation with var/www/html folder in an EC2 instance in which Apache has been installed.
After setting up an Elastic Beanstalk service and uploading the files, I see that these are stored in the regular var/www/html folder of the instance.
From reading AWS documents, it seems that instances may be deleted and re-provisioned, which is why use of an S3 bucket, EFS or EBS is recommended.
Why, then, is source code stored in the EC2 instance when using an apache server? Won't these files potentially be deleted with the instance?
If you manually uploaded some files and data to /var/www/html then off course they will be wiped out when AWS is going to replace/delete the instance, e.g. due to autoscalling.
All files that you use on EB should be part of your deployment package, and all files that, e.g. your users upload, should be stored outside of Eb, e.g. on S3.
Even if the instance is terminated for some reason, since the source code is part of your deployment package on Beanstalk it can provision another instance to replace with the exact same application and configurations. Basically you are losing this data, but it doesn't matter.
The data loss concern is for anything you do that is not part of the automated provisioning/deployment, ie any manual configuration changes or any data your application may write to ephemeral storage. This is what you would need a more persistent storage option for.
Seems that, when the app is first deployed, all files are uploaded to an S3 bucket, from where these are copied into the relevant directory of each new instance. In the event new instances have to be created (such as for auto-scaling) or replaced, these instances also pull the code from the S3 bucket. This is also how the app is re-deployed - the bucket is updated and each instance makes a new copy of the code.
Apologies if this is stating the obvious to some people, but I had seen a few similar queries.

Using AWS Lambda to copying S3 files to on-premise LAN folder

Problem:
We need to perform a task under which we have to transfer all files ( CSV format) stored in AWS S3 bucket to a on-premise LAN folder using the Lambda functions. This will be a scheduled tasks which will be carried out after every 1 hour, and the file will again be transferred from S3 to on-premise LAN folder while replacing the existing ones. Size of these files is not large (preferably under few MBs).
I am not able to find out any AWS managed service to accomplish this task.
I am a newbie to AWS, any solution to this problem is most welcome.
Thanks,
Actually, I am looking for a solution by which I can push S3 files to on-premise folder automatically
For that you need to make the on-premise network visible to the logic (lambda, whatever..) "pushing" the content. The default solution is using the AWS site-to-site VPN.
There are multiple options for setting up the VPN, you could choose based on the needs.
Then the on-premise network will look just like another subnet.
However - VPN has its complexity and cost. In most of the cases it is much easier to "pull" data from the on-premise environment.
To sync data there are multiple options. For a managed service, I could point out the S3 Gateway which based on your description sounds like an insane overkill.
Maybe you could start with a simple cron job (or a task timer if working with windows) and run a CLI command to sync the S3 content or just copy specified files.
Check out S3 Sync, I think it will help you accomplish this task: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/sync.html#examples
To run any AWS CLI in your computer, you will need to setup credentials, and the setup account/roles should have permissions to do the task (e.g. access S3)
Check out AWS CLI setup here: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html

Upload files to Amazon EC2 in a private network from Github Actions

As part of our workflow, we want to upload files to our Amazon EC2 instance automatically.
It's currently only allowing whitelisted IP ranges to connect over SSH. And since we are running Github actions, it seems odd to white list roughly 1500 IP ranges.
Does anyone have an intelligent solution for this?
SCP and/or rsync don't matter for us.
It's merely getting access that I need help with.
I have access to the ssh key, and I can get a hold of an admin to get temporary access to the AWS Console should I need it.
Since the EC2 instance is in a private network, the hurdles to get Github Actions ssh access to it are many.
I would work with a decoupled architecture. Have the GitHub action upload the files to S3.
Then
Lambda can load the file onto the ec2 instance - S3 trigger for Lambda
OR
Have a process running on the ec2 instance poll for new events on the s3 bucket per SNS - S3 polling

On-Premise file backup to aws

Use case:
I have one directory on-premise, I want to make a backup for it let's say at every midnight. And want to restore it if something goes wrong.
Doesn't seem a complicated task,but reading through the AWS documentation even this can be cumbersome and costly.Setting up Storage gateway locally seems unnecessarily complex for a simple task like this,setting up at EC2 costly also.
What I have done:
Reading through this + some other blog posts:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
https://docs.aws.amazon.com/storagegateway/latest/userguide/WhatIsStorageGateway.html
What I have found:
1.Setting up file gateway (locally or as an EC2 instance):
It just mount the files to an S3. And that's it.So my on-premise App will constantly write to this S3.The documentation doesn't mention anything about scheduled backup and recovery.
2.Setting up volume gateway:
Here I can make a scheduled synchronization/backup to the a S3 ,but using a whole volume for it would be a big overhead.
3.Standalone S3:
Just using a bare S3 and copy my backup there by AWS API/SDK with a manually made scheduled job.
Solutions:
Using point 1 from above, enable versioning and the versions of the files will serve as a recovery point.
Using point 3
I think I am looking for a mix of file-volume gateway: Working on file level and make an asynchronus scheduled snapshot for them.
How this should be handled? Isn't there a really easy way which will just send a backup of a directory to the AWS?
The easiest way to backup a directory to Amazon S3 would be:
Install the AWS Command-Line Interface (CLI)
Provide credentials via the aws configure command
When required run the aws s3 sync command
For example
aws s3 sync folder1 s3://bucketname/folder1/
This will copy any files from the source to the destination. It will only copy files that have been added or changed since a previous sync.
Documentation: sync — AWS CLI Command Reference
If you want to be more fancy and keep multiple backups, you could copy to a different target directory, or create a zip file first and upload the zip file, or even use a backup program like Cloudberry Backup that knows how to use S3 and can do traditional-style backups.

AWS Windows EC2 Pull From S3 on Upload

I have a subset of Windows EC2 instances that I would like to continuously copy files to whenever files are uploaded to a specific S3 bucket. Files will be uploaded to this bucket anywhere between once a month to several times a month but will need to be copied to the instances within an hour of upload. EC2 instances will be continually added and removed from this subset of instances. I would like this functionality to be controlled by the EC2 instance so that whenever a new instance is created, it can be configured to pull from this bucket. Ideally, this would be an instantaneous upon upload (vs a cron job running periodically). I have researched AWS Lamba and S3-notifications, and I am unsure if these are the correct methods to use. What solution is best suited to fit this model of copying files?
If you don't need "real time" presence of the files, you might think to run s3 sync on each instance by a cron job (easy one) or s3-notification->with some lambda works to deliver EC2 Run Command.
If the instances are in an autoscaling group, you can use aws s3 copy in the user data section of your launch config to accomplish this.