I have a subset of Windows EC2 instances that I would like to continuously copy files to whenever files are uploaded to a specific S3 bucket. Files will be uploaded to this bucket anywhere between once a month to several times a month but will need to be copied to the instances within an hour of upload. EC2 instances will be continually added and removed from this subset of instances. I would like this functionality to be controlled by the EC2 instance so that whenever a new instance is created, it can be configured to pull from this bucket. Ideally, this would be an instantaneous upon upload (vs a cron job running periodically). I have researched AWS Lamba and S3-notifications, and I am unsure if these are the correct methods to use. What solution is best suited to fit this model of copying files?
If you don't need "real time" presence of the files, you might think to run s3 sync on each instance by a cron job (easy one) or s3-notification->with some lambda works to deliver EC2 Run Command.
If the instances are in an autoscaling group, you can use aws s3 copy in the user data section of your launch config to accomplish this.
Related
I have 11 directories in Linux EC2 Instance where the external API adds data (.CSV files) to. I will need to schedule a job to copy ONLY those csv files from 11 directories into matching directories inside the Windows EC2 Instance daily. Both the Instances are on the same VPC but on different Security groups.
How can I accomplish the file transfer from Linux EC2 to a Windows EC2 in AWS?
"Pushing" content to a computer is always difficult due to security. And, in this situation, it is also cross-platform.
A simple solution would be:
Copy data from the source (Linux) computer to Amazon S3 on a regular schedule
Copy data from Amazon S3 to the destination (Windows) computer on a regular schedule
This can be done by triggering a script from cron / Scheduled Task which runs the AWS Command-Line Interface (CLI) aws s3 sync command. This is smart enough to copy files, but will only copy files that have been added/changed since the last use of the sync command.
See: aws s3 sync — AWS CLI Command Reference
You could copy the files hourly rather than daily, since there's no disadvantage.
Can someone explain me whats the best way to transfer data from a harddrive on an EC2 Instance (running Windows Server 2012) to an S3 Bucket for the same AWS Account on a daily basis?
Backround idea to this:
I'm generating a .csv file for one of our Business partners daily at 11:00 am and I want to deliver it to S3 (he has access to our S3 Bucket).
After that he can pull it out of S3 manually or automatically whenever he wants.
Hope you can help me, I only found manually solutions with the CLI, but no automated way for daily transfers.
Best Regards
You can directly mount S3 buckets as mounted drives on your EC2 instances. This way you don't even need some sort of triggers/daily task scheduler along with third party service as objects would be directly available in the S3 bucket.
For Linux typically you would use Filesystem in Userspace (FUSE). Take a look at this repo if you need it for Linux: https://github.com/s3fs-fuse/s3fs-fuse.
Regarding Windows, there is this tool:
https://tntdrive.com/mount-amazon-s3-bucket.aspx
If these tools don't suit you or if you don't want to mount directly the s3 bucket, here is another option: Whatever you can do with the CLI you should be able to do with the SDK. Therefore if you are able to code in one of the various language AWS Lambda proposes - C#/Java/Go/Powershell/Python/Node.js/Ruby - you could automate that using a Lambda function along with a daily task scheduler triggering at 11a.m.
Hope this helps!
Create a small application that uploads your file to an S3 bucket (there are a some example here). Then use Task Scheduler to execute your application on a regular basis.
Working on a problem at the moment where I want to export a file on an EC2 instance running a Windows AMI at four hour intervals to an S3 bucket. Currently, the architecture I'm thinking is as follows.
1. CloudWatch Events rule using scheduled trigger
2. Rule triggers Lambda function to run
3. Lambda function would use some form of the AWS CLI on the windows EC2 instance to extract (sync, cp, etc.) the file
4. File is placed is S3 bucket
Does anyone see a path that's more efficient than this one? I want to ensure that I'm handling this in the most straightforward manner. Thanks in advance for any input!
It is quite difficult to have external code (eg an AWS Lambda function) cause something to execute on a Windows computer. You could use Systems Manager Run Command, but that's a rather complex solution.
It would be much simpler to have the Windows computer push the files to Amazon S3:
Create a scheduled task in Windows
Use aws s3 cp or aws s3 sync to copy the files to Amazon S3
Done!
Your solution seems solid. Alternatively you may want to write daemon-like service (background process) that runs on each EC2 and does the data transfer from that instance to S3. What I like about your solution is how you can centrally control the scheduling easily. For my distributed solution you can have the processes read from central config, but that seems more complicated than the CW/Lambda solution.
For the EC2 process solution, this may be useful:
How to mount Amazon S3 Bucket as a Windows Drive, but it should be easy (and more scalable) to just use the AWS SDK instead to talk to S3
I am using AWS Data Pipeline to import some CSV data from S3 to Redshift. I also added a ShellCommandActivity to remove all S3 files after the copy activity completed. I attached a picture with the whole process.
Everything works fine but each activity starts it's own EC2 instance. Is it possible that the ShellCommandActivity to reuse the same EC2 instance as the RedshiftCopyActivity, after the copy command completed?
Thank you!
Unless you can do all activities in shell or CLI, it is not possible to do everything in the same instance.
One suggestion I can give is to move on to new technologies. AWS Data Pipeline is outdated (4 years old). You should use AWS Lambda which will cost you a fraction of what you are paying and you can load the files into Redshift as soon as the files are uploaded to S3. Clean up is automatic and Lambda is much more powerful than AWS Data Pipeline. The tutorial A Zero-Administration Amazon Redshift Database Loader is the one you want. Yes, there is some learning curve, but as the title suggest it is a zero administration load.
In order for the ShellCommandActivity to run on the same EC2 instance, I edited my ShellCommandActivity using Architect and for the Runs On option a chose Ec2Instance. The ShellCommandActivity gets mapped automatically to the same EC2Instance as the RedshiftCopyActivity. Now the whole process looks like this:
Thank you!
I'm following this tutorial and 100% works like a charm :
http://docs.aws.amazon.com/gettingstarted/latest/wah-linux/awsgsg-wah-linux.pdf
but, in that tutorial, it use Amazon EC2 and RDS only. I was wondering what if my servers scaled up into multiple EC2 instances then I need to update my PHP files.
do I have to distribute it manually across those instances? because, as far as I know, those instances are not synced each other.
so, I decided to use S3 as replacement of my /var/www so the PHP files is now centralised in one place.
so, whenever those EC2 scaled up, the files remains in one place and I don't need to upload to multiple EC2.
is this the best practice to have centralised file server (S3) for /var/www ? because currently I still having permission issue when it's mounted using s3fs.
thank you.
You have to put your /var/www/ in S3 and when your instances scaled up have to make 'aws s3 sync' from your bucket, you can do that in the userdata. Also you have to select a 'master' instance where you make changes, a sync script upload changes to S3 and with rsync it copy changes to your alive FE. This is because if you have 3 FE that downloaded /var/www/ from S3 and you want to make a new change you would have to make a s3 sync in all your instances.
You can manage changes in your 'master' instance with inotify. Inotify can detect a change in /var/www/ and exec two commands, one could be aws s3 sync and then a rsync to the rest of your instances. You can get the list of your instances from the ELB through the AWS API.
The last thing is check the instance terminate protection in your 'master' instance.
Your architecture should look like here http://www.markomedia.com.au/scaling-wordpress-in-amazon-cloud/
Good look!!