I am automating installation of a distributed software using Chef. It has multiple nodes and on the master node, it creates an encryption key in a file. That file should be present on all the slave nodes. I was planning to publish this file to S3 from master node and download it on slave nodes. I know we can use s3_file cookbook to download file from s3. I don't know how to upload a file to s3 in Chef. So, looking for suggestion how can I upload to S3 or what is other workaround if s3 uploading is not available. thanks
There is nothing specific in Chef for this, and I would highly recommend not handling this in Chef as then you have to deal with all kinds of race conditions when multiple nodes are booting simultaneously :) Probably just create and upload the key manually to your secrets management system (S3 buckets with ACLs are a simple option, but there is also SSM Parameter Store, or the newer AWS Secrets Manager), and then just deal with the download end from Chef.
You can use AWS CLI to upload a file to s3. If it is windows then use PowerShell to upload a file otherwise use a terminal script to do so. First, you need to install AWS CLI and then use AWS s3 cp command to upload a local file to AWS.
Related
Use case:
I have one directory on-premise, I want to make a backup for it let's say at every midnight. And want to restore it if something goes wrong.
Doesn't seem a complicated task,but reading through the AWS documentation even this can be cumbersome and costly.Setting up Storage gateway locally seems unnecessarily complex for a simple task like this,setting up at EC2 costly also.
What I have done:
Reading through this + some other blog posts:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
https://docs.aws.amazon.com/storagegateway/latest/userguide/WhatIsStorageGateway.html
What I have found:
1.Setting up file gateway (locally or as an EC2 instance):
It just mount the files to an S3. And that's it.So my on-premise App will constantly write to this S3.The documentation doesn't mention anything about scheduled backup and recovery.
2.Setting up volume gateway:
Here I can make a scheduled synchronization/backup to the a S3 ,but using a whole volume for it would be a big overhead.
3.Standalone S3:
Just using a bare S3 and copy my backup there by AWS API/SDK with a manually made scheduled job.
Solutions:
Using point 1 from above, enable versioning and the versions of the files will serve as a recovery point.
Using point 3
I think I am looking for a mix of file-volume gateway: Working on file level and make an asynchronus scheduled snapshot for them.
How this should be handled? Isn't there a really easy way which will just send a backup of a directory to the AWS?
The easiest way to backup a directory to Amazon S3 would be:
Install the AWS Command-Line Interface (CLI)
Provide credentials via the aws configure command
When required run the aws s3 sync command
For example
aws s3 sync folder1 s3://bucketname/folder1/
This will copy any files from the source to the destination. It will only copy files that have been added or changed since a previous sync.
Documentation: sync — AWS CLI Command Reference
If you want to be more fancy and keep multiple backups, you could copy to a different target directory, or create a zip file first and upload the zip file, or even use a backup program like Cloudberry Backup that knows how to use S3 and can do traditional-style backups.
I have been assigned with automating the task of fetching some files(csv/Excel) from certain websites and then loading them to S3. I would like to know if a script can be written in someway so that if you put the url and the s3 bucket path and run the script, the files would load onto S3; and if yes, how would I go about it?
Open to alternative solutions as well.
Thanks!
You can write a simple bash script, or any other language of your choice. Below are the steps for bash script:
Download the csv file using curl or wget.
Upload using aws-cli tools, aws s3 cp
Make sure you have aws-cli installed on the machine where you will be running this script.
You may want to read more about AWS CLI and AWS CLI S3 commands.
I need to transfer all our files (With folders structure) to AWS S3. I have researched lot about how this done.
Most of the places mentioned s3fs. But looks like this is bit old. And I have tried to install s3fs to my exsisting CentOS 6 web server. But its stuck on $ make command. (Yes there is Makefile.in)
And as per this answer AWS S3 Transfer Acceleration is the next better option. But still I have to write a PHP script (My application is PHP) to transfer all folders and files to S3. It is working same as how file save in S3 (API putObject), but faster. Please correct me if I am wrong.
Is there any other better solution (I prefer FTP) to transfer 1TB files with folders from CentOS 6 server to AWS S3? Is there any way to use FTP client in EC2 to transfer files from outside CentOS 6 to AWS S3?
Use the aws s3 sync command of the AWS Command-Line Interface (CLI).
This will preserve your directory structure and can be restarted in case of disconnection. Each execution will only copy new, changed or missing files.
Be aware that 1TB is a lot of data and can take significant time to copy.
An alternative is to use AWS Snowball, which is a device that AWS can send to you. It can hold 50TB or 80TB of data. Simply copy your data to the device, then ship it back to AWS and they will copy the data to Amazon S3.
Is there anyway to copy the file from s3 to EMR master node other than doing scp?
http://s3tools.org/s3cmd will allow you to upload and download files, even sync entire buckets to a linux server through the command line.
You can use s3distcp
With this you can download s3 files to master node
I would like to grab a file straight of the Internet and stick it into an S3 bucket to then copy it over to a PIG cluster. Due to the size of the file and my not so good internet connection downloading the file first onto my PC and then uploading it to Amazon might not be an option.
Is there any way I could go about grabbing a file of the internet and sticking it directly into S3?
Download the data via curl and pipe the contents straight to S3. The data is streamed directly to S3 and not stored locally, avoiding any memory issues.
curl "https://download-link-address/" | aws s3 cp - s3://aws-bucket/data-file
As suggested above, if download speed is too slow on your local computer, launch an EC2 instance, ssh in and execute the above command there.
For anyone (like me) less experienced, here is a more detailed description of the process via EC2:
Launch an Amazon EC2 instance in the same region as the target S3 bucket. Smallest available (default Amazon Linux) instance should be fine, but be sure to give it enough storage space to save your file(s). If you need transfer speeds above ~20MB/s, consider selecting an instance with larger pipes.
Launch an SSH connection to the new EC2 instance, then download the file(s), for instance using wget. (For example, to download an entire directory via FTP, you might use wget -r ftp://name:passwd#ftp.com/somedir/.)
Using AWS CLI (see Amazon's documentation), upload the file(s) to your S3 bucket. For example, aws s3 cp myfolder s3://mybucket/myfolder --recursive (for an entire directory). (Before this command will work you need to add your S3 security credentials to a config file, as described in the Amazon documentation.)
Terminate/destroy your EC2 instance.
[2017 edit]
I gave the original answer back at 2013. Today I'd recommend using AWS Lambda to download a file and put it on S3. It's the desired effect - to place an object on S3 with no server involved.
[Original answer]
It is not possible to do it directly.
Why not do this with EC2 instance instead of your local PC? Upload speed from EC2 to S3 in the same region is very good.
regarding stream reading/writing from/to s3 I use python's smart_open
You can stream the file from internet to AWS S3 using Python.
s3=boto3.resource('s3')
http=urllib3.PoolManager()
urllib.request.urlopen('<Internet_URL>') #Provide URL
s3.meta.client.upload_fileobj(http.request('GET', 'Internet_URL>', preload_content=False), s3Bucket, key,
ExtraArgs={'ServerSideEncryption':'aws:kms','SSEKMSKeyId':'<alias_name>'})