aws cli copy command halted - amazon-web-services

I used Putty to get into my AWS instance and ran a cp command to copy files into my S3 instance.
aws cli cp local s3://server_folder --recursive
Partway through, my internet dropped out and the copy halted even though the AWS instances was still running properly. Is there a way to make sure the cp command keeps running even if I lose my connection?

You can alternatively use Minio Client aka mc,it is open source and is compatible with AWS S3. Minio client is available for Windows along with mac, Linux.
The mc mirror command will help you in copying local content to remote AWS S3 bucket, incase of network issue the upload fails mc session resume will start uploading from where connection was terminated.
mc supports these commands.
COMMANDS:
ls List files and folders.
mb Make a bucket or folder.
cat Display contents of a file.
pipe Write contents of stdin to one target. When no target is specified, it writes to stdout.
share Generate URL for sharing.
cp Copy one or more objects to a target.
mirror Mirror folders recursively from a single source to single destination.
diff Compute differences between two folders.
rm Remove file or bucket [WARNING: Use with care].
access Set public access permissions on bucket or prefix.
session Manage saved sessions of cp and mirror operations.
config Manage configuration file.
update Check for a new software update.
version Print version.
You can check docs.minio.io for more details.
Hope it helps.
Disclaimer: I work for Minio.

Related

How to open a file that is stored in a bucket connected to the google cloud VM instance

I am new to google cloud, and I need to run a single python script in a compute engine.
I opened a new VM compute engine instance, opened a new bucket, uploaded the script to the bucket and I can see that the VM is connected to the bucket since when I run the command to list the buckets in the VM it finds the bucket and states the script is indeed there.
What I'm missing out on is how do I run the script? Or more generally how do I access these files?
Was looking for a suitable command but could not find any, but I have a feeling there should be such a command (since the VM can find the bucket and the files contained in it, I guess it can also access them somehow). How should I proceed to run the script from here?
The bucket's content is not attached to a volume in the VM. They are totally independent. With that being said, you first have to copy the python file from the bucket to your compute instance by using the gsutil cp command as below:
gsutil cp gs://my-bucket/main.py .
Once you have the file locally in your compute instance, you can simply run the python file.

Does gsutil rsync bucket1 bucket2 download/upload files locally?

I have a dumb question.
So I have terrabytes of data to rsync between two GCP buckets.
I'm not too sure with how gsutil rsync works behind the scenes.
Does it have to download the files locally before it uploads it to destination or does it just magically move things over from source bucket to destination?
The answer to your question is in the gsutil rsync documentation:
Note 2: If you are synchronizing a large amount of data between clouds you might consider setting up a Google Compute Engine account and running gsutil there. Since cross-provider gsutil data transfers flow through the machine where gsutil is running, doing this can make your transfer run significantly faster than running gsutil on your local workstation.
So yes, it downloads the content locally first, then uploads it to the destination.
I performed a test with RSYNC and the debug flags and I noticed this behaviour
When you move an object (using cp or rsync) between buckets this is not downloaded to your local machine, I used a file of ~4GB and glances to measure the network usage during rsync operation, the objects were directly moved to the target bucket
If you run the following command you going to notice that the SDK perform a post request indicating the movement between buckets
gsutil -d rsync gs://sourcebucket gs://targetbucket
https://storage.googleapis.com/storage/v1/b/sourcebucket/o/bigfile.iso/rewriteTo/b/targetbucket/o/bigfile.iso
Rewriteto behaviour is documented here

How to Transfer multiple files from Linux server to AWS

Can someone let me know how to transfer multiple files from Linux server to AWS?
If you are wanting to copy the data to Amazon S3, the easiest method is to use the AWS Command-Line Interface (CLI), either:
aws s3 cp --recursive or
aws s3 sync
The sync command automatically recurses sub-directories and is generally a better option because it can be re-run and only copies files modified or added since the previous execution. Thus, it can be used to continue the copy after a failure, or the next day when new files have been adeed.
Did you try using scp or sftp to transfer files. If your local machine is a linux one, you can use the console, otherwise putty in a windows machine.

Unzip file on ec2 and save it on S3?

I have a zip file that is about 20 GB large and contains about 400'000 images that I was able to move to my EC2 instance by using wget. Now I want to unzip the files and save them to my S3.
Preferably it would be great if I didnt need to unzip them to the ec2 first. Can I by SSH somehow use unzip -options to extract each file to S3?
I have found answers like this https://stackoverflow.com/a/9722141/2335675. But I have no understanding of what he actually means by "unzipping it to S3". Can I do this while connected to my EC2 instance by SSH? Do Amazon have some kind of build in unzip command that extracts it to the s3 instead of the current server?
I can see other people have asked this questions, but I'm unable to find a direct answer of how to actually do it.
How I solved it:
I created a secondary volume on my EC2 instance to have space for the file x3 or so, to also include space for the extracted files. See guide here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-add-volume-to-instance.html
By being connected to the EC2 instance by SSH, I used the unzipcommand to unzip the file to the new volume.
I used aws s3 cp myfolder s3://mybucket/myfolder --recursive to move all my files into my S3 bucket.
I deleted my temporary volume and all files on it.
Everything was done using SSH. No script or programming was required.
Remember you need to use sudo to have permission to do many of the things.
The first solution:
Mount s3 on ec2 using s3fs.
Extract files to the mount point.
The second solution:
Using python and its aws library boto
extracting one file to the temporal location using zipfile
and uploading it to s3 using boto,
then delete the temporal file.
go to 2 while finishied

downloading a file from Internet into S3 bucket

I would like to grab a file straight of the Internet and stick it into an S3 bucket to then copy it over to a PIG cluster. Due to the size of the file and my not so good internet connection downloading the file first onto my PC and then uploading it to Amazon might not be an option.
Is there any way I could go about grabbing a file of the internet and sticking it directly into S3?
Download the data via curl and pipe the contents straight to S3. The data is streamed directly to S3 and not stored locally, avoiding any memory issues.
curl "https://download-link-address/" | aws s3 cp - s3://aws-bucket/data-file
As suggested above, if download speed is too slow on your local computer, launch an EC2 instance, ssh in and execute the above command there.
For anyone (like me) less experienced, here is a more detailed description of the process via EC2:
Launch an Amazon EC2 instance in the same region as the target S3 bucket. Smallest available (default Amazon Linux) instance should be fine, but be sure to give it enough storage space to save your file(s). If you need transfer speeds above ~20MB/s, consider selecting an instance with larger pipes.
Launch an SSH connection to the new EC2 instance, then download the file(s), for instance using wget. (For example, to download an entire directory via FTP, you might use wget -r ftp://name:passwd#ftp.com/somedir/.)
Using AWS CLI (see Amazon's documentation), upload the file(s) to your S3 bucket. For example, aws s3 cp myfolder s3://mybucket/myfolder --recursive (for an entire directory). (Before this command will work you need to add your S3 security credentials to a config file, as described in the Amazon documentation.)
Terminate/destroy your EC2 instance.
[2017 edit]
I gave the original answer back at 2013. Today I'd recommend using AWS Lambda to download a file and put it on S3. It's the desired effect - to place an object on S3 with no server involved.
[Original answer]
It is not possible to do it directly.
Why not do this with EC2 instance instead of your local PC? Upload speed from EC2 to S3 in the same region is very good.
regarding stream reading/writing from/to s3 I use python's smart_open
You can stream the file from internet to AWS S3 using Python.
s3=boto3.resource('s3')
http=urllib3.PoolManager()
urllib.request.urlopen('<Internet_URL>') #Provide URL
s3.meta.client.upload_fileobj(http.request('GET', 'Internet_URL>', preload_content=False), s3Bucket, key,
ExtraArgs={'ServerSideEncryption':'aws:kms','SSEKMSKeyId':'<alias_name>'})