How to mount GCP Bucket in Google Colab - google-cloud-platform

I want to access the files in my GCP bucket from Colab. I followed these instructions
As you can see from the screen shots, there is nothing in the folder after mounting. What am I missing? The Data folder in my bucket is full of data.

Turns out you can't mount the bucket name + a path within the bucket. Removing the /Data/ after the name of the bucket led to a successful mount. So since my bucket name is hellonearth, the command is just:
!gcsfuse --implicit-dirs hellonearth myfolder

You are using the commands in a wrong way
after !apt -qq install gcsfuse
run:
!mkdir folderOnColab
!gcsfuse gs://folderOnBucket folderOnColab
Then runt !ls instead of just ls

Related

Automatically mounting S3 bucket using s3fs on Amazon CentOS

I have tried all the answers provided in similar questions but none is helpful.
I installed S3 Fuse so that I can mount S3 bucket. After the installation, I performed the following the steps:
Step 1 Create the mount point for S3 bucket mkdir –p /var/s3fs-drive-fs
Step 2 Then I am able to mount the S3 bucket in the new directory with the IAM role by running the following commands: s3fs myresearchdatasets /var/s3fs-drive-fs -o iam_role=EC2-to-S3-Buckets-Role -o allow_other, and it works fine.
However, I found out that the bucket disappears each time I reboot the system, which means I have to run the command above to remount the S3 bucket each time after restarting the system.
I found the steps to complete an Automatic mount at reboot by editing the fstab file with the lines below
s3fs myresearchdatasets /var/s3fs-drive-fs fuse_netdev,allow_other,iam_role=EC2-to-S3-Buckets-Role,umask=777, 0 0
To check whether the fstab is working correctly, I tried mount /var/s3fs-drive-fs/
but I got the following errors, "mount: can't find /var/s3fs-drive-fs/ in /etc/fstab"
Can anyone help me please?
The first field should include the mount type and the bucket name, e.g.,
s3fs#mybucket /path/to/mountpoint fuse _netdev,allow_other 0 0
The s3fs README has other examples.

AWS S3FS How to

Here's the current scenario -
I have multiple S3 Buckets, which have SQS events configured for PUTs of Objects from a FTP, which I have configured using S3FS.
Also, I have multiple Directories on an EC2, on which a User can PUT an object, which gets synced with the different S3 buckets (using S3FS), which generate SQS events(using S3's SQS events).
Here's what I need to achieve,
Instead of Multiple S3 buckets, I need to consolidate the logic on Folder level,
ie. I have now created Different Folders for each Bucket that I had created previously, I have created separate SQS events for PUT in individual Folders.
Now the Bucket level logic of S3FS, I want to tweak for Folder level in a Single S3 bucket.
ie. I want to create 3 different Directories oon the EC2, eg A,B,C.
If I PUT an object in Directory A of the EC2, the object must get synced with Folder A in the S3 bucket,
Similarly for Directory B and folder B of S3 and Directory C on EC2 and Folder C on the S3.
Here are the steps I created for installing S3FS -
Steps -
ssh into the EC2
sudo apt-get install automake autotools-dev g++ git libcurl4-gnutls-dev libfuse-dev libssl-dev libxml2-dev make pkg-config
git clone https://github.com/s3fs-fuse/s3fs-fuse.git
cd s3fs-fuse
./autogen.sh
./configure
make
sudo make install
Mounting S3 Bucket to File System
echo access-key-id:secret-access-key > /etc/passwd-s3fs
chmod 600 /etc/passwd-s3fs
mkdir /mnt/bucketname
echo s3fs#bucketname /mnt/bucketname fuse _netdev,rw,nosuid,nodev,allow_other 0 0 >> /etc/fstab
mount -a
Now these steps achieve sync between a particular Directory on the EC2 and the S3 bucket,
How do I tweak this to sync say 2 different Directories on the EC2 with 2 different Folders on the S3.
I am a Linux and AWS newbie, Please help me out.
Do not mount the S3 bucket to the file system. Use AWS S3 CLI and Cron to Sync the EC2 Directory with the S3 Bucket Directory.
Install S3CMD on the EC2 instance (http://tecadmin.net/install-s3cmd-manage-amazon-s3-buckets/#)
Start a cron job for achieving the Sync with the local directory and the S3 Bucket Subfolder.
Create a Script File for example "script.sh"
#/bin/bash
aws s3 sync /path/to/folder/A s3://mybucket/FolderA
aws s3 sync /path/to/folder/B s3://mybucket/FolderB
aws s3 sync /path/to/folder/C s3://mybucket/FolderC
Start a cron job for some thing like this:
* * * * * /root/scripts/script.sh
And you will achieve your use case.

How to move files from amazon ec2 to s3 bucket using command line

In my amazon EC2 instance, I have a folder named uploads. In this folder I have 1000 images. Now I want to copy all images to my new S3 bucket. How can I do this?
First Option sm3cmd
Use s3cmd
s3cmd get s3://AWS_S3_Bucket/dir/file
Take a look at this s3cmd documentation
if you are on linux, run this on the command line:
sudo apt-get install s3cmd
or Centos, Fedore.
yum install s3cmd
Example of usage:
s3cmd put my.file s3://pactsRamun/folderExample/fileExample
Second Option
Using Cli from amazon
Update
Like #tedder42 said in the comments, instead of using cp, use sync.
Take a look at the following syntax:
aws s3 sync <source> <target> [--options]
Example:
aws s3 sync . s3://my-bucket/MyFolder
More information and examples available at Managing Objects Using High-Level s3 Commands with the AWS Command Line Interface
aws s3 sync your-dir-name s3://your-s3-bucket-name/folder-name
Important: This will copy each item in your named directory into the s3 bucket folder you selected. This will not copy your directory as a whole.
Or, you can use the following command for one selected file.
aws s3 sync your-dir-name/file-name s3://your-s3-bucket-name/folder-name/file-name
Or you can use a wild character to select all. Note that this will copy your directory as a whole and also generate metadata and save them to your s3 bucket folder.
aws s3 sync . s3://your-s3-bucket-name/folder-name
To copy from EC2 to S3 use the below code in the Command line of EC2.
First, you have to give "IAM Role with full s3 Access" to your EC2 instance.
aws s3 cp Your_Ec2_Folder s3://Your_S3_bucket/Your_folder --recursive
Also note on aws cli syncing with s3 it is multithreaded and uploads multiple parts of a file at one time. The number of threads however, is not configurable at this time.
aws s3 mv /home/inbound/ s3://test/ --recursive --region us-west-2
This can be done very simply. Follow the following steps:
Open the AWS EC2 on console.
Select the instance and navigate to actions.
Select instances settings and select Attach/Replace IAM Role
When this is done, connect to the AWS instance and the rest will be done via the following CLI commands:
aws s3 cp filelocation/filename s3://bucketname
Hence you don't need to install or do any extra efforts.
Please note... the file location refers to the local address. And the bucketname is the name of your bucket.
Also note: This is possible if your instance and S3 bucket are in the same account.
Cheers.
We do have a dryrun feature available for testing.
To begin with I would assign ec2-instance a role to be able read
write to S3
SSH into the instance and perform the following
vi tmp1.txt
aws s3 mv ./ s3://bucketname-bucketurl.com/ --dryrun
If this works then all you have to do is either create a script to
upload all files with specific from this folder to s3 bucket
I have done the wrritten the following command in my script to move
files older than 2 minutes from current directory to bucket/folder
cd dir; ls . -rt | xargs -I FILES find FILES -maxdepth 1 -name
'*.txt' -mmin +2 -exec aws s3 mv '{}' s3://bucketurl.com

Amazon S3 with s3fs and fuse, transport endpoint is not connected

Redhat with Fuse 2.4.8
S3FS version 1.59
From the AWS online management console i can browse the files on the S3 bucket.
When i log-in (ssh) to my /s3 folder, i cannot access it.
also the command: "/usr/bin/s3fs -o allow_other bucket /s3"
return: s3fs: unable to access MOUNTPOINT /s3: Transport endpoint is not connected
What could be the reason? How can i fix it ? does this folder need to be unmount and then mounted again ?
Thanks !
Well, the solution was simple: to unmount and mount the dir. The error transport endpoint is not connected was solved by unmounting the s3 folder and then mounting again.
Command to unmount
fusermount -u /s3
Command to mount
/usr/bin/s3fs -o allow_other bucketname /s3
Takes 3 minutes to sync.
I don't recommend to access s3 via quick and dirty fuse drivers.
S3 isn't really designed to act as a file system,
see this SOF answer for a nice summary
You would probably never dare to mount a Linux mirror website just because it holds files. This is comparable
Let your process write files to your local fs, then sync your s3 bucket with tools like cron and s3cmd
If you insist in using s3fs..
sudo echo "yourawskey:yourawssecret" > /etc/passwd-s3fs
sudo chmod 640 /etc/passwd-s3fs
sudo /usr/bin/s3fs yours3bucket /yourmountpoint -ouse_cache=/tmp
Verify with mount
Source: http://code.google.com/p/s3fs/wiki/FuseOverAmazon
I was using old security credential before. Regeneration of security credentials (AccessId, AccessKey) solved the issue.
This was a permissions issue on the bucket for me. Adding the "list" and "view permissions" for "everyone" in the AWS UI allowed bucket access.
If you don't want to allow everyone access, then make sure you are using the AWS credentials associated with the user that has access to the bucket in S3Fuse
I had this problem and i found that the bucket can only have lowercase characters. Trying to access a bucket named "BUCKET1" via the https://BUCKET1.s3.amazonaws.com or https://bucket1.s3.amazonaws.com will both fail, but if the bucket is called "bucket1", https://bucket1.s3.amazonaws.com will success.
So it is not enough to lowercase the name for you the s3fs command line, you MUST also create the bucket in lowercase.
Just unmount the directory and reboot the server if you already made changes in /etc/fstab which mounts the directory automatically.
To unmount sudo umount /dir
In /etc/fstab these lines should be present. then only it will mount automtically after reboot
s3fs#bucketname /s3 fuse allow_other,nonempty,use_cache=/tmp/cache,multireq_max=500,uid=505,gid=503 0 0
This issue could be due to policy attached to IAM user. make sure IAM user have AdministratorAccess.
I have face same issue & by changing policy to AdministratorAccess issue got fixed.

Downloading an entire S3 bucket?

I noticed that there does not seem to be an option to download an entire s3 bucket from the AWS Management Console.
Is there an easy way to grab everything in one of my buckets? I was thinking about making the root folder public, using wget to grab it all, and then making it private again but I don't know if there's an easier way.
AWS CLI
See the "AWS CLI Command Reference" for more information.
AWS recently released their Command Line Tools, which work much like boto and can be installed using
sudo easy_install awscli
or
sudo pip install awscli
Once installed, you can then simply run:
aws s3 sync s3://<source_bucket> <local_destination>
For example:
aws s3 sync s3://mybucket .
will download all the objects in mybucket to the current directory.
And will output:
download: s3://mybucket/test.txt to test.txt
download: s3://mybucket/test2.txt to test2.txt
This will download all of your files using a one-way sync. It will not delete any existing files in your current directory unless you specify --delete, and it won't change or delete any files on S3.
You can also do S3 bucket to S3 bucket, or local to S3 bucket sync.
Check out the documentation and other examples.
Whereas the above example is how to download a full bucket, you can also download a folder recursively by performing
aws s3 cp s3://BUCKETNAME/PATH/TO/FOLDER LocalFolderName --recursive
This will instruct the CLI to download all files and folder keys recursively within the PATH/TO/FOLDER directory within the BUCKETNAME bucket.
You can use s3cmd to download your bucket:
s3cmd --configure
s3cmd sync s3://bucketnamehere/folder /destination/folder
There is another tool you can use called rclone. This is a code sample in the Rclone documentation:
rclone sync /home/local/directory remote:bucket
I've used a few different methods to copy Amazon S3 data to a local machine, including s3cmd, and by far the easiest is Cyberduck.
All you need to do is enter your Amazon credentials and use the simple interface to download, upload, sync any of your buckets, folders or files.
You've many options to do that, but the best one is using the AWS CLI.
Here's a walk-through:
Download and install AWS CLI in your machine:
Install the AWS CLI using the MSI Installer (Windows).
Install the AWS CLI using the Bundled Installer for Linux, OS X, or Unix.
Configure AWS CLI:
Make sure you input valid access and secret keys, which you received when you created the account.
Sync the S3 bucket using:
aws s3 sync s3://yourbucket /local/path
In the above command, replace the following fields:
yourbucket >> your S3 bucket that you want to download.
/local/path >> path in your local system where you want to download all the files.
To download using AWS S3 CLI:
aws s3 cp s3://WholeBucket LocalFolder --recursive
aws s3 cp s3://Bucket/Folder LocalFolder --recursive
To download using code, use the AWS SDK.
To download using GUI, use Cyberduck.
The answer by #Layke is good, but if you have a ton of data and don't want to wait forever, you should read "AWS CLI S3 Configuration".
The following commands will tell the AWS CLI to use 1,000 threads to execute jobs (each a small file or one part of a multipart copy) and look ahead 100,000 jobs:
aws configure set default.s3.max_concurrent_requests 1000
aws configure set default.s3.max_queue_size 100000
After running these, you can use the simple sync command:
aws s3 sync s3://source-bucket/source-path s3://destination-bucket/destination-path
or
aws s3 sync s3://source-bucket/source-path c:\my\local\data\path
On a system with CPU 4 cores and 16GB RAM, for cases like mine (3-50GB files) the sync/copy speed went from about 9.5MiB/s to 700+MiB/s, a speed increase of 70x over the default configuration.
100% works for me, i have download all files from aws s3 backet.
Install AWS CLI. Select your operating system and follow the steps here: Installing or updating the latest version of the AWS CLI
Check AWS version: aws --version
Run config command: aws configure
aws s3 cp s3://yourbucketname your\local\path --recursive
Eg (Windows OS): aws s3 cp s3://yourbucketname C:\aws-s3-backup\project-name --recursive
Check out this link: How to download an entire bucket from S3 to local folder
If you use Visual Studio, download "AWS Toolkit for Visual Studio".
After installed, go to Visual Studio - AWS Explorer - S3 - Your bucket - Double click
In the window you will be able to select all files. Right click and download files.
For Windows, S3 Browser is the easiest way I have found. It is excellent software, and it is free for non-commercial use.
Use this command with the AWS CLI:
aws s3 cp s3://bucketname . --recursive
Another option that could help some OS X users is Transmit.
It's an FTP program that also lets you connect to your S3 files. And, it has an option to mount any FTP or S3 storage as a folder in the Finder, but it's only for a limited time.
I've done a bit of development for S3 and I have not found a simple way to download a whole bucket.
If you want to code in Java the jets3t lib is easy to use to create a list of buckets and iterate over that list to download them.
First, get a public private key set from the AWS management consule so you can create an S3service object:
AWSCredentials awsCredentials = new AWSCredentials(YourAccessKey, YourAwsSecretKey);
s3Service = new RestS3Service(awsCredentials);
Then, get an array of your buckets objects:
S3Object[] objects = s3Service.listObjects(YourBucketNameString);
Finally, iterate over that array to download the objects one at a time with:
S3Object obj = s3Service.getObject(bucket, fileName);
file = obj.getDataInputStream();
I put the connection code in a threadsafe singleton. The necessary try/catch syntax has been omitted for obvious reasons.
If you'd rather code in Python you could use Boto instead.
After looking around BucketExplorer, "Downloading the whole bucket" may do what you want.
AWS SDK API is only the best option for uploading entire folder and repository to AWS S3 and to download entire AWS S3 bucket locally.
To upload whole folder to AWS S3: aws s3 sync . s3://BucketName
To download whole AWS S3 bucket locally: aws s3 sync s3://BucketName .
You can also assign path like BucketName/Path for particular folder in AWS S3 bucket to download.
If you only want to download the bucket from AWS, first install the AWS CLI in your machine. In terminal change the directory to where you want to download the files and run this command.
aws s3 sync s3://bucket-name .
If you also want to sync the both local and s3 directories (in case you added some files in local folder), run this command:
aws s3 sync . s3://bucket-name
You can do this with MinIO Client as follows: mc cp -r https://s3-us-west-2.amazonaws.com/bucketName/ localdir
MinIO also supports sessions, resumable downloads, uploads and many more. MinIO supports Linux, OS X and Windows operating systems. It is written in Golang and released under Apache Version 2.0.
AWS CLI is the best option to download an entire S3 bucket locally.
Install AWS CLI.
Configure AWS CLI for using default security credentials and default AWS Region.
To download the entire S3 bucket use command
aws s3 sync s3://yourbucketname localpath
Reference to AWS CLI for different AWS services: AWS Command Line Interface
To add another GUI option, we use WinSCP's S3 functionality. It's very easy to connect, only requiring your access key and secret key in the UI. You can then browse and download whatever files you require from any accessible buckets, including recursive downloads of nested folders.
Since it can be a challenge to clear new software through security and WinSCP is fairly prevalent, it can be really beneficial to just use it rather than try to install a more specialized utility.
If you use Firefox with S3Fox, that DOES let you select all files (shift-select first and last) and right-click and download all.
I've done it with 500+ files without any problem.
When in Windows, my preferred GUI tool for this is CloudBerry Explorer Freeware for
Amazon S3. It has a fairly polished file explorer and FTP-like interface.
You can use sync to download whole S3 bucket. For example, to download whole bucket named bucket1 on current directory.
aws s3 sync s3://bucket1 .
If you have only files there (no subdirectories) a quick solution is to select all the files (click on the first, Shift+click on the last) and hit Enter or right click and select Open. For most of the data files this will download them straight to your computer.
Try this command:
aws s3 sync yourBucketnameDirectory yourLocalDirectory
For example, if your bucket name is myBucket and local directory is c:\local, then:
aws s3 sync s3://myBucket c:\local
For more informations about awscli check this
aws cli installation
It's always better to use awscli for downloading / uploading files to s3. Sync will help you to resume without any hassle.
aws s3 sync s3://bucketname/ .
aws s3 sync s3://<source_bucket> <local_destination>
is a great answer, but it won't work if the objects are in storage class Glacier Flexible Retrieval, even if the the files have been restored. In that case you need to add the flag --force-glacier-transfer .
Here is a summary of what you have to do to copy an entire bucket:
1. Create a user that can operate with AWS s3 bucket
Follow this official article: Configuration basics
Don't forget to:
tick "programmatic access" in order to have the possibility to deal with with AWS via CLI.
add the right IAM policy to your user to allow him to interact with the s3 bucket
2. Download, install and configure AWS CLI
See this link allowing to configure it: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html
You can use the following command in order to add the keys you got when you created your user:
$ aws configure
AWS Access Key ID [None]: <your_access_key>
AWS Secret Access Key [None]: <your_secret_key>
Default region name [None]: us-west-2
Default output format [None]: json
3. Use the following command to download content
You can a recursive cp commande, but aws sync command is f:
aws s3 sync s3://your_bucket /local/path
To see what would be the dowloaded files before really do the download, you can use the --dryrun option.
To improve speed, you can adjust s3 max_concurrent_requests and max_queue_size properties. See: http://docs.aws.amazon.com/cli/latest/topic/s3-config.html
You can exclude/include some files using --exclude and --include options. See: https://docs.aws.amazon.com/cli/latest/reference/s3/
For example, the below command will show all the .png file presents in the bucket. Replay the command without --dryrun to make the resulting files be downloaded.
aws s3 sync s3://your_bucket /local/path --recursive --exclude "*" --include "*.png" --dryrun
Windows User need to download S3EXPLORER from this link which also has installation instructions :- http://s3browser.com/download.aspx
Then provide you AWS credentials like secretkey, accesskey and region to the s3explorer, this link contains configuration instruction for s3explorer:Copy Paste Link in brower: s3browser.com/s3browser-first-run.aspx
Now your all s3 buckets would be visible on left panel of s3explorer.
Simply select the bucket, and click on Buckets menu on top left corner, then select Download all files to option from the menu. Below is the screenshot for the same:
Bucket Selection Screen
Then browse a folder to download the bucket at a particular place
Click on OK and your download would begin.
aws sync is the perfect solution. It does not do a two way.. it is a one way from source to destination. Also, if you have lots of items in bucket it will be a good idea to create s3 endpoint first so that download happens faster (because download does not happen via internet but via intranet) and no charges
As #layke said, it is the best practice to download the file from the S3 cli it is a safe and secure. But in some cases, people need to use wget to download the file and here is the solution
aws s3 presign s3://<your_bucket_name/>
This will presign will get you temporary public URL which you can use to download content from S3 using the presign_url, in your case using wget or any other download client.
You just need to pass --recursive & --include "*" in the aws s3 cp command as follows: aws --region "${BUCKET_REGION}" s3 cp s3://${BUCKET}${BUCKET_PATH}/ ${LOCAL_PATH}/tmp --recursive --include "*" 2>&1
In addition to the suggestions for aws s3 sync, I would also recommend looking at s5cmd.
In my experience I found this to be substantially faster than the AWS CLI for multiple downloads or large downloads.
s5cmd supports wildcards so something like this would work:
s5cmd cp s3://bucket-name/* ./folder