Uploaded Folders wont show up - google-cloud-platform

I'm having trouble when uploading folders onto my google-cloud-storage bucket: They wont show up in the mounted drive /mnt/willferbucket (with gcsfuse)
It doesnt matter if i upload them through the webfrontend or by using gsutil, the only folders that show up, are the ones i create (not upload!, doesnt matter via webfrontend or directly on the mounted drive, in the example below: "canvas" and "prints", so this is working)
So.. "ls" on the mounted drive looks like this:
root#ubuntu-2:/mnt/willferbucket# ls
canvas helloWorld.py helloWorldSimple.py prints test.txt
But as you can see, when using gsutil:
There are my uploaded folders showing up and i'm able to download (same in the webfrontend: they show up):
root#ubuntu-2:/mnt/w# gsutil ls gs://willferbucket
gs://willferbucket/helloWorld.py
gs://willferbucket/helloWorldSimple.py
gs://willferbucket/test.txt
gs://willferbucket/canvas/
gs://willferbucket/prints/
gs://willferbucket/test/
gs://willferbucket/testfolder/
gs://willferbucket/tst/
I couldn't find out what the reason for this behaviour is :(
Maybe someone can help or is facing the same problem
Thanks for your reply

I have done the same test as you, using gcsfuse to mount one of my Cloud Storage buckets into my local system, and it also appears incomplete to me.
I have notified this situation to the Google Cloud Storage engineering team on your behalf and the issue is currently being handled by them. You can keep track on any progress they have made by following this link.
Please click on the star button next to the issue number to get email notifications on how the situation is being handled.
UPDATE
The Google Cloud Storage engineering team has come back to me and pointed out that the reason not all files and directories are listed when using gcsfuse is because of what would be called implicit directories, which are not set by just using gcsfuse (like an mdkir operation inside the mounted bucket), but through other means such as the Cloud Storage console. These directories may not be recognized by gcsfuse and therefore not added to the mounted bucket in your file system.
There is a way to solve this. Whenever you run the gcsfuse command to mount a Cloud Storage bucket into your local file system, add the --implicit-dirs flag so that all implicit directories are included. This would be an example for it:
gcsfuse --implicit-flags [YOUR_BUCKET] /path/to/your/local/dir/

Related

How to create a folder in Google Drive Sync created cloud directory?

This question assumes you have used Google Drive Sync or at least have knowledge of what files it creates in your cloud drive
While using rclone to sync a local ubuntu directory to a Google Drive (a.k.a. gdrive) location, I found that rclone wasn't able to (error googleapi: Error 500: Internal Error, internalError; the Google Cloud Platform API console revealed that the gdrive API call drive.files.create was failing)
By location I mean the root of the directory structure that the Google Drive Sync app creates on the cloud (eg. emboldened of say: Computers/laptopName/(syncedFolder1,syncedFolder2,...)). In the current case, the gdrive sync app (famously unavailable on Linux) was running from a separate Windows machine. It was in this location that rclone wasn't able to create a dir.
Forget rclone. Trying to manually create the folder in the web app also fails as follows.
Working...
Could not execute action
Why is this happening and how to achieve this - making a directory in the cloud region where gdrive sync has put all my synced folders?
Basically you can't. I found an explanation here
If I am correct in my suspicion, there are a few things you have to understand:
Even though you may be able to create folders inside the Computers isolated containers, doing so will immediately create that folder not only in your cloud, but on that computer/device. Any changes to anything inside the Computers container will automatically be synced to the device/computer the container is linked to- just like any change on the device/computer side is also synced to the cloud.
It is not possible to create anything at the "root" level of each container in the cloud. If that were permitted then the actual preferences set in Backup & Sync would have to be magically altered to add that folder to the preferences. Thus this is not done.
So while folders inside the synced folders may be created, no new modifications may be made in the "root" dir

Google Cloud Bucket mounted on Compute Engine Instance using gcsfuse does not create files

I have been able to mount Google Cloud Bucket using
gcsfuse --implicit-dirs " production-xxx-appspot /mount
or equally
sudo mount -t gcsfuse -o implicit_dirs,allow_other,uid=1000,gid=1000,key_file=service-account.json production-xxx-appspot /mount
Mounting works fine.
What happens is that when I execute the following commands after mounting, they also work fine :
mkdir /mount/files/
cp -rf /home/files/* /mount/files/
However, when I use :
mcedit /mount/files/a.txt
or
vi /mount/files/a.txt
The output says that there is no file available which makes sense.
Is there any other way to cover this situation, and use applications in a way that they can directly create files on the mounted google cloud bucket rather than creating files locally and copying afterwards.
If you do not want to create files locally and upload later, you should consider using a file storage system like Google Drive
Google Cloud storage is an object Storage system that means objects cannot be modified, you have to write the object completely at once. Object storage also does not work well with traditional databases, because writing objects is a slow process and writing an app to use an object storage API is not as simple as using file storage.
In a file storage system, Data is stored as a single piece of information inside a folder, just like you would organize pieces of paper inside a manila folder. When you need to access that piece of data, your computer needs to know the path to find it. (Beware—It can be a long, winding path.)
If you want to use Google Cloud Storage, you need to create your file locally and then push it to your bucket.
Here are an example of how to configure Google Cloud Storage with Node.js: File Upload example
Here is a tutorial on How to mount Object Storage on Cloud Server using s3fs-fuse
If you want to know more about storage formats please follow this link
More information about reading and writing to Cloud Storage in this link

Google Cloud Storage - files not showing

I have over 30 Leaflet maps hosted on my Google Cloud Platform bucket (for example) and it has always been an easy process to upload my folder (which includes an html file with sub-folders including .js and .css files) and share the map publicly.
I tried uploading another map today, but within the folder there are no files showing and I get the following message "There are no live objects in this folder. If you have object versioning enabled, this folder may contain archived versions of objects, which aren't visible in the console. You can list archived object versions using gsutil or the APIs."
Does anyone know what is going on here?
We have also seen this problem, and it seems that the issue is limited to buckets that have spaces in the name.
It's also not reproducible through the gcloud web console, but if you use gsutil to upload a file to a bucket with a space in the name then it won't be visible on the web UI.
I can see from your screenshot that your bucket also has spaces (%20 in the url).
If you need a workaround asap, you could rename your bucket...
But google should fix this soon, I hope.
There is currently open issue on GCS/Console integration
If files have any symbols that needs urlencoding - they are not visible in console - but accessible via gsutil/API (which is currently recommended as workaround)
Issue has been resolved as of 8-May-2018 10:00 UTC
This can happen if the file doesn't have an extension, the UI treats it as a folder and lets you navigate into it, showing a blank folder instead of the file contents.
We had the same symptom (files show up in API but invisible on the web and via CLI).
The issue turned out to be that we were saving files to "./uploads", which Google interprets as "create a directory literally called '.' and then a subdirectory called uploads."
The fix was to upload to "uploads/" instead of "./uploads". We also just ran a mass copy operation via the API for everything under "./uploads". All visible now!
I also had spaces in my url and it was not working properly yesterday. Checked this morning and everything is working as expected. I still have the spaces in my URL btw.

Backup strategies for AWS S3 bucket [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm looking for some advice or best practice to back up S3 bucket.
The purpose of backing up data from S3 is to prevent data loss because of the following:
S3 issue
issue where I accidentally delete this data from S3
After some investigation I see the following options:
Use versioning http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html
Copy from one S3 bucket to another using AWS SDK
Backup to Amazon Glacier http://aws.amazon.com/en/glacier/
Backup to production server, which is itself backed up
What option should I choose and how safe would it be to store data only on S3? Want to hear your opinions.
Some useful links:
Data Protection Documentation
Data Protection FAQ
Originally posted on my blog: http://eladnava.com/backing-up-your-amazon-s3-buckets-to-ec2/
Sync Your S3 Bucket to an EC2 Server Periodically
This can be easily achieved by utilizing multiple command line utilities that make it possible to sync a remote S3 bucket to the local filesystem.
s3cmd
At first, s3cmd looked extremely promising. However, after trying it on my enormous S3 bucket -- it failed to scale, erroring out with a Segmentation fault. It did work fine on small buckets, though. Since it did not work for huge buckets, I set out to find an alternative.
s4cmd
The newer, multi-threaded alternative to s3cmd. Looked even more promising, however, I noticed that it kept re-downloading files that were already present on the local filesystem. That is not the kind of behavior I was expecting from the sync command. It should check whether the remote file already exists locally (hash/filesize checking would be neat) and skip it in the next sync run on the same target directory. I opened an issue (bloomreach/s4cmd/#46) to report this strange behavior. In the meantime, I set out to find another alternative.
awscli
And then I found awscli. This is Amazon's official command line interface for interacting with their different cloud services, S3 included.
It provides a useful sync command that quickly and easily downloads the remote bucket files to your local filesystem.
$ aws s3 sync s3://your-bucket-name /home/ubuntu/s3/your-bucket-name/
Benefits:
Scalable - supports huge S3 buckets
Multi-threaded - syncs the files faster by utilizing multiple threads
Smart - only syncs new or updated files
Fast - thanks to its multi-threaded nature and smart sync algorithm
Accidental Deletion
Conveniently, the sync command won't delete files in the destination folder (local filesystem) if they are missing from the source (S3 bucket), and vice-versa. This is perfect for backing up S3 -- in case files get deleted from the bucket, re-syncing it will not delete them locally. And in case you delete a local file, it won't be deleted from the source bucket either.
Setting up awscli on Ubuntu 14.04 LTS
Let's begin by installing awscli. There are several ways to do this, however, I found it easiest to install it via apt-get.
$ sudo apt-get install awscli
Configuration
Next, we need to configure awscli with our Access Key ID & Secret Key, which you must obtain from IAM, by creating a user and attaching the AmazonS3ReadOnlyAccess policy. This will also prevent you or anyone who gains access to these credentials from deleting your S3 files. Make sure to enter your S3 region, such as us-east-1.
$ aws configure
Preparation
Let's prepare the local S3 backup directory, preferably in /home/ubuntu/s3/{BUCKET_NAME}. Make sure to replace {BUCKET_NAME} with your actual bucket name.
$ mkdir -p /home/ubuntu/s3/{BUCKET_NAME}
Initial Sync
Let's go ahead and sync the bucket for the first time with the following command:
$ aws s3 sync s3://{BUCKET_NAME} /home/ubuntu/s3/{BUCKET_NAME}/
Assuming the bucket exists, the AWS credentials and region are correct, and the destination folder is valid, awscli will start to download the entire bucket to the local filesystem.
Depending on the size of the bucket and your Internet connection, it could take anywhere from a few seconds to hours. When that's done, we'll go ahead and set up an automatic cron job to keep the local copy of the bucket up to date.
Setting up a Cron Job
Go ahead and create a sync.sh file in /home/ubuntu/s3:
$ nano /home/ubuntu/s3/sync.sh
Copy and paste the following code into sync.sh:
#!/bin/sh
# Echo the current date and time
echo '-----------------------------'
date
echo '-----------------------------'
echo ''
# Echo script initialization
echo 'Syncing remote S3 bucket...'
# Actually run the sync command (replace {BUCKET_NAME} with your S3 bucket name)
/usr/bin/aws s3 sync s3://{BUCKET_NAME} /home/ubuntu/s3/{BUCKET_NAME}/
# Echo script completion
echo 'Sync complete'
Make sure to replace {BUCKET_NAME} with your S3 bucket name, twice throughout the script.
Pro tip: You should use /usr/bin/aws to link to the aws binary, as crontab executes commands in a limited shell environment and won't be able to find the executable on its own.
Next, make sure to chmod the script so it can be executed by crontab.
$ sudo chmod +x /home/ubuntu/s3/sync.sh
Let's try running the script to make sure it actually works:
$ /home/ubuntu/s3/sync.sh
The output should be similar to this:
Next, let's edit the current user's crontab by executing the following command:
$ crontab -e
If this is your first time executing crontab -e, you'll need to select a preferred editor. I'd recommend selecting nano as it's the easiest for beginners to work with.
Sync Frequency
We need to tell crontab how often to run our script and where the script resides on the local filesystem by writing a command. The format for this command is as follows:
m h dom mon dow command
The following command configures crontab to run the sync.sh script every hour (specified via the minute:0 and hour:* parameters) and to have it pipe the script's output to a sync.log file in our s3 directory:
0 * * * * /home/ubuntu/s3/sync.sh > /home/ubuntu/s3/sync.log
You should add this line to the bottom of the crontab file you are editing. Then, go ahead and save the file to disk by pressing Ctrl + W and then Enter. You can then exit nano by pressing Ctrl + X. crontab will now run the sync task every hour.
Pro tip: You can verify that the hourly cron job is being executed successfully by inspecting /home/ubuntu/s3/sync.log, checking its contents for the execution date & time, and inspecting the logs to see which new files have been synced.
All set! Your S3 bucket will now get synced to your EC2 server every hour automatically, and you should be good to go. Do note that over time, as your S3 bucket gets bigger, you may have to increase your EC2 server's EBS volume size to accommodate new files. You can always increase your EBS volume size by following this guide.
Taking into account the related link, which explains that S3 has 99.999999999% durability, I would discard your concern #1. Seriously.
Now, if #2 is a valid use case and a real concern for you, I would definitely stick with options #1 or #3. Which one of them? It really depends on some questions:
Do you need any other of the versioning features or is it only to avoid accidental overwrites/deletes?
Is the extra cost imposed by versioning affordable?
Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. Is this OK for you?
Unless your storage use is really huge, I would stick with bucket versioning. This way, you won't need any extra code/workflow to backup data to Glacier, to other buckets, or even to any other server (which is really a bad choice IMHO, please forget about it).
How about using the readily available Cross Region Replication feature on the S3 buckets itself? Here are some useful articles about the feature
https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/
http://docs.aws.amazon.com/AmazonS3/latest/UG/cross-region-replication.html
You can backup your S3 data using the following methods
Schedule backup process using AWS datapipeline ,it can be done in 2 ways mentioned below:
a. Using copyActivity of datapipeline using which you can copy from one s3 bucket to another s3 bucket.
b. Using ShellActivity of datapipeline and "S3distcp" commands to do the recursive copy of recursive s3 folders from bucket to another (in parallel).
Use versioning inside the S3 bucket to maintain different version of data
Use glacier for backup your data ( use it when you don't need to restore the backup fast to the original buckets(it take some time to get back the data from glacier as data is stored in compressed format) or when you want to save some cost by avoiding to use another s3 bucket fro backup), this option can easily be set using the lifecycle rule on the s3 bucket fro which you want to take backup.
Option 1 can give you more security let say in case you accidentally delete your original s3 bucket and another benefit is that you can store your backup in datewise folders in another s3 bucket, this way you know what data you had on a particular date and can restore a specific date backup . It all depends on you use case.
You'd think there would be an easier way by now to just hold some sort of incremental backups on a diff region.
All the suggestions above are not really simple or elegant solutions. I don't really consider glacier an option as I think thats more of an archival solution then a backup solution. When I think backup I think disaster recovery from a junior developer recursively deleting a bucket or perhaps an exploit or bug in your app that deletes stuff from s3.
To me, the best solution would be a script that just backs up one bucket to another region, one daily and one weekly so that if something terrible happens you can just switch regions. I don't have a setup like this, I've looked into just haven't gotten around to doing it cause it would take a bit of effort to do this which is why I wish there was some stock solution to use.
While this question was posted some time ago, I thought it important to mention MFA delete protection with the other solutions. The OP is trying to solve for the accidental deletion of data. Multi-factor authentication (MFA) manifests in two different scenarios here -
Permanently deleting object versions - Enable MFA delete on the bucket's versioning.
Accidentally deleting the bucket itself - Set up a bucket policy denying delete without MFA authentication.
Couple with cross-region replication and versioning to reduce the risk of data loss and improve the recovery scenarios.
Here is a blog post on this topic with more detail.
As this topic was created longtime ago and is still pretty actual, here some updated news:
External backup
Nothing changed, you still can use CLI, or any other tool to schedule a copy somewhere else (in or out of AWS).
There is tools to do that and previous answers were very specific
"Inside" backup
S3 now supports versionning for previous versions. It means that you can create and use a bucket normally and let S3 manage the lifecycle in the same bucket.
An example of possible config, if you delete a file, would be:
File marked as deleted (still available but "invisible" to normal operations)
File moved to Glacier after 7 days
File removed after 30 days
You first need to activate versionning, and go to Lifecycle configuration. Pretty straight forward: previous versions only, and deletion is what you want.
Then, define your policy. You can add as many actions as you want (but each transition cost you). You can't store in Glacier less than 30 days.
If, We have too much data. If you have already a bucket then the first time The sync will take too much time, In my case, I had 400GB. It took 3hr the first time. So I think we can make the replica is a good solution for S3 Bucket backup.

How can I copy clone/duplicate a folder on S3?

I want to make a copy of the folders and images on my s3 bucket for my development server. How can I do that?
I just wanted to write an updated answer here:
You can now use Amazon's AWS Management Console (under the S3 tab) to right click on any folder (or file) in Bucket A, click Copy, then navigate to Bucket B and right click and click Paste
This makes it extremely easy to copy the contents of your production bucket over to your dev bucket.
If you are using linux, and just want to drag copies down to the local filesystem, then you could use s3sync:
http://www.s3sync.net/wiki
If you wanted to access the files directly on s3, you could mount s3 as a fuse filesystem locally, but beware that accessing files using this method is dependent on your connection, so there could be speed issues. I've used s3fs perfectly well for accessing backups etc:
(can only post one link atm, but google s3fs - it's hosted on googlecode)
If you just need a copy, then s3sync is the easiest option.
Hope this helps.
I have to say, in conclusion, I recommend using a GUI. They've already laid out the work for you.
My best recommendation is Bucket Explorer ( works on all OS's)
Second runner up is CloudBerry ( only on PC's )
Bucket Explorer has a sweet very easy to understand GUI, and has a lot of great perks, analytics, and usability that outweighs all the others I experimented with.