Replacing object in Google Cloud Bucket - google-cloud-platform

I am trying to replace an object(video) in google cloud bucket after doing certain operations over it giving it the same name. Giving it same name because it's already shared to multiple users. While doing an operation over it and while replacing it, some chunks of video becomes temporarily unavailable for people who are playing that video at that time and they face issue for a few seconds because of this.
So I have a question that whether its possible to replace the object in-place without affecting the existing version loaded in some places. Also to add I have CDN above this bucket too. Can object versioning on this bucket help me here? I want to keep the name same so that I dont have to send this link again to everyone

I had a similar situation. When I called support, they had me name the new file EXACTLY the same as the original file. Delete the original file from your bucket. Upload the new file that has the exact same name, and the URL will be the same as the original URL.

Related

Replace content in all files inside s3 bucket

I have a s3 bucket which is mapped to a domian say xyz.com . When ever a user register on xyz.com a file is created and stored in s3 bucket. Now i have 1000 of files in s3 and I want to replace some text in those files. All files have common name in start ex abc-{rand}.txt
The safest way of doing this would be to regenerate them again through the same process you originally used.
Personally I would try to avoid find and replace as it could lead to modifying parts that you did not intend.
Run multiple generations in parallel and override the existing files. This will ensure the files you generate will match your expectation and will not need to be modified again.
As a suggestion enable versioning before any of these interactions if you want the ability to rollback quickly in a scenario where it needs to be reverted.
Sadly, you can't do this in place in S3. You have to download them, change their content and re-upload.
This is because S3 is an object storage system, not regular file system.
To simply working with S3 files, you can use third part tool s3fs-fuse. The tool will make the S3 appear like a filesystem on your os.

How can I detect orphaned objects in S3 that aren't mapped to our database?

I am trying to find possible orphans in an S3 bucket. What I mean is that we might delete something out of the DB, and for whatever reason, it doesn't get cleared from S3. This can be a bug in our system or something of that nature. I want to double check against our API that the object in S3 maps to something that exists - the naming convention let's us map things together like that.
Scraping an entire bucket every X days seems unscalable. I was thinking that for each object in the bucket, it can add itself to an SQS queue for the relevant checking to happen, every 30 days or so.
I've only found events around uploads and specific modifications over at https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html. Is there anything more generalized I can't find? Any creative solutions to this problem?
You should activate Amazon S3 Inventory, which can provide a regular CSV file (as often as daily) that contains a list of every object in the Amazon S3 bucket.
You could then trigger some code that compares the contents of the CSV file against the database to find 'orphan' objects.

Google cloud object persists after deletion

I'm using Chrome (vs Cloud SDK / command line) to repeatedly replace a file in a bucket. Dragging / dropping a file to overwrite the existing one, and / or deleting it first and putting it back (changed).
At a certain point the file stops updating and remains in a persistent state, even if I literally rm -r its parent folder.
i.e., I could have /bucket/css/file.css and rm -r /bucket/css and the file will still be available to the public.
From your second answer, it seems that your bucket has the option of “Object Versioning” enabled.
When Object Versioning is enabled for a bucket, Cloud Storage creates an archived version of an object each time the live version of the object is overwritten or deleted.
To verify that “Object Versioning” is enabled on your bucket you can use the following command:
gsutil versioning get gs://[BUCKET_NAME]
The response looks like the following if Object Versioning is enabled:
gs://[BUCKET_NAME]: Enabled
However, according to the official documentation there is no limit to the number of older versions of an object you will create if you continue to upload to the same object in a versioning-enabled bucket.
Having said that, I tried to reproduce your case in my own bucket. The steps I followed are:
1.Enable Object Versioning for my bucket
2.Upload a file in the bucket with the name “example.png”, using the GCP Console.
3.Drag and drop another file with the same name (“example.png”), but different content.
4.Check option “Replace existing object”
5.Check if the file has been updated. It had.
6.Repeated the process 50 times (since you said you had 40 archived versions of your file) by uploading the different files one after the other, every time overriding the previous one. Each time I uploaded a different content file, a new archived version of that file was created. Each time file updated accordingly without any problems.
Please review the steps I followed and let me know if there is any additional action from your side.
Since you are able to delete the files via gsutil command, deletion is working fine and you have all permissions required. Would you be able to clean-up the cookies of your web browser, and try deleting it again? You can also try to use incognito window mode to check if its working.
Furthermore, if Object Versioning is on, you can disable it and try deleting the object again.Note that object deletion can not be undone and once you delete the object it will be completely removed.
Additionally, a good practice suggested along with Object Versioning is to create an Object Lifecycle rule for the bucket that would delete all the objects that have been stored for more than a specific amount of time. You can use this as a workaround for deleting either live or archived versions of your object (if Object Versioning is actually enabled) and to accomplish it, you can follow this link.
Generally, you can review Deleting data best practices here.
Note that, according to Cloud Storage Object Limits a single particular object can only be updated or overwritten up to once per second. For more information, check here.
I used the gsutil to delete it and it worked... temporarily. It seems there were like 40 cached versions of the file with hash tag ids.
At some point it stops updating / deleting the file. :(
gsutil rm -r gs://bucket/path/to/folder/

Google Cloud Storage bucket has stopped overwriting files by default when uploading with the Python library

I have an App Engine cron job that runs every week, uploading a file called logs.json to a Google Cloud Storage bucket.
For the past few months, this file has been overwritten each time the new version was uploaded.
In the last few weeks, rather than overwriting the file, the existing copy has been retained and the new one uploaded under a different name, e.g. logs_XHjYmP3.json.
This is a simplified snippet from the Django storage class where the upload is performed. I have verified that the filename is correct at the point of upload:
# Prints 'logs.json'
print(file.name)
blob.upload_from_file(file, content_type=content_type)
blob.make_public()
Reading the documentation, it says:
The effect of uploading to an existing blob depends on the
“versioning” and “lifecycle” policies defined on the blob’s bucket. In
the absence of those policies, upload will overwrite any existing
contents.
The versioning for the bucket is set to suspended, and I'm not aware of any other settings or any changes I have made that would affect this.
How can I make the file upload overwrite any existing file with the same name?
After further testing, although print(file.name) looked correct, the incorrect filename was actually coming from Django's get_available_name() storage class method. That method was trying to generate a unique filename if the file already existed. I have added the method to my custom storage class, and, if the file meets the criteria, I just return the existing name to allow it to overwrite. I'm still not sure why it started doing this, however.

How to change file upload date in Amazon S3 using AWS CLI

I need to move some files (thousands) to Amazon S3 bucket, from where they will be displayed to the end-user by another application (instead of the current one).
Problem is, that these files have creation/upload date now (dates very between 2012 and 2017, when they were uploaded to current application), and when I move them they all start to be of the same date. That is a problem because when you look at the files in the new application, you don't understand the time hierarchy which is sometimes very important.
Is there any way I can modify upload date of a file(s) in S3?
The Last Modification Date is generated by Amazon S3 and cannot be set via the API.
If dates and other information (eg user) are important to your application, you can store it as metadata on the object. Then, retrieve the metadata when displaying dates, user, etc.
What I did was renaming the file to something else and then renaming it again to its original name.
As you cannot rename directly, you have to copy the file to a new name, and then copy it back to its original name. (and delete the auxiliary file, of course)
It is not optimal, but that's the solution when using AWS client. I hope one day AWS will have all function the FTP used to have.
You can just copy over the same object and the timestamp will update.
This technique is also used to prolong the expire of an object in a bucket with a lifecycle rule.