Google creates a storage bucket without my interaction - google-cloud-platform

I was inspecting the infrastructure points I have on my Google Cloud to remove any lose points...
Then i noticed that google cloud storage have 5 buckets [even that i just created 2 of them]
these 5 buckets are:
1 - bucket i created
2 - bucket i created
3 - PROJECT.backups
4 - gcf-sources-CODE-us-central1
5 - us.artifacts.PROJECT.appspot.com
I understand that the backups bucket come from firebase realtime database backups and the sources bucket come from the firebase cloud functions code. BUT where does the artifacts bucket comes from? this bucket alone has TWICE the size of all other buckets together.
Its contents are just binary files named like "sha256:HASH" some of which are larger than 200MB
I deleted this bucket and it was re-created [without my interaction] again next day.
Does anyone know what might be using it? how can i track it down? what is it for?

The us.artifacts.<project id>.appspot.com bucket is created and used by Cloud Build to store container images generated by the Cloud Build service. One of the processes that generates objects in this bucket is Cloud Function, and you can realize this because the first time that you create a function, GCP asks you to enable the Cloud Build API and this bucket appears in the Cloud Storage section. App Engine also stores objects in this bucket each time you deploy a new version of an app.
As it is mentioned in the documentation, in the case of App Engine, once the deployment has been completed, the images in the us.artifacts.<project id>.appspot.com bucket are no longer needed, so it is safe to delete them. However, in the case that you are only using Cloud Functions, it is not recommended to delete the objects in this bucket. Although you are not experiencing issues now, there is a possibility that you can experience them in the future, so instead of delete all of the objects manually, you can use the Lifecycle Object Management to delete the objects in this bucket every certain period of time, for instance, every 7 days. You can do it by navigating to the Lifecycle tab of the us.artifacts.<project id>.appspot.com bucket and adding a new lifecycle rule which deletes objects that have the age greater than X days.

This is your docker registry. Each time you push (either via docker push or by using the Cloud Build service) GCP stores image layers in those buckets.

Related

Google Cloud storage bucket not listing deleted objects

Two days after having manually deleted all the objects in a multi-region Cloud Storage bucket (e.g. us.artifacts.XXX.com) without Object Versioning I noticed that the bucket size hadn't decreased at all. Only when trying to delete the bucket I discovered that it actually stills containing the objects that I had presumably deleted.
Why aren't those objects displayed in the bucket list view, even when enabling Show deleted data?
When deploying a Function for the first time, two buckets are created automatically:
gcf-sources-XXXXXX-us-central1
us.artifacts.project-ID.appspot.com
You can observe these two buckets from the GCP Console by clicking on Cloud Storage from the left panel.
The files you're seeing in bucket us.artifacts.project-ID.appspot.com are related to a recent change in how the runtime (for Node 10 and up) is built as this post explains.
I also found out that this bucket doesn't have object versioning, retention policy or any lifecycle rule. Although you delete this bucket, it will be created again when you deploy the related function, so, if you are seeing unexpected amounts of Cloud Storage used, this is likely caused by a known issue with the cleanup of artifacts created in the function deployment process as indicated here.
Until the issue is resolved, you can avoid hitting storage limits by creating an auto-deletion rule in the Cloud Console:
In the Cloud Console, select your project > Storage > Browser to open the storage browser.
Select the "artifacts" bucket from the list.
Under the Lifecycle tab, add a rule to auto-delete old images. Choose a deletion interval that works within your normal rate of deployments.
If possible, try to reproduce this scenario with a new function. In the meantime, take into account that if you delete many objects at once, you can track deletion progress by clicking the Notifications icon in the Cloud Console.
In addition, the Google Cloud Status Dashboard provides information about regional or global incidents affecting Google Cloud services such as Cloud Storage.
Nevermind! Eventually (at some point between 2-7 days after the deletion) the bucket size decreased and the objects are no longer displayed in the "Delete bucket" dialog.

My GCP project automatically created 2 storage buckets

My Gcp project name is Mobisium. I found out that there are 2 bucket auto created in the storage browser named mobisum-bucket and mobisium-daisy-bkt-asia.I have never used bucket in the project. mobisium-bucket bucket is empty and the mobisium-daisy-bkt-asia contains one file called daisy.log. Both buckets are Location Type: Multi-region. I read in a stack overflow question's comments that If bucket are created automatically multi-region, you will be charged.
My questions is:
Am I being charged for this buckets.
Are these buckets required, If not should I delete them.
According to documentation you are charged for:
data storage
network
operations
So you will be charged for them if they contain data. You can also view all charges assosciated with your billing account
This buckets names suggests that some services created them - the buckets name is hard to figure out which services. Sometimes when you turn on the services, they create buckets for themselves.
Creating new project, there shouldn't be any buckets, so if this is really new project (created from scratch) you could try to delete them.
If this will be repeated for another project (nor only for this one) it will be good idea to contact support, because this is not normal action.

GCP-Storage: do files appear before upload is complete

I want to transfer files into a VM whenever there is a new file added to storage, the problem is that i want the transfer to be done only when the upload is complete
So my question is : Do files appear even when the upload is still going on ? which means if I build a program that looks for new files every second, would it transfer the files from gcs to VM even if the upload is incomplete or the transfer would start whenever the upload is complete and not while it is uploading ?
Google Cloud Storage uploads are strongly consistent for object uploads. This means that the object is not visible until the object is 100% uploaded and any Cloud Storage housekeeping (such as replication) is complete. You cannot see nor access an object until the upload has completed and your software/tool receives a success response.
Google Cloud Storage Consistency
Do files appear even when the upload is still going on ? which means
if I build a program that looks for new files every second, would it
transfer the files from gcs to VM even if the upload is incomplete or
the transfer would start whenever the upload is complete and not while
it is uploading ?
No, your program will not see new objects until the new objects are 100% available. In Google Cloud Storage there are not partial uploads.
Files do not appear in the UI of Cloud Storage until the file is completely upload it to the specified bucket by the user.
I attached you how Google Cloud Platform manage the consistency in Google Cloud Storage Buckets here.
You could use gsutil to list all the files in one of your Cloud Storage Buckets at any moment, as stated here.
As for the application you are trying to develop, I highly suggest you to use Google Cloud Functions, in conjunction with triggers.
In this case, you could use google.storage.object.finalize trigger in order to execute your function every time a new object is uploaded to one of your buckets. You can see examples of this application here.
The Cloud Function will ensure that your bucket is correctly uploaded to the bucket before attempting to transfer the object to your GCE instance.
Therefore, after completing the upload, the only thing left will be to execute gcloud compute scp to copy files to your Google Compute Engine via scp, as stated here.

Identifying and deleting S3 Objects that are not being accessed?

I have recently joined a company that uses S3 Buckets for various different projects within AWS. I want to identify and potentially delete S3 Objects that are not being accessed (read and write), in an effort to reduce the cost of S3 in my AWS account.
I read this, which helped me to some extent.
Is there a way to find out which objects are being accessed and which are not?
There is no native way of doing this at the moment, so all the options are workarounds depending on your usecase.
You have a few options:
Tag each S3 Object (e.g. 2018-10-24). First turn on Object Level Logging for your S3 bucket. Set up CloudWatch Events for CloudTrail. The Tag could then be updated by a Lambda Function which runs on a CloudWatch Event, which is fired on a Get event. Then create a function that runs on a Scheduled CloudWatch Event to delete all objects with a date tag prior to today.
Query CloudTrail logs on, write a custom function to query the last access times from Object Level CloudTrail Logs. This could be done with Athena, or a direct query to S3.
Create a Separate Index, in something like DynamoDB, which you update in your application on read activities.
Use a Lifecycle Policy on the S3 Bucket / key prefix to archive or delete the objects after x days. This is based on upload time rather than last access time, so you could copy the object to itself to reset the timestamp and start the clock again.
No objects in Amazon S3 are required by other AWS services, but you might have configured services to use the files.
For example, you might be serving content through Amazon CloudFront, providing templates for AWS CloudFormation or transcoding videos that are stored in Amazon S3.
If you didn't create the files and you aren't knowingly using the files, can you probably delete them. But you would be the only person who would know whether they are necessary.
There is recent AWS blog post which I found very interesting and cost optimized approach to solve this problem.
Here is the description from AWS blog:
The S3 server access logs capture S3 object requests. These are generated and stored in the target S3 bucket.
An S3 inventory report is generated for the source bucket daily. It is written to the S3 inventory target bucket.
An Amazon EventBridge rule is configured that will initiate an AWS Lambda function once a day, or as desired.
The Lambda function initiates an S3 Batch Operation job to tag objects in the source bucket. These must be expired using the following logic:
Capture the number of days (x) configuration from the S3 Lifecycle configuration.
Run an Amazon Athena query that will get the list of objects from the S3 inventory report and server access logs. Create a delta list with objects that were created earlier than 'x' days, but not accessed during that time.
Write a manifest file with the list of these objects to an S3 bucket.
Create an S3 Batch operation job that will tag all objects in the manifest file with a tag of "delete=True".
The Lifecycle rule on the source S3 bucket will expire all objects that were created prior to 'x' days. They will have the tag given via the S3 batch operation of "delete=True".
Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs

Delete AWS codeDeploy Revisions from S3 after successfull deployment

I am using codeDeploy addon for bitbucket to deploy my codes directly from Bitbucket Git repository to my EC2 instances via AWS codeDeploy. However, after a while, I have a lot of revisions in my codeDeploy console which were stored in one S3 bucket. So what should I do to save my S3 storage from keeping old codeDeploy revisions?
Is it possible to delete these revisions automatically after a successful deployment?
Is it possible to delete them automatically if there is X number of successful revision? For example, delete an old revision if we have three new successful revisions.
CodeDeploy keeps every revision from BitBucket is because, the service needs last successful revision all the time for different kinds of features like AutoRollback. So we can't easily override the previous revision for now, when doing a deployment. But for all revisions older than last successful revision, they can be deleted.
Unfortunately, CodeDeploy doesn't have a good/elegant way to handle those obsolete revisions at the moment. It'd be great if there is an overwrite option when bitbucket pushes to S3.
CodeDeploy is purely a deployment tool, it cannot handle the revisions in S3 bucket.
I would recommend you look into the "lifecycle management" for S3. Since you are using version controlled bucket (I assume), there is always one latest version and 0 to many obsolete version. You can set a lifecycle configuration of type "NoncurrentVersionExpiration" so that the obsolete version will be deleted after some days.
This method is still not possible to maintain a fixed number of deployments as AWS only allows specifying lifecycle in number of days. But it's probably the best alternative to your use-case.
[1] http://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-set-lifecycle-configuration-intro.html
[2] http://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html
CodeDeploy does not handle a feature like Jenkins sample: "keep the last X [successful or not] runs".
However, with S3 Lifecycle, you can expire (delete) the S3 objects automatically after 3 months for sample.
On one hand this solution is a nice FinOps action when there is a constant activity during the expiration window (at least 3 deployments) by assuring the automatic rollback process of CodeDeploy and reducing the S3 cost.
On the other hand this solution is less efficient when you have spiky activities or worse no deployment at all during the specified S3 expiration delay: in the case of the deployment 12 months after the last one, when this deployment fails, Code Deploy will not be able to proceed to the rollback since the previous artifacts are no more available in S3.
As mitigation, I recommand you to use the Intelligent Tiering it can divide the S3 cost 4 without interferring with the CodeDeploy capabilities. Also you can set a expiration to 12 months to delete the ancient artifacts.
Another last solution is coding a Lambda scheduled by a weekly Cloudwatch Events and that will:
List deplyments using using your own critera success/fail status
Get deployment details for each
Filter out again this deployments using your cirteria (date, user, ..)
Delete the S3 objects using the deployment details