Changing Storage class from Multi-Regional to Coldline in Google Cloud Platform - google-cloud-platform

I just finished my 1 year free trial with Google Cloud Platform and I am now being billed.
When I set my first project up, it looks like I set it up as Multi-Regional. I would only use the Google Cloud Storage in the event of a catastrophic failure in my home where i lose data on both internal and external hard drives (ie. fire, etc) . I believe for this type of backup, I only need Coldline storage. I did change my project over to Coldline but it looks like it only changes new data, not the original stored data because I am still being charged for Multi-regional storage.
From what I understand, I have to change the Object Storage Class either by overwriting the data using "gsutil rewrite -s [STORAGE_CLASS] gs://[PATH_TO_OBJECT]" or by Object Lifestyle Management. I could not figure out how to do either, so I need help doing this (I am not even sure where to type these commands or which approach to use (I am not a programmer!!)).
I also saw in another post that my gsutil command needs to up to date 4.22 or higher. How do I check this?? I also saw in this post that the [PATH_TO_OBJECT] is My Bucket. I see a Project Name, Project ID, and Project number. Which of these (if any) are used in that field for My Bucket?
Thank you for any help

I also saw in another post that my gsutil command needs to up to date
4.22 or higher. How do I check this??
Get the gsutil version:
gsutil version
Update the Cloud SDK which includes gsutil:
Windows:
Open a command prompt with Administrator rights
gcloud components update
Linux:
gcloud components update
I see a Project Name, Project ID, and Project number. Which of these
(if any) are used in that field for My Bucket.
Use the PROJECT_ID. To get a list of the projects that you have access to. This command will list each project.
gcloud projects list
To see which is your default project:
gcloud config list project
If the default project is blank or the wrong one, use the following command.
To set the default project:
gcloud config set project [PROJECT_ID]
From what I understand, I have to change the Object Storage Class
either my overwriting the data
Assuming your bucket name is mybucket.
STEP 1: Change the default storage class for the bucket:
gsutil defstorageclass set coldline gs://mybucket
STEP 2: Change the storage class for each object manually. This is an option if you want to just select a few files.
gsutil rewrite -s coldline gs://mybucket/objectname
STEP 3: Verify the existing lifecycle policy. Change step 4 accordingly if an existing policy exists.
gsutil lifecycle get gs://mybucket
STEP 4: Change the lifecycle of the bucket. This policy will move all files older than 7 days to coldline storage.
POLICY (write to lifecycle.json):
{
"lifecycle": {
"rule": [
{
"action": {
"type": "SetStorageClass",
"storageClass": "COLDLINE"
},
"condition": {
"age": 7,
"matchesStorageClass": [
"MULTI_REGIONAL",
"STANDARD",
"DURABLE_REDUCED_AVAILABILITY"
]
}
}
]
}
}
Command:
gsutil lifecycle set lifecycle.json gs://mybucket

Related

GCP How to copy files automatically from Project A to Project B every monday?

GCP is a new thing for me but i want to know if it's possible to copy a specific file (e.g : myFiles.csv) from a bucket in the project A to a bucket in the project B every monday at 6.00 am ?
I need it because myFiles.csv is overwritten every Monday and i need to share it with the project B.
You can use Storage transfer :
https://cloud.google.com/storage-transfer/docs/create-transfers#google-cloud-console
With this service, you can select the source and destination bucket and scheduling options, every day in your case.
Source bucket project GCP A :
In this example, I selected a folder team_league in a bucket called mazlum_dev
In the prefix field, I added the name of the file I want to transfer input_team_slogans.json
You have to put your file name for your job.
Destination bucket project B :
You have to select the output folder of your destination bucket.
Sheduling options :
You can also use the GCloud sdk if needed with gsutil :
gsutil cp gs://your_bucket_project_a/your_file gs://your_bucket_project_b/output/
But you have to find a way to cron this script every day, that's why I recommend the first solution because everything is native and integrated for your need.
Follow below steps :
Click on this Web console link Storage > Transfer to create a new transfer. Then Select the source bucket you want to copy from ex.Project A. So once you go to the destination part of the transfer form, you can write/paste the target bucket (Ex. Project B) right in its text input. Even if that bucket is from another project. It will show you a green icon once the target has been verified as being an existing bucket. You can continue the form again to finalize your setup.
Once you start the transfer from the form, you can follow its progress by hitting the refresh button on top of the console.
As the bucket identifiers are globally unique (this is key to the solution).
Refer this SO Link for more information.

Google Dataprep copy flows from one project to another

I have two Google projects: dev and prod. I import data from also different storage buckets located in these projects: dev-bucket and prod-bucket.
After I have made and tested changes in the dev environment, how can I smoothly apply (deploy/copy) the changes to prod as well?
What I do now is I export the flow from devand then re-import it into prod. However, each time I need to manually do the following in the `prod flows:
Change the dataset that serve as inputs in the flow
Replace the manual and scheduled destinations for the right BigQuery dataset (dev-dataset-bigquery and prod-dataset-bigquery)
How can this be done more smoother?
If you want to copy data between Google Cloud Storage (GCS) buckets dev-bucket and prod-bucket, Google provides a Storage Transfer Service with this functionality. https://cloud.google.com/storage-transfer/docs/create-manage-transfer-console You can either manually trigger data to be copied from one bucket to another or have it run on a schedule.
For the second part, it sounds like both dev-dataset-bigquery and prod-dataset-bigquery are loaded from files in GCS? If this is the case, the BigQuery Transfer Service may be of use. https://cloud.google.com/bigquery/docs/cloud-storage-transfer You can trigger a transfer job manually, or have it run on a schedule.
As others have said in the comments, if you need to verify data before initiating transfers from dev to prod, a CI system such as spinnaker may help. If the verification can be automated, a system such as Apache Airflow (running on Cloud Composer, if you want a hosted version) provides more flexibility than the transfer services.
Follow below procedure for movement from one environment to another using API and for updating the dataset and the output as per new environment.
1)Export a plan
GET
https://api.clouddataprep.com/v4/plans/<plan_id>/package
2)Import the plan
Post:
https://api.clouddataprep.com/v4/plans/package
3)Update the input dataset
PUT:
https://api.clouddataprep.com/v4/importedDatasets/<datset_id>
{
"name": "<new_dataset_name>",
"bucket": "<bucket_name>",
"path": "<bucket_file_name>"
}
4)Update the output
PATCH
https://api.clouddataprep.com/v4/outputObjects/<output_id>
{
"publications": [
{
"path": [
"<project_name>",
"<dataset_name>"
],
"tableName": "<table_name>",
"targetType": "bigquery",
"action": "create"
}
]
}

Permissions Issue with Google Cloud Data Fusion

I'm following the instructions in the Cloud Data Fusion sample tutorial and everything seems to work fine, until I try to run the pipeline right at the end. Cloud Data Fusion Service API permissions are set for the Google managed Service account as per the instructions. The pipeline preview function works without any issues.
However, when I deploy and run the pipeline it fails after a couple of minutes. Shortly after the status changes from provisioning to running the pipeline stops with the following permissions error:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X.",
"reason" : "forbidden"
} ],
"message" : "xxxxxxxxxxx-compute#developer.gserviceaccount.com does not have storage.buckets.create access to project X."
}
xxxxxxxxxxx-compute#developer.gserviceaccount.com is the default Compute Engine service account for my project.
"Project X" is not one of mine though, I've no idea why the pipeline startup code is trying to create a bucket there, it does successfully create temporary buckets ( one called df-xxx and one called dataproc-xxx) in my project before it fails.
I've tried this with two separate accounts and get the same error in both places. I had tried adding storage/admin roles to the various service accounts to no avail but that was before I realized it was attempting to access a different project entirely.
I believe I was able to reproduce this. What's happening is that the BigQuery Source plugin first creates a temporary working GCS bucket to export the data to, and I suspect it is attempting to create it in the Dataset Project ID by default, instead of your own project as it should.
As a workaround, create a GCS bucket in your account, and then in the BigQuery Source configuration of your pipeline, set the "Temporary Bucket Name" configuration to "gs://<your-bucket-name>"
You are missing setting up permissions steps after you create an instance. The instructions to give your service account right permissions is in this page https://cloud.google.com/data-fusion/docs/how-to/create-instance

Can't delete directory from Amazon S3

I'm using the web interface of Amazon's S3, and when I right-click a folder X and choose Delete, X seems to be deleted. Then when I refresh the interface, X may either disappear or remain; if I keep clicking refresh, the folder is either missing or present. Is this a bug, or am I doing something wrong? The folder is still present, as far as I can tell; one of my EMR jobs complains that the output folder X still exists.
I had the same problem in the AWS web interface after AWS Command Line (CLI)-deleting a "recursive" folder in a bucket. Some objects randomly reappeared (not files, but in fact "folders") in the web interface. Even though i tried to delete these folders in the web interface, they were still there (The interface said the operation was successful...)
Solution that worked for me in the AWS web interface: Right clicked the folder -> CUT, and PASTE into another folder. Worked great, and then deleted the new folder. Gone!
Tried the same as Kristoffer's answer, but CUT/PASTE to another folder made the new folder to not be deletable.
Further hacking: create a new temporary bucket. CUT/PASTE the folder to this bucket and delete the bucket.
S3 does not actually use folders. Instead the path separators in object paths are treated like folders. If you want to remove a folder, all the contents of the folder will have to get deleted.
If there is any delay in deleting all of the contents, the folder may continue to exist.
As of March 2017 the AWS Console UI has changed and you can no longer enter a 'versioning mode' described in my old post.
It seems now folder with versioned files can be deleted freely without restriction.
If this is not the case please drop a comment so I can correct this post.
Previous Version of AWS Console
If you are using the AWS Management Console and you have versioning turned ON, you must be in 'versioning mode' to delete the folder.
You enter 'versioning mode' by going to the top of the page and next to where it says 'Versions:' select the Show button. Then you can proceed to right-click and delete your folder.
Hope this helps someone.
I encountered this issue when I was unable to delete an empty folder from an S3 bucket that had Versioning enabled.
I was able to delete the empty folder by using the "empty bucket configuration" from the S3 Buckets listing:
Select the bucket you'd like to empty, and click the Delete button:
AWS warns you that the bucket isn't empty, and offers a link to use the empty bucket configuration. Click the link:
Proceed through this screen by typing permanently delete to delete all the objects in this bucket:
You should then be able to verify that your S3 bucket is truly empty.
Tried various alternatives to delete from Web interface to delete a folder with sub folders in it without luck.
I had an installation of S3 browser and then tried from S3 Browser interface, worked.
I think I'm seeing similar behavior. My bucket has versioning turned on; even with an empty folder/directory within the bucket, attempting to "delete" the folder/directory within the bucket via the AWS web UI console does not result in it actually being removed. I presume the "deleted" versions of the files within that path still exist (but are not visible in the web console), therefore the bucket isn't truly empty, and isn't truly getting deleted. You may need to check via the CLI tools if existing deleted versions of files in that folder/directory exist (but are not visible in the web console) and delete the files permanently, then attempt to remove the folder/directory in your bucket.
I have the same problem that I cant delete a s3 bucket, with the message "An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied"
After a while, I delete the bucket policy in tab "permission" button "bucket policy" and It worked like a charm, with:
aws s3 rb s3://elasticbeanstalk-us-west-..../ --force
I hope this help! Is another option
Pablo
had an "elastic-bean-stalk" bucket and had to delete "bucket policy" before it would delete.
pitney
I had the same problem and didn't have access to the amazon console but I could delete it with this Java code
AmazonS3Client amazonS3Client = new AmazonS3Client(basicAWSCredentials);
ObjectListing objectListing = amazonS3Client.listObjects("bucketName", "prefix");
DeleteObjectsRequest deleteObjectsRequest = new DeleteObjectsRequest("bucketName");
List<DeleteObjectsRequest.KeyVersion> keysToDelete = new ArrayList<>();
objectListing.getObjectSummaries().forEach(s3ObjectSummary -> {
keysToDelete.add(new DeleteObjectsRequest.KeyVersion(s3ObjectSummary.getKey()));
});
deleteObjectsRequest.setKeys(keysToDelete);
amazonS3Client.deleteObjects(deleteObjectsRequest);
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
</dependency>
Try delete with another account, like administrator account. For me it works only with this method.
If you're having trouble fully deleting an empty folder in an S3 bucket that has versioning turned on (i.e. removing all trace of the folder, including in 'Show versions' mode), you can usually get around it by deleting the folder's delete marker(s) using the API/CLI:
$ aws s3api list-object-versions --bucket YOUR-BUCKET --prefix PATH-TO-YOUR-FOLDER
{
"DeleteMarkers": [
{
"Owner": {
"DisplayName": "YOUR-ACCOUNT-NAME",
"ID": "YOUR-ACCOUNT-CANONICAL-ID"
},
"Key": "PATH-TO-YOUR-FOLDER/",
"VersionId": "UNIQUE-VERSION-ID",
"IsLatest": true,
"LastModified": "2022-12-09T07:18:57+00:00"
}
]
}
$ aws s3api delete-objects --bucket YOUR-BUCKET --delete 'Objects=[{Key=PATH-TO-YOUR-FOLDER/,VersionId=UNIQUE-VERSION-ID}]'
{
"Deleted": [
{
"Key": "PATH-TO-YOUR-FOLDER/",
"VersionId": "UNIQUE-VERSION-ID",
"DeleteMarker": true,
"DeleteMarkerVersionId": "UNIQUE-VERSION-ID"
}
]
}
Try use the new S3 console. The delete feature works for folders.

Force CloudFront distribution/file update

I'm using Amazon's CloudFront to serve static files of my web apps.
Is there no way to tell a cloudfront distribution that it needs to refresh it's file or point out a single file that should be refreshed?
Amazon recommend that you version your files like logo_1.gif, logo_2.gif and so on as a workaround for this problem but that seems like a pretty stupid solution. Is there absolutely no other way?
Good news. Amazon finally added an Invalidation Feature. See the API Reference.
This is a sample request from the API Reference:
POST /2010-08-01/distribution/[distribution ID]/invalidation HTTP/1.0
Host: cloudfront.amazonaws.com
Authorization: [AWS authentication string]
Content-Type: text/xml
<InvalidationBatch>
<Path>/image1.jpg</Path>
<Path>/image2.jpg</Path>
<Path>/videos/movie.flv</Path>
<CallerReference>my-batch</CallerReference>
</InvalidationBatch>
As of March 19, Amazon now allows Cloudfront's cache TTL to be 0 seconds, thus you (theoretically) should never see stale objects. So if you have your assets in S3, you could simply go to AWS Web Panel => S3 => Edit Properties => Metadata, then set your "Cache-Control" value to "max-age=0".
This is straight from the API documentation:
To control whether CloudFront caches an object and for how long, we recommend that you use the Cache-Control header with the max-age= directive. CloudFront caches the object for the specified number of seconds. (The minimum value is 0 seconds.)
With the Invalidation API, it does get updated in a few of minutes.
Check out PHP Invalidator.
Bucket Explorer has a UI that makes this pretty easy now. Here's how:
Right click your bucket. Select "Manage Distributions."
Right click your distribution. Select "Get Cloudfront invalidation list"
Then select "Create" to create a new invalidation list.
Select the files to invalidate, and click "Invalidate." Wait 5-15 minutes.
Automated update setup in 5 mins
OK, guys. The best possible way for now to perform automatic CloudFront update (invalidation) is to create Lambda function that will be triggered every time when any file is uploaded to S3 bucket (a new one or rewritten).
Even if you never used lambda functions before, it is really easy -- just follow my step-by-step instructions and it will take just 5 mins:
Step 1
Go to https://console.aws.amazon.com/lambda/home and click Create a lambda function
Step 2
Click on Blank Function (custom)
Step 3
Click on empty (stroked) box and select S3 from combo
Step 4
Select your Bucket (same as for CloudFront distribution)
Step 5
Set an Event Type to "Object Created (All)"
Step 6
Set Prefix and Suffix or leave it empty if you don't know what it is.
Step 7
Check Enable trigger checkbox and click Next
Step 8
Name your function (something like: YourBucketNameS3ToCloudFrontOnCreateAll)
Step 9
Select Python 2.7 (or later) as Runtime
Step 10
Paste following code instead of default python code:
from __future__ import print_function
import boto3
import time
def lambda_handler(event, context):
for items in event["Records"]:
path = "/" + items["s3"]["object"]["key"]
print(path)
client = boto3.client('cloudfront')
invalidation = client.create_invalidation(DistributionId='_YOUR_DISTRIBUTION_ID_',
InvalidationBatch={
'Paths': {
'Quantity': 1,
'Items': [path]
},
'CallerReference': str(time.time())
})
Step 11
Open https://console.aws.amazon.com/cloudfront/home in a new browser tab and copy your CloudFront distribution ID for use in next step.
Step 12
Return to lambda tab and paste your distribution id instead of _YOUR_DISTRIBUTION_ID_ in the Python code. Keep surrounding quotes.
Step 13
Set handler: lambda_function.lambda_handler
Step 14
Click on the role combobox and select Create a custom role. New tab in browser will be opened.
Step 15
Click view policy document, click edit, click OK and replace role definition with following (as is):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"cloudfront:CreateInvalidation"
],
"Resource": [
"*"
]
}
]
}
Step 16
Click allow. This will return you to a lambda. Double check that role name that you just created is selected in the Existing role combobox.
Step 17
Set Memory (MB) to 128 and Timeout to 5 sec.
Step 18
Click Next, then click Create function
Step 19
You are good to go! Now on, each time you will upload/reupload any file to S3, it will be evaluated in all CloudFront Edge locations.
PS - When you are testing, make sure that your browser is loading images from CloudFront, not from local cache.
PSS - Please note, that only first 1000 files invalidation per month are for free, each invalidation over limit cost $0.005 USD. Also additional charges for Lambda function may apply, but it is extremely cheap.
If you have boto installed (which is not just for python, but also installs a bunch of useful command line utilities), it offers a command line util specifically called cfadmin or 'cloud front admin' which offers the following functionality:
Usage: cfadmin [command]
cmd - Print help message, optionally about a specific function
help - Print help message, optionally about a specific function
invalidate - Create a cloudfront invalidation request
ls - List all distributions and streaming distributions
You invaliate things by running:
$sam# cfadmin invalidate <distribution> <path>
one very easy way to do it is FOLDER versioning.
So if your static files are hundreds for example, simply put all of them into a folder called by year+versioning.
for example i use a folder called 2014_v1 where inside i have all my static files...
So inside my HTML i always put the reference to the folder. ( of course i have a PHP include where i have set the name of the folder. ) So by changing in 1 file it actually change in all my PHP files..
If i want a complete refresh, i simply rename the folder to 2014_v2 into my source and change inside the php include to 2014_v2
all HTML automatically change and ask the new path, cloudfront MISS cache and request it to the source.
Example:
SOURCE.mydomain.com is my source,
cloudfront.mydomain.com is CNAME to cloudfront distribution.
So the PHP called this file
cloudfront.mydomain.com/2014_v1/javascript.js
and when i want a full refresh, simply i rename folder into the source to "2014_v2" and i change the PHP include by setting the folder to "2014_v2".
Like this there is no delay for invalidation and NO COST !
This is my first post in stackoverflow, hope i did it well !
In ruby, using the fog gem
AWS_ACCESS_KEY = ENV['AWS_ACCESS_KEY_ID']
AWS_SECRET_KEY = ENV['AWS_SECRET_ACCESS_KEY']
AWS_DISTRIBUTION_ID = ENV['AWS_DISTRIBUTION_ID']
conn = Fog::CDN.new(
:provider => 'AWS',
:aws_access_key_id => AWS_ACCESS_KEY,
:aws_secret_access_key => AWS_SECRET_KEY
)
images = ['/path/to/image1.jpg', '/path/to/another/image2.jpg']
conn.post_invalidation AWS_DISTRIBUTION_ID, images
even on invalidation, it still takes 5-10 minutes for the invalidation to process and refresh on all amazon edge servers
current AWS CLI support invalidation in preview mode. Run the following in your console once:
aws configure set preview.cloudfront true
I deploy my web project using npm. I have the following scripts in my package.json:
{
"build.prod": "ng build --prod --aot",
"aws.deploy": "aws s3 sync dist/ s3://www.mywebsite.com --delete --region us-east-1",
"aws.invalidate": "aws cloudfront create-invalidation --distribution-id [MY_DISTRIBUTION_ID] --paths /*",
"deploy": "npm run build.prod && npm run aws.deploy && npm run aws.invalidate"
}
Having the scripts above in place you can deploy your site with:
npm run deploy
Set TTL=1 hour and replace
http://developer.amazonwebservices.com/connect/ann.jspa?annID=655
Just posting to inform anyone visiting this page (first result on 'Cloudfront File Refresh')
that there is an easy-to-use+access online invalidator available at swook.net
This new invalidator is:
Fully online (no installation)
Available 24x7 (hosted by Google) and does not require any memberships.
There is history support, and path checking to let you invalidate your files with ease. (Often with just a few clicks after invalidating for the first time!)
It's also very secure, as you'll find out when reading its release post.
Full disclosure: I made this. Have fun!
Go to CloudFront.
Click on your ID/Distributions.
Click on Invalidations.
Click create Invalidation.
In the giant example box type * and click invalidate
Done
If you are using AWS, you probably also use its official CLI tool (sooner or later). AWS CLI version 1.9.12 or above supports invalidating a list of file names.
Full disclosure: I made this. Have fun!