GCP Dataflow Error: Path "gs://..." is not a valid filepattern. The pattern must be of the form "gs://<bucket>/path/to/file" - google-cloud-platform

I am trying to create a dataflow from Pub-Sub to BigQuery in GCP console.
In the "Create job from template" screen, I am having a trouble what to enter for "Temporary Location" box. It says "Path and filename prefix for writing temporary files. ex: gs://MyBucket/tmp".
So I specified something like this: "gs://${GOOGLE_CLOUD_PROJECT}-test/dataflow/tmp"
But I am getting this error (dataflow folder is there BTW):
Path "gs://${GOOGLE_CLOUD_PROJECT}-test/dataflow/tmp" is not a valid filepattern. The pattern must be of the form "gs://<bucket>/path/to/file".
I tried different patterns but to no avail. Any idea how to resolve this?

it complains that it wants a bucket ...
The pattern must be of the form "gs:// [bucket] /path/to/file".
export PROJECT_ID=$(gcloud config list --format 'value(core.project)')
export BUCKET_NAME="${PROJECT_ID}-test"
gsutil "gs://${BUCKET_NAME}/dataflow/tmp"
wondered about the -test suffix and I've just tried to reflect that in code.
one can obtain all valid BUCKET_NAMEs with gsutil ls.

Related

Correct naming convention for Cloud Run: gcloud builds submit media bucket names (The specified bucket does not exist)

I am following this tutorial to upload my existing Django project running locally to Google Cloud Run. I believe I have followed all the steps correctly to create the bucket and grant it the necessary permissions. But when I try to run:
gcloud builds submit \
--config cloudmigrate.yaml \
--substitutions=_INSTANCE_NAME=cgps-reg-2-postgre-sql,_REGION=us-central1
I get the error:
Step #3 - "collect static": google.api_core.exceptions.NotFound: 404 POST https://storage.googleapis.com/upload/storage/v1/b/cgps-registration-2_cgps-reg-2-static-files-bucket/o?uploadType=multipart&predefinedAcl=publicRead:
I was a little confused by this line that seams to tell you to put the bucket name in the location field, but I think its perhaps just a typo in the tutorial. I was not sure if I should leave location at the default "Multi-Region" or change it to "us-central1" where everyting else in the project is.
The instructions for telling the project the name of the bucket I interpreted as PROJECT_ID + "_" + BUCKET_NAME:
or in my case
cgps-registration-2_cgps-reg-2-static-files-bucket
But clearly this naming convention is not correct as the error clearly says it can not find a bucket with this name. So what am I missing here?
Credit for this answer really goes to dazwilken. The answer he gave in the comment is the correct one:
Your bucket name is cgps-reg-2-static-files-bucket. This is its
globally unique name. You should not prefix it (again) with the
Project name when referencing it. The error is telling you (correctly)
that the bucket (called
cgps-registration-2_cgps-reg-2-static-files-bucket) does not exist. It
does not. The bucket is called cgps-reg-2-static-files-bucket
Because bucket names must be unique, one way to create them it to
combine another unique name i.e. the Google Cloud Project ID in their
naming. The tutorial likely confused you by using this approach but
without explaining it.

Is there a way to copy Google Cloud Storage object from SDK Shell to network drive like Box?

Is there a way to copy a GCS object via SDK Shell to a network drive like Box?
What i've tried is below. Thanks.
gsutil cp gs://your-bucket/some_file.tif C:/Users/Box/01. name/folder
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
There appears to be a typo in your destination:
C:/Users/Box/01. name/folder
There is a space after the period and before 'name' - you'll need to either wrap it in quotes or escape that space. Looks like you're on Windows; here's a doc on how to escape spaces in file paths.

Hive / S3 error: "No FileSystem for scheme: s3"

I am running Hive from a container (this image: https://hub.docker.com/r/bde2020/hive/) in my local computer.
I am trying to create a Hive table stored as a CSV in S3 with the following command:
CREATE EXTERNAL TABLE local_test (name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 's3://mybucket/local_test/';
However, I am getting the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.io.IOException No FileSystem for scheme: s3)
What is causing it?
Do I need to set up something else?
Note:
I am able to run aws s3 ls mybucket and also to create Hive tables in another directory, like /tmp/.
Problem discussed here.
https://github.com/ramhiser/spark-kubernetes/issues/3
You need to add reference to aws sdk jars to hive library path. That way it can recognize file schemes,
s3, s3n, and s3a
Hope it helps.
EDIT1:
hadoop-aws-2.7.4 has implementations on how to interact with those file systems. Verifying the jar it has all the implementations to handle those schema.
org.apache.hadoop.fs tells hadoop to see which file system implementation it need to look.
Below classes are implamented in those jar,
org.apache.hadoop.fs.[s3|s3a|s3native]
The only thing still missing is, the library is not getting added to hive library path. Is there anyway you can verify that path is added to hive library path?
EDIT2:
Reference to library path setting,
How can I access S3/S3n from a local Hadoop 2.6 installation?

Is there any way to get s3 uri from aws web console?

I want to download a directory from my s3.
When I need a file, the s3 management console (aws web console) allows me to download it, but when a directory, I have to use aws-cli, like:
$ aws s3 cp s3://mybucket/mydirectory/ . --recursive
My question is: Is there any way to get the s3 uri (s3://mybucket/mydirectory/) from s3 management console?
It's URL is available, but it is slightly different from s3 URI required by aws-cli. I could not find any menu to get the uri.
Thank you in advance!
No, it is not displayed in the console. However, it is simply:
s3://<bucket-name>/<key>
Directories are actually part of the key. For example, foo.jpg stored in an images directory will actually have a key (filename) of images/foo.jpg.
(self-answer)
Because it seems there was no such way, I have created one:
pip install aws-s3-url2uri
And command aws_s3_url2uri will be available after installation.
This command internally converts the web console URLs to S3 URIs, so works with URLs and URIs and local paths:
aws_s3_url2uri ls "https://console.aws.amazon.com/s3/home?region=<regionname>#&bucket=mybucket&prefix=mydir/mydir2/"
calls
aws s3 ls s3://mybucket/mydir/mydir2/
internally.
To convert an S3 URL displayed in the console such as https://s3.us-east-2.amazonaws.com/my-bucket-name/filename to an S3 URI, remove the https://s3.us-east-2.amazonaws.com/ portion and replace it with s3://, like so:
s3://my-bucket-name/filename
It looks like this feature is now available in the AWS Web Console.
It is accessible in two ways:
Selecting the checkbox next to the file and clicking on "Copy S3 URI" button.
Clicking on the file, and clicking on the "Copy S3 URI" button on the top right.
You can get the value from the console by selecting the file in the console. Choose Copy path on the Overview tab to copy the S3:// link to the object.
It is possible to get the S3-URI for a proper key/file in the console, by selecting the key and clicking on the Copy path button, this will place the s3-URI for the file on the clipboard.
However, directories are not keys as such but just key prefixes, so this will not work for them.
You may fail to get s3uri if you are created a new bucket.
You can get s3uri after creating a new folder in your bucket >> select check box to the newly created folder >> then copy the s3uri that appears at the top.

aws configure delete access key profile

I seem to be having difficulty deleting the access key profile i created for a test user using
aws configure --profile testuser
I have tried deleting the entries in my ~/.awsdirectory however when i run aws configure, i am getting the following error.
botocore.exceptions.ProfileNotFound: The config profile (testuser) could not be found
A workaround is adding [profile testuser] in my ~/.aws/config file but i dont want to do that. I want to remove all traces of this testuser profile from my machine.
The Configuring the AWS Command Line Interface documentation page lists various places where configuration files are stored, such as:
Linux: ~/.aws/credentials
Windows: C:\Users\USERNAME \.aws\credentials
There is also a default profile, which sounds like something that might be causing your situation:
Linux: export AWS_DEFAULT_PROFILE=user2
Windows: set AWS_DEFAULT_PROFILE=user2
I suggest checking to see whether that environment variable has been set.
look for a hidden folder; .aws/credentials
it path name is most likely: '/Users/COMPUTER_NAME/.aws/credentials'
change computer name to your computer name, there you will find two files,
config and credentials, edit them with a regular text editor