Google cloud functions deployment through Cloud Source repositories stopped working - google-cloud-platform

I managed to have a script deploying a GCP Function using the following command :
gcloud beta functions deploy pipeline-helper --set-env-vars PROPFILE_BUCKET=${my_bucket},PROPFILE_PATH=${some_property} --source https://source.developers.google.com/projects/{PROJECT}/repos/{REPO}/fixed-aliases/1.0.1/paths/ --entry-point onFlagFileCreation --runtime nodejs6 --trigger-resource ${my_bucket} --trigger-event google.storage.object.finalize --region europe-west1 --memory 1G --timeout 300s
That worked for a few days, the last one being December 4th. Then, when launched on December 27th ... the command failed with the following output (with debug option added) :
Deploying function (may take a while - up to 2 minutes)...
..failed.
DEBUG: (gcloud.beta.functions.deploy) OperationError: code=13, message=Failed to retrieve function source code
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 841, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 770, in Run
resources = command_instance.Run(args)
File "/usr/lib/google-cloud-sdk/lib/surface/functions/deploy.py", line 203, in Run
return _Run(args, track=self.ReleaseTrack(), enable_env_vars=True)
File "/usr/lib/google-cloud-sdk/lib/surface/functions/deploy.py", line 157, in _Run
return api_util.PatchFunction(function, updated_fields)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/util.py", line 308, in CatchHTTPErrorRaiseHTTPExceptionFn
return func(*args, **kwargs)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/util.py", line 364, in PatchFunction
operations.Wait(op, messages, client, _DEPLOY_WAIT_NOTICE)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/operations.py", line 126, in Wait
_WaitForOperation(client, request, notice)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/operations.py", line 101, in _WaitForOperation
sleep_ms=SLEEP_MS)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 219, in RetryOnResult
result = func(*args, **kwargs)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/operations.py", line 65, in _GetOperationStatus
raise exceptions.FunctionsError(OperationErrorToString(op.error))
FunctionsError: OperationError: code=13, message=Failed to retrieve function source code
ERROR: (gcloud.beta.functions.deploy) OperationError: code=13, message=Failed to retrieve function source code
Build step 'Execute shell' marked build as failure
Finished: FAILURE
My problem relates to the use of the --source option of this command when it points to a Google Source repository url (it works with gcs bucket or local directory)
I tried using the minimal valid source repository url https://source.developers.google.com/projects/PROJECT/repos/REPO as mentioned in the official doc here ... with no success (same error)
After that, i cloned the official sample « Google cloud functions - hello world sample to GC Repositories and tried to deploy it using an equivalent command ... with no more success (same error). However, i was able to deploy it via a zip uploaded to a gcs bucket in my project or from a local repository but not from Google Source repositories ...
The account used to deploy the Function (xxx-compute#developer.gserviceaccount.com) has the following right :
Stackdriver Debugger Agent
Cloud Functions Developer
Cloud Functions Service Agent
Editor
Service Account User
Source Repository Writer
Cloud Source Repositories Service Agent
Storage Object Creator
Storage Object Viewer
Any help would be greatly appreciated

As mentioned in my last comment to #Raj, the problem was due to a bug in GCP ... that is now fixed. Support « people » where kind and reactive.
All is working as expected now !

Related

AWS CLI failed to display response

I am using aws-cli version 2.8.8.
Connecting to AWS using LDAP and it is successful.
If I run command aws s3 ls then I get the results.
However when I try run command aws dynamodb list-tables nothing get displayed. Same for aws ec2 describe-instances no response.
When I run same command in debug mode I can see exception in awscli.clidriver file:
Exception details
2022-11-03 12:27:55,762 - MainThread - awscli.clidriver - DEBUG - Exception caught in main()
Traceback (most recent call last):
File "awscli/clidriver.py", line 458, in main
File "awscli/clidriver.py", line 593, in __call__
File "awscli/clidriver.py", line 769, in __call__
My team members uses the same cli version and account and then can access to all data. The issue is with my Mac terminal.
I tried searching for this issue online but no one has reported it. This could be with my terminal but I am not able to identify root cause.

MLflow proxied artifact access: Unable to locate credentials

I am using MLflow to track my experiments. I am using an S3 bucket as an artifact store. For acessing it, I want to use proxied artifact access, as described in the docs, however this does not work for me, since it locally looks for credentials (but the server should handle this).
Expected Behaviour
As described in the docs, I would expect that locally, I do not need to specify my AWS credentials, since the server handles this for me. From docs:
This eliminates the need to allow end users to have direct path access to a remote object store (e.g., s3, adls, gcs, hdfs) for artifact handling and eliminates the need for an end-user to provide access credentials to interact with an underlying object store.
Actual Behaviour / Error
Whenever I run an experiment on my machine, I am running into the following error:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
So the error is local. However, this should not happen since the server should handle the auth instead of me needing to store my credentials locally. Also, I would expect that I would not even need library boto3 locally.
Solutions Tried
I am aware that I need to create a new experiment, because existing experiments might still use a different artifact location which is proposed in this SO answer as well as in the note in the docs. Creating a new experiment did not solve the error for me. Whenever I run the experiment, I get an explicit log in the console validating this:
INFO mlflow.tracking.fluent: Experiment with name 'test' does not exist. Creating a new experiment.
Related Questions (#1 and #2) refer to a different scenario, which is also described in the docs
Server Config
The server runs on a kubernetes pod with the following config:
mlflow server \
--host 0.0.0.0 \
--port 5000 \
--backend-store-uri postgresql://user:pw#endpoint \
--artifacts-destination s3://my_bucket/artifacts \
--serve-artifacts \
--default-artifact-root s3://my_bucket/artifacts \
I would expect my config to be correct, looking at doc page 1 and page 2
I am able to see the mlflow UI if I forward the port to my local machine. I also see the experiment runs as failed, because of the error I sent above.
My Code
The relevant part of my code which fails is the logging of the model:
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("test2)
...
# this works
mlflow.log_params(hyperparameters)
model = self._train(model_name, hyperparameters, X_train, y_train)
y_pred = model.predict(X_test)
self._evaluate(y_test, y_pred)
# this fails with the error from above
mlflow.sklearn.log_model(model, "artifacts")
Question
I am probably overlooking something. Is there a need to locally indicate that I want to use proxied artified access? If yes, how do I do this? Is there something I have missed?
Full Traceback
File /dir/venv/lib/python3.9/site-packages/mlflow/models/model.py", line 295, in log
mlflow.tracking.fluent.log_artifacts(local_path, artifact_path)
File /dir/venv/lib/python3.9/site-packages/mlflow/tracking/fluent.py", line 726, in log_artifacts
MlflowClient().log_artifacts(run_id, local_dir, artifact_path)
File /dir/venv/lib/python3.9/site-packages/mlflow/tracking/client.py", line 1001, in log_artifacts
self._tracking_client.log_artifacts(run_id, local_dir, artifact_path)
File /dir/venv/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 346, in log_artifacts
self._get_artifact_repo(run_id).log_artifacts(local_dir, artifact_path)
File /dir/venv/lib/python3.9/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 141, in log_artifacts
self._upload_file(
File /dir/venv/lib/python3.9/site-packages/mlflow/store/artifact/s3_artifact_repo.py", line 117, in _upload_file
s3_client.upload_file(Filename=local_file, Bucket=bucket, Key=key, ExtraArgs=extra_args)
File /dir/venv/lib/python3.9/site-packages/boto3/s3/inject.py", line 143, in upload_file
return transfer.upload_file(
File /dir/venv/lib/python3.9/site-packages/boto3/s3/transfer.py", line 288, in upload_file
future.result()
File /dir/venv/lib/python3.9/site-packages/s3transfer/futures.py", line 103, in result
return self._coordinator.result()
File /dir/venv/lib/python3.9/site-packages/s3transfer/futures.py", line 266, in result
raise self._exception
File /dir/venv/lib/python3.9/site-packages/s3transfer/tasks.py", line 139, in __call__
return self._execute_main(kwargs)
File /dir/venv/lib/python3.9/site-packages/s3transfer/tasks.py", line 162, in _execute_main
return_value = self._main(**kwargs)
File /dir/venv/lib/python3.9/site-packages/s3transfer/upload.py", line 758, in _main
client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
File /dir/venv/lib/python3.9/site-packages/botocore/client.py", line 508, in _api_call
return self._make_api_call(operation_name, kwargs)
File /dir/venv/lib/python3.9/site-packages/botocore/client.py", line 898, in _make_api_call
http, parsed_response = self._make_request(
File /dir/venv/lib/python3.9/site-packages/botocore/client.py", line 921, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File /dir/venv/lib/python3.9/site-packages/botocore/endpoint.py", line 119, in make_request
return self._send_request(request_dict, operation_model)
File /dir/venv/lib/python3.9/site-packages/botocore/endpoint.py", line 198, in _send_request
request = self.create_request(request_dict, operation_model)
File /dir/venv/lib/python3.9/site-packages/botocore/endpoint.py", line 134, in create_request
self._event_emitter.emit(
File /dir/venv/lib/python3.9/site-packages/botocore/hooks.py", line 412, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File /dir/venv/lib/python3.9/site-packages/botocore/hooks.py", line 256, in emit
return self._emit(event_name, kwargs)
File /dir/venv/lib/python3.9/site-packages/botocore/hooks.py", line 239, in _emit
response = handler(**kwargs)
File /dir/venv/lib/python3.9/site-packages/botocore/signers.py", line 103, in handler
return self.sign(operation_name, request)
File /dir/venv/lib/python3.9/site-packages/botocore/signers.py", line 187, in sign
auth.add_auth(request)
File /dir/venv/lib/python3.9/site-packages/botocore/auth.py", line 407, in add_auth
raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials
The problem is that the server is running on wrong run parameters, the --default-artifact-root needs to either be removed or set to mlflow-artifacts:/.
From mlflow server --help:
--default-artifact-root URI Directory in which to store artifacts for any
new experiments created. For tracking server
backends that rely on SQL, this option is
required in order to store artifacts. Note that
this flag does not impact already-created
experiments with any previous configuration of
an MLflow server instance. By default, data
will be logged to the mlflow-artifacts:/ uri
proxy if the --serve-artifacts option is
enabled. Otherwise, the default location will
be ./mlruns.
Having the same problem and the accepted answer doesn't seem to solve my issue.
Neither removing or setting mlflow-artifacts instead of s3 works for me. Moreover it gave me an error that since I have a remote backend-store-uri I need to set default-artifact-root while running the mlflow server.
How I solved it that I find the error self explanatory, and the reason it states that it was unable to find credential is that mlflow underneath uses boto3 to do all the transaction. Since I had setup my environment variables in .env, just loading the file was enough for me and solved the issue. If you have the similar scenario then just run the following commands before starting your mlflow server,
set -a
source .env
set +a
This will load the environment variables and you will be good to go.
Note:
I was using both remote server for backend and artifacts storage, mainly postgres and minio.
For remote backend backend-store-uri is must otherwise you will not be able to startup your mlflow server
The answer #bk_ helped me. I ended up with the following command to get my Tracking Server running with proxied connection for artifact storage:
mlflow server \
--backend-store-uri postgresql://postgres:postgres#postgres:5432/mlflow \
--default-artifact-root mlflow-artifacts:/ \
--serve-artifacts \
--host 0.0.0.0

gcloud 403 permission errors with wrong project

I used to work at a company and had setup my gcloud previously with gcloud init or gcloud auth login (I don't recall which one). We were using google container engine (GKE).
I've since left the company and been removed from the permissions on that project.
Now today, I wanted to setup a brand new app engine for myself unrelated to the previous company.
Why is it that I cant run any commands without getting the below error? gcloud init, gcloud auth login or even gcloud --help or gcloud config list all display errors. It seems like it's trying to login to my previous company's project with gcloud container cluster but I'm not typing that command at all and am in a differerent zone and interested in a different project. Where is my config for gcloud getting these defaults?
Is this a case where I need to delete my .config/gcloud folder? Seems rather extreme of a solution just to login to a different project?
Traceback (most recent call last):
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/gcloud.py", line 65, in <module>
main()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/gcloud.py", line 61, in main
sys.exit(googlecloudsdk.gcloud_main.main())
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/gcloud_main.py", line 130, in main
gcloud_cli = CreateCLI([])
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/gcloud_main.py", line 119, in CreateCLI
generated_cli = loader.Generate()
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 329, in Generate
cli = self.__MakeCLI(top_group)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 517, in __MakeCLI
log.AddFileLogging(self.__logs_dir)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/core/log.py", line 676, in AddFileLogging
_log_manager.AddLogsDir(logs_dir=logs_dir)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/core/log.py", line 365, in AddLogsDir
self._CleanUpLogs(logs_dir)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/core/log.py", line 386, in _CleanUpLogs
self._CleanLogsDir(logs_dir)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/core/log.py", line 412, in _CleanLogsDir
os.remove(log_file_path)
OSError: [Errno 13] Permission denied: '/Users/terence/.config/gcloud/logs/2017.07.27/19.07.37.248117.log'
And the log file:
/Users/terence/.config/gcloud/logs/2017.07.27/19.07.37.248117.log
2017-07-27 19:07:37,252 DEBUG root Loaded Command Group: ['gcloud', 'container']
2017-07-27 19:07:37,253 DEBUG root Loaded Command Group: ['gcloud', 'container', 'clusters']
2017-07-27 19:07:37,254 DEBUG root Loaded Command Group: ['gcloud', 'container', 'clusters', 'get_credentials']
2017-07-27 19:07:37,330 DEBUG root Running [gcloud.container.clusters.get-credentials] with arguments: [--project: "REMOVED_PROJECT", --zone: "DIFFERENT_ZONE", NAME: "REMOVED_CLUSTER_NAME"]
2017-07-27 19:07:37,331 INFO ___FILE_ONLY___ Fetching cluster endpoint and auth data.
2017-07-27 19:07:37,591 DEBUG root (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission for "projects/REMOVED_PROJECT/zones/DIFFERENT_ZONE/clusters/REMOVED_CLUSTER_NAME".
Traceback (most recent call last):
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 712, in Execute
resources = args.calliope_command.Run(cli=self, args=args)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 871, in Run
resources = command_instance.Run(args)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/surface/container/clusters/get_credentials.py", line 69, in Run
cluster = adapter.GetCluster(cluster_ref)
File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/lib/googlecloudsdk/api_lib/container/api_adapter.py", line 213, in GetCluster
raise api_error
HttpException: ResponseError: code=403, message=Required "container.clusters.get" permission for "projects/REMOVED_PROJECT/zones/DIFFERENT_ZONE/clusters/REMOVED_CLUSTER_NAME".
2017-07-27 19:07:37,596 ERROR root (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission for "projects/REMOVED_PROJECT/zones/DIFFERENT_ZONE/clusters/REMOVED_CLUSTER_NAME".
I had to delete my .config/gcloud to make this work although I don't believe that is a good "solution".
Okay so not sure if things have changed but ran into a similar issue. Please try this before nuking your configuration.
gcloud supports multiple accounts and you can see what account is active by running gcloud auth list.
ACTIVE ACCOUNT
* Work-Email#company.com
Personal-Email#gmail.com
If you are not on the correct one, you can do
$ gcloud config set account Personal-Email#gmail.com
And it'll set the correct account. Running a gcloud auth list again should show the ACTIVE now on your personal.
if you haven't auth'd into your personal, you'll need to login. You can rungcloud auth login Personal-Email#gmail.com and follow the flow from there and then return to the above.
Make sure to set PROJECT_ID or whatever things you may need when switching.
Now from there I found it's STILL possible that you might not be auth'd correctly. I think for this, you may need to restart your terminal session or even simply doing a source ~/.bash_profile was sufficient. (Perhaps I needed to do this to refresh the GOOGLE_APPLICATION_CREDENTIALS environment variable but I'm not sure).
Hope this helps. Try this before nuking
Rename / delete config/gcloud/logs folder and try Instead of deleting .config/gcloud folder.
This Solution worked for me :)

boto3 throws error in when packaged under rpm

I am using boto3 in my project and when i package it as rpm it is raising error while initializing ec2 client.
<class 'botocore.exceptions.DataNotFoundError'>:Unable to load data for: _endpoints. Traceback -Traceback (most recent call last):
File "roboClientLib/boto/awsDRLib.py", line 186, in _get_ec2_client
File "boto3/__init__.py", line 79, in client
File "boto3/session.py", line 200, in client
File "botocore/session.py", line 789, in create_client
File "botocore/session.py", line 682, in get_component
File "botocore/session.py", line 809, in get_component
File "botocore/session.py", line 179, in <lambda>
File "botocore/session.py", line 475, in get_data
File "botocore/loaders.py", line 119, in _wrapper
File "botocore/loaders.py", line 377, in load_data
DataNotFoundError: Unable to load data for: _endpoints
Can anyone help me here. Probably boto3 requires some run time resolutions which it not able to get this in rpm.
I tried with using LD_LIBRARY_PATH in /etc/environment which is not working.
export LD_LIBRARY_PATH="/usr/lib/python2.6/site-packages/boto3:/usr/lib/python2.6/site-packages/boto3-1.2.3.dist-info:/usr/lib/python2.6/site-packages/botocore:
I faced the same issue:
botocore.exceptions.DataNotFoundError: Unable to load data for: ec2/2016-04-01/service-2
For which I figured out the directory was missing. Updating botocore by running the following solved my issue:
pip install --upgrade botocore
Botocore depends on a set of service definition files that it uses to generate clients on the fly. Boto3 further depends on another set of files that it uses to generate resource clients. You will need to include these in any installs of boto3 or botocore. The files will need to be located in the 'data' folder of the root of the respective library.
I faced similar issue which was due to old version of botocore. Once I updated it, it started working.
Please consider using below command.
pip install --upgrade botocore
Also please ensure, you have setup boto configuration profile.
Boto searches credentials in below order.
Passing credentials as parameters in the boto.client() method
Passing credentials as parameters when creating a Session object
Environment variables
Shared credential file (~/.aws/credentials)
AWS config file (~/.aws/config)
Assume Role provider
Boto2 config file (/etc/boto.cfg and ~/.boto)
Instance metadata service on an Amazon EC2 instance that has an IAM
role configured.

Unable to launch spark cluster on aws using spark-ec2 script -"AWS was not able to validate the provided access credentials"

I have tried both the commands below and did set the env variables prior to launch of the scripts, but I am hit with "AWS was not able to validate the provided access credentials" error. I don't think there is an issue with keys.
I would appreciate any sort help to fix this.
I am on ubuntu t2.micro instance.
https://spark.apache.org/docs/latest/ec2-scripts.html
export AWS_SECRET_ACCESS_KEY=
export AWS_ACCESS_KEY_ID=
./spark-ec2 -k admin-key1 -i /home/ubuntu/admin-key1.pem -s 3 launch my-spark-cluster
./spark-ec2 --key-pair=admin-key1 --identity-file=/home/ubuntu/admin-key1.pem --region=ap-southeast-2 --zone=ap-southeast-2a launch my-spark-cluster
AuthFailure
AWS was not able to validate the provided access credentials
Traceback (most recent call last):
File "./spark_ec2.py", line 1465, in <module>
main()
File "./spark_ec2.py", line 1457, in main
real_main()
File "./spark_ec2.py", line 1277, in real_main
opts.zone = random.choice(conn.get_all_zones()).name
File "/cskmohan/spark-1.4.1/ec2/lib/boto-2.34.0/boto/ec2/connection.py", line 1759, in get_all_zones
[('item', Zone)], verb='POST')
File "/cskmohan/spark-1.4.1/ec2/lib/boto-2.34.0/boto/connection.py", line 1182, in get_list
raise self.ResponseError(response.status, response.reason, body)
boto.exception.EC2ResponseError: EC2ResponseError: 401 Unauthorized