Dataflow: Apache Beam WARNING: apache_beam.utils.retry:Retry with exponential backoff:

Dataflow: Apache Beam WARNING: apache_beam.utils.retry:Retry with exponential backoff: - google-cloud-platform

I have a simple pipeline, which 3 weeks ago would work fine but I've gone back to the code to enhance it and when I tried to run the code it returned the following error:
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 10.280192286716229 seconds before retrying exists because we caught exception: TypeError: string indices must be integers
Traceback for above exception (most recent call last):
I am running the dataflow script via Cloud Shell on the Google Cloud Platform. By simply executing Python3 <dataflow.py>
The code is as follows, and this used to submit the job to dataflow without issue
import json
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions, StandardOptions
from apache_beam import coders
pipeline_args=[
'--runner=DataflowRunner',
'--job_name=my-job-name',
'--project=my-project-id',
'--region=europe-west2',
'--temp_location=gs://mybucket/temp',
'--staging_location=gs://mybucket/staging'
]
options = PipelineOptions (pipeline_args)
p = beam.Pipeline(options=options)
rows = (
p | 'Read daily Spot File' >> beam.io.ReadFromText(
file_pattern='gs://bucket/filename.gz',
compression_type='gzip',
coder=coders.BytesCoder(),
skip_header_lines=0))
p.run()
Any advice as to why this has started happening would be great to know.
Thanks in advance.

Not sure about the original issue but I can speak to Usman's post which seems to describe an issue I ran into myself.
Python doesn't use gcloud auth to authenticate but it uses the environment variable GOOGLE_APPLICATION_CREDENTIALS. So before you run the python command to launch the Dataflow job, you will need to set that environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key"
More info on setting up the environment variable: https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable
Then you'll have to make sure that the account you set up has the necessary permissions in your GCP project.
Permissions and service accounts:
User service account or user account: it needs the Dataflow Admin
role at the project level and to be able to act as the worker service
account (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#worker_service_account).
Worker service account: it will be one worker service account per
Dataflow pipeline. This account will need the Dataflow Worker role at
the project level plus the necessary permissions to the resources
accessed by the Dataflow pipeline (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#worker_service_account).
Example: if Dataflow pipeline’s input is Pub/Sub topic and output is
BigQuery table, the worker service account will need read access to
the topic as well as write permission to the BQ table.
Dataflow service account: this is the account that gets automatically
created when you enable the Dataflow API in a project. It
automatically gets the Dataflow Service Agent role at the project
level (source:
https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#service_account).

I found same kind of error following this [tutorial][1]
python3 PubSubToGCS.py \
--project=mango-s-11-a81277cf \
--region=us-east1 \
--input_topic=projects/mango-s-11-a81277cf/topics/pubsubtest \
--output_path=gs://mangobucket001/samples/output \
--runner=DataflowRunner \
--window_size=2 \
--num_shards=2 \
--temp_location=gs://mangobucket001/temp
a81277cf/topics/pubsubtest --output_path=gs://mangobucket001/ --runner=DataflowRunner --window_size=2 --num_shards=2 --temp_location=gs://mangobucket001/temp
INFO:apache_beam.runners.portability.stager:Downloading source distribution of the SDK from PyPi
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/bin/python3', '-m', 'pip', 'download', '--dest', '/tmp/tmpaelpxw64', 'apache-beam==2.33.0', '--no-deps', '--no-binary', ':all:']
INFO:apache_beam.runners.portability.stager:Staging SDK sources from PyPI: dataflow_python_sdk.tar
INFO:apache_beam.runners.portability.stager:Downloading binary distribution of the SDK from PyPi
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/bin/python3', '-m', 'pip', 'download', '--dest', '/tmp/tmpaelpxw64', 'apache-beam==2.33.0', '--no-deps', '--only-binary', ':all:', '--python-version', '38', '--implementation', 'cp', '--abi', 'cp38', '--platform', 'manylinux1_x86_64']
INFO:apache_beam.runners.portability.stager:Staging binary distribution of the SDK from PyPI: apache_beam-2.33.0-cp38-cp38-manylinux1_x86_64.whl
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.8 interpreter.
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.33.0
INFO:root:Using provided Python SDK container image: gcr.io/cloud-dataflow/v1beta3/python38-fnapi:2.33.0
INFO:root:Python SDK container image set to "gcr.io/cloud-dataflow/v1beta3/python38-fnapi:2.33.0" for Docker environment
INFO:apache_beam.runners.dataflow.internal.apiclient:Defaulting to the temp_location as staging_location: gs://mangobucket001/temp
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 3.054787201157167 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 5.22755214575824 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 19.20582351652022 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 39.51226214946517 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 61.15550203875504 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
raise IOError((
INFO:apache_beam.runners.dataflow.internal.apiclient:Starting GCS upload to gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658/pickled_main_session...
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 122.46460946846845 seconds before retrying _gcs_file_copy because we caught exception: OSError: Could not upload to GCS path gs://mangobucket001/temp/beamapp-maan-0214122440-788240.1644841480.788658: access denied. Please verify that credentials are valid and that you have write access to the specified path.
Traceback for above exception (most recent call last):
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 559, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/maan/.local/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 642, in stage_file
[1]: https://cloud.google.com/pubsub/docs/pubsub-dataflow

Related

GCP Removed Access To Google Collab User

I used to be able to use secretmanager to get secrets from my GCP account using google collab.
Now, whenever I try to run the following code:
client = secretmanager.SecretManagerServiceClient()
name = f"projects/my_project_here/secrets/my_secret_name_here/versions/latest"
response = client.access_secret_version(request={"name": name})
I get the following error over and over:
ERROR:grpc._plugin_wrapping:AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x7fb313b41850>" raised exception!
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/credentials.py", line 111, in refresh
self._retrieve_info(request)
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/credentials.py", line 88, in _retrieve_info
request, service_account=self._service_account_email
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/_metadata.py", line 234, in get_service_account_info
return get(request, path, params={"recursive": "true"})
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/_metadata.py", line 187, in get
response,
google.auth.exceptions.TransportError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7fb313b9c290>)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/grpc/_plugin_wrapping.py", line 90, in __call__
context, _AuthMetadataPluginCallback(callback_state, callback))
File "/usr/local/lib/python3.7/dist-packages/google/auth/transport/grpc.py", line 101, in __call__
callback(self._get_authorization_headers(context), None)
File "/usr/local/lib/python3.7/dist-packages/google/auth/transport/grpc.py", line 88, in _get_authorization_headers
self._request, context.method_name, context.service_url, headers
File "/usr/local/lib/python3.7/dist-packages/google/auth/credentials.py", line 133, in before_request
self.refresh(request)
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/credentials.py", line 117, in refresh
six.raise_from(new_exc, caught_exc)
File "<string>", line 3, in raise_from
google.auth.exceptions.RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7fb313b9c290>)
How can I see what user I am in google collab and then add this user in GCP so I can again fetch these secrets?
I know to go to GCP->Secret Manager->My_Secret->Permissions->+Grant Access, but I don't know how to know 1) Who to Add 2) Why this permission changed on its own with no intervention on anyone's end.
Both were originally running under my email (and they are still) so this worked without me ever having to touch secrets access because both were the App Engine default service account<->Secret Manager Secret Accessor

google.auth.exceptions.TransportError when trying to run google cloud vision quickstart guide

I'm trying to run the google cloud vision quickstart guide but when calling response = client.label_detection(image=image) I'm getting the following error:
ERROR:grpc._plugin_wrapping:AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x7fc1f3f66350>" raised exception!
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/credentials.py", line 111, in refresh
self._retrieve_info(request)
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/credentials.py", line 88, in _retrieve_info
request, service_account=self._service_account_email
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/_metadata.py", line 234, in get_service_account_info
return get(request, path, params={"recursive": "true"})
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/_metadata.py", line 187, in get
response,
google.auth.exceptions.TransportError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7fc1fb046910>)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/grpc/_plugin_wrapping.py", line 90, in __call__
context, _AuthMetadataPluginCallback(callback_state, callback))
File "/usr/local/lib/python3.7/dist-packages/google/auth/transport/grpc.py", line 101, in __call__
callback(self._get_authorization_headers(context), None)
File "/usr/local/lib/python3.7/dist-packages/google/auth/transport/grpc.py", line 88, in _get_authorization_headers
self._request, context.method_name, context.service_url, headers
File "/usr/local/lib/python3.7/dist-packages/google/auth/credentials.py", line 133, in before_request
self.refresh(request)
File "/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/credentials.py", line 117, in refresh
six.raise_from(new_exc, caught_exc)
File "<string>", line 3, in raise_from
google.auth.exceptions.RefreshError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7fc1fb046910>)
---------------------------------------------------------------------------
_InactiveRpcError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
71 try:
---> 72 return callable_(*args, **kwargs)
73 except grpc.RpcError as exc:
7 frames
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7fc1fb046910>)"
debug_error_string = "UNKNOWN:Error received from peer vision.googleapis.com:443 {grpc_message:"Getting metadata from plugin failed with error: (\"Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\\nb\'\'\", <google.auth.transport.requests._Response object at 0x7fc1fb046910>)", grpc_status:14, created_time:"2022-09-13T14:48:30.187999422+00:00"}"
>
The above exception was the direct cause of the following exception:
ServiceUnavailable Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
72 return callable_(*args, **kwargs)
73 except grpc.RpcError as exc:
---> 74 raise exceptions.from_grpc_error(exc) from exc
75
76 return error_remapped_callable
ServiceUnavailable: 503 Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7fc1fb046910>)
I'm executing the example code in a google colab, here's my code:
#connect to google drive
from google.colab import drive
drive.mount('/content/drive')
#download and extract google cloud client
!curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-401.0.0-linux-x86_64.tar.gz
!tar -xf google-cloud-cli-401.0.0-linux-x86_64.tar.gz
#set credentials environment variable
!export GOOGLE_APPLICATION_CREDENTIALS=/content/drive/MyDrive/imagelabeling-1663076432940-26cfebd304cf.json
#install and init google cloud
!./google-cloud-sdk/install.sh --usage-reporting False --quiet
!./google-cloud-sdk/bin/gcloud init
#install google-cloud-vision libraries
!pip install --upgrade google-cloud-vision
#download cat image
!wget https://raw.githubusercontent.com/googleapis/python-vision/master/samples/snippets/quickstart/resources/wakeupcat.jpg
#run labeling code
import io
import os
# Imports the Google Cloud client library
from google.cloud import vision
# Instantiates a client
client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = os.path.abspath('/content/wakeupcat.jpg')
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)
# Performs label detection on the image file
response = client.label_detection(image=image)
labels = response.label_annotations
print('Labels:')
for label in labels:
print(label.description)
during !./google-cloud-sdk/bin/gcloud init I select Re-initialize this configuration [default] with new settings and then login with my google username by clicking on the weblink and copy-pasting the auth code. After that I'm selecting the testproject I've setup in the console.
I have confirmed that billing is enabled for this testproject. I have also enabled the vision API for this project and I have created a service account for this project with 'Owner' role and have created a json key file, which is located on the mounted google drive path `/content/drive/MyDrive/imagelabeling-1663076432940-26cfebd304cf.json``
Did I miss something?

Setting up a scheduled query with a service account error PermissionDenied: 403 The caller does not have permission in data_transfer_service

I am trying to schedule query using big query data transfer api and giving required permission bigquery.admin and enabled the big query transfer api.
Permission Documentation:
https://cloud.google.com/bigquery-transfer/docs/enable-transfer-service
Also tried with project owner permission to the service account. But still giving same error.
Code Documentation: (Setting up a scheduled query with a service account)
https://cloud.google.com/bigquery/docs/scheduling-queries
Part in which error coming
transfer_config = transfer_client.create_transfer_config(
bigquery_datatransfer.CreateTransferConfigRequest(
parent=parent,
transfer_config=transfer_config,
service_account_name=service_account_name,
)
)
Error StackTrace
Traceback (most recent call last):
File "/home/ubuntu/prod/venv_trellai/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 73, in error_remapped_callable
return callable_(*args, **kwargs)
File "/home/ubuntu/prod/venv_trellai/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/ubuntu/prod/venv_trellai/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.PERMISSION_DENIED
details = "The caller does not have permission"
debug_error_string = "{"created":"#1633536014.842657676","description":"Error received from peer ipv4:142.250.192.138:443","file":"src/core/lib/surface/call.cc","file_line":1070,"grpc_message":"The caller does not have permission","grpc_status":7}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "__main__.py", line 728, in <module>
mbc.schedule_query()
File "/home/ubuntu/prod/trell-ds-framework/data_engineering/data_migration/schedule_quries.py", line 62, in schedule_query
service_account_name=service_account_name,
File "/home/ubuntu/prod/venv_trellai/lib/python3.6/site-packages/google/cloud/bigquery_datatransfer_v1/services/data_transfer_service/client.py", line 647, in create_transfer_config
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/home/ubuntu/prod/venv_trellai/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
return wrapped_func(*args, **kwargs)
File "/home/ubuntu/prod/venv_trellai/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.PermissionDenied: 403 The caller does not have permission
Service file have all these credentials below.
BigQuery Admin
BigQuery Data Transfer Service Agent
Service Account Token Creator
Storage Admin
I am already setting up json authentication cred in environment variable but still gives permission error.
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = Constants.BIG_QUERY_SERVICE_ACCOUNT_CRED
Can anyone help me out here? Thanks in advance.

Take a look at this page on authentication: https://cloud.google.com/bigquery/docs/authentication/service-account-file#python
Assuming you're using Service Account, you can provide the credentials explicitly to confirm they work as expected:
from google.cloud import bigquery
from google.oauth2 import service_account
# TODO(developer): Set key_path to the path to the service account key
# file.
# key_path = "path/to/service_account.json"
credentials = service_account.Credentials.from_service_account_file(
key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
client = bigquery.Client(credentials=credentials, project=credentials.project_id,)

I would recommend you to see if the service account you are using is referring to the project you are using and has all the permissions needed to schedule the query. My best guess is that you are pointing to another project with the service account.
Also, you need one extra role for the service account that is the next one “Service Account Token Creator”.

Profile argument in python s3fs

I'm trying to use s3fs in python to connect to an s3 bucket. The associated credentials are saved in a profile called 'pete' in ~/.aws/credentials:
[default]
aws_access_key_id=****
aws_secret_access_key=****
[pete]
aws_access_key_id=****
aws_secret_access_key=****
This seems to work in AWS CLI (on Windows):
$>aws s3 ls s3://my-bucket/ --profile pete
PRE other-test-folder/
PRE test-folder/
But I get a permission denied error when I use what should be equivalent code using the s3fs package in python:
import s3fs
import requests
s3 = s3fs.core.S3FileSystem(profile = 'pete')
s3.ls('my-bucket')
I get this error:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 504, in _lsdir
async for i in it:
File "C:\ProgramData\Anaconda3\lib\site-packages\aiobotocore\paginate.py", line 32, in __anext__
response = await self._make_request(current_kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\aiobotocore\client.py", line 154, in _make_api_call
raise error_class(parsed_response, operation_name)
ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<ipython-input-9-4627a44a7ac3>", line 5, in <module>
s3.ls('ma-baseball')
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 993, in ls
files = maybe_sync(self._ls, self, path, refresh=refresh)
File "C:\ProgramData\Anaconda3\lib\site-packages\fsspec\asyn.py", line 97, in maybe_sync
return sync(loop, func, *args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\fsspec\asyn.py", line 68, in sync
raise exc.with_traceback(tb)
File "C:\ProgramData\Anaconda3\lib\site-packages\fsspec\asyn.py", line 52, in f
result[0] = await future
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 676, in _ls
return await self._lsdir(path, refresh)
File "C:\ProgramData\Anaconda3\lib\site-packages\s3fs\core.py", line 527, in _lsdir
raise translate_boto_error(e) from e
PermissionError: Access Denied
I have to assume it's not a config issue within s3 because I can access s3 through the CLI. So something must be off with my s3fs code, but I can't find a whole lot of documentation on profiles in s3fs to figure out what's going on. Any help is of course appreciated.

How fix no basic auth credentials error wit hdockerpy and aws ecr repo?

I m trying to execute the folowing python code:
import logging
import sys
import docker, boto3
from base64 import b64decode
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
LOCAL_REPOSITORY = '111111111111.dkr.ecr.us-east-1.amazonaws.com/my_repo:latest'
image = '111111111111.dkr.ecr.us-east-1.amazonaws.com/my_repo'
ecr_registry, _ = image.split('/', 1)
client = docker.from_env()
# Get login credentials from AWS for the ECR registry.
ecr = boto3.client('ecr')
response = ecr.get_authorization_token()
token = b64decode(response['authorizationData'][0]['authorizationToken'])
username, password = token.decode('utf-8').split(':', 1)
# Log in to the ECR registry with Docker.
client.login(username, password, registry=ecr_registry)
logging.info("loggined")
client.images.pull(image, auth_config={
username: username,
password: password
})
And got exception:
C:\myPath>python app/pull_example.py
INFO:botocore.credentials:Found credentials in environment variables.
INFO:root:loggined
Traceback (most recent call last):
File "C:\Python3\lib\site-packages\docker\api\client.py", line 261, in _raise_for_status
response.raise_for_status()
File "C:\Python3\lib\site-packages\requests\models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localnpipe/v1.35/images/create?fromImage=111111111111.dkr.ecr.us-east-1.amazonaws.com%2Fmy_repo
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "app/pull_example.py", line 41, in <module>
password: password
File "C:\Python3\lib\site-packages\docker\models\images.py", line 445, in pull
repository, tag=tag, stream=True, **kwargs
File "C:\Python3\lib\site-packages\docker\api\image.py", line 415, in pull
self._raise_for_status(response)
File "C:\Python3\lib\site-packages\docker\api\client.py", line 263, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "C:\Python3\lib\site-packages\docker\errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 500 Server Error: Internal Server Error ("Get https://111111111111.dkr.ecr.us-east-1.amazonaws.com/v2/my_repo/tags/list: no basic auth credentials")
What is the problem? Why I can not pull image even after client.login call which happens wihtout any exeptions. What is the correct way to perform login and pull image from ECR repository and dockerpy?

This was happen due to - https://github.com/docker/docker-py/issues/2157
Deleting ~/.docker/config.json fixed the issue.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js