Airflow - can't authenticate when opening spreadsheet - google-cloud-platform

I'm trying to connect to a spreadsheet, and read the cells with Airflow so I can eventually add this to a database. For now I've created a Google Sheets document, and have authenticated via the Google Client in my browser, however when I run the code it results in the following error: ""User must be authenticated when user project is provided".
Details: "\[{'#type': 'type.googleapis.com/google.rpc.ErrorInfo',
'reason': 'USER_PROJECT_DENIED',
'domain': 'googleapis.com',
'metadata': {'service': 'sheets.googleapis.com',
'consumer': 'projects/User-2023-01-19'}}\]"\>
Here's the code I'm running after running and authenticating via " gcloud auth application-default login"
from airflow.providers.google.suite.hooks.sheets import GSheetsHook
from airflow.operators.python import PythonOperator
def get_data_from_spreadsheet():
hook = GSheetsHook(
)
spreadsheet = hook.get_values('-Sheets key-','B3' )
get_data_from_spreadsheet()
I've tried running the code block above, expecting to get the data posted in the sheet, and instead get an error.

Related

Pytest on Flask based API - test by calling the remote API

New to using Pytest on APIs. From my understanding, testing creates another instance of Flask. Additionally, from the tutorials I have seen, they also suggest to create a separate DB table instance to add, fetch and remove data for test purposes. However, I simply plan to use the remote api URL as host to simply make the call.
Now, I set my conftest like this, where the flag --testenv would indicate to make the get/post call on the host listed below:
import pytest
import subprocess
def pytest_addoption(parser):
"""Add option to pass --testenv=api_server to pytest cli command"""
parser.addoption(
"--testenv", action="store", default="exodemo", help="my option: type1 or type2"
)
#pytest.fixture(scope="module")
def testenv(request):
return request.config.getoption("--testenv")
#pytest.fixture(scope="module")
def testurl(testenv):
if testenv == 'api_server':
return 'http://api_url:5000/'
else:
return 'http://locahost:5000'
And my test file is written like this:
import pytest
from app import app
from flask import request
def test_nodes(app):
t_client = app.test_client()
truth = [
{
*body*
}
]
res = t_client.get('/topology/nodes')
print (res)
assert res.status_code == 200
assert truth == json.loads(res.get_data)
I run the code using this:
python3 -m pytest --testenv api_server
The thing I expect is that the test file would simply make a call to the remote api with the creds, fetch the data regardless of how it gets pulled in the remote code, and bring it here for assertion. However, I am getting the 400 BAD REQUEST error, with the error being like this:
assert 400 == 200
E + where 400 = <WrapperTestResponse streamed [400 BAD REQUEST]>.status_code
single_test.py:97: AssertionError
--------------------- Captured stdout call ----------------------
{"timestamp": "2022-07-28 22:11:14,032", "level": "ERROR", "func": "connect_to_mysql_db", "line": 23, "message": "Error connecting to the mysql database (2003, \"Can't connect to MySQL server on 'mysql' ([Errno -3] Temporary failure in name resolution)\")"}
<WrapperTestResponse streamed [400 BAD REQUEST]>
Does this mean that the test file is still trying to lookup the database locally for fetching? I am unable to figure out on which host are they sending the test url as well, so I am kind of stuck here. Looking to get some help around here.
Thanks.

Airflow PostgresToGoogleCloudStorageOperator authentication error

Is there any documentation regarding how to authenticate with google cloud storage through an airflow dag?
I am trying to use airflow's PostgresToGoogleCloudStorageOperator to upload the results from a query to google cloud storage. However I am getting the below error.
{gcp_api_base_hook.py:146} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
{taskinstance.py:1150} ERROR - 403 POST https://storage.googleapis.com/upload/storage/v1/b/my_test_bucket123/o?uploadType=multipart: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
I have configured "gcp_conn_id" in the airflow UI and provided the json from my service account key but still getting this error.
Below is the entire dag file
from datetime import datetime
from datetime import timedelta
from airflow import DAG
from airflow.contrib.operators.postgres_to_gcs_operator import PostgresToGoogleCloudStorageOperator
DESTINATION_BUCKET = 'test'
DESTINATION_DIRECTORY = "uploads"
dag_params = {
'dag_id': 'upload_test',
'start_date': datetime(2020, 7, 7),
'schedule_interval': '#once',
}
with DAG(**dag_params) as dag:
upload = PostgresToGoogleCloudStorageOperator(
task_id="upload",
bucket=DESTINATION_BUCKET,
filename=DESTINATION_DIRECTORY + "/{{ execution_date }}" + "/{}.json",
sql='''SELECT * FROM schema.table;''',
postgres_conn_id="postgres_default",
gcp_conn_id="gcp_default"
)
Sure. There is this very nice documentation with examples for Google Connection:
https://airflow.apache.org/docs/apache-airflow-providers-google/stable/connections/gcp.html
Also GCP has this very nice overview of all the authentication methods you can use:
https://cloud.google.com/endpoints/docs/openapi/authentication-method

Dialogflow: Agent metadata not found for agentId

I'm trying to use Dialogflow's detect_intent in Python and I keep getting:
404 com.google.apps.framework.request.NotFoundException: Agent metadata not found for agentId: ####-####-####-####-####
Here's a snippet of my code:
import google.cloud.dialogflow as dialogflow
from CONFIG import DIALOGFLOW_PROJECT_ID
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = 'credentials/dialogflow.json'
def predict_intent(text, language):
session_client = dialogflow.SessionsClient()
session = session_client.session_path(DIALOGFLOW_PROJECT_ID, SESSION_ID)
text_input = dialogflow.TextInput(text=text, language_code=language)
query_input = dialogflow.QueryInput(text=text_input)
response = session_client.detect_intent(session=session, query_input=query_input) # ERROR
return response.query_result.intent.display_name
I tried running the function multiple times and some of them succeed, but most fall in the exception.
I can train the bot using the same interface and it works fine.
I'm using Python 3.7 and the following Google Cloud modules: google-api-core==2.0.1, google-auth==2.0.2, google-cloud-dialogflow==2.7.1, googleapis-common-protos==1.53.0.

GCP/Python: Capturing actual error in subprocess.popen() while csv import from Hive to CloudSQL

I have python 3.6.8 on GNU/Linux 3.10 on GCP and I'm trying to load data from Hive to CloudSQL.
gc_cmd_import_csv_p1 = subprocess.Popen(['gcloud', 'sql', 'import', 'csv',
'{}'.format(quote(cloudsql_instance)),
'{}'.format(quote(load_csv_files)),
'--database={}'.format(quote(cloudsql_db)),
'--table={}'.format(quote(cloudsql_table_name)),
'--user={}'.format(quote(db_user_name)),
'--quiet'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True)
import_cmd_op, import_cmd_error = gc_cmd_import_csv_p1.communicate()
import_cmd_return_code = gc_cmd_import_csv_p1.returncode
if import_cmd_return_code:
print("""[ERROR] Unable to import data from Hive to CloudSQL.
Error description: {}
Error Code(s): {}
Issue file name: {}
""".format(import_cmd_error, import_cmd_return_code, load_csv_files))
sys.exit(9)
print("[INFO] Data Import completed from HIVE to CloudSQL.")
In case of any error above, I'm getting message like:
Error description: ERROR: (gcloud.sql.import.csv) HTTPError 403: The client is not authorized to make this request.Error Code(s): 1
But when I actually run the same import command directly as shown below:
gcloud sql import csv test-cloud-sql-instance gs://test-server-12345/app1/data/lookup_table/000000_0 --database=test_db --table=name_lookup --user=test_user --quiet
I'm getting the actual error like below:
ERROR: (gcloud.sql.import.csv) [ERROR_RDBMS] ERROR: extra data after last expected column CONTEXT: COPY name_lookup, line 16902:
I want this message
( Extra data after last expected column... line 16902:)
to be shown in python script instead of
HTTPError 403:
error. How to capture that?
Please note: There is no authentication issue as suggested by HTTP Error.
So after long discussion with GCP Admin, we have found the issue.
We tried to execute the same import command using os.system() and then again we got the HTTP error. Admin then revisited the GCP IAM documentation and created a role for P-SQL user. Issue is resolved now.

How to set credentials to use Gmail API from GCE VM Command Line?

I'm trying to enable my Linux VM on GCE to access my Gmail account in order to send emails.
I've found this article, https://developers.google.com/gmail/api/quickstart/python, which reads some Gmail account information (it's useful since I just want to test my connection).
from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
def main():
"""Shows basic usage of the Gmail API.
Lists the user's Gmail labels.
"""
creds = None
# The file token.pickle stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
service = build('gmail', 'v1', credentials=creds)
# Call the Gmail API
results = service.users().labels().list(userId='me').execute()
labels = results.get('labels', [])
if not labels:
print('No labels found.')
else:
print('Labels:')
for label in labels:
print(label['name'])
if __name__ == '__main__':
main()
However, I'm not sure which Credentials I need to set, as when I set and use:
Service Account: I receive the following message ValueError: Client secrets must be for a web or installed app.
OAuth client ID for Web Application type: the code runs well, however I receive the following message when trying to first authorize the application's access:
Erro 400: redirect_uri_mismatch
The redirect URI in the request, http://localhost:60443/, does not match the ones authorized for the OAuth client. To update the authorized redirect URIs, visit: https://console.developers.google.com/apis/credentials/oauthclient/${your_client_id}?project=${your_project_number}
OAuth client ID for Desktop type: the code runs well, however I receive the following message when trying to first authorize the application's access:
localhost connection has been refused
Does anyone know how is the correct setup and if I'm missing anything?
Thanks
[Nov17th]
After adding the gmail scope to my VM's scopes I ran the python script and I got the following error:
Traceback (most recent call last):
File "quickstart2.py", line 29, in <module>
main()
File "quickstart2.py", line 18, in main
results = service.users().labels().list(userId="107628233971924038182").execute()
File "/home/lucasnunesfe9/.local/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/lucasnunesfe9/.local/lib/python3.7/site-packages/googleapiclient/http.py", line 915, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://gmail.googleapis.com/gmail/v1/users/107628233971924038182/labels?alt=json returned "Precondition check failed.">
I checked the error HTTP link and it shows:
{
"error": {
"code": 401,
"message": "Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.",
"errors": [
{
"message": "Login Required.",
"domain": "global",
"reason": "required",
"location": "Authorization",
"locationType": "header"
}
],
"status": "UNAUTHENTICATED"
}
}
Is any manual procedure needed in a "first authorization step"?
PS: reinforcing that I have already enabled my Service Account to "G Suite Domain-wide Delegation". This action generated an OAuth 2.0 Client ID, which is being used in the python script (variable userId).
Personally, I never understood this example. I think it's too old (and even compliant Python 2.6!!)
Anyway, you can forget the first part and get a credential like this
from googleapiclient.discovery import build
import google.auth
# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
def main():
creds, project_id = google.auth.default(scopes=SCOPES)
service = build('gmail', 'v1', credentials=creds)
# Call the Gmail API
results = service.users().labels().list(userId='me').execute()
labels = results.get('labels', [])
if not labels:
print('No labels found.')
else:
print('Labels:')
for label in labels:
print(label['name'])
if __name__ == '__main__':
main()
However, because you will use a service account to access to your GMAIL account, you have to change the userId with the ID that you want and to grand access from the GSuite admin console to the service account to have access to GMAIL API.
To achieve this, you need to grant the scope of your VM service account. Stop the VM, run this (long) command, and start your VM again. The command:
Takes the current scopes of your VM (and clean/format them)
Add the gmail scope
Set the scope to the VM (it's a BETA command)
gcloud beta compute instances set-scopes <YOUR_VM_NAME> --zone <YOUR_VM_ZONE> \
--scopes=https://www.googleapis.com/auth/gmail.readonly,\
$(gcloud compute instances describe <YOUR_VM_NAME> --zone <YOUR_VM_ZONE> \
--format json | jq -c ".serviceAccounts[0].scopes" | sed -E "s/\[(.*)\]/\1/g" | sed -E "s/\"//g")