Does Google CloudML Serving allow connecting to NoSQL DBs e.g. Datastore? - google-cloud-ml

We have a model currently serving on Cloud ML. As a modification we added connections to datastore, which return 403, Insufficient privileges.
The mock code generating the error is:
from google.cloud import datastore
import datetime
# create & upload task
client = datastore.Client()
key = client.key('Task')
task = datastore.Entity(
key, exclude_from_indexes=['description'])
task.update({
'created': datetime.datetime.utcnow(),
'description': 'description',
'done': False
})
client.put(task)
# now list tasks
query = client.query(kind='Task')
query.order = ['created']
return list(query.fetch())
The next step would be adding credentials (service account) and exporting new path to GOOGLE_APPLICATION_DEFAULT parameter. However, since getting this account is difficult (company layering), I'd like to save time by asking the question.
Is the only way of communication with a NoSQL DB via Cloud Functions? Is that the common approach?

When you create your Model, you need custom prediction, and define a service account that has access to your resources.
gcloud components install beta
gcloud beta ai-platform versions create your-version-name \
--service-account your-service-account-name#your-project-id.iam.gserviceaccount.com
...

Related

Try deploying a custom ML model with an endpoint by using a custom image on Google Cloud's Vertex AI

I have been banging my head around this for a while and Google Cloud does not have a lot of documentation about this issue. What I am trying to do is deploy a custom ML model on Google Cloud Vertex by:
Uploading the model onto the Model Registry in Vertex AI
Create an endpoint
Deploying the uploaded model on the created endpoint.
Steps 1 and 2 are easy to implement, and I am not facing and issues. However step 3 is always failing for some reason. Even the logs don't give me a lot of information.
For Step 1:
This is Dockerfile I am using to create a custom image to serve my ML model:
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim
COPY requirements-base.txt requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
COPY serve.py serve.py
COPY model.pkl model.pkl
And this is what my serve.py file looks like:
from fastapi import Request, FastAPI, Response
import json
import catboost
import pickle
import os
app = FastAPI(title="Sentiment Analysis")
AIP_HEALTH_ROUTE = os.environ.get('AIP_HEALTH_ROUTE', '/health')
AIP_PREDICT_ROUTE = os.environ.get('AIP_PREDICT_ROUTE', '/predict')
#app.get(AIP_HEALTH_ROUTE, status_code=200)
async def health():
return {'health': 'ok'}
#app.post(AIP_PREDICT_ROUTE)
async def predict(request: Request):
with open('model.pkl', 'rb') as file:
model = pickle.load(file)
data = request.get_json()
input_data = data['input']
predictions = model.predict(input_data)
return json.dumps({'predictions': predictions.tolist()})
if __name__ == '__main__':
app.run(debug = True, host="0.0.0.0", port=8080)
After building the image, I push it to artifact registry on Google Cloud.
Is there an issue with how I have written the serve.py file or Dockerfile?
Or is there an easier way to deploy custom ML models on Google Cloud for MLOps and prediction purposes.
Well I tried a couple of manual approaches from the Google Cloud Vertex AI and also using gcloud commands.
In the manual process, after importing the model with the custom image I clicked on deploy to an end point. But this seems to always fail and takes forever.
Similarly using gcloud, I first create endpoint, then upload my model on to the registry, and the upload the model on the endpoint created. But this approach also fails.
At the end of the day I want my model to be successfully deployed on the endpoint and should give the right answers for predictions. Or, somehow host my custom ML model on Google Cloud and make predictions with it in a reasonable and manageable way!

Creating REST API in GCP to read data from BigQuery

Very new in Google Cloud Platform & hence asking basic question.
I am looking for an API which will be hosted in GCP. An External application will call the API to read data from BigQuery.
Can anyone help me out with any example Code/Approach?
Looking for an End-to-End cloud based solution based on Python
I can't provide you with a complete code example. But:
You can setup your python API using (Flask for example)
You can then use the python client to connect to BigQuery https://cloud.google.com/bigquery/docs/reference/libraries
Deploy your python API in Google App Engine, Cloud Run, Kubernetes, Compute, etc....
Do not forget to setup CORS and potential auth,
That's it
You can create a Python program using the Bigquery client, then deploy this program as a HTTP Cloud Function or Cloud Run service :
from flask import escape
from google.cloud import bigquery
import functions_framework
#functions_framework.http
def your_http_function(request):
#HTTP Cloud Function.
request_json = request.get_json(silent=True)
request_args = request.args
# example to retrieve argument param in the HTTP call
if request_json and 'name' in request_json:
name = request_json['name']
elif request_args and 'name' in request_args:
name = request_args['name']
# Construct a BigQuery client object.
client = bigquery.Client()
query = """
SELECT name, SUM(number) as total_people
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
GROUP BY name, state
ORDER BY total_people DESC
LIMIT 20
"""
query_job = client.query(query) # Make an API request.
rows = query_job.result() # Waits for query to finish
for row in rows:
print(row.name)
return rows
You have to deploy your Python code as a Cloud Function in this example
Your function can be invoked with a HTTP call with a param name :
https://GCP_REGION-PROJECT_ID.cloudfunctions.net/hello_http?name=NAME
You can also use Cloud Run that gives more flexibility because you deploy a Docker image.

'403 Permission denied while getting Drive credentials' when using Deployment Manager to create an 'external table' in BigQuery

Steps to reproduce:
Create sheet in Google Sheets
Enable Deployment Manager & Google Drive API in Google Cloud Platform
add deployment manager service-account with view permissions on sheet
Create dataset with deployment manager
Create table with deployment manager, reference external sheet in sourceUris
partial python template:
def GenerateConfig(context):
name: str = context.env['name']
dataset: str = context.properties['dataset']
tables: [] = context.properties['tables']
location: str = context.properties.get('location', 'EU')
resources = [{
'name': name,
'type': 'gcp-types/bigquery-v2:datasets',
'properties': {
'datasetReference': {
'datasetId': dataset,
},
'location': location
},
}]
for t in tables:
resources.append({
'name': '{}-tbl'.format(t["name"]),
'type': 'gcp-types/bigquery-v2:tables',
'properties': {
'datasetId': dataset,
'tableReference': {
'tableId': t["name"]
},
'externalDataConfiguration': {
'sourceUris': ['https://docs.google.com/spreadsheets/d/123123123123123-123123123123/edit?usp=sharing'],
'sourceFormat': 'GOOGLE_SHEETS',
'autodetect': True,
'googleSheetsOptions':
{
"skipLeadingRows": '1',
}
}
},
})
return {'resources': resources}
I've found a few leads such as this, but they all reference using 'scopes' to add https://www.googleapis.com/auth/drive.
I'm not sure of how to add scopes to a deployment manager request, or really how scopes work.
Any help would be appreciated.
Yes, using scopes solves the problem. However, even after adding the scopes, I was facing the same error. Sharing the google sheets document with the GCP service account helped me get rid of this error.
To summarize - use scopes and share the document with the GCP service account that you will use for querying the table.
Also, this document is helpful for querying external tables
I was having the same issue when running Airflow DAGs on Cloud Composer, which is the managed Airflow service on Google Cloud Platform.
Essentially you need to:
Share the file with the email of the service account (give Viewer or Editor permissions based on what the DAG is supposed to execute)
Enable Google Drive OAuth Scopes
Depending on the Cloud Composer version you are using, the second step should be executed in a slightly different way:
For Cloud Composer 1
You should add the Google Drive OAuth Scope through the User Interface:
"https://www.googleapis.com/auth/drive"
Alternatively, if you are using Infrastructure as a Code (e.g. Terraform), you can specify oauth_scopes as shown below:
config {
...
node_config {
...
oauth_scopes = [
"https://www.googleapis.com/auth/drive",
]
}
}
For Cloud Composer 2
Since Cloud Composer v2 uses GKE Autopilot, it does not support OAuth on the environment level. You can however specify the scope at the connection level, that is being used by your Airflow Operator in order to initiate the connection.
If you are using the default GCP connection (i.e. google_cloud_default which is automatically created upon deployment of the Cloud Composer instance), then all you need to do is specify Google Drive ("https://www.googleapis.com/auth/drive") in the scopes of the connection (through Airflow Connections UI).
Alternatively, you can even create your new connection and once again specify the Google Drive in the scopes field and then pass the name of this connection in the gcp_conn_id argument of your Operator.

How can I grant individual permissions in Google Cloud Platform for BigQuery users using python

I need to set up very fine-grained access control for user accounts in GCP using a python script
I know that via UI/gcloud util I can give it role roles/big query. user, but it has a lot of other permissions I don't want this service account to have.
How can I grant individual permissions via python scripts?
Go to your BigQuery console, click into the arrow at the right of one dataset and then click into Share dataset
And then add the e-mail of the user here:
You can choose one of 3 roles available: Viewer/Owner/Editor.
Do this in every dataset to every user.
Update to do it via Python script
You can do it with a Python script following this small tutorial.
The code will be something like:
from google.cloud import bigquery
client = bigquery.Client()
dataset = client.get_dataset(client.dataset('dataset1'))
entry = bigquery.AccessEntry(
role='READER',
entity_type='userByEmail',
entity_id='user1#example.com')
assert entry not in dataset.access_entries
entries = list(dataset.access_entries)
entries.append(entry)
dataset.access_entries = entries
dataset = client.update_dataset(dataset, ['access_entries']) # API request
#assert entry in dataset.access_entries

Google Cloud Composer and Google Cloud SQL

What ways do we have available to connect to a Google Cloud SQL (MySQL) instance from the newly introduced Google Cloud Composer? The intention is to get data from a Cloud SQL instance into BigQuery (perhaps with an intermediary step through Cloud Storage).
Can the Cloud SQL proxy be exposed in some way on pods part the Kubernetes cluster hosting Composer?
If not can the Cloud SQL Proxy be brought in by using the Kubernetes Service Broker? -> https://cloud.google.com/kubernetes-engine/docs/concepts/add-on/service-broker
Should Airflow be used to schedule and call GCP API commands like 1) export mysql table to cloud storage 2) read mysql export into bigquery?
Perhaps there are other methods that I am missing to get this done
"The Cloud SQL Proxy provides secure access to your Cloud SQL Second Generation instances without having to whitelist IP addresses or configure SSL." -Google CloudSQL-Proxy Docs
CloudSQL Proxy seems to be the recommended way to connect to CloudSQL above all others. So in Composer, as of release 1.6.1, we can create a new Kubernetes Pod to run the gcr.io/cloudsql-docker/gce-proxy:latest image, expose it through a service, then create a Connection in Composer to use in the operator.
To get set up:
Follow Google's documentation
Test the connection using info from Arik's Medium Post
Check that the pod was created kubectl get pods --all-namespaces
Check that the service was created kubectl get services --all-namespaces
Jump into a worker node kubectl --namespace=composer-1-6-1-airflow-1-10-1-<some-uid> exec -it airflow-worker-<some-uid> bash
Test mysql connection mysql -u composer -p --host <service-name>.default.svc.cluster.local
Notes:
Composer now uses namespaces to organize pods
Pods in different namespaces don't talk to each other unless you give them the full path <k8-service-name>.<k8-namespace-name>.svc.cluster.local
Creating a new Composer Connection with the full path will enable successful connection
We had the same problem but with a Postgres instance. This is what we did, and got it to work:
create a sqlproxy deployment in the Kubernetes cluster where airflow runs. This was a copy of the existing airflow-sqlproxy used by the default airflow_db connection with the following changes to the deployment file:
replace all instances of airflow-sqlproxy with the new proxy name
edit under 'spec: template: spec: containers: command: -instances', replace the existing instance name with the new instance we want to connect to
create a kubernetes service, again as a copy of the existing airflow-sqlproxy-service with the following changes:
replace all instances of airflow-sqlproxy with the new proxy name
under 'spec: ports', change to the appropriate port (we used 5432 for a Postgres instance)
in the airflow UI, add a connection of type Postgres with host set to the newly created service name.
You can follow these instructions to launch a new Cloud SQL proxy instance in the cluster.
re #3: That sounds like a good plan. There isn't a Cloud SQL to BigQuery operator to my knowledge, so you'd have to do it in two phases like you described.
Adding the medium post in the comments from #Leo to the top level https://medium.com/#ariklevliber/connecting-to-gcp-composer-tasks-to-cloud-sql-7566350c5f53 . Once you follow that article and have the service setup you can connect from your DAG using SQLAlchemy like this:
import os
from datetime import datetime, timedelta
import logging
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
logger = logging.getLogger(os.path.basename(__file__))
INSTANCE_CONNECTION_NAME = "phil-new:us-east1:phil-db"
default_args = {
'start_date': datetime(2019, 7, 16)
}
def connect_to_cloud_sql():
'''
Create a connection to CloudSQL
:return:
'''
import sqlalchemy
try:
PROXY_DB_URL = "mysql+pymysql://<user>:<password>#<cluster_ip>:3306/<dbname>"
logger.info("DB URL", PROXY_DB_URL)
engine = sqlalchemy.create_engine(PROXY_DB_URL, echo=True)
for result in engine.execute("SELECT NOW() as now"):
logger.info(dict(result))
except Exception:
logger.exception("Unable to interact with CloudSQL")
dag = DAG(
dag_id="example_sqlalchemy",
default_args=default_args,
# schedule_interval=timedelta(minutes=5),
catchup=False # If you don't set this then the dag will run according to start date
)
t1 = PythonOperator(
task_id="example_sqlalchemy",
python_callable=connect_to_cloud_sql,
dag=dag
)
if __name__ == "__main__":
connect_to_cloud_sql()
Here, in Hoffa's answer to a similar question, you can find a reference on how Wepay keeps it synchronized every 15 minutes using an Airflow operator.
From said answer:
Take a look at how WePay does this:
https://wecode.wepay.com/posts/bigquery-wepay
The MySQL to GCS operator executes a SELECT query against a MySQL
table. The SELECT pulls all data greater than (or equal to) the last
high watermark. The high watermark is either the primary key of the
table (if the table is append-only), or a modification timestamp
column (if the table receives updates). Again, the SELECT statement
also goes back a bit in time (or rows) to catch potentially dropped
rows from the last query (due to the issues mentioned above).
With Airflow they manage to keep BigQuery synchronized to their MySQL
database every 15 minutes.
Now we can connect to Cloud SQL without creating a cloud proxy ourselves. The operator will create it automatically. The code look like this:
from airflow.models import DAG
from airflow.contrib.operators.gcp_sql_operator import CloudSqlInstanceExportOperator
export_body = {
'exportContext': {
'fileType': 'CSV',
'uri': EXPORT_URI,
'databases': [DB_NAME],
'csvExportOptions': {
'selectQuery': SQL
}
}
}
default_dag_args = {}
with DAG(
'postgres_test',
schedule_interval='#once',
default_args=default_dag_args) as dag:
sql_export_task = CloudSqlInstanceExportOperator(
project_id=GCP_PROJECT_ID,
body=export_body,
instance=INSTANCE_NAME,
task_id='sql_export_task'
)