How to connect Janusgraph deployed in GCP with Python runtime? - google-cloud-platform

I have deployed the Janusgraph using Helm in Google cloud Containers, following the below documentation:
https://cloud.google.com/architecture/running-janusgraph-with-bigtable,
I'm able to fire the gremline query using Google Cloud Shell.
Snapshot of GoogleCLoud Shell
Now I want to access the Janusgraph using Python, I tried below line of code but it's unable to connect to Janusgraph inside GCP container.
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
graph = Graph()
g = graph.traversal().withRemote(DriverRemoteConnection('gs://127.0.0.1:8182/gremlin','g'))
value = g.V().has('name','hercules').values('age')
print(value)
here's the output I'm getting
[['V'], ['has', 'name', 'hercules'], ['values', 'age']]
Whereas the output should be -
30
Is there someone tried to access Janusgraph using Python inside GCP.

You need to end the query with a terminal step such as next or toList. What you are seeing is the query bytecode printed as the query was never submitted to the server due to the missing terminal step. So you need something like this:
value = g.V().has('name','hercules').values('age').next()
print(value)

Related

Creating REST API in GCP to read data from BigQuery

Very new in Google Cloud Platform & hence asking basic question.
I am looking for an API which will be hosted in GCP. An External application will call the API to read data from BigQuery.
Can anyone help me out with any example Code/Approach?
Looking for an End-to-End cloud based solution based on Python
I can't provide you with a complete code example. But:
You can setup your python API using (Flask for example)
You can then use the python client to connect to BigQuery https://cloud.google.com/bigquery/docs/reference/libraries
Deploy your python API in Google App Engine, Cloud Run, Kubernetes, Compute, etc....
Do not forget to setup CORS and potential auth,
That's it
You can create a Python program using the Bigquery client, then deploy this program as a HTTP Cloud Function or Cloud Run service :
from flask import escape
from google.cloud import bigquery
import functions_framework
#functions_framework.http
def your_http_function(request):
#HTTP Cloud Function.
request_json = request.get_json(silent=True)
request_args = request.args
# example to retrieve argument param in the HTTP call
if request_json and 'name' in request_json:
name = request_json['name']
elif request_args and 'name' in request_args:
name = request_args['name']
# Construct a BigQuery client object.
client = bigquery.Client()
query = """
SELECT name, SUM(number) as total_people
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE state = 'TX'
GROUP BY name, state
ORDER BY total_people DESC
LIMIT 20
"""
query_job = client.query(query) # Make an API request.
rows = query_job.result() # Waits for query to finish
for row in rows:
print(row.name)
return rows
You have to deploy your Python code as a Cloud Function in this example
Your function can be invoked with a HTTP call with a param name :
https://GCP_REGION-PROJECT_ID.cloudfunctions.net/hello_http?name=NAME
You can also use Cloud Run that gives more flexibility because you deploy a Docker image.

Google cloud functions + google cloud sql + google cloud scheduler timeout

I am trying to deploy a python code with google cloud funtions and scheduler that writes a simple table to a google postgresql cloud database.
I created a postgre sql database
Added a Cloud SQL client role to the App Engine default service account
Created a Cloud Pub/Sup topic
Created a scheduler job.
So far so good. Then I created the following function main.py :
from sqlalchemy import create_engine
import pandas as pd
from google.cloud.sql.connector import Connector
import sqlalchemy
connector = Connector()
def getconn():
conn = connector.connect(
INSTANCE_CONNECTION_NAME,
"pg8000",
user=DB_USER,
password=DB_PASS,
db=DB_NAME
)
return conn
pool = sqlalchemy.create_engine(
"postgresql+pg8000://",
creator=getconn,
)
def testdf(event,context):
df=pd.DataFrame({"a" : [1,2,3,4,5],
"b" : [1,2,3,4,5]})
df.to_sql("test",
con = pool,
if_exists ="replace",
schema = 'myschema')
And the requirements.txt contains:
pandas
sqlalchemy
pg8000
cloud-sql-python-connector[pg8000]
When I test the function it always timeout. No error, just these logs:
I cant figure it out why. I have tried several code snippets from:
https://colab.research.google.com/github/GoogleCloudPlatform/cloud-sql-python-connector/blob/main/samples/notebooks/postgres_python_connector.ipynb#scrollTo=UzHaM-6TXO8h
and from
https://codelabs.developers.google.com/codelabs/connecting-to-cloud-sql-with-cloud-functions#2
I think the adjustments of the permissions and roles cause the timeout. Any ideas?
Thx

How to publish an exported Google AutoML machine as an API?

I have built a machine learning model using Google's AutoML Tables interface. Once the model was trained, I exported it in a docker container to my local machine by following the steps detailed on Google's official documentation page: https://cloud.google.com/automl-tables/docs/model-export. Now, on my machine, it exists inside a Docker container, and I am able to run it successfully using the following command:
docker run -v exported_model:/models/default/0000001 -p 8080:8080 -it gcr.io/cloud-automl-tables-public/model_server
Once running as a local host via Docker, I am able to make predictions using the following python code:
import requests
import json
vector = [1, 1, 1, 1, 1, 2, 1]
input = {"instances": [{"column_1": vector[0],
"column_2": vector[1],
"column_3": vector[2],
"column_4": vector[3],
"column_5": vector[4],
"column_6": vector[5],
"column_7": vector[6]}]}
jsonData = json.dumps(input)
response = requests.post("http://localhost:8080/predict", jsonData)
print(response.json())
I need to publish this model as an API to be used by my client. I have considered AWS EC2 and Azure functions, however, I have not had any success so far. Ideally, I plan to use the FastAPI interface, but do not know how to do this in a dockerized context.
I have since solved the problem, however, it comes at a cost. It is possible to deploy an AutoML model from GCP, but it will incur a small charge over time. I don't think there is a way around this, otherwise Google would loose revenue.
Once the model is up and running, the following Python code can be used to make predictions:
from google.cloud import automl_v1beta1 as automl
project_id = 'chatbot-286a1'
compute_region = 'us-central1'
model_display_name = 'DNALC_4p_global_20201112093636'
inputs = {'sessionDuration': 60.0, 'createdStartDifference': 1206116.042162, 'confirmedStartDifference': 1206116.20255, 'createdConfirmedDifference': -0.160388}
client = automl.TablesClient(project=project_id, region=compute_region)
feature_importance = False
if feature_importance:
response = client.predict(
model_display_name=model_display_name,
inputs=inputs,
feature_importance=True,
)
else:
response = client.predict(
model_display_name=model_display_name, inputs=inputs
)
print("Prediction results:")
for result in response.payload:
print(
"Predicted class name: {}".format(result.tables.value)
)
print("Predicted class score: {}".format(result.tables.score))
break
In order for the code to work, the following resources may be helpful:
Installing AutoML in python:
How to install google.cloud automl_v1beta1 for python using anaconda?
Authenticating AutoML in python:
https://cloud.google.com/docs/authentication/production
(Remember to put the json file path to the authentication token as an environment variable - this is for security purposes.)

import JSON file to google firestore using cloud functions

I'm new to GCP and Python. I have got a task to import JSON file into google firestore using google cloud functions via Python.
Kindly assist please.
I could achieve this system setup using below code. Posting for your reference:-
CLOUD FUNCTIONS CODE
REQUIREMENTS.TXT (Dependencies)
`google-api-python-client==1.7.11
google-auth==1.6.3
google-auth-httplib2==0.0.3
google-cloud-storage==1.19.1
google-cloud-firestore==1.6.2`
MAIN.PY
from google.cloud import storage
from google.cloud import firestore
import json
client = storage.Client()``
def hello_gcs_generic(data, context):
print('Bucket: {}'.format(data['bucket']))
print('File: {}'.format(data['name']))
bucketValue = data['bucket']
filename = data['name']
print('bucketValue : ',bucketValue)
print('filename : ',filename)
testFile = client.get_bucket(data['bucket']).blob(data['name'])
dataValue = json.loads(testFile.download_as_string(client=None))
print(dataValue)
db = firestore.Client()
doc_ref = db.collection(u'collectionName').document(u'documentName')
doc_ref.set(dataValue)
Cloud functinos are server less functions provided by google. The beauty of cloud function is it will destroy it will invoke any trigger happens and itself once the execution is complete. Cloud functions are single purpose functions. Not only python, you can also use NodeJS and Go to write cloud functions. You can create a cloud function very easily by visiting quick start of cloud functions (https://cloud.google.com/functions/docs/quickstart-console).
Your task is to import a JSON file into google firestore. This part you can do using Firestore python connector like any normal python program and add into cloud function console or upload via gcloud. Still the trigger part is missing here. As I mentioned cloud function is serverless. It will execute when any event happens in the attached trigger. You haven't mentioned any trigger here (when you want to trigger the function). Once you give information about the trigger I can give more insights on resolution.
Ishank Aggarwal, You can add the above code snippet as a part of cloud function by following steps:
https://console.cloud.google.com/functions/
Create function using function name, yours requirements, choose run time as python and choose trigger as your gcs bucket.
Once you create it, if any change happens in your bucket, the function will trigger and execute your code

Google Cloud Composer and Google Cloud SQL

What ways do we have available to connect to a Google Cloud SQL (MySQL) instance from the newly introduced Google Cloud Composer? The intention is to get data from a Cloud SQL instance into BigQuery (perhaps with an intermediary step through Cloud Storage).
Can the Cloud SQL proxy be exposed in some way on pods part the Kubernetes cluster hosting Composer?
If not can the Cloud SQL Proxy be brought in by using the Kubernetes Service Broker? -> https://cloud.google.com/kubernetes-engine/docs/concepts/add-on/service-broker
Should Airflow be used to schedule and call GCP API commands like 1) export mysql table to cloud storage 2) read mysql export into bigquery?
Perhaps there are other methods that I am missing to get this done
"The Cloud SQL Proxy provides secure access to your Cloud SQL Second Generation instances without having to whitelist IP addresses or configure SSL." -Google CloudSQL-Proxy Docs
CloudSQL Proxy seems to be the recommended way to connect to CloudSQL above all others. So in Composer, as of release 1.6.1, we can create a new Kubernetes Pod to run the gcr.io/cloudsql-docker/gce-proxy:latest image, expose it through a service, then create a Connection in Composer to use in the operator.
To get set up:
Follow Google's documentation
Test the connection using info from Arik's Medium Post
Check that the pod was created kubectl get pods --all-namespaces
Check that the service was created kubectl get services --all-namespaces
Jump into a worker node kubectl --namespace=composer-1-6-1-airflow-1-10-1-<some-uid> exec -it airflow-worker-<some-uid> bash
Test mysql connection mysql -u composer -p --host <service-name>.default.svc.cluster.local
Notes:
Composer now uses namespaces to organize pods
Pods in different namespaces don't talk to each other unless you give them the full path <k8-service-name>.<k8-namespace-name>.svc.cluster.local
Creating a new Composer Connection with the full path will enable successful connection
We had the same problem but with a Postgres instance. This is what we did, and got it to work:
create a sqlproxy deployment in the Kubernetes cluster where airflow runs. This was a copy of the existing airflow-sqlproxy used by the default airflow_db connection with the following changes to the deployment file:
replace all instances of airflow-sqlproxy with the new proxy name
edit under 'spec: template: spec: containers: command: -instances', replace the existing instance name with the new instance we want to connect to
create a kubernetes service, again as a copy of the existing airflow-sqlproxy-service with the following changes:
replace all instances of airflow-sqlproxy with the new proxy name
under 'spec: ports', change to the appropriate port (we used 5432 for a Postgres instance)
in the airflow UI, add a connection of type Postgres with host set to the newly created service name.
You can follow these instructions to launch a new Cloud SQL proxy instance in the cluster.
re #3: That sounds like a good plan. There isn't a Cloud SQL to BigQuery operator to my knowledge, so you'd have to do it in two phases like you described.
Adding the medium post in the comments from #Leo to the top level https://medium.com/#ariklevliber/connecting-to-gcp-composer-tasks-to-cloud-sql-7566350c5f53 . Once you follow that article and have the service setup you can connect from your DAG using SQLAlchemy like this:
import os
from datetime import datetime, timedelta
import logging
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
logger = logging.getLogger(os.path.basename(__file__))
INSTANCE_CONNECTION_NAME = "phil-new:us-east1:phil-db"
default_args = {
'start_date': datetime(2019, 7, 16)
}
def connect_to_cloud_sql():
'''
Create a connection to CloudSQL
:return:
'''
import sqlalchemy
try:
PROXY_DB_URL = "mysql+pymysql://<user>:<password>#<cluster_ip>:3306/<dbname>"
logger.info("DB URL", PROXY_DB_URL)
engine = sqlalchemy.create_engine(PROXY_DB_URL, echo=True)
for result in engine.execute("SELECT NOW() as now"):
logger.info(dict(result))
except Exception:
logger.exception("Unable to interact with CloudSQL")
dag = DAG(
dag_id="example_sqlalchemy",
default_args=default_args,
# schedule_interval=timedelta(minutes=5),
catchup=False # If you don't set this then the dag will run according to start date
)
t1 = PythonOperator(
task_id="example_sqlalchemy",
python_callable=connect_to_cloud_sql,
dag=dag
)
if __name__ == "__main__":
connect_to_cloud_sql()
Here, in Hoffa's answer to a similar question, you can find a reference on how Wepay keeps it synchronized every 15 minutes using an Airflow operator.
From said answer:
Take a look at how WePay does this:
https://wecode.wepay.com/posts/bigquery-wepay
The MySQL to GCS operator executes a SELECT query against a MySQL
table. The SELECT pulls all data greater than (or equal to) the last
high watermark. The high watermark is either the primary key of the
table (if the table is append-only), or a modification timestamp
column (if the table receives updates). Again, the SELECT statement
also goes back a bit in time (or rows) to catch potentially dropped
rows from the last query (due to the issues mentioned above).
With Airflow they manage to keep BigQuery synchronized to their MySQL
database every 15 minutes.
Now we can connect to Cloud SQL without creating a cloud proxy ourselves. The operator will create it automatically. The code look like this:
from airflow.models import DAG
from airflow.contrib.operators.gcp_sql_operator import CloudSqlInstanceExportOperator
export_body = {
'exportContext': {
'fileType': 'CSV',
'uri': EXPORT_URI,
'databases': [DB_NAME],
'csvExportOptions': {
'selectQuery': SQL
}
}
}
default_dag_args = {}
with DAG(
'postgres_test',
schedule_interval='#once',
default_args=default_dag_args) as dag:
sql_export_task = CloudSqlInstanceExportOperator(
project_id=GCP_PROJECT_ID,
body=export_body,
instance=INSTANCE_NAME,
task_id='sql_export_task'
)