How to build an API call to a BigQuery table - google-cloud-platform

I would very much appreciate if anyone can point me on how to create an API call to one specific table on BigQuery? EG: I will send a request {user_id} and it will return {user_status}.
I've tried googling around, and the only thing I could find is two steps:
Export the table from BigQuery -> CSV -> postgresql on Google Cloud SQL
Setup API using Flask on Google App Engine that hits the DB postgresql deployed on Google Cloud SQL.
Would very much appreciate it if anyone has other more direct ways.
Thank you.

Related

What is the best way to replicatedata from Oracle Goldengate Onpremise to AWS (SQL or NOSQL)?

What is the best way to replicate data from Oracle Goldengate On premise to AWS (SQL or NOSQL)?
I was just checking this for azure,
My company is looking for solutions of moving data to the cloud
Minimal impact for on-prem legacy/3rd party systems.
No oracle db instances on the cloud side.
Minimum "hops" for the data between the source and destination.
Paas over IaaS solutions.
Out of the box features over native code and in-house development.
oracle server 12c or above
some custom filtering solution
some custom transformations
** filtering can be done in goldengate, in nifi, azure mapping, ksqldb
solutions are divided into:
If solution is alolwed to touch.read the logfile of the oracle server
you can use azure ADF, azure synapse, K2view, apache nifi, Orcle CDC adapter for BigData (check versions) to directly move data to the cloud buffered by kafka however the info inside the kafka will be in special-schema json format.
If you must use GG Trail file as input to your sync/etl paradigm you can
use a custom data provider that would translate the trailfile into a flowfile for nifi (you need to write it, see this 2 star project on github for a direction
use github project with gg for bigdata and kafka over kafkaconect to also get translated SQL dml and ddl statements which would make the solution much more readable
other solutions are corner cases, but i hope this gives you what you needed
In my company's case we have Oracle as a source db and Snowflake as a target db. We've built the following processing sequence:
On-premise OGG Extract works with on-premise Oracle DB.
Datapump sends trails to another host
On this host we have OGG for Big data Replicat that processes trails and then sends result as json to AWS S3 bucket.
Since Snowflake DB can handle JSON as a source of data and works with S3 bucket it loads jsons into staging tables where further processing takes place.
You can read more about this approach here: https://www.snowflake.com/blog/continuous-data-replication-into-snowflake-with-oracle-goldengate/

How to call a Bigquery stored procedure in Nifi

I have a bigquery stored procedures which will run on some GCS object and do magic out of it. The procedures work perfect manually but I want to call the procedure from Nifi. I have worked with HANA and know that I need JDBC driver to connect and perform query.
Either I can use the executeprocess processor or I could use executeSQL processor. I dont know to be honest
I am not sure how to achieve that in Nifi with bigquery stored procedures. Could anyone help me on this?
Thanks in advance!!
Updated with new error if someone could help
Option1: Executeprocess
The closest thing to "execute manually" is installing the Google Cloud SDK and execute within 'executeprocess' this:
bq query 'CALL STORED_PROCEDURE(ARGS)'
or
bq query 'SELECT STORED_PROCEDURE(ARGS)'
Option 2: ExecuteSQL
If you want to use ExecuteSQL with Nifi to call the stored procedure, you'll the BigQuery JDBC Driver.
Both 'select' and 'call' methods will work with BigQuery.
Which option is better?
I believe ExecuteSQL is easier than Executeprocess.
Why? because you need to install the GCloud SDK on all systems that might run executecommand, and you must pass the google cloud credentials to them.
That means sharing the job is not easy.
Plus, this might involve administrator rights in all the machines.
In the ExecuteSQL case you'll need to:
1 - Copy the jdbc driver to the lib directory inside your Nifi installation
2 - Connect to BigQuery using pre-generated access/refresh tokens - see JDBC Driver for Google BigQuery Install and Configuration guide - that's Oauth type 2.
The good part is that when you export the flow, the credentials are embedded on it: no need to mess with credentials.json files etc (this could be also bad from a security standpoint).
Distributing jdbc jars is easier than installing the GCloud SDK: just drop a file on the lib folder. If you need it in more than one node, you can scp/sftp it, or distribute it with Ambari.

How to show snapshot API response data in data-studio?

I need to design and display a compute engine snapshot report for different projects in the cloud in data-studio. For this, I am trying to use the below Google Compute Engine snapshot-api for retrieving data.
https://compute.googleapis.com/compute/v1/projects/my-project/global/snapshots
The data may change everyday depending on the snapshots created from the disks. So the report should display all the updated data.
Can this rest-api be called directly from Google data-studio?
Alternatively, what is the best/simplest way to display the response in data-studio?
You can use a Community Connector in Data Studio to directly pull the data from the API.
Currently, their is no way to connect GCP Compute Engine (GCE) resource data or use the REST API in Data Studio. The only products that are available on connecting data from GCP are the following:
BigQuery
Cloud Spanner
Cloud SQL for MySQL
Google Cloud Storage
MySQL
PostgreSQL
A possible way to design and display a Compute Engine Snapshot Report for different projects in the Cloud in Data Studio is by creating a Google App Script (to call the snapshot REST API) with a Google Sheet, and then import the data into the sheet on Data Studio.
Additionally, if you have any questions in regards to Data Studio, I would suggest reviewing the following documents below:
Data Studio Help Center
Data Studio Help Community
EDIT: My apologies, it seems that their is a way to show snapshot API response data in Data Studio by using a Community Connector to directly pull the data from the API.

How to schedule a query (Export Data) from Google Big Query to External Storage space (Eg: Box)

I read many articles and solutions regarding scheduling queries to external storage places in Google Big Query but they didn't seem to be that clear.
Note: My company has subscription only to Google Big Query and not to the complete cloud Services (Google Cloud Platform).
I know how to do it manually but I am looking to automate the process since I need the same data every week.
Any suggestions will be appreciated. Thank you.
Option 1
You can use Apache Airflow which provides the option to create schedule task on to of BigQuery using BigQuery operator.
You can find in this link the basic steps required to start setting this up
option 2
You can use the Google BigQuery command line to export your data as you do from the webUI, for example:
bq --location=[LOCATION] extract --destination_format [FORMAT] --compression [COMPRESSION_TYPE] --field_delimiter [DELIMITER] --print_header [BOOLEAN] [PROJECT_ID]:[DATASET].[TABLE] gs://[BUCKET]/[FILENAME]
Once you get this working you can use any schedule process of your liking to schedule the run of this job
BTW: Airflow has a connector which enables you to run the command line tool
Once the file in GCP you can use Box G suite integration to see and manage your files

Can we implement data lineage on queries run via Google BigQuery?

Could anyone help me in providing some pointers on how do we implement Data lineage on a DW type solution built on Google BigQuery using Google Cloud storage as source and Google Cloud Composer as the workflow manager to implement a series of SQL's?
If you have your data in Cloud Storage, you would maybe like to use something like GoogleCloudStorageToBigQueryOperator to first load your data in Bigquery, then use BigQueryOperator to run your queries.
Then you could see how your different DAGs,tasks etc are running in the Airflow Web UI inside Composer.