GCP workflow: load external sql file? - google-cloud-platform

I am planning to have a Cloud Scheduler that calls a GCP Workflows every day at 8 a.m. My GCP Workflows will have around 15 different steps and will be only transformations (update, delete, add) on BigQuery. Some queries will be quite long and I am wondering if there is a way to load a .sql file into a GCP Workflows task1.yaml?
#workflow entrypoint
ProcessItem:
params: [project, gcsPath]
steps:
- initialize:
assign:
- dataset: wf_samples
- input: ${gcsPath}
- sqlQuery: QUERY HERE
...

You need to do something similar: (of course you can assign this to a variable like input)
#workflow entrypoint
main:
steps:
- getSqlfile:
call: http.get
args:
url: https://raw.githubusercontent.com/jisaw/sqlzoo-solutions/master/select-in-select.sql
headers:
Content-Type: "text/plain"
result: queryFromFile
- final:
return: ${queryFromFile.body}
For Cloud Storage that may look like:
call: http.get
args:
url: https://storage.cloud.google.com/................./q1.sql
headers:
Content-Type: "text/plain"
auth:
type: OIDC
result: queryFromFile
Or event with this format (different URL syntax + OAuth2)
call: http.get
args:
url: https://storage.googleapis.com/................./q1.sql
headers:
Content-Type: "text/plain"
auth:
type: OAuth2
result: queryFromFile
Make sure that invoker has the right permission to access the Cloud Storage file.
Note: On further testing, this to work correctly the text/plain
mime-type must be set on the GCS file.

Related

How to trigger Email notification to Group of users from BigQuery on threshold?

I have written the below query in GCP BigQuery, where I am using error function to pop-up error message when the threshold for the quantity column exceeds 1000.
SELECT ERROR(CONCAT("Over threshold: ", CAST(quantity AS STRING)))
FROM `proj.dataset.table`
WHERE quantity > 1000
I am getting the email notification when I have scheduled this query in BigQuery. But I want to trigger that notification to the group of users through BigQuery.
How to achieve this?
You could achieve this and a lot more with the Cloud Workflows serverless product and an external email sending provider such as Sendgrid, Mailchimp, Mailgun that offers a REST Api.
You basically setup a Workflow that will handle the steps for you:
run the BigQuery query
on error trigger an email step
you could even combine, if results returned are of a kind execute another step
The main workflow would be like this:
#workflow entrypoint
main:
steps:
- getList:
try:
call: BQ_Query
args:
query: SELECT ERROR('demo') from (select 1) where 1>0
result: result
except:
as: e
steps:
- sendEmail:
call: sendGridSend
args:
secret: sendgrid_email_dev_apikey
from: from#domain.com
to:
- email: email1#domain.com
- email: email2#domain.com
subject: "This is a test"
content: ${"Error message from BigQuery" + e.body.error.message}
contentType: "text/plain"
result: callResult
- final:
return: ${callResult}
sendgrid_email_dev_apikey is the secret label, I've used Secret Manager to store Sendgrid's API key. If you want to use MailChimp there are examples in this Github repo.
The workflow invoker could be a Cloud Scheduler entry. So instead of launching the scheduled queries from BigQuery interface, you set them up in a scheduled Workflow. You must give permission for the invoker service account to read Secrets, to run BigQuery jobs.
The rest of the Workflow is here:
BQ_Query:
params: [query]
steps:
- runBQquery:
call: googleapis.bigquery.v2.jobs.query
args:
projectId: ${sys.get_env("GOOGLE_CLOUD_PROJECT_ID")}
body:
useLegacySql: false
query: ${query}
result: queryResult
- documentFound:
return: ${queryResult}
sendGridSend:
params: [secret, from, to, subject, content, contentType]
steps:
- getSecret:
call: http.get
args:
url: ${"https://secretmanager.googleapis.com/v1/projects/" + sys.get_env("GOOGLE_CLOUD_PROJECT_NUMBER") + "/secrets/" + secret + "/versions/latest:access"}
auth:
type: OAuth2
result: sendGridKey
- decodeSecrets:
assign:
- decodedKey: ${text.decode(base64.decode(sendGridKey.body.payload.data))}
- sendMessage:
call: http.post
args:
url: https://api.sendgrid.com/v3/mail/send
headers:
Content-Type: "application/json"
Authorization: ${"Bearer " + decodedKey }
body:
personalizations:
- to: ${to}
from:
email: ${from}
subject: ${subject}
content:
- type: ${contentType}
value: ${content}
result: sendGridResult
- returnValue:
return: ${sendGridResult}
Since you receive a mail notification, I guess you are using the BigQuery Data Transfer service.
According to this paragraph only the person that set up the transfer will receive the mail notification. However, if you're using Gmail you can automatically forward these message to a list of users.
This link should guide you through it.

Google Cloud Workflows - "ResourceLimitError" when doing HTTP requests in a loop

We are using GCP Workflows to do some API calls for status check every n second via http.post call.
Everything was fine till recently all of our workflows started failing with internal error:
{"message":"ResourceLimitError: Memory usage limit exceeded","tags":["ResourceLimitError"]}
I found out, that when we are using GET with query params, it's failure happens a bit later than the same for POST and body.
Here is the testing workflow:
main:
steps:
- init:
assign:
- i: 0
- body:
foo: 'thisismyhorsemyhorseisamazing'
- doRequest:
call: http.request
args:
url: https://{my-location-and-project-id}.cloudfunctions.net/workflow-test
method: GET
query: ${body}
result: res
- sleepForOneSecond:
call: sys.sleep
args:
seconds: 1
- logCounter:
call: sys.log
args:
text: ${"Iteration - " + string(i)}
severity: INFO
- increaseCounter:
assign:
- i: ${i + 1}
- checkIfFinished:
switch:
- condition: ${i < 500}
next: doRequest
next: returnOutput
- returnOutput:
return: ${res.body}
It can do up to 37 requests with GET and 32 with POST and then execution stops with an error. And that numbers don't change.
For reference, Firebase function on POST and GET returns 200 with next JSON:
{
"bar": "thisismyhorsemyhorseisamazing",
"fyz": [],
}
Any ideas what goes wrong there? I don't think that 64Kb quota for variables is exceeded there. It shouldn't be calculated as a sum of all assignments, should it?
This looks like an issue with the product, I found this Google tracker, This issue was reported.
It is better continue over the public issue tracker.

AWS Api-Gateway error 500 on postman / success on api gateway test

I am facing a strange issue with a lambda intergration in api gateway ( tried proxy as well same issue)
lambda first hits AppSync and returns either JSON content on error or a XLXS file on success.
while testing on API gateway test console it brings back status 200 and the binary results as expected. but when i try it externally through postman it fails.
More info :
Intergration type : Lambda
Success response :
response = buffer.toString("base64");
Error Response:
response= JSON.stringify(err);
Serverless apigateway setup:
exportXls:
handler: ./src/apiGatewayLambdas/exportxls/exportXls.handler
role: AppSyncLambdaRole
events:
- http:
path: /api/exportxls
method: post
integration: lambda
contentHandling: CONVERT_TO_BINARY
Apparently Apigateway with lambda or proxy integration encodes body to base64. so i changed my lambda to
let buffer = new Buffer(_event.body, "base64");
let body = buffer.toString("ascii");
body = JSON.parse(body);
and everything worked as expected .

Include cookie in swagger doc requests

My web service API will check whether a certain cookie is included in the requests, but I couldn't figure out how to include a cookie to my swagger doc api calls.
I've tried two approaches:
Adding cookie as a editable field like this in my .yaml file.
paths:
/myApi/create:
parameters:
- name: Cookie
in: header
description: cookie
required: true
type: string
In the html file of swagger ui, add
window.authorizations.add(
"Cookie",
new ApiKeyAuthorization("Cookie", 'Name=Val', 'header')
)
But in both of the approach my api doesn't get the cookie, I was wondering how I can do this? Thanks!
OpenAPI/Swagger spec 2.0 does not support cookie authentication. For the next version (3.0), the discussion to support it can be found in the following:
https://github.com/OAI/OpenAPI-Specification/issues/15
UPDATE: OpenAPI spec 3.0 will support cookie: https://github.com/OAI/OpenAPI-Specification/blob/OpenAPI.next/versions/3.0.md#parameter-locations
Maybe it is too late, but you should check the following example:
swagger: '2.0'
info:
version: '1'
title: With Cookie Authentication
description: With Cookie Authentication
securityDefinitions:
myCookie:
type: apiKey
name: Cookie
in: header
paths:
/say-hi:
get:
summary: Say Hello
description: Say Hello
responses:
200:
description: OK
security:
- myCookie: []

Android app uploaded in Device Farm via AWS SDK for Go never changed status from INITIALIZED

I'm trying to use AWS SDK for Go to automate app runs in AWS Device Farm. But any app that uploaded with Go version of SDK never changed status from "INITIALIZED". If I upload them via AWS Console web UI, then all will be fine.
Example of code for upload:
func uploadApp(client *devicefarm.DeviceFarm, appType, projectArn string) string {
params := &devicefarm.CreateUploadInput{
Name: aws.String(*appName),
ProjectArn: aws.String(projectArn),
Type: aws.String(appType),
}
resp, err := client.CreateUpload(params)
if err != nil {
log.Fatal("Failed to upload an app because of: ", err.Error())
}
log.Println("Upload ARN:", *resp.Upload.Arn)
return *resp.Upload.Arn
}
In response I got something like:
{
Upload: {
Arn: "arn:aws:devicefarm:us-west-2:091463382595:upload:c632e325-266b-4bda-a74d-0acec1e2a5ae/9fbbf140-e377-4de9-b7df-dd18a21b2bca",
Created: 2016-01-15 14:27:31 +0000 UTC,
Name: "app-debug-unaligned.apk",
Status: "INITIALIZED",
Type: "ANDROID_APP",
Url: "bla-bla-bla"
}
}
With time status never changes from "INITIALIZED". As I mentioned, apps which run scheduled from UI works fine.
How to figure it out reason of this ?
=======================================
Solution:
1) After CreateUpload it requires to upload a file using pre-signed S3 link in the response
2) Upload should be executed via HTTP PUT request by received URL with file content in the body
3) In &devicefarm.CreateUploadInput should be specified ContentTypeparameter. For PUT request same value for Content-Type header should be used
4) If PUT request will be send from Go code, then Content-Length header should be set manually
When you call the CreateUpload API, Device Farm will return an "Upload" response containing a "Url" field.
{
Upload: {
Arn: "arn:aws:devicefarm:us-west-2:....",
Created: 2016-01-15 14:27:31 +0000 UTC,
Name: "app-name.apk",
Status: "INITIALIZED",
Type: "ANDROID_APP",
Url: "bla-bla-bla"
}
}
The returned url, "bla-bla-bla", is a pre-signed S3 url for you to upload your application. Documentation on using a pre-signed url to upload an object: http://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlUploadObject.html
Once your application has been uploaded, the app will be processed. The status of your upload will change to "PROCESSING" and "SUCCEEDED" (or "FAILED" if something is wrong). Once it's in "SUCCEEDED" status, you can use it to schedule a run.