Variable passing throwing error in BigQueryInsertJobOperator in Airflow - google-cloud-platform

I have written a BigQueryInsertJobOperator in Airflow to select and insert data to a Big Query table. But I am facing issue with variable passing. I am getting below error while executing Airflow DAG.
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 911, in to_api_repr
configuration = self._configuration.to_api_repr()
File "/home/airflow/.local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 683, in to_api_repr
query_parameters = resource["query"].get("queryParameters")
AttributeError: 'str' object has no attribute 'get'
Here is my Operator code:
dag = DAG(
'bq_to_sql_operator',
default_args=default_args,
schedule_interval="#daily",
template_searchpath="/opt/airflow/dags/scripts",
user_defined_macros={"BQ_PROJECT": BQ_PROJECT, "BQ_EDW_DATASET": BQ_EDW_DATASET, "BQ_STAGING_DATASET": BQ_STAGING_DATASET},
catchup=False
)
t1 = BigQueryInsertJobOperator(
task_id='bq_write_to_umc_cg_service_agg_stg',
configuration={
"query": "{% include 'umc_cg_service_agg_stg.sql' %}",
"useLegacySql":False,
"allow_large_results":True,
"writeDisposition": "WRITE_TRUNCATE",
"destinationTable": {
'projectId': BQ_PROJECT,
'datasetId': BQ_STAGING_DATASET,
'tableId': UMC_CG_SERVICE_AGG_STG_TABLE_NAME
}
},
params={'BQ_PROJECT': BQ_PROJECT, 'BQ_EDW_DATASET': BQ_EDW_DATASET, 'BQ_STAGING_DATASET': BQ_STAGING_DATASET },
gcp_conn_id=BQ_CONN_ID,
location=BQ_LOCATION,
dag=dag
)
My SQL file looks like as below:
select
faccs2.employer_key employer_key,
faccs2.service_name service_name,
gender,
approximate_age_band,
state,
relationship_map_name,
account_attribute1_name,
account_attribute1_value,
account_attribute2_name,
account_attribute2_value,
account_attribute3_name,
account_attribute3_value,
account_attribute4_name,
account_attribute4_value,
account_attribute5_name,
account_attribute5_value,
count(distinct faccs2.sf_service_id) total_service_count
from `{{params.BQ_PROJECT}}.{{params.BQ_EDW_DATASET}}.fact_account_cg_case_survey` faccs
inner join `{{params.BQ_PROJECT}}.{{params.BQ_EDW_DATASET}}.fact_account_cg_case_service` faccs2 on faccs.sf_case_id = faccs2.sf_case_id
inner join `{{params.BQ_PROJECT}}.{{params.BQ_EDW_DATASET}}.dim_account` da on faccs2.account_key = da.account_key
left join `{{params.BQ_PROJECT}}.{{params.BQ_STAGING_DATASET}}.stg_account_selected_attr_tmp2` attr on faccs.account_key = attr.account_key
where not da.is_test_account_flag
and attr.gender is not null
and coalesce(faccs.case_status,'abc') <> 'Closed as Duplicate'
group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16;
Can someone please help me how to fix this issue.

I think that the query configuration should be in a nested document called query:
t1 = BigQueryInsertJobOperator(
task_id='bq_write_to_umc_cg_service_agg_stg',
configuration={
"query": {
"query": "{% include 'umc_cg_service_agg_stg.sql' %}",
"useLegacySql":False,
"allow_large_results":True,
"writeDisposition": "WRITE_TRUNCATE",
"destinationTable": {
'projectId': BQ_PROJECT,
'datasetId': BQ_STAGING_DATASET,
'tableId': UMC_CG_SERVICE_AGG_STG_TABLE_NAME
}
}
},
params={'BQ_PROJECT': BQ_PROJECT, 'BQ_EDW_DATASET': BQ_EDW_DATASET, 'BQ_STAGING_DATASET': BQ_STAGING_DATASET },
gcp_conn_id=BQ_CONN_ID,
location=BQ_LOCATION,
dag=dag
)
With your provided configuration dict, an internal method try to access queryParameters which should be in the dict configuration["query"], but it finds str instead of dict.

Consider below script what I've used at work.
target_date = '{{ ds_nodash }}'
...
# DAG task
t1= bq.BigQueryInsertJobOperator(
task_id = 'sample_task,
configuration = {
"query": {
"query": f"{{% include 'your_query_file.sql' %}}",
"useLegacySql": False,
"queryParameters": [
{ "name": "target_date",
"parameterType": { "type": "STRING" },
"parameterValue": { "value": f"{target_date}" }
}
],
"parameterMode": "NAMED"
},
},
location = 'asia-northeast3',
)
-- in your_query_file.sql, #target_date value is passed as a named parameter.
DECLARE target_date DATE DEFAULT SAFE.PARSE_DATE('%Y%m%d', #target_date);
SELECT ... FROM ... WHERE partitioned_at = target_date;
You can refer to configuration JSON field specification on the link below.
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#queryrequest
parameterMode string
Standard SQL only. Set to POSITIONAL to use positional (?) query parameters or to NAMED to use named (#myparam) query parameters in this query.
queryParameters[] object (QueryParameter)
jobs.query parameters for Standard SQL queries.
queryParameters is an array of QueryParameter which has following JSON format.
{
"name": string,
"parameterType": {
object (QueryParameterType)
},
"parameterValue": {
object (QueryParameterValue)
}
}
https://cloud.google.com/bigquery/docs/reference/rest/v2/QueryParameter

Related

How to manipulate strings and array in Apache Velocity request mapping template in AWS appSync

This is my first time working with VTL, so please correct my code if there is something very silly in it.
What I want to achieve
{
"cities": ["jaipur", "mumbai", "delhi", "sheros", "jalandhar", "bengaluru"]
}
Graphql schema:
type Query{
getCitiesForGroups: String
}
Default response template:
#if($ctx.error)
$utils.error($ctx.error.message, $ctx.error.type)
#end
$utils.toJson($utils.rds.toJsonObject($ctx.result)[0])
Using the default response template, the result I was getting
{ "data": { "getCitiesForGroups": [ "{groupcity=jaipur}", "{groupcity=mumbai}", "{groupcity=delhi}", "{groupcity=sheros}", "{groupcity=jalandhar}", "{groupcity=bengaluru}" ] } }
My request template
{
"version": "2018-05-29",
"statements": [
"select DISTINCT LOWER(city) as city from public.Groups"
]
}
To get the desired output, I changed the response template as I wanted to loop over the response I got from the DB, and remove the {city=} from the string by using the substring method given in AWS resolver mapping docs and this is where I am facing the problem.
My response template
#if($ctx.error)
$utils.error($ctx.error.message, $ctx.error.type)
#end
#set ($rawListOfCities = $utils.rds.toJsonObject($ctx.result)[0])
#set ($sanitisedListOfCities = [])
#foreach( $city in $rawListOfCities )
#set ($equalToIndex = $city.indexOf("="))
#set ($equalToIndex = $equalToIndex + 1)
#set ($curlyLastIndex = $city.lastIndexOf("}"))
#set ($tempCity = $city.substring($equalToIndex, $curlyLastIndex))
## #set ($tempCity = $city)
$util.qr($sanitisedListOfCities.add($tempCity))
#end
$util.toJson($sanitisedListOfCities)
The response that I am getting:
{
"data": {
"getCitiesForGroups": "[null, null, null, null, null, null]"
}
}
However, when I use the line #set ($tempCity = $city) and comment out the line above it, I get the following response:
{
"data": {
"getCitiesForGroups": "[{city=jaipur}, {city=mumbai}, {city=delhi}, {city=sheros}, {city=jalandhar}, {city=bengaluru}]"
}
}
Which means $city has the value {city=jaipur}, so which I want to sanitize and add it to ($sanitisedListOfCities) and return it as the response.
But I am getting null as the result substring method.
So how can I sanitize the response from DB and return it?
I had a similar issue and this is how I solved it.
First, your Graphql schema should look like,
type Query {
getCitiesForGroups: cityList
}
type cityList {
cities: [String]
}
Your request mapping template,
{
"version": "2018-05-29",
"statements": [
"select DISTINCT LOWER(city) as city from public.Groups"
]
}
And finally your response mapping template
#set($cityList = [])
#set($resMap = {})
#if($ctx.error)
$utils.error($ctx.error.message, $ctx.error.type)
#end
#foreach($item in $utils.rds.toJsonObject($ctx.result)[0])
$util.qr($cityList.add($item.city))
#end
$util.qr($resMap.put("cities", $cityList))
#return($resMap)
Expected response(complete)
{
"data": {
"getCitiesForGroups": {
"cities": [
"jaipur",
"mumbai",
"delhi",
"sheros",
"jalandhar",
"bengaluru"
]
}
}
}
I hope this helps you.

jsonPath expression expected value but return list of values

I want to check response "class_type" value has "REGION".
I test springboot API using mockMvc.
the MockHttpServletResponse is like this.
Status = 200
Error message = null
Headers = {Content-Type=[application/json;charset=UTF-8]}
Content type = application/json;charset=UTF-8
Body =
{"result":true,
"code":200,
"desc":"OK",
"data":{"total_count":15567,
"items": ...
}}
this is whole response object.
Let's take a closer look, especially items.
"items": [
{
"id": ...,
"class_type": "REGION",
"region_type": "MULTI_CITY",
"class": "com.model.Region",
"code": "AE-65GQ6",
...
},
{
"id": "...",
"class_type": "REGION",
"region_type": "CITY",
"class": "com.model.Region",
"code": "AE-AAN",
...
},
I tried using jsonPath.
#When("User wants to get list of regions, query is {string} page is {int} pageSize is {int}")
public void userWantsToGetListOfRegionsQueryIsPageIsPageSizeIs(String query, int page, int pageSize) throws Exception {
mockMvc().perform(get("/api/v1/regions" ))
.andExpect(status().is2xxSuccessful())
.andDo(print())
.andExpect(jsonPath("$.data", is(notNullValue())))
.andExpect(jsonPath("$.data.total_count").isNumber())
.andExpect(jsonPath("$.data.items").isArray())
.andExpect(jsonPath("$.data.items[*].class_type").value("REGION"));
log.info("지역 목록");
}
but
jsonPath("$.data.items[*].class_type").value("REGION")
return
java.lang.AssertionError: Got a list of values ["REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION","REGION"] instead of the expected single value REGION
I want to just check "$.data.items[*].class_type" has "REGION".
How can I change this?
One option would be to check whether you have elements in your array which have the class_type equal to 'REGION':
public static final String REGION = "REGION";
mockMvc().perform(get("/api/v1/regions"))
.andExpect(jsonPath("$.data.items[?(#.class_type == '" + REGION + "')]").exists());

Azure Cosmos query to convert into List

This is my JSON data, which is stored into cosmos db
{
"id": "e064a694-8e1e-4660-a3ef-6b894e9414f7",
"Name": "Name",
"keyData": {
"Keys": [
"Government",
"Training",
"support"
]
}
}
Now I want to write a query to eliminate the keyData and get only the Keys (like below)
{
"userid": "e064a694-8e1e-4660-a3ef-6b894e9414f7",
"Name": "Name",
"Keys" :[
"Government",
"Training",
"support"
]
}
So far I tried the query like
SELECT c.id,k.Keys FROM c
JOIN k in c.keyPhraseBatchResult
Which is not working.
Update 1:
After trying with the Sajeetharan now I can able to get the result, but the issue it producing another JSON inside the Array.
Like
{
"id": "ee885fdc-9951-40e2-b1e7-8564003cd554",
"keys": [
{
"serving": "Government"
},
{
"serving": "Training"
},
{
"serving": "support"
}
]
}
Is there is any way that extracts only the Array without having key value pari again?
{
"userid": "e064a694-8e1e-4660-a3ef-6b894e9414f7",
"Name": "Name",
"Keys" :[
"Government",
"Training",
"support"
]
}
You could try this one,
SELECT C.id, ARRAY(SELECT VALUE serving FROM serving IN C.keyData.Keys) AS Keys FROM C
Please use cosmos db stored procedure to implement your desired format based on the #Sajeetharan's sql.
function sample() {
var collection = getContext().getCollection();
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
'SELECT C.id,ARRAY(SELECT serving FROM serving IN C.keyData.Keys) AS keys FROM C',
function (err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
response.setBody('no docs found');
}
else {
var response = getContext().getResponse();
var map = {};
for(var i=0;i<feed.length;i++){
var keyArray = feed[i].keys;
var array = [];
for(var j=0;j<keyArray.length;j++){
array.push(keyArray[j].serving)
}
feed[i].keys = array;
}
response.setBody(feed);
}
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
Output:

Postman eval() - how to evaluate a property part of Json

I have a json response object like this:
"results": [
{
"seq": "882818::048313",
"id": "user1"
}
]
}
I have the entire json payload and id field name stored in 2 separate variables:
var jsonObj = pm.response.json();
var myfield = "id";
What I would like to do is below:
console.log("Value of id is: " + eval(jsonObj) + eval(".") + eval(myField));
I tried this way and getting error: Unexpected identifier.
I don't want to hardcode the name of the property but instead make it dynamic.
Please help.
If your response body looks like this:
{
"results": [
{
"seq": "882818::048313",
"id": "user1"
}
]
}
Below statement will work (no need to use eval)
console.log("Value of id is: " + jsonObj.results[0][myField]);

"type mismatch error, expected type LIST" for querying a one-to-many relationship in AppSync

The schema:
type User {
id: ID!
createdCurricula: [Curriculum]
}
type Curriculum {
id: ID!
title: String!
creator: User!
}
The resolver to query all curricula of a given user:
{
"version" : "2017-02-28",
"operation" : "Query",
"query" : {
## Provide a query expression. **
"expression": "userId = :userId",
"expressionValues" : {
":userId" : {
"S" : "${context.source.id}"
}
}
},
"index": "userIdIndex",
"limit": #if(${context.arguments.limit}) ${context.arguments.limit} #else 20 #end,
"nextToken": #if(${context.arguments.nextToken}) "${context.arguments.nextToken}" #else null #end
}
The response map:
{
"items": $util.toJson($context.result.items),
"nextToken": #if(${context.result.nextToken}) "${context.result.nextToken}" #else null #end
}
The query:
query {
getUser(id: "0b6af629-6009-4f4d-a52f-67aef7b42f43") {
id
createdCurricula {
title
}
}
}
The error:
{
"data": {
"getUser": {
"id": "0b6af629-6009-4f4d-a52f-67aef7b42f43",
"createdCurricula": null
}
},
"errors": [
{
"path": [
"getUser",
"createdCurricula"
],
"locations": null,
"message": "Can't resolve value (/getUser/createdCurricula) : type mismatch error, expected type LIST"
}
]
}
The CurriculumTable has a global secondary index titled userIdIndex, which has userId as the partition key.
If I change the response map to this:
$util.toJson($context.result.items)
The output is the following:
{
"data": {
"getUser": {
"id": "0b6af629-6009-4f4d-a52f-67aef7b42f43",
"createdCurricula": null
}
},
"errors": [
{
"path": [
"getUser",
"createdCurricula"
],
"errorType": "MappingTemplate",
"locations": [
{
"line": 4,
"column": 5
}
],
"message": "Unable to convert \n{\n [{\"id\":\"87897987\",\"title\":\"Test Curriculum\",\"userId\":\"0b6af629-6009-4f4d-a52f-67aef7b42f43\"}],\n} to class java.lang.Object."
}
]
}
If I take that string and run it through a console.log in my frontend app, I get:
{
[{"id":"2","userId":"0b6af629-6009-4f4d-a52f-67aef7b42f43"},{"id":"1","userId":"0b6af629-6009-4f4d-a52f-67aef7b42f43"}]
}
That's clearly an object. How do I make it... not an object, so that AppSync properly reads it as a list?
SOLUTION
My response map had a set of curly braces around it. I'm pretty sure that was placed there in the generator by Amazon. Removing them fixed it.
I think I'm not seeing the complete view of your schema, I was expecting something like:
schema {
query: Query
}
Where Query is RootQuery, in fact you didn't share us your Query definition. Assuming you have the right Query definition. The main problem is in your response template.
> "items": $util.toJson($context.result.items)
This means that you are passing a collection named: *"items"* to Graphql query engine. And you are referring this collection as "createdCurricula". So solve this issue your response-mapping-template is the right place to fix. How? just replace the above line with the following.
"createdCurricula": $util.toJson($context.result.items),
Please the main thing to note here is, the mapping template is a bridge between your datasources and qraphql, feel free to make any computation, or name mapping but don't forget that object names in that response json are the one should match in schema/query definition.
Thanks.
Musema
change to result type to $util.toJson($ctx.result.data.posts)
The exception msg says that it expected a type list.
Looking at:
{
[{"id":"2","userId":"0b6af629-6009-4f4d-a52f-67aef7b42f43"},{"id":"1","userId":"0b6af629-6009-4f4d-a52f-67aef7b42f43"}]
}
I don't see that createdCurricula is a LIST.
What is currently in DDB is:
"id": "0b6af629-6009-4f4d-a52f-67aef7b42f43",
"createdCurricula": null