I want many gyp scripts to have a common target. So I decided to move it to a separate include file. Simplest test-case that produces an error:
foo.gyp
{
'includes' : [
'bar.gypi',
],
}
bar.gypi
{
'targets': [
{
'target_name' : 'phony',
'type' : 'none',
'actions' : [
{
'action_name' : '_phony_',
'inputs' : ['',],
'outputs' : ['',],
'action' : ['_phony_',],
'message' : '_phony_',
},
],
},
],
}
Produces error:
IndexError: string index out of range while reading includes of
foo.gyp while tr ying to load foo.gyp
Some observations:
If I delete actions from target, everything parses well
If I move targets (with actions) to foo.gyp, everything parses well
Am I doing something wrong?
It looks like the "outputs" list can not be empty or contain an empty string:
# gyp/make.py:893
self.WriteLn("%s: obj := $(abs_obj)" % QuoteSpaces(outputs[0]))
You may have empty inputs but in this case the phony action will shoot only once. I haven't found any mentions of phony actions in the GYP documentation, but I have the following variant working:
# bar.gypi
{
'targets': [
{
'target_name' : 'phony',
'type' : 'none',
'actions' : [
{
'action_name' : '_phony_',
'inputs' : ['./bar.gypi'], # The action depends on this file
'outputs' : ['test'], # Some dummy file
'action' : ['echo', 'test'],
'message' : 'Running phony target',
},
],
},
],
}
I could try to find a better way if you tell me more about the task you are trying to solve.
Related
I'm trying to run a dataflow batch job using template "Text file on Cloud Storage to BigQuery". First three steps are working but in the last stage it is getting failed giving error:
Error message from worker: java.lang.RuntimeException: Failed to create job with prefix beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000, reached max retries: 3, last failed job: { "configuration" : { "jobType" : "LOAD", "labels" : { "beam_job_id" : "2022-11-10_02_06_07-15255037958352274885" }, "load" : { "createDisposition" : "CREATE_IF_NEEDED", "destinationTable" : { "datasetId" : "minerals_test_dataset", "projectId" : "jio-big-data-poc", "tableId" : "mytable01" }, "ignoreUnknownValues" : false, "sourceFormat" : "NEWLINE_DELIMITED_JSON", "useAvroLogicalTypes" : false, "writeDisposition" : "WRITE_APPEND" } }, "etag" : "LHqft9L/H4XBWTNZ7BSRXA==", "id" : "jio-big-data-poc:asia-south1.beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2", "jobReference" : { "jobId" : "beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2", "location" : "asia-south1", "projectId" : "jio-big-data-poc" }, "kind" : "bigquery#job", "selfLink" : "https://bigquery.googleapis.com/bigquery/v2/projects/jio-big-data-poc/jobs/beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2?location=asia-south1", "statistics" : { "creationTime" : "1668074949767", "endTime" : "1668074949869", "startTime" : "1668074949869" }, "status" : { "errorResult" : { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" }, "errors" : [ { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" } ], "state" : "DONE" }, "user_email" : "49449455496-compute#developer.gserviceaccount.com", "principal_subject" : "serviceAccount:49449455496-compute#developer.gserviceaccount.com" }. org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:200) org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:153) org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:378)
I tried running same job with other datasets csv files, also the javascript udf and json schema are according to the documentation, but the job is failing at the same stage. So, what can be the possible solution to this error?
The Json schema you given doesn't matches the BigQuery schema of your table :
"Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" }, "errors" : [ { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" } ]
There is a field called field: marks that seems to not exists in the BigQuery table.
If you update your BigQuery schema to match perfectly with the fields of your input Json line and element, that will solve the issue.
I have a Mongo collection that contains data on saved searches in a Vue/Laravel app, and it contains records like the following:
{
"_id" : ObjectId("6202f3357a02e8740039f343"),
"q" : null,
"name" : "FCA last 3 years",
"frequency" : "Daily",
"scope" : "FederalContractAwardModel",
"filters" : {
"condition" : "AND",
"rules" : [
{
"id" : "awardDate",
"operator" : "between_relative_backward",
"value" : [
"now-3.5y/d",
"now/d"
]
},
{
"id" : "subtypes.extentCompeted",
"operator" : "in",
"value" : [
"Full and Open Competition"
]
}
]
},
The problem is the value in the item in the rules array that has the decimal.
"value" : [
"now-3.5y/d",
"now/d"
]
in particular the decimal. Because of a UI error, the user was allowed to enter a decimal value, and so this needs to be fixed to remove the decimal like so.
"value" : [
"now-3y/d",
"now/d"
]
My problem is writing a Mongo query to identify these records (I'm a Mongo noob). What I need is to identify records in this collection that have an item in the filters.rules array with an item in the 'value` array that contains a decimal.
Piece of cake, right?
Here's as far as I've gotten.
myCollection.find({"filters.rules": })
but I'm not sure where to go from here.
UPDATE: After running the regex provided by #R2D2, I found that it also brings up records with a valid date string , e.g.
"rules" : [
{
"id" : "dueDate",
"operator" : "between",
"value" : [
"2018-09-10T19:04:00.000Z",
null
]
},
so what I need to do is filter out cases where the period has a double 0 on either side (i.e. 00.00). If I read the regex correctly, this part
[^\.]
is excluding characters, so I would want something like
[^00\.00]
but running this query
db.collection.find( {
"filters.rules.value": { $regex: /\.[^00\.00]*/ }
} )
still returns the same records, even though it works as expected in a regex tester. What am I missing?
To find all documents containing at least one value string with (.) , try:
db.collection.find( {
"filters.rules.value": { $regex: /\.[^\.]*/ }
} )
Or you can filter only the fields that need fix via aggregation as follow:
[direct: mongos]> db.tes.aggregate([ {$unwind:"$filters.rules"}, {$unwind:"$filters.rules.value"}, {$match:{ "filters.rules.value": {$regex: /\.[^\.]*/ } }} ,{$project:{_id:1,oldValue:"$filters.rules.value"}} ])
[
{ _id: ObjectId("6202f3357a02e8740039f343"), oldValue: 'now-3.5y/d' }
]
[direct: mongos]>
Later to update those values:
db.collection.update({
"filters.rules.value": "now-3.5y/d"
},
{
$set: {
"filters.rules.$[x].value.$": "now-3,5y/d-CORRECTED"
}
},
{
arrayFilters: [
{
"x.value": "now-3.5y/d"
}
]
})
playground
I am trying to connect to snowflake from EMR cluster launched by airflow EMR operator but I'm getting the following error
py4j.protocol.Py4JJavaError: An error occurred while calling
o147.load. : java.lang.ClassNotFoundException: Failed to find data
source: net.snowflake.spark.snowflake. Please find packages at
http://spark.apache.org/third-party-projects.html
These are the steps I am adding to my EMRaddsteps operator to run the script load_updates.py and I am describing my snowflake packages in the "Args"
STEPS = [
{
"Name" : "convo_facts",
"ActionOnFailure" : "TERMINATE_CLUSTER",
"HadoopJarStep" : {
"Jar" : "command-runner.jar",
"Args" : ["spark-submit", "s3://dev-data-lake/spark_files/cf/load_updates.py", \
"--packages net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4", \
"INPUT=s3://dev-data-lake/table_exports/public/", \
"OUTPUT=s3://dev-data-lake/emr_output/cf/"]
}
}
]
JOB_FLOW_OVERRIDES = {
'Name' : 'cftest',
'LogUri' : 's3://dev-data-lake/emr_logs/cf/log.txt',
'ReleaseLabel' : 'emr-5.32.0',
'Instances' : {
'InstanceGroups' : [
{
'Name' : 'Master nodes',
'Market' : 'ON_DEMAND',
'InstanceRole' : 'MASTER',
'InstanceType' : 'r6g.4xlarge',
'InstanceCount' : 1,
},
{
'Name' : 'Slave nodes',
'Market' : 'ON_DEMAND',
'InstanceRole' : 'CORE',
'InstanceType' : 'r6g.4xlarge',
'InstanceCount' : 3,
}
],
'KeepJobFlowAliveWhenNoSteps' : True,
'TerminationProtected' : False
},
'Applications' : [{
'Name' : 'Spark'
}],
'JobFlowRole' : 'EMR_EC2_DefaultRole',
'ServiceRole' : 'EMR_DefaultRole'
}
And, this is how I am adding snowflake creds in my load_updates.py script to extract into a pyspark dataframe.
# Set options below
sfOptions = {
"sfURL" : "xxxx.us-east-1.snowflakecomputing.com",
"sfUser" : "user",
"sfPassword" : "xxxx",
"sfDatabase" : "",
"sfSchema" : "PUBLIC",
"sfWarehouse" : ""
}
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
query_sql = """select * from cf""";
messages_new = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
.options(**sfOptions) \
.option("query", query_sql) \
.load()
Not sure if I am missing something here or where am I doing wrong.
The option --package should be placed before s3://.../load_updates.py in the spark-submit command. Otherwise, it'll be considered as application argument.
Try with this :
STEPS = [
{
"Name": "convo_facts",
"ActionOnFailure": "TERMINATE_CLUSTER",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--packages",
"net.snowflake:snowflake-jdbc:3.8.0,net.snowflake:spark-snowflake_2.11:2.4.14-spark_2.4",
"s3://dev-data-lake/spark_files/cf/load_updates.py",
"INPUT=s3://dev-data-lake/table_exports/public/",
"OUTPUT=s3://dev-data-lake/emr_output/cf/"
]
}
}
]
i'm trying to play around with GYP and got stucked with defining "default variable"
have 2 files(one main, and one expected to store common data, included to main:
1) v_common.gypi:
{
'variables': {
'mymodule%': "blblblb",
'mymoduleLibs' : "<(mymodule)/Libs",
},
'target_defaults': {
},
}
2) mymodule.gyp
{
'variables':{
},
'includes': [
'v_common.gypi',
], # includes
'targets': [
{
'target_name': 'myModule',
'type': 'none',
'actions' : [
{
'action_name': 'create_libs_folder',
'inputs': ['one_file'],
'outputs':['blabla'],
'action': ['mkdir', '<(mymoduleLibs)'],
}
]
},
], # targets
}
per my expectations:
mymodule should get value of "blblblb", (as far as it wasn't defined previously anywhere),
then I should be able to use it for compute value of mymoduleLibs
and after all mymoduleLibs should be usable in mymodule.gyp
but, i just getting error that mymodule is "Undefined variable". If I do exact definition of mymodyle like in example below(withot percent sign), everything works fine. :
'variables': {
'mymodule': "blblblb",
'mymoduleLibs' : "<(mymodule)/Libs",
}
any ideas?
i've found issue. it's described here https://groups.google.com/forum/?fromgroups#!topicsearchin/gyp-developer/default/gyp-developer/1EWXAXe-qWs
correct workaroud is to define default variables in sub-dict 'variables':{...}, so they will be evaluated before expanding other variables, like below:
{
'variables': {
'variables': {
'mymodule%': "blblblb",
},
'mymoduleLibs' : "<(mymodule)/Libs",
},
'target_defaults': {
},
}
I've installed pljson 1.05 in Oracle Xe 11g and written a PLSQL function to extract values from the return from Amazon AWS describe-instances.
Trying to obtain the values for top level items such as reservation ID work but i am unable to get values nested within lower levels of the json.
e.g. this example works (using the cutdown AWS JSON inline
DECLARE
reservations JSON_LIST;
l_tempobj JSON;
instance JSON;
L_id VARCHAR2(20);
BEGIN
obj:= json('{
"Reservations": [
{
"ReservationId": "r-5a33ea1a",
"Instances": [
{
"State": {
"Name": "stopped"
},
"InstanceId": "i-7e02503e"
}
]
},
{
"ReservationId": "r-e5930ea5",
"Instances": [
{
"State": {
"Name": "running"
},
"InstanceId": "i-77859692"
}
]
}
]
}');
reservations := json_list(obj.get('Reservations'));
l_tempobj := json(reservations);
DBMS_OUTPUT.PUT_LINE('============');
FOR i IN 1 .. l_tempobj.count
LOOP
DBMS_OUTPUT.PUT_LINE('------------');
instance := json(l_tempobj.get(i));
instance.print;
l_id := json_ext.get_string(instance, 'ReservationId');
DBMS_OUTPUT.PUT_LINE(i||'] Instance:'||l_id);
END LOOP;
END;
returning
============
------------
{
"ReservationId" : "r-5a33ea1a",
"Instances" : [{
"State" : {
"Name" : "stopped"
},
"InstanceId" : "i-7e02503e"
}]
}
1] Instance:r-5a33ea1a
------------
{
"ReservationId" : "r-e5930ea5",
"Instances" : [{
"State" : {
"Name" : "running"
},
"InstanceId" : "i-77859692"
}]
}
2] Instance:r-e5930ea5
but this example to return the instance ID doesnt
DECLARE
l_clob CLOB;
obj JSON;
reservations JSON_LIST;
l_tempobj JSON;
instance JSON;
L_id VARCHAR2(20);
BEGIN
obj:= json('{
"Reservations": [
{
"ReservationId": "r-5a33ea1a",
"Instances": [
{
"State": {
"Name": "stopped"
},
"InstanceId": "i-7e02503e"
}
]
},
{
"ReservationId": "r-e5930ea5",
"Instances": [
{
"State": {
"Name": "running"
},
"InstanceId": "i-77859692"
}
]
}
]
}');
reservations := json_list(obj.get('Reservations'));
l_tempobj := json(reservations);
DBMS_OUTPUT.PUT_LINE('============');
FOR i IN 1 .. l_tempobj.count
LOOP
DBMS_OUTPUT.PUT_LINE('------------');
instance := json(l_tempobj.get(i));
instance.print;
l_id := json_ext.get_string(instance, 'Instances.InstanceId');
DBMS_OUTPUT.PUT_LINE(i||'] Instance:'||l_id);
END LOOP;
END;
returning
============
------------
{
"ReservationId" : "r-5a33ea1a",
"Instances" : [{
"State" : {
"Name" : "stopped"
},
"InstanceId" : "i-7e02503e"
}]
}
1] Instance:
------------
{
"ReservationId" : "r-e5930ea5",
"Instances" : [{
"State" : {
"Name" : "running"
},
"InstanceId" : "i-77859692"
}]
}
2] Instance:
The only change from the first example to the second is replacing 'ReservationId' with 'Instances.InstanceId' but in the second example, although the function succeeds and the instance.print statement outputs the full json, this code doesnt populate the Instance ID into l_id so is not output on the DBMS_OUTPUT.
I also get the same result (i.e. no value in L_id) if I just use 'InstanceId'.
My assumption and from reading the examples suggested JSON PATH should allow me to select the values using either the dot notation for nested values but it doesnt seem to work. I also tried extracting 'Instances' into a temp variable if type JSON_LIST and then accessing it from there but also wasnt able to get a working example.
Any help appreciated. Many Thanks.
See ex8.sql. In particular, it says:
JSON Path for PL/JSON:
never raises an exception (null is returned instead)
arrays are 1-indexed
use dots to navigate through the json scopes.
the empty string as path returns the entire json object.
JSON Path only work with JSON as input.
7 get types are supported: string, number, bool, null, json, json_list and date!
spaces inside [ ] are not important, but is important otherwise
Thus, your path should be:
l_id := json_ext.get_string(instance, 'Instances[1].InstanceId');
Or, without directly using json_ext:
l_id := instance.path('Instances[1].InstanceId');