SAS libname json with arrays - sas

We are having a problem with the libname json, basically our dataset carries over the wrong field on object hierarchies.
here is a simple program:
filename resp "dataset.json";
filename pmap "map.json";
run;
libname example JSON fileref=resp map=pmap;
proc datasets lib=example;
run;
data objects;
set example.objects;
run;
the json dataset "dataset.json" looks like this:
{
"objects": [
{
"field": "wrong_answer"
},
{
"objectHierarchy": [
{
"map_to_this_level": "demo2"
},
{
"map_to_this_level": "demo1"
},
{
"map_to_this_level": "demo"
}
],
"field": "right_answer"
}
]
}
and the map "map.json" looks like this:
{
"DATASETS": [
{
"DSNAME": "objects",
"TABLEPATH": "/root/objects/objectHierarchy",
"VARIABLES": [
{
"NAME": "map_to_this_level",
"TYPE": "CHARACTER",
"PATH": "/root/objects/objectHierarchy/map_to_this_level",
"CURRENT_LENGTH": 10
},{
"NAME": "field1",
"TYPE": "CHARACTER",
"PATH": "/root/objects/field",
"CURRENT_LENGTH": 12
}
]
}
]
}
the resulting dataset "example.objects" looks like this:
map_to_this_level field
demo2 wrong_answer
demo1
demo right_answer
My question is why does the wrong_answer value from the field on the first object with an empty objectHierarchy get mapped onto the row of data from the next object with actual values for it's objectHierarchy field?
the data should look like this:
map_to_this_level field
demo2 right_answer
demo1
demo right_answer

I presume a human expectation of:
field1 map_to_this_level
------------ -----------------
wrong_answer <missing>
right_answer demo2
right_answer demo1
right_answer demo
The JSON library engine is a serial decoder. The order of the properties in the json being parsed do not mate well with the map specified and the operation of the internal map interpreter (i.e. the SAS JSON library engine black box)
Consider this example with these small changes:
in json the field comes before objectHierarchy
in json wrong_answer has an empty array objectHierarchy. Note: if the objectHierarchy was not present, no row would be output for wrong_answer
in map the field1 value is retained using SAS JSON map feature DATASETS/VARIABLES/OPTIONS:["RETAIN"]
filename response catalog "work.json.sandbox.source";
data _null_;
file response; input; put _infile_;
datalines4;
{
"objects": [
{
"field": "wrong_answer"
,
"objectHierarchy": []
},
{
"field": "right_answer"
,
"objectHierarchy": [
{
"map_to_this_level": "demo2"
},
{
"map_to_this_level": "demo1"
},
{
"map_to_this_level": "demo"
}
]
}
]
}
;;;;
run;
filename pmap catalog "work.json.pmap.source";
data _null_;
file pmap; input; put _infile_;
datalines4;
{
"DATASETS": [
{
"DSNAME": "objects",
"TABLEPATH": "/root/objects/objectHierarchy",
"VARIABLES": [
{
"NAME": "map_to_this_level",
"TYPE": "CHARACTER",
"PATH": "/root/objects/objectHierarchy/map_to_this_level",
"CURRENT_LENGTH": 10
},
{
"NAME": "field1",
"TYPE": "CHARACTER",
"PATH": "/root/objects/field",
"CURRENT_LENGTH": 12
, "OPTIONS": ["RETAIN"]
}
]
}
]
}
;;;;
run;
libname example JSON fileref=response map=pmap;
ods listing; options nocenter nodate nonumber formdlim='-'; title;
dm 'clear output';
proc datasets lib=example;
run;
proc print data=example.alldata;
run;
proc print data=example.objects;
run;
dm 'output';
Output
map_to_
this_
Obs level field1
1 wrong_answer
2 demo2 right_answer
3 demo1 right_answer
4 demo right_answer
If your json can not be trusted to be aligned with the mappings processed by SAS JSON library engine you will have to either:
work the the json provider, or
find an alternative interpreting mediary (Python, C#, etc) that can output modified json or an alternate interpreted form such as csv, that can be consumed by SAS.

Related

Create a table in AWS athena parsing dynamic keys in nested json

I have JSON files each line of the format below and I would like to parse this data and index it to a table using AWS Athena.
{
"123": {
"abc": {
"id": "test",
"data": "ipsum lorum"
},
"abd": {
"id": "test_new",
"data": "lorum ipsum"
}
}
}
Can a table with this format be created for the above data? In the documentation, it is mentioned that struct can be used for parsing nested JSON, however, there are no sample examples for dynamic keys.
You could cast JSON to map or array and transform it in any way you want. In this case you could use map_values and CROSS JOIN UNNEST to produce rows from JSON objects:
with test AS
(SELECT '{ "123": { "abc": { "id": "test", "data": "ipsum lorum" }, "abd": { "id": "test_new", "data": "lorum ipsum" } } }' AS str),
struct_like AS
(SELECT cast(json_parse(str) AS map<varchar,
map<varchar,
map<varchar,
varchar>>>) AS m
FROM test),
flat AS
(SELECT item
FROM struct_like
CROSS JOIN UNNEST(map_values(m)) AS t(item))
SELECT
key,
value['id'] AS id,
value['data'] AS data
FROM flat
CROSS JOIN unnest(item) AS t(key, value)
The result:
key id data
abc test ipsum lorum
abd test_new lorum ipsum

elasticsearch in json string (and / or )

I am new to AWS elasticsearch but need to create queries to search the follow data with different criteria.
search_metadata (json string with key/value pair) - "{\"number\":\"111\"; \"area\":\"central\"; "\code\":\"1111\"; \"type\":\"internal\"}"
category - "statement" or "bill" or "email"
datetime - "2019-05-04T00:00:00" or "2019-07-16T00:01:00"
flag - "good" or "bad"
I need to construct query to do the following
AND or OR condition in search_metadata field (JSON string) -> not sure how to do it.
along with AND condition for category, datetime range and flag. -> Do I need to use muliti-match for flag and category ?
"query": {
"bool": {
"must": [
{
"match_phrase": {
"search_metadata": "number 111" --> not sure about AND or OR with "area" and others
}
},
{
"range": {
"datetime": {
"gte": "2019-05-04T00:00:00Z",
"lte": "2019-07-16T00:01:00Z"
}
}
}
]
}
}
}

How to create array of nested fields and arrays in BigQuery

I am trying to create a table in BigQuery according to a json schema which I will put in GCS and push to a pub/sub topic from there. I need to create some arrays and nested fields in order to achieve that.
By using struct and array_agg I can achieve arrays of struct but I couldn't figure out how to create struct of array.
Imagine that I have a json schema as below:
{
"vacancies": {
"id": "12",
"timestamp": "2019-08-22T04:04:26Z",
"version": "1.0",
"positionOpening": {
"documentId": {
"value": "505"
},
"statusCode": "Closed",
"registrationDate": "2014-05-07T16:11:22Z",
"lastUpdated": "2014-05-07T16:14:56Z",
"positionProfiles": [
{
"positionTitle": "Data Scientist for international company",
"positionQualifications": [
{
"experienceSummary": [
{"measure": {"value": "10","unitCode": "ANN"}},
{"measure": {"value": "4","unitCode": "ANN"}}
],
"educationRequirement": {
"programs": ["Physics","Computer Science"],
"programConcentrations": ["Data Analysis","Python Programming"]
},
"languageRequirement": [
{
"competencyName": "English",
"requiredProficiencyLevel": {"scoresNumeric": [{"value": "100"},{"value": "95"}]}
},
{
"competencyName": "French",
"requiredProficiencyLevel": {"scoresNumeric": [{"value": "95"},{"value": "70"}]}
}
]
}
]
}
]
}
}
}
How can I create a SQL query to get this as a result?
Thanks in advance for the help!
You might have to build a temp table to do this.
This first create statement would take a denormalized table convert it to a table with an array of structs.
The second create statement would take that temp table and embed the array into a (array of) struct(s).
You could remove the internal struct from the first query, and array wrapper the second query to build a strict struct of arrays. But this should be flexibe enough that you can create an array of structs, a struct of arrays or any combination of the two as many times as you want up to the 15 levels deep that BigQuery allows you to max out at.
The final outcome of this could would be a table with one column (column1) of a standard datatype, as well as an array of structs called OutsideArrayOfStructs. That Struct has two columns of "standard" datatypes, as well as an array of structs called InsideArrayOfStructs.
CREATE OR REPLACE TABLE dataset.tempTable as (
select
column1,
column2,
column3,
ARRAY_AGG(
STRUCT(
ArrayObjectColumn1,
ArrayObjectColumn2,
ArrayObjectColumn3
)
) as InsideArrayOfStructs
FROM
sourceDataset.sourceTable
GROUP BY
column1,
column2,
column3 )
CREATE OR REPLACE TABLE dataset.finalTable as (
select
column1,
ARRAY_AGG(
STRUCT(
column2,
column3,
InsideArrayOfStructs
)
) as OutsideArrayOfStructs
FROM
dataset.tempTable
GROUP BY
Column1 )

Azure Cosmos query to convert into List

This is my JSON data, which is stored into cosmos db
{
"id": "e064a694-8e1e-4660-a3ef-6b894e9414f7",
"Name": "Name",
"keyData": {
"Keys": [
"Government",
"Training",
"support"
]
}
}
Now I want to write a query to eliminate the keyData and get only the Keys (like below)
{
"userid": "e064a694-8e1e-4660-a3ef-6b894e9414f7",
"Name": "Name",
"Keys" :[
"Government",
"Training",
"support"
]
}
So far I tried the query like
SELECT c.id,k.Keys FROM c
JOIN k in c.keyPhraseBatchResult
Which is not working.
Update 1:
After trying with the Sajeetharan now I can able to get the result, but the issue it producing another JSON inside the Array.
Like
{
"id": "ee885fdc-9951-40e2-b1e7-8564003cd554",
"keys": [
{
"serving": "Government"
},
{
"serving": "Training"
},
{
"serving": "support"
}
]
}
Is there is any way that extracts only the Array without having key value pari again?
{
"userid": "e064a694-8e1e-4660-a3ef-6b894e9414f7",
"Name": "Name",
"Keys" :[
"Government",
"Training",
"support"
]
}
You could try this one,
SELECT C.id, ARRAY(SELECT VALUE serving FROM serving IN C.keyData.Keys) AS Keys FROM C
Please use cosmos db stored procedure to implement your desired format based on the #Sajeetharan's sql.
function sample() {
var collection = getContext().getCollection();
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
'SELECT C.id,ARRAY(SELECT serving FROM serving IN C.keyData.Keys) AS keys FROM C',
function (err, feed, options) {
if (err) throw err;
if (!feed || !feed.length) {
var response = getContext().getResponse();
response.setBody('no docs found');
}
else {
var response = getContext().getResponse();
var map = {};
for(var i=0;i<feed.length;i++){
var keyArray = feed[i].keys;
var array = [];
for(var j=0;j<keyArray.length;j++){
array.push(keyArray[j].serving)
}
feed[i].keys = array;
}
response.setBody(feed);
}
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
Output:

cannot get correct syntax for pljson

I've installed pljson 1.05 in Oracle Xe 11g and written a PLSQL function to extract values from the return from Amazon AWS describe-instances.
Trying to obtain the values for top level items such as reservation ID work but i am unable to get values nested within lower levels of the json.
e.g. this example works (using the cutdown AWS JSON inline
DECLARE
reservations JSON_LIST;
l_tempobj JSON;
instance JSON;
L_id VARCHAR2(20);
BEGIN
obj:= json('{
"Reservations": [
{
"ReservationId": "r-5a33ea1a",
"Instances": [
{
"State": {
"Name": "stopped"
},
"InstanceId": "i-7e02503e"
}
]
},
{
"ReservationId": "r-e5930ea5",
"Instances": [
{
"State": {
"Name": "running"
},
"InstanceId": "i-77859692"
}
]
}
]
}');
reservations := json_list(obj.get('Reservations'));
l_tempobj := json(reservations);
DBMS_OUTPUT.PUT_LINE('============');
FOR i IN 1 .. l_tempobj.count
LOOP
DBMS_OUTPUT.PUT_LINE('------------');
instance := json(l_tempobj.get(i));
instance.print;
l_id := json_ext.get_string(instance, 'ReservationId');
DBMS_OUTPUT.PUT_LINE(i||'] Instance:'||l_id);
END LOOP;
END;
returning
============
------------
{
"ReservationId" : "r-5a33ea1a",
"Instances" : [{
"State" : {
"Name" : "stopped"
},
"InstanceId" : "i-7e02503e"
}]
}
1] Instance:r-5a33ea1a
------------
{
"ReservationId" : "r-e5930ea5",
"Instances" : [{
"State" : {
"Name" : "running"
},
"InstanceId" : "i-77859692"
}]
}
2] Instance:r-e5930ea5
but this example to return the instance ID doesnt
DECLARE
l_clob CLOB;
obj JSON;
reservations JSON_LIST;
l_tempobj JSON;
instance JSON;
L_id VARCHAR2(20);
BEGIN
obj:= json('{
"Reservations": [
{
"ReservationId": "r-5a33ea1a",
"Instances": [
{
"State": {
"Name": "stopped"
},
"InstanceId": "i-7e02503e"
}
]
},
{
"ReservationId": "r-e5930ea5",
"Instances": [
{
"State": {
"Name": "running"
},
"InstanceId": "i-77859692"
}
]
}
]
}');
reservations := json_list(obj.get('Reservations'));
l_tempobj := json(reservations);
DBMS_OUTPUT.PUT_LINE('============');
FOR i IN 1 .. l_tempobj.count
LOOP
DBMS_OUTPUT.PUT_LINE('------------');
instance := json(l_tempobj.get(i));
instance.print;
l_id := json_ext.get_string(instance, 'Instances.InstanceId');
DBMS_OUTPUT.PUT_LINE(i||'] Instance:'||l_id);
END LOOP;
END;
returning
============
------------
{
"ReservationId" : "r-5a33ea1a",
"Instances" : [{
"State" : {
"Name" : "stopped"
},
"InstanceId" : "i-7e02503e"
}]
}
1] Instance:
------------
{
"ReservationId" : "r-e5930ea5",
"Instances" : [{
"State" : {
"Name" : "running"
},
"InstanceId" : "i-77859692"
}]
}
2] Instance:
The only change from the first example to the second is replacing 'ReservationId' with 'Instances.InstanceId' but in the second example, although the function succeeds and the instance.print statement outputs the full json, this code doesnt populate the Instance ID into l_id so is not output on the DBMS_OUTPUT.
I also get the same result (i.e. no value in L_id) if I just use 'InstanceId'.
My assumption and from reading the examples suggested JSON PATH should allow me to select the values using either the dot notation for nested values but it doesnt seem to work. I also tried extracting 'Instances' into a temp variable if type JSON_LIST and then accessing it from there but also wasnt able to get a working example.
Any help appreciated. Many Thanks.
See ex8.sql. In particular, it says:
JSON Path for PL/JSON:
never raises an exception (null is returned instead)
arrays are 1-indexed
use dots to navigate through the json scopes.
the empty string as path returns the entire json object.
JSON Path only work with JSON as input.
7 get types are supported: string, number, bool, null, json, json_list and date!
spaces inside [ ] are not important, but is important otherwise
Thus, your path should be:
l_id := json_ext.get_string(instance, 'Instances[1].InstanceId');
Or, without directly using json_ext:
l_id := instance.path('Instances[1].InstanceId');