Athena-express query returns nested array as a string - amazon-athena

I have this json data in AWS S3, it's an array of objects.
[{"usefulOffer": "Nike shoe","webStyleId": "123","skus": [{"rmsSkuId": "456","eventIds": ["", "7", "8", "9"]},{"rmsSkuId": "777","eventIds": ["B", "Q", "W", "H"]}],"timeStamp": "4545"},
{"usefulOffer": "Adidas pants","webStyleId": "35","skus": [{"rmsSkuId": "16","eventIds": ["2", "4", "boo", "la"]}],"timeStamp": "999"},...]
This is a query how I created table/schema in Athena for data above
CREATE EXTERNAL TABLE IF NOT EXISTS table (
usefulOffer STRING,
webStyleId STRING,
skus array<struct<rmsSkuId: STRING, eventIds: array<STRING>>>,
`timeStamp` STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION 's3://...'
When I make a query to Athena using athena-express 'SELECT * FROM table' it returns the nice json format except the nested array it returns as a string
[
{
usefuloffer: 'Nike shoe',
webstyleid: '123',
skus: '[{rmsskuid=456, eventids=[, 7, 8, 9]}, {rmsskuid=777, eventids=[B, Q, W, H]}]',
timestamp: '4545'
},
{
usefuloffer: 'Adidas pants',
webstyleid: '35',
skus: '[{rmsskuid=16, eventids=[2, 4, boo, la]}]',
timestamp: '999'
},
I was trying create the table/schema without this option "WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')" but it returned me bad format at all.
How can I get the nested array as array but not as a string?
Thank you for help!

Related

Extracting tag value from a JSON string field in Oracle

I have an Oracle column that stores a JSON string. This is an example value
{
"data": {
"employer": {
"status": "active",
"name1": {
"content": "My favorite company"
}
}
}
}
I am interested in getting the value of the content tag that is contained in the first occurrence of name1 tag. So in this example, what I want is to get "My favorite company" (without the quotes)
How do I do this in Oracle SQL query?
If you are using Oracle 12.2 or higher, You may use below query -
SELECT JSON_VALUE(YOUR_COLUMN, $.content)
FROM YOUR_TABLE;
Here's one option; should be OK if JSON data is simple, but - that's what your example suggests:
SQL> select * from test;
JSON
--------------------------------------------------------------------------------
{
"data": {
"employer": {
"status": "active",
"name1": {
"content": "My favorite company"
}
}
}
}
Query:
temp CTE finds "content" string and returns everything that follows
the final query extracts what's between 3rd and 4th double quotes character
SQL> with temp as
2 (select substr(json,
3 instr(json, '"content"')
4 ) content
5 from test
6 )
7 select substr(content,
8 instr(content, '"', 1, 3) + 1,
9 instr(content, '"', 1, 4) - instr(content, '"', 1, 3) - 1
10 ) result
11 from temp;
RESULT
--------------------------------------------------------------------------------
My favorite company
SQL>

Create a table in AWS athena parsing dynamic keys in nested json

I have JSON files each line of the format below and I would like to parse this data and index it to a table using AWS Athena.
{
"123": {
"abc": {
"id": "test",
"data": "ipsum lorum"
},
"abd": {
"id": "test_new",
"data": "lorum ipsum"
}
}
}
Can a table with this format be created for the above data? In the documentation, it is mentioned that struct can be used for parsing nested JSON, however, there are no sample examples for dynamic keys.
You could cast JSON to map or array and transform it in any way you want. In this case you could use map_values and CROSS JOIN UNNEST to produce rows from JSON objects:
with test AS
(SELECT '{ "123": { "abc": { "id": "test", "data": "ipsum lorum" }, "abd": { "id": "test_new", "data": "lorum ipsum" } } }' AS str),
struct_like AS
(SELECT cast(json_parse(str) AS map<varchar,
map<varchar,
map<varchar,
varchar>>>) AS m
FROM test),
flat AS
(SELECT item
FROM struct_like
CROSS JOIN UNNEST(map_values(m)) AS t(item))
SELECT
key,
value['id'] AS id,
value['data'] AS data
FROM flat
CROSS JOIN unnest(item) AS t(key, value)
The result:
key id data
abc test ipsum lorum
abd test_new lorum ipsum

How to create array of nested fields and arrays in BigQuery

I am trying to create a table in BigQuery according to a json schema which I will put in GCS and push to a pub/sub topic from there. I need to create some arrays and nested fields in order to achieve that.
By using struct and array_agg I can achieve arrays of struct but I couldn't figure out how to create struct of array.
Imagine that I have a json schema as below:
{
"vacancies": {
"id": "12",
"timestamp": "2019-08-22T04:04:26Z",
"version": "1.0",
"positionOpening": {
"documentId": {
"value": "505"
},
"statusCode": "Closed",
"registrationDate": "2014-05-07T16:11:22Z",
"lastUpdated": "2014-05-07T16:14:56Z",
"positionProfiles": [
{
"positionTitle": "Data Scientist for international company",
"positionQualifications": [
{
"experienceSummary": [
{"measure": {"value": "10","unitCode": "ANN"}},
{"measure": {"value": "4","unitCode": "ANN"}}
],
"educationRequirement": {
"programs": ["Physics","Computer Science"],
"programConcentrations": ["Data Analysis","Python Programming"]
},
"languageRequirement": [
{
"competencyName": "English",
"requiredProficiencyLevel": {"scoresNumeric": [{"value": "100"},{"value": "95"}]}
},
{
"competencyName": "French",
"requiredProficiencyLevel": {"scoresNumeric": [{"value": "95"},{"value": "70"}]}
}
]
}
]
}
]
}
}
}
How can I create a SQL query to get this as a result?
Thanks in advance for the help!
You might have to build a temp table to do this.
This first create statement would take a denormalized table convert it to a table with an array of structs.
The second create statement would take that temp table and embed the array into a (array of) struct(s).
You could remove the internal struct from the first query, and array wrapper the second query to build a strict struct of arrays. But this should be flexibe enough that you can create an array of structs, a struct of arrays or any combination of the two as many times as you want up to the 15 levels deep that BigQuery allows you to max out at.
The final outcome of this could would be a table with one column (column1) of a standard datatype, as well as an array of structs called OutsideArrayOfStructs. That Struct has two columns of "standard" datatypes, as well as an array of structs called InsideArrayOfStructs.
CREATE OR REPLACE TABLE dataset.tempTable as (
select
column1,
column2,
column3,
ARRAY_AGG(
STRUCT(
ArrayObjectColumn1,
ArrayObjectColumn2,
ArrayObjectColumn3
)
) as InsideArrayOfStructs
FROM
sourceDataset.sourceTable
GROUP BY
column1,
column2,
column3 )
CREATE OR REPLACE TABLE dataset.finalTable as (
select
column1,
ARRAY_AGG(
STRUCT(
column2,
column3,
InsideArrayOfStructs
)
) as OutsideArrayOfStructs
FROM
dataset.tempTable
GROUP BY
Column1 )

How can I convert json map to List String?

I need to get country name and country code into List as "Andorra (AD)". I can load json to a map but I cannot convert to List. How can I convert json map to List String?
"country": [
{
"countryCode": "AD",
"countryName": "Andorra",
"currencyCode": "EUR",
"isoNumeric": "020"
},
You can use the .map() function
var countryList = country.map((c) => '${c["countryName"]} (${c["countryCode"]})').toList()

Sort documents based on first character in field value

I have a set of data like this:
[{name: "ROBERT"}, {name: "PETER"}, {name: "ROBINSON"} , {name: "ABIGAIL"}]
I want to make a single mongodb query that can find:
Any data which name starts with letter "R" (regex: ^R)
Followed by any data which name contains letter "R" NOT AS THE FIRST CHARACTER, like: peteR, adleR, or caRl
so it produces:
[{name: "ROBERT"}, {name: "ROBINSON"}, {name: "PETER"}]
it basically just display any data that contains "R" character in it but I want to sort it so that data with "R" as the first character appears before the rest
So far I've come out with 2 separate query then followed by an operation to eliminate any duplicated results, then joined them. So is there any more efficient way to do this in mongo ?
What you want is add a weight to you documents and sort them accordingly.
First you need to select only those documents that $match your criteria using regular expressions.
To do that, you need to $project your documents and add the "weight" based on the value of the first character of your string using a logical $condition processing.
The condition here is $eq which add weight 1 to the document if the lowercase of the first character in the name is "r" or 0 if it's not.
Of course the $substr and the $toLower string aggregation operators respectively return the the first character in lowercase.
Finally you $sort your documents by weight in descending order.
db.coll.aggregate(
[
{ "$match": { "name": /R/i } },
{ "$project": {
"_id": 0,
"name": 1,
"w": {
"$cond": [
{ "$eq": [
{ "$substr": [ { "$toLower": "$name" }, 0, 1 ] },
"r"
]},
1,
0
]
}
}},
{ "$sort": { "w": -1 } }
]
)
which produces:
{ "name" : "ROBERT", "w" : 1 }
{ "name" : "ROBINSON", "w" : 1 }
{ "name" : "PETER", "w" : 0 }
try this :
db.collectioname.find ("name":/R/)