Extracting tag value from a JSON string field in Oracle - regex

I have an Oracle column that stores a JSON string. This is an example value
{
"data": {
"employer": {
"status": "active",
"name1": {
"content": "My favorite company"
}
}
}
}
I am interested in getting the value of the content tag that is contained in the first occurrence of name1 tag. So in this example, what I want is to get "My favorite company" (without the quotes)
How do I do this in Oracle SQL query?

If you are using Oracle 12.2 or higher, You may use below query -
SELECT JSON_VALUE(YOUR_COLUMN, $.content)
FROM YOUR_TABLE;

Here's one option; should be OK if JSON data is simple, but - that's what your example suggests:
SQL> select * from test;
JSON
--------------------------------------------------------------------------------
{
"data": {
"employer": {
"status": "active",
"name1": {
"content": "My favorite company"
}
}
}
}
Query:
temp CTE finds "content" string and returns everything that follows
the final query extracts what's between 3rd and 4th double quotes character
SQL> with temp as
2 (select substr(json,
3 instr(json, '"content"')
4 ) content
5 from test
6 )
7 select substr(content,
8 instr(content, '"', 1, 3) + 1,
9 instr(content, '"', 1, 4) - instr(content, '"', 1, 3) - 1
10 ) result
11 from temp;
RESULT
--------------------------------------------------------------------------------
My favorite company
SQL>

Related

Athena-express query returns nested array as a string

I have this json data in AWS S3, it's an array of objects.
[{"usefulOffer": "Nike shoe","webStyleId": "123","skus": [{"rmsSkuId": "456","eventIds": ["", "7", "8", "9"]},{"rmsSkuId": "777","eventIds": ["B", "Q", "W", "H"]}],"timeStamp": "4545"},
{"usefulOffer": "Adidas pants","webStyleId": "35","skus": [{"rmsSkuId": "16","eventIds": ["2", "4", "boo", "la"]}],"timeStamp": "999"},...]
This is a query how I created table/schema in Athena for data above
CREATE EXTERNAL TABLE IF NOT EXISTS table (
usefulOffer STRING,
webStyleId STRING,
skus array<struct<rmsSkuId: STRING, eventIds: array<STRING>>>,
`timeStamp` STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION 's3://...'
When I make a query to Athena using athena-express 'SELECT * FROM table' it returns the nice json format except the nested array it returns as a string
[
{
usefuloffer: 'Nike shoe',
webstyleid: '123',
skus: '[{rmsskuid=456, eventids=[, 7, 8, 9]}, {rmsskuid=777, eventids=[B, Q, W, H]}]',
timestamp: '4545'
},
{
usefuloffer: 'Adidas pants',
webstyleid: '35',
skus: '[{rmsskuid=16, eventids=[2, 4, boo, la]}]',
timestamp: '999'
},
I was trying create the table/schema without this option "WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')" but it returned me bad format at all.
How can I get the nested array as array but not as a string?
Thank you for help!

MongoDB regex returning "like" instead of "literal" match

I need to perform a case-insensitive find.
But I am getting a case-insensitive "like" returned, where things that are "like" my string match
I can overcome this in the console or tool by adding ^ and $ - however I cannot figure out how to do that in code, when Im passing a var into the query.
module.exports.getCollege = function( name, callback ) {
const query = [
{
$match: {
'name': { $regex: name, $options: 'i' }
}
},
{
$project: {
'address': 1,
'name': 1
}
}
];
College.aggregate( query ).exec( callback );
}
This returns
University of Michigan (object),
University of Michigan Flint campus (object)
I just want University of Michigan (object)
How do I get a LITERAL match, but case insensitive???
+++++++++++++++++++++++++++
Response to D. SM - "use collation..."
Returns correct because string is correct..
However - case insensitive not working
How do I get a LITERAL match, but case insensitive???
Specify collation in your query and use $eq instead of $regex.

Gettting the index of an element in an array of objects according to a field using regex in MongoDB

Lets say I have the following partial document
{
"groups": [
{
"text": "Example text 1 blah blah blah",
"key": 1,
},
{
"text": "Example text 2 blah blah blah",
"key": 2,
},
{
"text": "Example text 3 blah blah blah",
"key": 3,
},
}
I would like to search for 'Example text 2' using regex in the text field and retrieve the index of the object.
Basically combination of $indexOfArray and regex, I couldn't find a way to use them together.
What I'm trying to do is to first check if an object with the matching field exists, if it does, change the key value and bump this object to index 0 (first in the array), is there a better way to do this? (I'm planning on swapping the found object index with index 0)

Sort documents based on first character in field value

I have a set of data like this:
[{name: "ROBERT"}, {name: "PETER"}, {name: "ROBINSON"} , {name: "ABIGAIL"}]
I want to make a single mongodb query that can find:
Any data which name starts with letter "R" (regex: ^R)
Followed by any data which name contains letter "R" NOT AS THE FIRST CHARACTER, like: peteR, adleR, or caRl
so it produces:
[{name: "ROBERT"}, {name: "ROBINSON"}, {name: "PETER"}]
it basically just display any data that contains "R" character in it but I want to sort it so that data with "R" as the first character appears before the rest
So far I've come out with 2 separate query then followed by an operation to eliminate any duplicated results, then joined them. So is there any more efficient way to do this in mongo ?
What you want is add a weight to you documents and sort them accordingly.
First you need to select only those documents that $match your criteria using regular expressions.
To do that, you need to $project your documents and add the "weight" based on the value of the first character of your string using a logical $condition processing.
The condition here is $eq which add weight 1 to the document if the lowercase of the first character in the name is "r" or 0 if it's not.
Of course the $substr and the $toLower string aggregation operators respectively return the the first character in lowercase.
Finally you $sort your documents by weight in descending order.
db.coll.aggregate(
[
{ "$match": { "name": /R/i } },
{ "$project": {
"_id": 0,
"name": 1,
"w": {
"$cond": [
{ "$eq": [
{ "$substr": [ { "$toLower": "$name" }, 0, 1 ] },
"r"
]},
1,
0
]
}
}},
{ "$sort": { "w": -1 } }
]
)
which produces:
{ "name" : "ROBERT", "w" : 1 }
{ "name" : "ROBINSON", "w" : 1 }
{ "name" : "PETER", "w" : 0 }
try this :
db.collectioname.find ("name":/R/)

Mongodb Regex Query first 2 characters of the string

In one of my mongodb collection, I have a date string that has a mm/dd/yyyy format. Now, I want to query the 'mm' string.
Example, 05/20/2016 and 04/05/2015.
I want to get the first 2 characters of the string and query '05'. With that, the result I will get should only be 05/20/2016.
How can I achieve this?
Thanks!
For a regex solution, the following will suffice
var search = "05",
rgx = new RegExp("^"+search); // equivalent to var rgx = /^05/;
db.collection.find({ "leave_start": rgx });
Testing
var leave_start = "05/06/2016",
test = leave_start.match(/^05/);
console.log(test); // ["05", index: 0, input: "05/06/2016"]
console.log(test[0]); // "05"
or
var search = "05",
rgx = new RegExp("^"+search),
leave_start = "05/12/2016";
var test = leave_start.match(rgx);
console.log(test); // ["05", index: 0, input: "05/06/2016"]
console.log(test[0]); // "05"
Another alternative is to use the aggregation framework and take advantage of the $substr operator to extract the first 2 characters of a field and then the $match operator will filter documents based on the new substring field above:
db.collection.aggregate([
{
"$project": {
"leaves_start": 1,
"monthSubstring": { "$substr": : [ "$leaves_start", 0, 2 ] }
}
},
{ "$match": { "monthSubstring": "05" } }
])