jq 1.5 find nested elements by regex when parent unknown - regex

Given the JSON structure below i would like to find the first occurrence of object ccc so I can add a new object to the children ddd. However I do not know the key name of the parent or how many levels deep it may be.
to find
"children": {
"ccc": [{
"id": "ddd",
"des": "object d",
"parent": "ccc"
}]
}
full JSON stored in $myJson
{
"zzz": [{
"id": "aaa",
"des": "object A",
"parent": "zzz",
"children": {
"aaa": [{
"id": "bbb",
"des": "object B",
"parent": "aaa",
"children": {
"bbb": [{
"id": "ccc",
"des": "object C",
"parent": "bbb",
"children": {
"ccc": [{
"id": "ddd",
"des": "object d",
"parent": "ccc"
}]
}
}, {
"id": "eee",
"des": "object e",
"parent": "bbb"
}]
}
},{
"id": "fff",
"des": "object f",
"parent": "aaa"
}]
}
}]}
follow some other answers I have tried combinations of
output=($(jq -r '.. | with_entries(select(.key|match("ccc";"i")))' <<< ${myjson}))
or
output=($(jq -r '.. | to_entries | map(select(.key | match("ccc";"i"))) | map(.value)' <<< ${myjson}))
all give errors of a similar nature jq: error (at <stdin>:1): number (0) cannot be matched, as it is not a string

In the following, I'll assume you want to add "ADDITIONAL" to the array at EVERY key that matches a given regex (here "ccc"):
walk(if type == "object"
then with_entries(if (.key|test("ccc"))
then .value += ["ADDITIONAL"] else . end)
else . end)
If your jq does not have walk/1, then you can simply copy-and-paste its def from the jq FAQ or builtin.jq
Alternative formulation
If you have the following general-purpose helper function handy (e.g. in your ~/.jq):
def when(filter; action): if (filter?) // null then action else . end;
then the above solution shrinks down to:
walk(when(type == "object";
with_entries(when(.key|test("ccc"); .value += ["ADDITIONAL"]))))

Related

Problem replacing brackets in concatenated JSON formatted string with -replace and REGEX

I have a script that gets a JSON list from an API. The call to the API will only return 100 records so I use a while loop to do multiple calls and then append each responses into a string variable $rawData. Once the loop is done, I end up with improper brackets at every connection of two append actions and it ends up looking like this:
}
}
]
[
{
Properly formatted, it should look like this:
}
},
{
I have tried the following:
$rawData -replace "\s\s\}\]\[",','
$rawData -replace "\s\s\}\n\]\[",','
$rawData -replace "\s\s\}`n\]\[",','
$rawData -replace "\}\]\[",','
$rawData -replace "\}`n\]\[",','
$rawData -replace "\}\n\]\[",','
All of these return the following, which totally baffles me:
}
}
][
{
Where is the comma that was specified as the replacement? How did it remove the newline after the ]? Then I tried to simplify it using:
$rawData -replace "\]\[",','
Output:
}
}
,
{
So close! I am assuming that there is a newline there but I can't figure out how to get rid of it. I have scoured so many sites and am now completely frustrated. If possible, can you tell me the right syntax and why my attempts have failed?
----EDIT----
Thank you MClayton! That solved the issue and it is definitely more efficient. I'm still trying to grasp what is happening with the ConvertTo(From)-Json cmdlets. I am still very interested in why the -replace did not work and how to make it work for future reference. If anyone has an answer for that it would be very helpful.
To summarise (and fill in a few blanks), you're basically doing something like this:
$response1 = #"
[
{ "object": { "name": "object 1"} },
{ "object": { "name": "object 2"} },
{ "object": { "name": "object 3"} }
]
"#
$response2 = #"
[
{ "object": { "name": "object 4"} },
{ "object": { "name": "object 5"} },
{ "object": { "name": "object 6"} }
]
"#
$result = $response1 + $response2;
which is giving a result like:
[
{ "object": { "name": "object 1"} },
{ "object": { "name": "object 2"} },
{ "object": { "name": "object 3"} }
][
{ "object": { "name": "object 4"} },
{ "object": { "name": "object 5"} },
{ "object": { "name": "object 6"} }
]
and that isn't valid json.
Rather than try to patch up the text with a regex, the suggestion by #Colyn1337 is to parse the json, join the objects and then turn it back into text:
$result = #( $response1, $response2 ) | convertFrom-Json
$result
#object
#------
##{name=object 1}
##{name=object 2}
##{name=object 3}
##{name=object 4}
##{name=object 5}
##{name=object 6}
Now you can either work on the objects directly, or you can turn it back into a single valid json string:
$json = $result | ConvertTo-Json
$json
#[
# {
# "object": {
# "name": "object 1"
# }
# },
# ... etc...
# {
# "object": {
# "name": "object 6"
# }
# }
#]

MongoDB - Find numbers that starts with a string

I'm trying to make a query that gets all the prices that starts with '12'.
I have a collection like this:
{
"place": "Costa Rica",
"name": "Villa Lapas",
"price": 1353,
},
{
"place": "Costa Rica",
"name": "Hotel NWS",
"price": 1948,
},
{
"place": "Costa Rica",
"name": "Hotel Papaya",
"price": 1283,
},
{
"place": "Costa Rica",
"name": "Hostal Serine",
"price": 1248,
},
And I want my results like this:
{
'prices': [
1248,
1283
]
}
I'm converting all the prices to string in order to use a regex function. But I don't understand very well how to use the regex in my query.
My query returns:
{ "prices" : null }
{ "prices" : null }
Could someone please guide me? :)
db.collection.aggregate([
{'$project': {
'_id': 0,
'price': {'$toString': '$price'}
}},
{'$project': {
'prices': {'$regexFind': { 'input': "$price", 'regex': '^12' }}
}}
]).pretty();
You are almost correct.
db.test.aggregate([
{'$project': {
'_id': 0,
'prices': {'$toString': '$price'}
^^^ -> I meant this
}},
{'$match': {
'prices': {'$regex': '^12' }
^^^ -> same here
}}
])
You need to use $match with $regex which yields the result as you expected.
If you use regexFind, it works on all matching docs and returns null where input doesn't match the pattern
And
In the first project you have price instead prices. If you refer the first project name in the second project, then pipeline matches.

What it is the Grep to remove all lines BUT, name and country

My file has this following pattern.
[
{
"id": 8050879,
"coord": { "lon": -1.65825, "lat": 42.808472 },
"country": "ES",
"geoname": { "cl": "P", "code": "PPLL", "parent": 6359749 },
"name": "Iturrama",
"stat": { "level": 1.0, "population": 24846 },
"stations": [
{ "id": 5493, "dist": 4, "kf": 1 },
{ "id": 28697, "dist": 32, "kf": 1 }
],
"zoom": 14
},
{
"id": 5406990,
"coord": { "lon": -122.064957, "lat": 37.906311 },
"country": "US",
"geoname": { "cl": "P", "code": "PPL", "parent": 5339268 },
"langs": [
{ "bg": "Уолнът Крийк" },
{ "de": "Walnut Creek" },
{ "en": "Walnut Creek" },
{ "eo": "Walnut Creek" },
{ "link": "http://en.wikipedia.org/wiki/Walnut_Creek%2C_California" },
{ "post": "94595" }
],
"name": "Walnut Creek",
"stat": { "level": 1.0, "population": 64173 },
"stations": [
{ "id": 374, "dist": 9, "kf": 1 },
{ "id": 10103, "dist": 9, "kf": 1 },
],
"zoom": 11
},
...
]
I would like to get
[
{
"country": "ES",
"name": "Iturrama"
},
{
"country": "US",
"name": "Walnut Creek"
},
...
]
I have been using
grep -v id filename > result
then
grep -v coord result > result
grep -v geoname result > result
...
until I get my pattern, but I noticed I am deleting anything that has id on it,
So If I have a name: "cIDadel" it will delete too.
Can any one help me with that?
Don't use non-syntax aware tools like grep to parse structured data like JSON. It can't possibly differentiate the underlying types i.e. object/array or any other. Use a proper parser like jq using which you can simply do
jq 'map({country, name})' json_file
See it work in jq-playground. Downloading instructions and setting up is pretty easy - Download jq
If you need to use shell tools for some reason instead of JSON parsing, use AWK.
file.awk
/^\[$/ {print($0)}
/^\{$/ {print($0)}
/"country"/ {print($0)}
/"name"/ {print($0)}
/^ *\},$/ {print($1)}
/^\]$/ {print($0)}
Call:
awk -f file.awk yourdata.txt

How do I make an User required JSON

I have a JSON file, in that three objects are available, In that 2nd and 3rd objects does not have some fields which I actually needed. In missing fields, I need to add my own values. I will provide my code below
I tried this So far:
with open("final.json") as data1:
a = json.load(data1)
final = []
for item in a:
d = {}
d["AppName"]= item["name"]
d["AppId"] = item["id"]
d["Health"] = item["health"]
d["place1"] = item["cities"][0]["place1"]
d["place2"] = item["cities"][0]["place2"]
print(final)
Error: I am getting Key Error
My Input JSON file has similar data:
[{
"name": "python",
"id": 1234,
"health": "Active",
"cities": {
"place1": "us",
"place2": "newyork"
}
},
{
"name": "java",
"id": 2345,
"health": "Active"
}, {
"name": "python",
"id": 1234
}
]
I am expecting output:
[{
"name": "python",
"id": 1234,
"health": "Active",
"cities": {
"place1": "us",
"place2": "newyork"
}
},
{
"name": "java",
"id": 2345,
"health": "Null",
"cities": {
"place1": "0",
"place2": "0"
}
}, {
"name": "python",
"id": 1234,
"health": "Null",
"cities": {
"place1": "0",
"place2": "0"
}
}
]
I see two issues with the code that you have posted.
First, you are referring to the 'cities' field in you input JSON as if it is a list when it is, in fact, an object.
Second, to handle JSON containing objects which may be missing certain fields, you should use the Python dictionary get method. This method takes a key and an optional value to return if the key is not found (default is None).
for item in a:
d = {}
d["AppName"]= item["name"]
d["AppId"] = item["id"]
d["Health"] = item.get("health", "Null")
d["place1"] = item.get("cities", {}).get("place1", "0")
d["place2"] = item.get("cities", {}).get("place2", "0")

Get all instances of text in curly braces between brackets

Let's say I have some text like this:
{
"source": "Analytics 13 {Employee_Info.acl} {Employee_Data}",
"lastRecNo": "3",
"columns": {
"ID": "numeric",
"NAME": "character",
"EFFECTIVE_DATE": "date",
"ROLE": "character"
},
"data": [{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
},
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
},
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}]
}
And I want to RegEx match every object inside the "data" array (including the curly braces).
So the first match would be:
{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
}
the second would be:
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
}
and the third would be
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}
What regex pattern would I use to do that in PowerShell?
Notice the first match actually has some extra curly braces in the text of the "ROLE" field, which shouldn't interfere with the match.
I've tried this so far '(?<={).*?(?=})', but the first match is:
"source": "Analytics 13 {Employee_Info.acl
This result isn't a part of the "data" array and it doesn't include the curly braces in the match. I know I'm missing something that says "make sure we are inside the brackets/"data" array and I'm probably not taking into account the extra curly braces in the "ROLE" field in the first object of the "data" array that I want to ignore.
Your task can be easily done using ConvertFrom-Json and ConvertTo-Json cmdlets.
Here is a brief example:
First you get text file content to variable.
$JSON = #"
[
{
"source": "Analytics 13 {Employee_Info.acl} {Employee_Data}",
"lastRecNo": "3",
"columns": {
"ID": "numeric",
"NAME": "character",
"EFFECTIVE_DATE": "date",
"ROLE": "character"
},
"data": [{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
},
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
},
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}]
}
]
"#
Then you just perform converting from JSON using ConvertFrom-Json cmdlet.
ConvertFrom-Json -InputObject $JSON
Output:
source lastRecNo columns data
------ --------- ------- ----
Analytics 13 {Employee_Info.acl} {Employee_Data} 3 #{ID=numeric; NAME=character; EFFECTIVE_DATE=date; ROLE=character} {#{ID=1; NAME=Bill Smith; EFFECTIVE_DATE=2018-10-01; ROLE=Director {Regional},{Call Center}}, #...
You then can return items from DATA to JSON format using ConvertTo-Json cmdlet. All together.
$PSObject = ConvertFrom-Json -InputObject $JSON
foreach ($item in $PSObject.data){
ConvertTo-Json $item
}
Output:
{
"ID": 1,
"NAME": "Bill Smith",
"EFFECTIVE_DATE": "2018-10-01",
"ROLE": "Director {Regional},{Call Center}"
}
{
"ID": 2,
"NAME": "Ellen Jones",
"EFFECTIVE_DATE": "2018-07-01",
"ROLE": "Manager"
}
{
"ID": 3,
"NAME": "Sam Edwards",
"EFFECTIVE_DATE": "2018-09-01",
"ROLE": "Supervisor"
}
You can now add filter conditions for DATA items in foreach loop.