sed regex find & replace (awk solutions welcome) - regex

I'm working on a JSON file (for MongoDB) and need to convert a field name to Database Reference. I'm attempting to do it via sed (though I'm open to solutions using awk, etc), but I'm a complete noob with the tool and am struggling.
Input:
...
"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" : "C00465971",
"RecipCode" : "RW",
"Amount" : 500,
....
Output needed:
...
"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" : {
"ref" : "Cmtes",
"$id" : "C00278101",
"$db" : "OpenSecrets"
},
"RecipCode" : "RW",
"Amount" : 500,
....
My sed command attempt is:
sed -r 's/\"CmteID\" \: \(\"[\w\d]\{9\}\",\)/\"CmteID\" : { \
\"ref\" : \"Cmtes\", \
\"$id\" : \1 \
\"$db\" : \"OpenSecrets\" \
}/' <IN_FILE >OUT_FILE
but I get this error when I run it:
sed: -e expression #1, char 198: invalid reference \1 on `s' command's RHS
Any help would be appreciated. Thanks.

An awk approach:
awk '$1=="\"CmteID\"" {$3="{\n\t\"ref\" : \"Cmtes\",\
\n\t\"\$id\" : "$3"\
\n\t\"\$db\" : \"OpenSecrets\"\n},"}1' infile
Explanation
When the first field is matched $1=="\"CmteID\"" we are changing the third field for the expected string, the only variable part is CmteID value , assigned in: \n\t\"\$id\" : "$3"
Line breaks added (escape char \) to improve the clarity of the code.
Results
"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" : {
"ref" : "Cmtes",
"$id" : "C00465971",
"$db" : "OpenSecrets"
},
"RecipCode" : "RW",
"Amount" : 500,

sed is for simple substitutions on individual lines, that is all. This problem is not like that, so this is not a job for sed.
$ cat tst.awk
BEGIN { FS=OFS=" : " }
$1 == "\"CmteID\"" {
print $1, "{"
print " \"ref\"", "\"Cmtes\""
print " \"$id\"", $2
print " \"$db\"", "\"OpenSecrets\""
$0 = "},"
}
{ print }
$ awk -f tst.awk file
...
TransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" : {
"ref" : "Cmtes"
"$id" : "C00465971",
"$db" : "OpenSecrets"
},
"RecipCode" : "RW",
"Amount" : 500,
....

awk to the rescue!
$ awk '$1=="\"CmteID\""{print $1 ": {";
print "\t\"ref\" : \"Cmtes\",";
print "\t\"$id\" : "$3;
print "\t\"$db\" : \"OpenSecrets\",";
print "},";
next}1' jsonfile
...
"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID": {
"ref" : "Cmtes",
"$id" : "C00465971",
"$db" : "OpenSecrets",
},
"RecipCode" : "RW",
"Amount" : 500,
....
with some cleanup
$ awk -v NT="\n\t" 'function q(x) {return "\""x"\"";};
$1==q("CmteID") {$3 = " {"
NT q("ref") " : " q("Cmtes") ","
NT q("$id") " : " $3
NT q("$db") " : " q("OpenSecrets")
",\n},"}1' jsonfile
...
"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" : {
"ref" : "Cmtes",
"$id" : "C00465971",
"$db" : "OpenSecrets",
},
"RecipCode" : "RW",
"Amount" : 500,
....

Many languages have built-in JSON parsers. PHP is one of them:
#!/usr/bin/php
<?php
$infile = $argv[1];
$outfile = $argv[2];
$data = json_decode(file_get_contents($infile));
$id = $data["CmteID"];
$data["CmteID"] = array("ref"=>"Cmtes", "\$id"=>$id, "\$db"=>"OpenSecrets");
file_put_contents($outfile, json_encode($data));
Untested but it should work. Make it executable and call ./myscript.php IN_FILE OUT_FILE.
My main point being, JSON is not text and using text-replacement on it can lead to problems, just like other structured data formats like XML!

This might work for you (GNU sed):
sed -r 's/"CmteID" : (.*)/"CmteID" : { \
"ref" : "Cmtes", \
"$id" : \1 \
"$db" : "OpenSecrets" \
},/' fileIn >fileOut
This was a case of over quoting. The parens grouping the $id had been quoted unneccessarily as the -r was inforce.

Related

Remove characters that matched regex on log4j2 event using replace parameter

Given this log event:
complete: task = { 'status' : 0, 'task' : '{ 'id' : 9149263125397547267, 'process' : 'A-SIGN', 'in' : '/file/log4j-api-2.11.0.jar' }' }
How can I transform highlighted part into status: 0 using regex?
So far I was able to get this part complete: task = { 'status' : 0, using the following code:
/^(?:[^:]*[:]){2}[^:]*([,])/
Any thoughts?
Pattern: '(status)'\s:\s(\d+)
Replace: $1 : $2
Demo: https://regex101.com/r/nb53XO/1

JMeter : Regex extracting from an array response

I want to extract the addressId for a given housenumber in a response with a long array. The array response looks like this (snippet):
: : "footprint":null,
: : "type":null,
: : "addressId":"0011442239",
: : "streetName":"solitudestr.",
: : "streetNrFirstSuffix":null,
: : "streetNrFirst":null,
: : "streetNrLastSuffix":null,
: : "streetNrLast":null,
: : "houseNumber":"25",
: : "houseName":null,
: : "city":"stuttgart",
: : "postcode":"70499",
: : "stateOrProvince":null,
: : "countryName":null,
: : "poBoxNr":null,
: : "poBoxType":null,
: : "attention":null,
: : "geographicAreas":
: : [
: : ],
: : "firstName":null,
: : "lastName":null,
: : "title":null,
: : "region":"BW",
: : "additionalInfo":null,
: : "properties":
: : [
: : ],
: : "extAddressId":null,
: : "entrance":null,
: : "district":null,
: : "addressLine1":null,
: : "addressLine2":null,
: : "addressLine3":null,
: : "addressLine4":null,
: : "companyName":null,
: : "contactName":null,
: : "houseNrExt":null,
: : "derbyStack":false
: },
: {
: : "footprint":null,
: : "type":null,
: : "addressId":"0011442246",
: : "streetName":"solitudestr.",
: : "streetNrFirstSuffix":null,
: : "streetNrFirst":null,
: : "streetNrLastSuffix":null,
: : "streetNrLast":null,
: : "houseNumber":"26",
: : "houseName":null,
: : "city":"stuttgart",
: : "postcode":"70499",
: : "stateOrProvince":null,
: : "countryName":null,
: : "poBoxNr":null,
: : "poBoxType":null,
: : "attention":null,
: : "geographicAreas":
: : [
: : ],
: : "firstName":null,
: : "lastName":null,
: : "title":null,
: : "region":"BW",
: : "additionalInfo":null,
: : "properties":
: : [
: : ],
: : "extAddressId":null,
: : "entrance":null,
: : "district":null,
: : "addressLine1":null,
: : "addressLine2":null,
: : "addressLine3":null,
: : "addressLine4":null,
: : "companyName":null,
: : "contactName":null,
: : "houseNrExt":null,
: : "derbyStack":false
: },
i only show 2 housenumbers in this response as an example but the original response is bigger.
Q: How can i match the adressId for a specific houseNumber (i have these houseNumbers in my CSV dataset) ? I Could do a regex which extracts all addressId's but then i'd have to use the correct matching no. in Jmeter. However, i cannot assume that the ordening of these will remain same in the different environments we test the script against.
I would recommend reconsidering using regular expressions to deal with JSON data.
Starting from JMeter 3.0 you have a JSON Path PostProcessor. Using it you can execute arbitrary JSONPath queries so extracting the addressID for the given houseNumber would be as simple as:
`$..[?(#.houseNumber == '25')].addressId`
Demo:
You can use a JMeter Variable instead of the hard-coded 25 value like:
$..[?(#.houseNumber == '${houseNumber}')].addressId
If for some reason you have to use JMeter < 3.0 you still can have JSON Path postprocessing capabilities using JSON Path Extractor via JMeter Plugins
See Advanced Usage of the JSON Path Extractor in JMeter article, in particular Conditional Select chapter for more information.
You may use a regex that will capture the digits after addressId and before a specific houseNumber if you use an unrolled tempered greedy token (for better efficiency) in between them to make sure the regex engine does not overflow to another record.
"addressId":"(\d+)"(?:[^\n"]*(?:\n(?!: +: +\[)[^\n"]*|"(?!houseNumber")[^\n"]*)*"houseNumber":"25"|$)
See the regex demo (replace 25 with the necessary house number)
Details:
"addressId":" - literal string
(\d+) - Group 1 ($1$ template value) capturing 1+ digits
" - a quote
(?:[^\n"]*(?:\n(?!: +: +\[)[^\n"]*|"(?!houseNumber")[^\n"]*)*"houseNumber":"25"|$) - a non-capturing group with 2 alternatives, one being $ (end of string) or:
[^\n"]* - zero or more chars other than newline and "
(?: - then come 2 alternatives:
\n(?!: +: +\[)[^\n"]* - a newline not followed with : : [ like string and followed with 0+chars other than a newline and "
| - or
"(?!houseNumber")[^\n"]* - a " not followed with houseNumber and followed with 0+chars other than a newline and "
)* - than may repeat 0 or more times
"houseNumber":"25" - hourse number literal string.

How to replace a multiline block from a file (JSON format) with sed, awk or other OS X tools?

I'm looking for a one-liner to execute in the terminal to replace a multiline text block with my own context inside a text file. I'm on OSX (not GNU sed) and not able to install any additonal tools.
What I want to do is to replace in
{
"user" :
{
"name": "Andreas",
"age": 34
},
"viewer" :
{
"name": "Pedro",
"age": 41
}
}
two lines between the curly brackets inside the "user" block with own values to get the result:
{
"user" :
{
"name": "Mike",
"age": 29
},
"viewer" :
{
"name": "Pedro",
"age": 41
}
}
Simple search for the lines containing "name" or "age" would not work as they can belong to another structure and should not be modified.
By combining several examples I found I got this one working:
sed -i '' -n $'1h;1! H;$ {;g;s#"user"[^{]*[^}]*#"user" :\\\n\\\t{\\\n\\\t\\\t"name": "Mike",\\\n\\\t\\\t"age": 29\\\n\\\t#p;}' config.json
However it seems to be quite complex and here are my questions.
How the matching pattern can be modified to detect only the content between the brackets, so I don't have to recreate the "user" key.
Is there another more elegant solution? sed, awk or any other system tools inlcuded in OS X are welcome.
Parsing a JSON is not a very good idea (you should give a look to jq), but awk can help.
For example, you can check when user appears and, from there, act on the subsequent lines:
awk '/user/ {f=NR}
NR==f+2 {sub ("Andreas","Mike")}
NR==f+3 {sub (34, 29)}
1' file
You can also provide the new values as parameters.
If you don't know the value of the parameters, use a regular expression to match the content inside:
awk '/user/ {f=NR} NR==f+2 {sub (/: ".*,$/,": \"Mike\",")} NR==f+3 {sub (/: [0-9]+$/, ": 29,")} 1' a
Test
$ awk '/user/ {f=NR} NR==f+2 {sub ("Andreas","Mike")} NR==f+3 {sub (34, 29)} 1' a
{
"user" :
{
"name": "Mike",
"age": 29
},
"viewer" :
{
"name": "Pedro",
"age": 41
}
}
sed -i '' -e '1h;1!H;$!d;x;s/\("user" :[^}]*"name": \)"[^"]*"\([^}]*"age": \)[0-9]*/\1"Mike"\234/' config.json
try this but cannot be sure there is not the same structure inside another one. It replace the first occurence
sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk.
$ cat tst.awk
BEGIN { split("name \"Mike\" age 29",map) }
/"user"/ { inUser = 1 }
inUser {
for (i=1;i in map;i+=2) {
if ($1 == "\""map[i]"\":") {
sub(/: [^ ,]+/,": "map[i+1])
}
}
if (/}/) {
inUser = 0
}
}
{ print }
$
$ awk -f tst.awk file
{
"user" :
{
"name": "Mike",
"age": 29
},
"viewer" :
{
"name": "Pedro",
"age": 41
}
}
The above will fail if the replacement string contains & since it's being used as the 2nd arg to sub() - if that could happen then you'd use match() and substr() instead of sub() so the replacement text is treated as a literal string:
if ($1 == "\""map[i]"\":") {
match($0,/: [^ ,]+/)
$0 = substr($0,1,RSTART-1) ": "map[i+1] substr($0,RSTART+RLENGTH)
}

Escaping a square bracket in a MongoDB regex / PCRE

I need to query a MongoDB database for documents whose field x starts with [text. I tried the following:
db.collection.find({x:{$regex:/^[text/}})
which fails because [ is part of the regex syntax. So I've spent some time trying to find how to escape [ from my regex... without any success so far.
Any help appreciated, thanks!
Using backslash \ in front of the square bracket as below :
db.collection.find({"x":{"$regex":"\\[text"}})
db.collection.find({"x":{"$regex":"^\\[text"}})
Or
db.collection.find({"x":{"$regex":"\\\\[text"}})
db.collection.find({"x":{"$regex":"^\\\\[text"}})
It returns those documents which starts with [text
For ex:
In documents contains following data
{ "_id" : ObjectId("55644128dd771680e5e5f094"), "x" : "[text" }
{ "_id" : ObjectId("556448d1dd771680e5e5f099"), "x" : "[text sd asd " }
{ "_id" : ObjectId("55644a06dd771680e5e5f09a"), "x" : "new text" }
and using db.collection.find({"x":{"$regex":"\\[text"}}) it return following results :
{ "_id" : ObjectId("55644128dd771680e5e5f094"), "x" : "[text" }
{ "_id" : ObjectId("556448d1dd771680e5e5f099"), "x" : "[text sd asd " }

NodeJS MongoDB $match Regex Format to JSON

saving a json document like
{
"$match" : {
"value" : { "$regex" : "^mystring" , "$options" : "i"}
}
}
in mongodb results internally in
{
"$match" : {
"value" : /^mystring/i
}
}
This seems to be not valid json anymore. E.g. if I try to send it as a Json result in nodejs I only get back this:
{
"$match": {
"value": {}
}
}
Is there a way to force the { "$regex" : "^mystring" , "$options" : "i"} syntax or another solution?