To split a json file.. Extracting data between curly braces - regex

I have a json file. I want to split that file into different parts..
Following is my file's content.
I want to split the content based on the curly brackets {},
"1010320": {
"abc": [
"1012220",
"hiiiiiiiii."
],
"xyz": "Describe"
},
"1012757": {
"pqr": [
"1013757",
"x"
]
},
"1014220": {
"abc": [
"1018420",
"sooooo"
],
"answer": "4th"
},
"1019660": {
"abc": [
"1031920",
"welcome"
],
"xyz": "Describing&Interpreting"
},
"1034280": {
"abc": [
"1040560",
"Ok..."
],
"nop": "Student Question"
},
The output should be:
1) "abc": [
"1012220",
"hiiiiiiiii."
],
"xyz": "Describe"
2) "pqr": [
"1013757",
"x"
]
3) "abc": [
"1018420",
"sooooo"
],
"answer": "4th"
plz.. help..

i think this will be useful for you
(?<=\{)\n\s+((?:[\n]+|.*)+?)\n\}
regex demo here : http://regex101.com/r/rS3wI5

Related

How to extract an element in an array if the filter element is 2 levels down

My ListInputSecurityGroup task returns this json:
{
"output": [
{
"Arn": "arn:aws:medialive:eu-north-1:xxx:inputSecurityGroup:1977625",
"Id": "1977625",
"Inputs": [],
"State": "IDLE",
"Tags": {},
"WhitelistRules": [
{
"Cidr": "5.5.5.5/32"
}
]
},
{
"Arn": "arn:aws:medialive:eu-north-1:xxx:inputSecurityGroup:5411101",
"Id": "5411101",
"Inputs": [],
"State": "IDLE",
"Tags": {
"use": "some_other_use"
},
"WhitelistRules": [
{
"Cidr": "1.1.1.1/0"
}
]
},
{
"Arn": "arn:aws:medialive:eu-north-1:xxx:inputSecurityGroup:825926",
"Id": "825926",
"Inputs": [
"4011716"
],
"State": "IN_USE",
"Tags": {
"use": "for_rtmp_pipeline"
},
"WhitelistRules": [
{
"Cidr": "0.0.0.0/0"
}
]
}
]
}
I want to use OutputPath to extract the InputSecurityGroup with the tag {use:for_rtmp_pipeline}. According to this JSONPath tester this expression works $.output[?(#.Tags.use == for_rtmp_pipeline)] and it returns the 3rd element in this array. But when used in the StepFunction itself, or in the Data Flow Simulator, it doesn't return anything. Is this a limitation of the JSONPath engine in AWS, or is there a different syntaxis? How can I extract the one element I want?
Note that in the tester the searched string should be in quotes, while in AWS there's no need for quotes.

LogicApp:replace the message in the csv table with a "." for ","

I have the flow where i want to edit the column in the csv table and replace the "," by a "."
How do I do that? Because the replace function expression in logicApp does not return the column:
It asks me to take the complete body when I use the replace function.
Where as details column is available which I want to edit:
How should I replace the "," from the details column?
I did this then, Then i don't see the variable I initialize.
For instance I've taken this as my sample .csv file which I'm retrieving from my storage account.
Firstly I have used Parse CSV file like you did the same, then initialised and used the Append the string variable connector taking the Productsname column. Lastly, have used the replace function expression to replace ' , ' with a ' . '.
NOTE: I have used '|' following productsname variable for future purpose.
Here is my Logic App workflow
THE COMPOSE CONNECTOR EXPRESSION :-
split(replace(variables('Productname'),',','.'),'|')
OUTPUT:
Here is my workflow that you can refer to:
{
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"Compose": {
"inputs": "#split(replace(variables('Productname'),',','.'),'|')",
"runAfter": {
"For_each_2": [
"Succeeded"
]
},
"type": "Compose"
},
"For_each_2": {
"actions": {
"Append_to_string_variable": {
"inputs": {
"name": "Productname",
"value": "#{items('For_each_2')?['Productname']}|"
},
"runAfter": {},
"type": "AppendToStringVariable"
}
},
"foreach": "#body('Parse_CSV')",
"runAfter": {
"Initialize_variable": [
"Succeeded"
]
},
"type": "Foreach"
},
"Get_blob_content_(V2)": {
"inputs": {
"host": {
"connection": {
"name": "#parameters('$connections')['azureblob']['connectionId']"
}
},
"method": "get",
"path": "/v2/datasets/#{encodeURIComponent(encodeURIComponent('AccountNameFromSettings'))}/files/#{encodeURIComponent(encodeURIComponent('JTJmY29udGFpbmVyMjQwOCUyZlByb2R1Y3RzLmNzdg=='))}/content"
},
"metadata": {
"JTJmY29udGFpbmVyMjQwOCUyZlByb2R1Y3RzLmNzdg==": "/container2408/Products.csv"
},
"runAfter": {},
"type": "ApiConnection"
},
"Initialize_variable": {
"inputs": {
"variables": [
{
"name": "Productname",
"type": "string"
}
]
},
"runAfter": {
"Parse_CSV": [
"Succeeded"
]
},
"type": "InitializeVariable"
},
"Parse_CSV": {
"inputs": {
"body": {
"content": "#{base64(body('Get_blob_content_(V2)'))}",
"headers": "Productid,Productname"
},
"host": {
"connection": {
"name": "#parameters('$connections')['plumsail']['connectionId']"
}
},
"method": "post",
"path": "/flow/v1/Documents/jobs/ParseCsv"
},
"runAfter": {
"Get_blob_content_(V2)": [
"Succeeded"
]
},
"type": "ApiConnection"
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"parameters": {
"$connections": {
"defaultValue": {},
"type": "Object"
}
},
"triggers": {
"manual": {
"inputs": {
"schema": {}
},
"kind": "Http",
"type": "Request"
}
}
},
"parameters": {
"$connections": {
"value": {
"azureblob": {
"connectionId": "/subscriptions/<subscription id>/resourceGroups/<Your resource group name>/providers/Microsoft.Web/connections/azureblob",
"connectionName": "azureblob",
"id": "/subscriptions/<subscription id>/providers/Microsoft.Web/locations/northcentralus/managedApis/azureblob"
},
"plumsail": {
"connectionId": "/subscriptions/<subscription id >/resourceGroups/<Your resource group name>/providers/Microsoft.Web/connections/plumsail",
"connectionName": "plumsail",
"id": "/subscriptions/<subscription id>/providers/Microsoft.Web/locations/northcentralus/managedApis/plumsail"
}
}
}
}
}
I used items function express and did it directly.
#replace(item()?['details'],',','')
This was a bit strange it didn't work at first but now it is working.

Removing ids from Postman Collection with bash script - sed and regex

I'm trying to solve an issue with Postman Collections.
Test scripts added to collection generates additional field "id".
Id field change after each export of the Collection to file.
Due to this fact PRs with changes in Postman Collections are very hard to read.
I want to solve that issue with git pre commit hook and bash script which will remove all id's from script object of collection.
There are three possible locations of the id in scripts object:
First element of object
"script":{
"id": "83d9076e-64c7-47fa-9b50-b7635718c925",
"exec": [
"console.log(\"foo\");"
],
"type": "text/javascript"
}
Middle of object
"script":{
"exec": [
"console.log(\"foo\");"
],
"id": "83d9076e-64c7-47fa-9b50-b7635718c925",
"type": "text/javascript"
}
End of object
"script":{
"exec": [
"console.log(\"foo\");"
],
"type": "text/javascript",
"id": "83d9076e-64c7-47fa-9b50-b7635718c925"
}
From regex point of view case 1 and 2 are the same:
.*"id": "[a-f0-9-]*",
Case 3 is different and regex which handles this option is:
,\n.*"id": "[a-f0-9-]*",
As I mentioned before, I want to use this regexp in bash script:
postmanClean.sh
#!/bin/bash
COLLECTION_FILES=$(find . -type f -name "*postman_collection.json")
for POSTMAN_COLLECTION in ${COLLECTION_FILES}
do
echo "Harmonizing Postman $POSTMAN_COLLECTION"
sed -i -e 's/.*"id": "[a-f0-9-]*"\,//' ${POSTMAN_COLLECTION} # Remove test/script ID
sed -i -e 's/\,\n.*"id": "[a-f0-9-]*"//' ${POSTMAN_COLLECTION} # Remove test/script ID
done
Above solution is incorrect. I tried different options, but this regexp are not working.
How properly build this request to make them work with sed command?
Collection file:
demo.postman_collection.json
{
"info": {
"_postman_id": "258b2fe2-5768-47f8-9e82-70971bab6bbd",
"name": "demo",
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
},
"item": [
{
"name": "One",
"item": [
{
"name": "Demo 1",
"event": [
{
"listen": "test",
"script": {
"id": "83d9076e-64c7-47fa-9b50-b7635718c925",
"exec": [
"console.log(\"foo\");"
],
"type": "text/javascript"
}
}
],
"protocolProfileBehavior": {
"disableBodyPruning": true
},
"request": {
"method": "GET",
"header": [],
"body": {
"mode": "raw",
"raw": "foo"
},
"url": {
"raw": "https://postman-echo.com/delay/1",
"protocol": "https",
"host": [
"postman-echo",
"com"
],
"path": [
"delay",
"1"
]
}
},
"response": []
}
],
"protocolProfileBehavior": {}
},
{
"name": "Two",
"item": [
{
"name": "Demo 2",
"event": [
{
"listen": "test",
"script": {
"exec": [
"console.log(\"bar\");"
],
"type": "text/javascript",
"id": "facb28f7-c54d-46e2-adb2-4c929fd1edd3"
}
}
],
"protocolProfileBehavior": {
"disableBodyPruning": true
},
"request": {
"method": "GET",
"header": [],
"body": {
"mode": "raw",
"raw": "bar"
},
"url": {
"raw": "https://postman-echo.com/delay/2",
"protocol": "https",
"host": [
"postman-echo",
"com"
],
"path": [
"delay",
"2"
]
}
},
"response": []
},
{
"name": "Demo 3",
"event": [
{
"listen": "test",
"script": {
"exec": [
"console.log(\"foobar\");"
],
"id": "facb28f7-c54d-46e2-adb2-4c929fd1edd3",
"type": "text/javascript"
}
}
],
"protocolProfileBehavior": {
"disableBodyPruning": true
},
"request": {
"method": "GET",
"header": [],
"body": {
"mode": "raw",
"raw": "bar"
},
"url": {
"raw": "https://postman-echo.com/delay/3",
"protocol": "https",
"host": [
"postman-echo",
"com"
],
"path": [
"delay",
"3"
]
}
},
"response": []
}
],
"protocolProfileBehavior": {}
}
],
"protocolProfileBehavior": {}
}
I think jq is the right tool for this job and the solution will be as simple as walk(del(.id?)). here a rewrite of your script using jq:
#!/bin/bash
COLLECTION_FILES=$(find . -type f -name "*postman_collection.json")
for f in ${COLLECTION_FILES}
do
echo "Harmonizing Postman $f"
jq --indent 4 'walk(del(.id?))' "$f" > "$f.tmp" && mv "$f.tmp" "$f"
done
and a demo (please note how jq takes care of removing the extra , from "type": "text/javascript", which will otherwise invalidate the json):
$ cp demo.postman_collection.json demo.postman_collection.json.bak
$ ./postmanClean.sh
Harmonizing Postman ./demo.postman_collection.json
$ diff demo.postman_collection.json.bak demo.postman_collection.json
17d16
< "id": "83d9076e-64c7-47fa-9b50-b7635718c925",
65,66c64
< "type": "text/javascript",
< "id": "facb28f7-c54d-46e2-adb2-4c929fd1edd3"
---
> "type": "text/javascript"
104d101
< "id": "facb28f7-c54d-46e2-adb2-4c929fd1edd3",
$
You don't need to distinguish the two patterns, as you can use sed to just match any line that contains the "id": "..." pattern and then use it to delete the entire line where it matched using the d command. So you do not need to care about the newlines, whitespace or whether the trailing comma is there or not.
Executed on your example
sed -i '/"id": "[a-f0-9-]*"/d' demo.postman_collection.json
removes all the id lines (except the "_postman_id" of course).

How to use regex inside in query using morphia?

Mongodb allows regex expression of pattern /pattern/ without using $regex expression.
http://docs.mongodb.org/manual/reference/operator/query/in/
How can i do it using morphia ?
If i give Field criteria with field operator as in and value of type "java.util.regex.Pattern" then the equivalent query generated in
$in:[$regex: 'given pattern'] which wont return expected results at all.
Expectation: $in :[ /pattern1 here/,/pattern2 here/]
Actual using 'Pattern' object : $in : [$regex:/pattern1 here/,$regex:/pattern 2 here/]
I'm not entirely sure what to make of your code examples, but here's a working Morphia code snippet:
Pattern regexp = Pattern.compile("^" + email + "$", Pattern.CASE_INSENSITIVE);
mongoDatastore.find(EmployeeEntity.class).filter("email", regexp).get();
Note that this is really slow. It can't use an index and will always require a full collection scan, so avoid it at all cost!
Update: I've added a specific code example. The $in is not required to search inside an array. Simply use /^I/ as you would in string:
> db.profile.find()
{
"_id": ObjectId("54f3ac3fa63f282f56de64bd"),
"tags": [
"India",
"Australia",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ac4da63f282f56de64be"),
"tags": [
"Island",
"Antigua"
]
}
{
"_id": ObjectId("54f3ac5ca63f282f56de64bf"),
"tags": [
"Spain",
"Mexico"
]
}
{
"_id": ObjectId("54f3ac6da63f282f56de64c0"),
"tags": [
"Israel"
]
}
{
"_id": ObjectId("54f3ad17a63f282f56de64c1"),
"tags": [
"Germany",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ad56a63f282f56de64c2"),
"tags": [
"ireland"
]
}
> db.profile.find({ tags: /^I/ })
{
"_id": ObjectId("54f3ac3fa63f282f56de64bd"),
"tags": [
"India",
"Australia",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ac4da63f282f56de64be"),
"tags": [
"Island",
"Antigua"
]
}
{
"_id": ObjectId("54f3ac6da63f282f56de64c0"),
"tags": [
"Israel"
]
}
{
"_id": ObjectId("54f3ad17a63f282f56de64c1"),
"tags": [
"Germany",
"Indonesia"
]
}
Note: The position in the array makes no difference, but the search is case sensitive. Use /^I/i if this is not desired or Pattern.CASE_INSENSITIVE in Java.
Single RegEx Filter
use .filter(), .criteria(), or .field()
query.filter("email", Pattern.compile("reg.*exp"));
// or
query.criteria("email").contains("reg.*exp");
// or
query.field("email").contains("reg.*exp");
Morphia converts this into:
find({"email": { $regex: "reg.*exp" } })
Multiple RegEx Filters
query.or(
query.criteria("email").contains("reg.*exp"),
query.criteria("email").contains("reg.*exp.*2"),
query.criteria("email").contains("reg.*exp.*3")
);
Morphia converts this into:
find({"$or" : [
{"email": {"$regex": "reg.*exp"}},
{"email": {"$regex": "reg.*exp.*2"}},
{"email": {"$regex": "reg.*exp.*3"}}
]
})
Unfortunately,
You cannot use $regex operator expressions inside an $in.
MongoDB Manual 3.4
Otherwise, we could do:
Pattern[] patterns = new Pattern[] {
Pattern.compile("reg.*exp"),
Pattern.compile("reg.*exp.*2"),
Pattern.compile("reg.*exp.*3"),
};
query.field().in(patterns);
hopefully, one day morphia will support that :)

How to match array of sub string with array of string using mongo?

I have follwoing collection structure -
{
"_id": ObjectId("54c784d71e14acf9ae833f9f"),
"vms": [
{
"name": "ABC",
"ids": [
"abc.60a980004270457730244662385a4f69",
"abc.60a980004270457730244662385a4f6d"
]
},
{
"name": "PQR",
"ids": [
"abc.6d867d9c7acd60001aed76eb2c70bd53",
"abc.60a980004270457730244662385a4f6d"
]
},
{
"name": "XYZ",
"ids": [
"abc.600605b00237d91016cdc38f376bd31d",
"abc.600605b00237d91016cdc38f376cd32f"
]
}
]
}
I have an array which contains substrings of ids. here is an array for your reference -
myArray = [ "4270457730244662385a4f69","4270457730244662385a4f6d" , "4270457730244662385a4f6b"]
I want to find each element of myArray is not present in ids as a substring using mongo.
Currently I am able to find single element using regex in mongo.
In above example, I want output as:
[
{
"name": "XYZ",
"ids": [
"abc.600605b00237d91016cdc38f376bd31d",
"abc.600605b00237d91016cdc38f376cd32f"
]
}
]
How do I find substring in array using mongo??
It is possible to do it using regex. You can match the string for multiple substrings using or operator. It is | in regex. Search for 'Boolean "or"' on wikipedia
MongoDB query using aggregation:
db.collection_name.aggregate([
{$unwind: "$vms"},
{$match: {
"vms.ids": {$not: /.*(4270457730244662385a4f69|4270457730244662385a4f6d|4270457730244662385a4f6b).*/}}
}
])
Output will be
{
"_id" : ObjectId("54c784d71e14acf9ae833f9f"),
"vms" : {
"name" : "XYZ",
"ids" : [
"abc.600605b00237d91016cdc38f376bd31d",
"abc.600605b00237d91016cdc38f376cd32f"
]
}
}