Trouble with regex and gather all data between [ and ].
Testing with the program: http://regexr.com/
String data
{
"Items": [
{
"UserID": "1487840267893246",
"Timestamp": 1487204364877,
},
{
"UserID": "1487840267893336",
"Timestamp": 1487204364888,
}
],
"Count": 2,
"ScannedCount": 3
}
The below (fired in AWS lambda) has the intention of pulling all chars between the [ and ] and outputting it. (\[[^]*\]) works with the regex calc above, but only returns "undefined" in Lambda. Why?
Items = data.match(/"(\[[^]*\])"/);
console.log(Items);
An alternative solution was to extract the data into an array as follows
userID = data.match(/"UserID":"([^"]+)"/g);
console.log(userID);
Try the dotall flag:
Items = data.match(/"(?s)\[.*\]");
And you didn't need those brackets.
Related
I am trying to get the value for the first group match of a pattern entity from the json response of Watson Assistant. The pattern is a simple regex to recognize sequences of numbers: ([0-9]+)
The json response looks like this:
"entity": "ID",
"location": [
18,
23
],
"value": "id",
"confidence": 1.0,
"groups": [
{
"group": "group_0",
"location": [
18,
23
]
}
]
},
{
"entity": "sys-number",
"location": [
18,
23
],
"value": "12345",
"confidence": 1.0,
"metadata": {
"numeric_value": 12345.0
}
}
]
So, the group is matched alright, but the field "value" is populated with the String literal from the entity config. I would expected to find the actual value there (which is the one the value field of the next entity, sys-number).
How do I need to change the config so that the value is included as-is in the value field (or somewhere else) and so that I don't have to extract the entity from the text string using the location values? Is it possible at all?
Thanks a lot
Cheers,
Martin
To access value of pattern based entity, you can either use <? #entity_name.literal ?> or <? #entity_name.groups[0] ?> - if there are more groups captured. You can find more info in the doc: https://cloud.ibm.com/docs/services/assistant?topic=assistant-entities
I have what seems to be a very simple problem though I can't get it to work.
I have a token stream of words and I want to remove any token that is a single word e.g. [the quick, brown, fox] should be outputted as [the quick].
I've tried using pattern_capture token filters and used many types of patterns but it only generates new tokens, and doesn't remove old ones.
Here is the analyzer I've built (abbreviated for clarity)
"analyzer": {
"job_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"some_custom_char_filter"
],
"filter": [
other filters....,
"dash_drop",
"trim",
"unique",
"drop_single_word"
]
}
},
"char_filter": {...},
"filter": {
"dash_drop": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"([^-]+)\\s?(?!-.+)",
"- (.+)"
]
},
"drop_single_word": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [**nothing here works**]
}
}
}
I know I'm using a whitespace tokenzier that breaks sentences into words, but not shown here is the use of shingles to create new nGrams.
The purpose of the dash_drop filter is used to split sentences with - into tokens without the - so for example: my house - my rules would split into [my house, my rules].
Any help is greatly apperciated.
I have a JSON with format variables in it, similar to string format, and I'd like to be able to load it with the variables replaced by actual values.
For example, if the JSON is:
[
{
"role": "President",
"name": "{first_name}",
"age": "{first_age}"
},
{
"role": "Vice President",
"name": "{second_name}",
"age": "{second_age}"
}
]
And the dictionary I'd like to format with is:
{"first_name": "Bob", "first_age": "50", "second_name": "Bill", "second_age": "35"}
I'd like to get:
[
{
"role": "President",
"name": "Bob",
"age": "50"
},
{
"role": "Vice President",
"name": "Bill",
"age": "35"
}
]
I tried converting the JSON to a string, using format, and then turning it back to a list of dictionaries:
from ast import literal_eval
literal_eval(str(raw_json).format(**json_params))
But the dictionaries' curly brackets confuse the format function and give me a KeyError exception. I suppose I could replace every pair of curly brackets which don't have a variable name between them with double curly brackets, but that's bound to go wrong and also not very Pythonic.
What would be the most elegant way to solve that issue?
What you are looking for is a templating engine.
Template is json string and data must be injected into this template.
Right tool to do that with python is jinja2
I have the following json array:
[
{
"roleId": 128,
"roleName": "B",
"Permissions": []
},
{
"roleId": 310,
"roleName": "ROLE",
"Permissions": [
{
"permissionId": 8074,
"isPermissionActive": true
},
{
"permissionId": 2271,
"isPermissionActive": true
},
{
"permissionId": 8075,
"isPermissionActive": true
},
{
"permissionId": 2275,
"isPermissionActive": true
}
]
},
{
"roleId": 201,
"roleName": "B",
"Permissions": []
},
{
"roleId": 5,
"roleName": "B",
"Permissions": []
}
]
I need to select the roleId of the jsons with non null permissions. I've tried using a negative lookahead, but doesnt seem to work.
Before selecting the roleId, I wanted to get a correct match on the json so I've tried \{(\s*.*?(?!"Permissions":\s*\[\]))*\}, but it seems to match all the jsons and even the inner jsons of the permission array. How should change the look ahead to match properly. I'm currently matching in sublime text 3.
Try the following Regex:
(?<="roleId": )\d*(?=(?:[^{]*"Permissions": \[(?!])))
Explanation:
The first part (?<="roleId": )\d* uses positive lookbehind to find the rollId
The second part (?=(?:[^{]*"Permissions": \[(?!]))) negative lookahead to find non-null permissions, and positive lookahead to find the rollId with the non-null permissions.
See detailed explanation here
While this doesn't directly answer your question, I would like to point out that it might be more natural to process JSON with JavaScript or with one of many languages that can easily convert JSON to native types. Alternatively, you could use a JSON processing language such as jq:
jq '
map( select(.Permissions != []) | .roleId )
'
which produces an array of role IDs with non-empty permissions.
A lookbehind to check if the number is a "roleId".
And a lookahead to check if the "Permissions" have a mustache.
(?<="roleId": )\d+(?=[^\}]*?"Permissions":\s*\[\s*\{)
Test it here
Prepared json request like below.
[{
"type": "John",
"attributes": {
"AA": [{
"value": "1234"
}]
}
},
{
}
]
I need to replace the below one with empty i.e means blank ''.
,
{
}
Could you please provide the solution for this.
Finally It should come like below.
[{
"type": "John",
"attributes": {
"AA": [{
"value": "1234"
}]
}
}
]
This regex matches the given sequence however you would probably need to change it to accept all possibilities:
/, \n\{\W+?\}/
Just replace the match with nothing.
Do you get the response as a JSON object or as a string?
If you get the response as an object you have to stringify it before applying the replace function:
payload = JSON.parse(JSON.stringify(payload).replace(/,\{\}/, ''))
If the response you posted above is already stringified and you havenĀ“t parsed it into an object, the method is:
payload = payload.replace(/\,\s+\n\s+\{\n\s+\}/,'')
To achieve this purpose, we can use DataWeave expression whether in Transform Message or in MEL.
In this case I prefer to use it in MEL: #[dw('payload filter (sizeOf $) > 0')]
You can use the flatten operator here as given below. It should remove empty json. Also you can try replace {} with null and adding skipnullon="everywhere"
flatten payload