Regex for json with negative lookahead - regex

I have the following json array:
[
{
"roleId": 128,
"roleName": "B",
"Permissions": []
},
{
"roleId": 310,
"roleName": "ROLE",
"Permissions": [
{
"permissionId": 8074,
"isPermissionActive": true
},
{
"permissionId": 2271,
"isPermissionActive": true
},
{
"permissionId": 8075,
"isPermissionActive": true
},
{
"permissionId": 2275,
"isPermissionActive": true
}
]
},
{
"roleId": 201,
"roleName": "B",
"Permissions": []
},
{
"roleId": 5,
"roleName": "B",
"Permissions": []
}
]
I need to select the roleId of the jsons with non null permissions. I've tried using a negative lookahead, but doesnt seem to work.
Before selecting the roleId, I wanted to get a correct match on the json so I've tried \{(\s*.*?(?!"Permissions":\s*\[\]))*\}, but it seems to match all the jsons and even the inner jsons of the permission array. How should change the look ahead to match properly. I'm currently matching in sublime text 3.

Try the following Regex:
(?<="roleId": )\d*(?=(?:[^{]*"Permissions": \[(?!])))
Explanation:
The first part (?<="roleId": )\d* uses positive lookbehind to find the rollId
The second part (?=(?:[^{]*"Permissions": \[(?!]))) negative lookahead to find non-null permissions, and positive lookahead to find the rollId with the non-null permissions.
See detailed explanation here

While this doesn't directly answer your question, I would like to point out that it might be more natural to process JSON with JavaScript or with one of many languages that can easily convert JSON to native types. Alternatively, you could use a JSON processing language such as jq:
jq '
map( select(.Permissions != []) | .roleId )
'
which produces an array of role IDs with non-empty permissions.

A lookbehind to check if the number is a "roleId".
And a lookahead to check if the "Permissions" have a mustache.
(?<="roleId": )\d+(?=[^\}]*?"Permissions":\s*\[\s*\{)
Test it here

Related

WireMock not matching regex with negative lookahead

I'm currently facing an issue when trying to get my standalone WireMock to match a GET request with a certain path pattern using a regex with a negative lookahead:
{
"request": {
"method": "GET",
"urlPathPattern": "\/my\/interesting\/path\/(\\?![0-9]*$)(\b[0-9A-Z]{11}\b)"
},
"response": {
"status": 200,
"body": "",
"headers": {
"Content-Type": "application/json"
}
}
}
When checking the WireMock logs, a near miss is logged. As suggested by WireMock, I escaped the question mark operator within my regex with a double backslash. Though, this did not help either.
I expect the url path pattern to match urls that ends with an alphanumeric, eleven character uppercase string, such as:
http://myapp:8080/my/interesting/path/ABCDEF12345
I've already checked if my regex is valid and matches the cases that I would expect it to, which it does.
What might be of use: I'm using WireMock version 2.33.2 (docker image wiremock/wiremock:2.33.2)
If I understand you right, you need to match an 11-character string at the end, that consists of uppercase letters and digits only but does not consist of digits only, right? If WireMock's regex engine does not support negative lookahead - and that's what it looks like, if it doesn't match your regex but isn’t all that surprising neither, since there are quite a view regex implementations that do not support look-aheads), you have two choices.
You can create 11 different possible endings and put them together with | looking for a letter at each of the 11 positions like this:
/my/interesting/path/([A-Z][A-Z0-9]{10}|[A-Z0-9][A-Z][A-Z0-9]{9}|[A-Z0-9]{2}[A-Z][A-Z0-9]{8}|[A-Z0-9]{3}[A-Z][A-Z0-9]{7}|[A-Z0-9]{4}[A-Z][A-Z0-9]{6}|[A-Z0-9]{5}[A-Z][A-Z0-9]{5}|[A-Z0-9]{6}[A-Z][A-Z0-9]{4}|[A-Z0-9]{7}[A-Z][A-Z0-9]{3}|[A-Z0-9]{8}[A-Z][A-Z0-9]{2}|[A-Z0-9]{9}[A-Z][A-Z0-9]|[A-Z0-9]{10}[A-Z])$
or
You use priorities and define three matches (taking your comment of not allowing 11 digits to match a digit-only ending into account) like this:
{
"priority": 1,
"request": {
"method": "GET",
"urlPathPattern": "/my/interesting/path/[0-9]{11}$"
},
"response": {
"status": 404,
"body": "",
"headers": {}
}
}
and
{
"priority": 2,
"request": {
"method": "GET",
"urlPathPattern": "/my/interesting/path/[0-9]*$"
},
"response": {
"status": 200,
"body": "whatever is necessary for the digits-only or empty url",
"headers": {
"Content-Type": "application/json"
}
}
}
and
{
"priority": 3,
"request": {
"method": "GET",
"urlPathPattern": "/my/interesting/path/[0-9A-Z]{11}$"
},
"response": {
"status": 200,
"body": "",
"headers": {
"Content-Type": "application/json"
}
}
}
The first match (priority 1) will pick up any URL that ends in 11 digits so that the second one is never tried for 11 digits. The third match (priority 3) will then only be tried, if the first one (priority 1) and second one (priority 2) did not match, thus guaranteeing there are not just digits if the third one matches.
The priority field is documented in the 'stubbing' part of the WireMock documentation: https://wiremock.org/docs/stubbing/
Hope that get's you going...

How can I add a validation regex to a Kontent slug element using the Kontent JS Management SDK

Hi there :) I'm struggling to add a validation regex to the URL Slug in my Content Type. I can set it manually e.g.
But I want to set it programmatically using the JS Management SDK. This is one of the things I have tried...
const mod: ContentTypeModels.IModifyContentTypeData[] = [
{
op: 'addInto',
path: '/elements/codename:page_url',
value: {
validation_regex: {
regex: '^[a-zA-Z-/]{1,60}$',
flags: 'i',
validation_message: 'URL slug must only contain (English/Latin) characters, forward slashes and hyphens',
is_active: true,
},
},
},
]
That gives me the error >> Invalid operation with index '0': Unexpected path part 'codename:page_url'
In the hope that the problem is just with the path I have tried several other permutations, without success.
Is what I want possible in place i.e. without deleting and re-adding the element? And if so how?
The addInto operation is for adding new elements, so if there is no url slug element you can add a new one and specify the regular expression:
[
{
"op": "addInto",
"path": "/elements",
"value":{
"depends_on": {
"element": {
"id": "d395c03d-2b20-4631-adc6-bc4cd9c88b0b"
}
},
"validation_regex": {
"regex": "^[a-zA-Z-/]{1,60}$",
"flags": "i",
"validation_message": "URL slug must only contain (English/Latin) characters, forward slashes and hyphens",
"is_active": true
},
"name": "some_slug",
"guidelines": null,
"is_required": false,
"type": "url_slug",
"codename": "some_slug"
}
]
For updating just regex of existing url slug element you need to use the replace operation instead:
[
{
"op": "replace",
"path": "/elements/codename:some_type/validation_regex",
"value":{
"regex": "^[a-zA-Z-/]{1,60}$",
"flags": "i",
"validation_message": "URL slug must only contain (English/Latin) characters, forward slashes and hyphens",
"is_active": true
}
}
]
You can find more info in our API reference -> https://kontent.ai/learn/reference/management-api-v2/#operation/modify-a-content-type

Elasticsearch Token filter for removing tokens with a single word

I have what seems to be a very simple problem though I can't get it to work.
I have a token stream of words and I want to remove any token that is a single word e.g. [the quick, brown, fox] should be outputted as [the quick].
I've tried using pattern_capture token filters and used many types of patterns but it only generates new tokens, and doesn't remove old ones.
Here is the analyzer I've built (abbreviated for clarity)
"analyzer": {
"job_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"some_custom_char_filter"
],
"filter": [
other filters....,
"dash_drop",
"trim",
"unique",
"drop_single_word"
]
}
},
"char_filter": {...},
"filter": {
"dash_drop": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"([^-]+)\\s?(?!-.+)",
"- (.+)"
]
},
"drop_single_word": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [**nothing here works**]
}
}
}
I know I'm using a whitespace tokenzier that breaks sentences into words, but not shown here is the use of shingles to create new nGrams.
The purpose of the dash_drop filter is used to split sentences with - into tokens without the - so for example: my house - my rules would split into [my house, my rules].
Any help is greatly apperciated.

Regex working with calculator but not with AWS lambda function

Trouble with regex and gather all data between [ and ].
Testing with the program: http://regexr.com/
String data
{
"Items": [
{
"UserID": "1487840267893246",
"Timestamp": 1487204364877,
},
{
"UserID": "1487840267893336",
"Timestamp": 1487204364888,
}
],
"Count": 2,
"ScannedCount": 3
}
The below (fired in AWS lambda) has the intention of pulling all chars between the [ and ] and outputting it. (\[[^]*\]) works with the regex calc above, but only returns "undefined" in Lambda. Why?
Items = data.match(/"(\[[^]*\])"/);
console.log(Items);
An alternative solution was to extract the data into an array as follows
userID = data.match(/"UserID":"([^"]+)"/g);
console.log(userID);
Try the dotall flag:
Items = data.match(/"(?s)\[.*\]");
And you didn't need those brackets.

How to replace the payload from json object in mule

Prepared json request like below.
[{
"type": "John",
"attributes": {
"AA": [{
"value": "1234"
}]
}
},
{
}
]
I need to replace the below one with empty i.e means blank ''.
,
{
}
Could you please provide the solution for this.
Finally It should come like below.
[{
"type": "John",
"attributes": {
"AA": [{
"value": "1234"
}]
}
}
]
This regex matches the given sequence however you would probably need to change it to accept all possibilities:
/, \n\{\W+?\}/
Just replace the match with nothing.
Do you get the response as a JSON object or as a string?
If you get the response as an object you have to stringify it before applying the replace function:
payload = JSON.parse(JSON.stringify(payload).replace(/,\{\}/, ''))
If the response you posted above is already stringified and you haven´t parsed it into an object, the method is:
payload = payload.replace(/\,\s+\n\s+\{\n\s+\}/,'')
To achieve this purpose, we can use DataWeave expression whether in Transform Message or in MEL.
In this case I prefer to use it in MEL: #[dw('payload filter (sizeOf $) > 0')]
You can use the flatten operator here as given below. It should remove empty json. Also you can try replace {} with null and adding skipnullon="everywhere"
flatten payload