WireMock not matching regex with negative lookahead - regex

I'm currently facing an issue when trying to get my standalone WireMock to match a GET request with a certain path pattern using a regex with a negative lookahead:
{
"request": {
"method": "GET",
"urlPathPattern": "\/my\/interesting\/path\/(\\?![0-9]*$)(\b[0-9A-Z]{11}\b)"
},
"response": {
"status": 200,
"body": "",
"headers": {
"Content-Type": "application/json"
}
}
}
When checking the WireMock logs, a near miss is logged. As suggested by WireMock, I escaped the question mark operator within my regex with a double backslash. Though, this did not help either.
I expect the url path pattern to match urls that ends with an alphanumeric, eleven character uppercase string, such as:
http://myapp:8080/my/interesting/path/ABCDEF12345
I've already checked if my regex is valid and matches the cases that I would expect it to, which it does.
What might be of use: I'm using WireMock version 2.33.2 (docker image wiremock/wiremock:2.33.2)

If I understand you right, you need to match an 11-character string at the end, that consists of uppercase letters and digits only but does not consist of digits only, right? If WireMock's regex engine does not support negative lookahead - and that's what it looks like, if it doesn't match your regex but isn’t all that surprising neither, since there are quite a view regex implementations that do not support look-aheads), you have two choices.
You can create 11 different possible endings and put them together with | looking for a letter at each of the 11 positions like this:
/my/interesting/path/([A-Z][A-Z0-9]{10}|[A-Z0-9][A-Z][A-Z0-9]{9}|[A-Z0-9]{2}[A-Z][A-Z0-9]{8}|[A-Z0-9]{3}[A-Z][A-Z0-9]{7}|[A-Z0-9]{4}[A-Z][A-Z0-9]{6}|[A-Z0-9]{5}[A-Z][A-Z0-9]{5}|[A-Z0-9]{6}[A-Z][A-Z0-9]{4}|[A-Z0-9]{7}[A-Z][A-Z0-9]{3}|[A-Z0-9]{8}[A-Z][A-Z0-9]{2}|[A-Z0-9]{9}[A-Z][A-Z0-9]|[A-Z0-9]{10}[A-Z])$
or
You use priorities and define three matches (taking your comment of not allowing 11 digits to match a digit-only ending into account) like this:
{
"priority": 1,
"request": {
"method": "GET",
"urlPathPattern": "/my/interesting/path/[0-9]{11}$"
},
"response": {
"status": 404,
"body": "",
"headers": {}
}
}
and
{
"priority": 2,
"request": {
"method": "GET",
"urlPathPattern": "/my/interesting/path/[0-9]*$"
},
"response": {
"status": 200,
"body": "whatever is necessary for the digits-only or empty url",
"headers": {
"Content-Type": "application/json"
}
}
}
and
{
"priority": 3,
"request": {
"method": "GET",
"urlPathPattern": "/my/interesting/path/[0-9A-Z]{11}$"
},
"response": {
"status": 200,
"body": "",
"headers": {
"Content-Type": "application/json"
}
}
}
The first match (priority 1) will pick up any URL that ends in 11 digits so that the second one is never tried for 11 digits. The third match (priority 3) will then only be tried, if the first one (priority 1) and second one (priority 2) did not match, thus guaranteeing there are not just digits if the third one matches.
The priority field is documented in the 'stubbing' part of the WireMock documentation: https://wiremock.org/docs/stubbing/
Hope that get's you going...

Related

How can I add a validation regex to a Kontent slug element using the Kontent JS Management SDK

Hi there :) I'm struggling to add a validation regex to the URL Slug in my Content Type. I can set it manually e.g.
But I want to set it programmatically using the JS Management SDK. This is one of the things I have tried...
const mod: ContentTypeModels.IModifyContentTypeData[] = [
{
op: 'addInto',
path: '/elements/codename:page_url',
value: {
validation_regex: {
regex: '^[a-zA-Z-/]{1,60}$',
flags: 'i',
validation_message: 'URL slug must only contain (English/Latin) characters, forward slashes and hyphens',
is_active: true,
},
},
},
]
That gives me the error >> Invalid operation with index '0': Unexpected path part 'codename:page_url'
In the hope that the problem is just with the path I have tried several other permutations, without success.
Is what I want possible in place i.e. without deleting and re-adding the element? And if so how?
The addInto operation is for adding new elements, so if there is no url slug element you can add a new one and specify the regular expression:
[
{
"op": "addInto",
"path": "/elements",
"value":{
"depends_on": {
"element": {
"id": "d395c03d-2b20-4631-adc6-bc4cd9c88b0b"
}
},
"validation_regex": {
"regex": "^[a-zA-Z-/]{1,60}$",
"flags": "i",
"validation_message": "URL slug must only contain (English/Latin) characters, forward slashes and hyphens",
"is_active": true
},
"name": "some_slug",
"guidelines": null,
"is_required": false,
"type": "url_slug",
"codename": "some_slug"
}
]
For updating just regex of existing url slug element you need to use the replace operation instead:
[
{
"op": "replace",
"path": "/elements/codename:some_type/validation_regex",
"value":{
"regex": "^[a-zA-Z-/]{1,60}$",
"flags": "i",
"validation_message": "URL slug must only contain (English/Latin) characters, forward slashes and hyphens",
"is_active": true
}
}
]
You can find more info in our API reference -> https://kontent.ai/learn/reference/management-api-v2/#operation/modify-a-content-type

How do I extract a string of numbers from random text in Power Automate?

I am setting up a flow to organize and save emails as PDF in a Dropbox folder. The first email that will arrive includes a 10 digit identification number which I extract along with an address. My flow creates a folder in Dropbox named in this format: 2023568684 : 123 Main St. Over a few weeks, additional emails arrive that I need to put into that folder. The subject always has a 10 digit number in it. I was building around each email and using functions like split, first, last, etc. to isolate the 10 digits ID. The problem is that there is no consistency in the subjects or bodies of the messages to be able to easily find the ID with that method. I ended up starting to build around each email format individually but there are way too many, not to mention the possibility of new senders or format changes.
My idea is to use List files in folder when a new message arrives which will create an array that I can filter to find the folder ID the message needs to be saved to. I know there is a limitation on this because of the 20 file limit but that is a different topic and question.
For now, how do I find a random 10 digit number in a randomly formatted email subject line so I can use it with the filter function?
For this requirement, you really need regex and at present, PowerAutomate doesn't support the use of regex expressions but the good news is that it looks like it's coming ...
https://powerusers.microsoft.com/t5/Power-Automate-Ideas/Support-for-regex-either-in-conditions-or-as-an-action-with/idi-p/24768
There is a connector but it looks like it's not free ...
https://plumsail.com/actions/request-free-license
To get around it for now, my suggestion would be to create a function app in Azure and let it do the work. This may not be your cup of tea but it will work.
I created a .NET (C#) function with the following code (straight in the portal) ...
#r "Newtonsoft.Json"
using System.Net;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using Newtonsoft.Json;
public static async Task<IActionResult> Run(HttpRequest req, ILogger log)
{
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
string strToSearch = System.Text.Encoding.UTF8.GetString(Convert.FromBase64String((string)data?.Text));
string regularExpression = data?.Pattern;
var matches = System.Text.RegularExpressions.Regex.Matches(strToSearch, regularExpression);
var responseString = JsonConvert.SerializeObject(matches, new JsonSerializerSettings()
{
ReferenceLoopHandling = ReferenceLoopHandling.Ignore
});
return new ContentResult()
{
ContentType = "application/json",
Content = responseString
};
}
Then in PowerAutomate, call the HTTP action passing in a base64 encoded string of the content you want to search ...
The is the expression in the JSON ... base64(variables('String to Search')) ... and this is the json you need to pass in ...
{
"Text": "#{base64(variables('String to Search'))}",
"Pattern": "[0-9]{10}"
}
This is an example of the response ...
[
{
"Groups": {},
"Success": true,
"Name": "0",
"Captures": [],
"Index": 33,
"Length": 10,
"Value": "2023568684"
},
{
"Groups": {},
"Success": true,
"Name": "0",
"Captures": [],
"Index": 98,
"Length": 10,
"Value": "8384468684"
}
]
Next, add a Parse JSON action and use this schema ...
{
"type": "array",
"items": {
"type": "object",
"properties": {
"Groups": {
"type": "object",
"properties": {}
},
"Success": {
"type": "boolean"
},
"Name": {
"type": "string"
},
"Captures": {
"type": "array"
},
"Index": {
"type": "integer"
},
"Length": {
"type": "integer"
},
"Value": {
"type": "string"
}
},
"required": [
"Groups",
"Success",
"Name",
"Captures",
"Index",
"Length",
"Value"
]
}
}
Finally, extract the first value that you find which matches the regex pattern. It returns multiple results if found so if you need to, you can do something with those.
This is the expression ... #{first(body('Parse_JSON'))?['value']}
From this string ...
We're going to search for string 2023568684 within this text and we're also going to try and find 8384468684, this should work.
... this is the result ...
Don't have a Premium PowerAutomate licence so can't use the HTTP action?
You can do this exact same thing using the LogicApps service in Azure. It's the same engine with some slight differences re: connectors and behaviour.
Instead of the HTTP, use the Azure Functions action.
In relation to your action to fire when an email is received, in LogicApps, it will poll every x seconds/minutes/hours/etc. rather than fire on event. I'm not 100% sure which email connector you're using but it should exist.
Dropbox connectors exist, that's no problem.
You can export your PowerAutomate flow into a LogicApps format so you don't have to start from scratch.
https://learn.microsoft.com/en-us/azure/logic-apps/export-from-microsoft-flow-logic-app-template
If you're concerned about cost, don't be. Just make sure you use the consumption plan. Costs only really rack up for these services when the apps run for minutes at a time on a regular basis. Just keep an eye on it for your own mental health.
TO get the function URL, you can find it in the function itself. You have to be in the function ...

Elasticsearch Token filter for removing tokens with a single word

I have what seems to be a very simple problem though I can't get it to work.
I have a token stream of words and I want to remove any token that is a single word e.g. [the quick, brown, fox] should be outputted as [the quick].
I've tried using pattern_capture token filters and used many types of patterns but it only generates new tokens, and doesn't remove old ones.
Here is the analyzer I've built (abbreviated for clarity)
"analyzer": {
"job_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"char_filter": [
"some_custom_char_filter"
],
"filter": [
other filters....,
"dash_drop",
"trim",
"unique",
"drop_single_word"
]
}
},
"char_filter": {...},
"filter": {
"dash_drop": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"([^-]+)\\s?(?!-.+)",
"- (.+)"
]
},
"drop_single_word": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [**nothing here works**]
}
}
}
I know I'm using a whitespace tokenzier that breaks sentences into words, but not shown here is the use of shingles to create new nGrams.
The purpose of the dash_drop filter is used to split sentences with - into tokens without the - so for example: my house - my rules would split into [my house, my rules].
Any help is greatly apperciated.

How to handle \n or \r or \r\n in json Schema validation?

I used regex to escape tags like <> or it working fine but i want to allow \n, \r or \r\n in my description.I am getting description from email hence it contain to many such character.
eg: (Description)"You get solution\n on stack\roverflow"
"StringRegex": {
"anyOf": [{
"type": "null"
}, {
"type": "string",
"pattern": "^(?:(?!<.*?>).)*$"
}]
}
Error:
{
"ObjectKey": "[0].",
"Message": "should match pattern \"^(?:(?!<.*?>).)*$\""
}

Regex for json with negative lookahead

I have the following json array:
[
{
"roleId": 128,
"roleName": "B",
"Permissions": []
},
{
"roleId": 310,
"roleName": "ROLE",
"Permissions": [
{
"permissionId": 8074,
"isPermissionActive": true
},
{
"permissionId": 2271,
"isPermissionActive": true
},
{
"permissionId": 8075,
"isPermissionActive": true
},
{
"permissionId": 2275,
"isPermissionActive": true
}
]
},
{
"roleId": 201,
"roleName": "B",
"Permissions": []
},
{
"roleId": 5,
"roleName": "B",
"Permissions": []
}
]
I need to select the roleId of the jsons with non null permissions. I've tried using a negative lookahead, but doesnt seem to work.
Before selecting the roleId, I wanted to get a correct match on the json so I've tried \{(\s*.*?(?!"Permissions":\s*\[\]))*\}, but it seems to match all the jsons and even the inner jsons of the permission array. How should change the look ahead to match properly. I'm currently matching in sublime text 3.
Try the following Regex:
(?<="roleId": )\d*(?=(?:[^{]*"Permissions": \[(?!])))
Explanation:
The first part (?<="roleId": )\d* uses positive lookbehind to find the rollId
The second part (?=(?:[^{]*"Permissions": \[(?!]))) negative lookahead to find non-null permissions, and positive lookahead to find the rollId with the non-null permissions.
See detailed explanation here
While this doesn't directly answer your question, I would like to point out that it might be more natural to process JSON with JavaScript or with one of many languages that can easily convert JSON to native types. Alternatively, you could use a JSON processing language such as jq:
jq '
map( select(.Permissions != []) | .roleId )
'
which produces an array of role IDs with non-empty permissions.
A lookbehind to check if the number is a "roleId".
And a lookahead to check if the "Permissions" have a mustache.
(?<="roleId": )\d+(?=[^\}]*?"Permissions":\s*\[\s*\{)
Test it here