How to use RegExp grouping in javascript to parse strings - regex

I need to create a RegExp that will allow me to use groups to properly parse a string for some comparison logic.
consider the following list of strings:
const testSet: string[] = [
"alpha-4181a",
"alpha-4181a-2",
"alpha-4181a_3",
"example",
"smokeTest"
]
Note the -2 and _3 which are valid methods of versioning in this naming convention. We wish to maintain support for such.
If we loop through the above set, I am expecting the entire string, WITHOUT versioning if it exists (as shown below)...
const returnSet: string[] = [
"alpha-4181a",
"alpha-4181a",
"alpha-4181a",
"example",
"smokeTest"
]
so far I have the following regex
/([-_]\d?)$/gi
which does properly identify the versioning at the end of the string. From here, I would like to create an additional group that matches everything that is NOT the versioning convention, but I can't seem to figure it out...

You just need to match everything before the versioning at the end. But you also need lazy matching, which is what +? - see this question for more.
const testSet = [
"alpha-4181a",
"alpha-4181a-2",
"alpha-4181a_3",
"example",
"smokeTest"
];
const resultSet = testSet.map((x) => x.match(/^(.+?)(?:[_-]\d)?$/)?.[1] ?? x);
// ^^^^^^^^^^ versioning here
// ^^^^^ match everything before
console.log(resultSet);

Related

Custom vallidator to ban a specific wordlist

I need a custom validator to ban a specific list of banned words from a textarea field.
I need exactly this type of implementation, I know that it's not logically correct to let the user type part of a query but it's exactly what I need.
I tried with a regExp but it has a strange behaviour.
My RegExp
/(drop|update|truncate|delete|;|alter|insert)+./gi
my Validator
export function forbiddenWordsValidator(sqlRe: RegExp): ValidatorFn {
return (control: AbstractControl): { [key: string]: any } | null => {
const forbidden = sqlRe.test(control.value);
return forbidden ? { forbiddenSql: { value: control.value } } : null;
};
}
my formControl:
whereCondition: new FormControl("", [
Validators.required,
forbiddenWordsValidator(this.BAN_SQL_KEYWORDS)...
It works only in certain cases and I don't understand why does the same string works one time and doesn't work if i delete a char and rewrite it or sometimes if i type a whitespace the validator returns ok.
There are several issues here:
The global g modifier leads to unexpected alternated results when used in RegExp#test and similar methods that move the regex index after a valid match, it must be removed
. at the end requires any 1 char other than line break char, hence it must be removed.
Use
/drop|update|truncate|delete|;|alter|insert/i
Or, to match the words as whole words use
/\b(?:drop|update|truncate|delete|alter|insert)\b|;/i
This way, insert in insertion and drop in dropout won't get "caught" (=matched).
See the regex demo.
it's not a great idea to give such power to the user

regex breaks when I use a colon(:)

I just started working with elastic search. By started working I mean I have to query an already running elastic database. Is there a good documentation of the regex they follow. I know about the one on their official site, but its not very helpful.
The more specific problem is that I want to query for lines of the sort:
10:02:37:623421|0098-TSOT {TRANSITION} {ID} {1619245525} {securityID} {} {fromStatus} {NOT_PRESENT} {toStatus} {WAITING}
or
01:01:36:832516|0058-CT {ADD} {0} {3137TTDR7} {23} {COM} {New} {0} {0} {52} {1}
and more of a similar structure. I don't want a generalized regex. If possible, could someone give me a regex expression for each of these that would run with elastic?
I noticed that it matches if the regexp matches with a substring too when I ran with:
query = {"query":
{"regexp":
{
"message": "[0-9]{2}"
}
},
"sort":
[
{"#timestamp":"asc"}
]
}
But it wont match anything if I use:
query = {"query":
{"regexp":
{
"message": "[0-9]{2}:.*"
}
},
"sort":
[
{"#timestamp":"asc"}
]
}
I want to write regex that are more specific and that are different for the two examples given near the top.
turns out my message is present in the tokenized form instead of the raw form, and : is one of the default delimiters of the tokenizer, in elastic. And as a reason, I can't use regexp query on the whole message because it matches it with each token individually.

Regular Expression If 2nd parameter is Enrollment

I have below response
{
"id": "3452",
"enrollable_id": "3452",
"enrollable_type": "Enrollment"
}
{
"id": "3453",
"enrollable_id": "3453",
"enrollable_type": "Task"
}
{
"id": "3454",
"enrollable_id": "3454",
"enrollable_type": "Enrollment"
}
{
"id": "3455",
"enrollable_id": "3455",
"enrollable_type": "Task"
}
I would like to get id [3452 and 3454] only if enrollable_type= Enrollment. This is for jmeter regex extractor so it would be great if I can just use one liner regex to fetch 3452 and 3454.
The RegEx you are looking for is:
_id":\s*"([^"]+(?=[^\0}]+_type":\s*"E))
Try it online!
Explanation
_id":\s*" Finds the place where the enrollment_id is
[^"]+(?= Matches the ID if:
[^\0}]+_type":\s* Finds the place where enrollable_type is
"E Checks if the enrollable type begins with an uppercase E
) End if
( ) Captures the ID
It's important to note that this RegEx will match on valid people and capture the valid ID. This means you will need to get each match's capture rather than just getting each match.
Disclaimer
The above RegEx contains backslashes, which you will need to escape if using the RegEx as a string literal.
This is the RegEx with all necessary-to-escape characters escaped:
_id":\\s*"([^"]+(?=[^\\0}]+_type":\\s*"E))
It's usually a bad idea to parse structured data with just a regex, but if you're intent on going this route then here you go:
"(\d+)"\s*,\s*(?="enrollable_type":\s*"Enrollment")
This assumes that entrollable_type always follows enrollable_id and that everything is quoted consistently with a little allowance for variance in white space. You should be able to handle a little more variance if necessary, such as if you're unsure if can depend on keys or data being quoted (["']?). However, if you can depend on the order of the properties (such as if they type comes before id) then you should abandon using a regex.
Here's a sample working in JavaScript
const text = `{ "id": "3452", "enrollable_id": "3452", "enrollable_type": "Enrollment" } { "id": "3453", "enrollable_id": "3453", "enrollable_type": "Task" } { "id": "3454", "enrollable_id": "3454", "enrollable_type": "Enrollment" } { "id": "3455", "enrollable_id": "3455", "enrollable_type": "Task" }`;
const re = /"(\d+)"\s*,\s*(?="enrollable_type":\s*"Enrollment")/g;
var match;
while(match = re.exec(text)) {
console.log(match[1]);
}
Your response seems to be a JSON one (however it's malformed). If this is the case and it's really JSON - I would recommend going for JSON Extractor instead as regular expressions are fragile, sensitive to markup change, new lines, order of elements, etc. while JSON Extractor looks only into the content.
The relevant JSON Path query would be something like:
$..[?(#.enrollable_type == 'Enrollment')].enrollable_id
Demo:
More information: JMeter's JSON Path Extractor Plugin - Advanced Usage Scenarios
You can extract the data in 2 ways
Using Json Extractor.
To extract data using json extractor response data should follow json syntax rules,
To extract data use the following JSON path in json extractor
$..[?(#.enrollable_type=="Enrollment")].id
and use match no -1 as shown below
To extract data using regular expression extractor use the following regex
id": "(.+?)",\s*(.+?)\s*"enrollable_type": "Enrollment
template : $1$2$3$4$
Match no -1
as shown below
you can see the variables stored using debug sampler
More information
extract variables

FW/1 pattern matching N digits

I am trying to match routes where IDs have exactly 6 numbers
This does not work:
variables.framework.routes = [
{ "main/{id:[0-9]{6}}" = "main/home/eid/:id"},
{ "main/home" = "main/home"},
{ "*" = "main/404"}
];
This does:
variables.framework.routes = [
{ "main/{id:[0-9]+}" = "main/home/eid/:id"},
{ "main/home" = "main/home"},
{ "*" = "main/404"}
];
The second one of course matches on any number of digits. I wonder if I have to escape the {
It looks like FW/1 only allows a limited regular expression syntax for the routes declaration. So I don't think your first example will work. From what I could find the limited regular expression syntax in routes was added to FW/1 version 3.5. I found some discussion on the topic and this specific comment describing the requested behavior - https://github.com/framework-one/fw1/issues/325#issuecomment-118572702
{placeholder:regex}, so we could have product/{id:[0-9]+}-:name.html that targets product.detail?id={id:[0-9]+}&name=:name.
You need to repeat the placeholder with the regex in the target route too (could be changed).
You can't put } in your placeholder specific regex.
Let me know if a PR is welcome for this add-on.
Notice that second bullet point which mentions that the } (bracket) is not allowed in the placeholder regex.
Here is a link to the code referenced by that pull-request which was included in 3.5 - https://github.com/framework-one/fw1/commit/9543b78552dbd27a526083ac72a3846bd86eeb90
And here is a link to the updated documentation for version 3.5 where some information was added about this feature - http://framework-one.github.io/documentation/developing-applications.html#url-routes
Snippet of that doc here:
Placeholder variables in the route are identified either by a leading colon or by braces (specifying a variable name and a regex to restrict matches) and can appear in the URL as well, for example { "/product/:id" = "/product/view/id/:id" } specifies a match for /product/something which will be treated as if the URL was /product/view/id/something - section: product, item: view, query string id=something. Similarly, { "/product/{id:[0-9]+}" = "/product/view/id/:id" } specifies a match for /product/42 which will be treated as if the URL was /product/view/id/42, and only numeric values will match the placeholder.

Grunt: replace wildcard value when using grunt-text-replace

I'm no regex master, and I'm pretty sure a regex is what is needed in this instance.
I currently have a text replacement task like so:
configSeed: {
src: ['src/*/local/app-config.js'],
overwrite: true,
replacements: [
{
from: 'var CONFIG_SEED_STRING = null;',
to: 'var CONFIG_SEED_STRING = "{"some_stringified_dynamic_json":"values"}";'
}
]
}
Which works fine the first time the config file is saved, the above string is replaced.
However, as soon as the string is replaced, further changes to the config don't have a replacement applied because obviously null is no longer to be found.
null is where my wildcard value needs to be, and the value could be either null (initially) or subsequent replacing a valid JSON string instead.
If my assumption about a wildcard being needed is true, would that trigger recursion upon save? Or does Grunt have in-built protection against this situation? [edit: I've tested this by replacing the string with the same value, recursion does not occur.]
So, assuming it is safe to use a wildcard where I want to, could I please get help with a regex value to be replaced?
Alternative solutions also welcome, for example my code base is unchanging enough that I could viably replace a line of code completely, if that's possible.
Thanks for any help provided.
Omg, I actually did it, what a feeling. After some painful reading on regex again:
configSeed: {
src: ['src/*/local/app.js'],
overwrite: true,
replacements: [
{
from: /var CONFIG_SEED_STRING = '[^']*'/g,
to: 'var CONFIG_SEED_STRING = \'{"foo":"bar"}\''
},
{
from: 'var CONFIG_SEED_STRING = null',
to: 'var CONFIG_SEED_STRING = \'{"foo":"bar"}\''
}
]
}
Not perfect, because I have two from/tos, but it catches both null and valid JSON data in between single quoted String value for CONFIG_SEED_STRING.
Instant reward time for writing a regex! I'm allowing myself 15 minutes of Youtube at work.