Extracting user-defined Groovy tokens from strings - regex

Groovy here : I need to scan a string for a substring of the form:
${token}:<someValue>]
That is:
A user-define (dynamic) token string (could be anything at runtime); then
A colon (:); then
Anything (<someValue>); then finally
A right squre bracket (])
So basically something like:
def String fetchTokenValue(String toScan, String token) {
if(toScan.matches(".*${token}:.*]")) {
String everythingBetweenColonAndRBracket = ???
return everythingBetweenColonAndRBracket
} else {
return 'NO_DICE'
}
}
Such that the output would be as follows:
fetchTokenValue('swkokd sw:defroko swodjejr blah:fizzbuzz] wdkerko', 'blah') => 'fizzbuzz'
fetchTokenValue('swkokd sw:defroko swodjejr blah:fizzbuzz] wdkerko', 'boo') => 'NO_DICE'
I'm struggling with the regex as well as how to, if a match is made, extract all the text between the colon and the right square bracket. We can assume there will only ever be one match, or simply operate on the first match that is found (if it exists).
Any ideas where I'm going awry?

You may use [^\]]* subpattern (a negated character class [^...] that matches any chars other than those defined inside it) to match zero or more chars other than ] and use a capturing group to capture that text and only return Group 1 contents. Also, it is a good idea to automatically escape the input token so as to avoid illegal pattern syntax issues:
import java.util.regex.*;
def String fetchTokenValue(String toScan, String token) {
def matcher = ( toScan =~ /.*${Pattern.quote(token)}:([^\]]*)].*/ )
if(matcher.matches()) {
return matcher.group(1)
} else {
return 'NO_DICE'
}
}
println fetchTokenValue('swkokd sw:defroko swodjejr blah:fizzbuzz] wdkerko', 'blah')
See the online Groovy demo

You could use this regex which grabs anything up to a ] into a group
def String fetchTokenValue(String toScan, String token) {
def match = toScan =~ /.+${token}:([^\]]+)/
if(match) { match[0][1] } else { 'NO_DICE' }
}
def str = 'swkokd sw:defroko swodjejr blah:fizzbuzz] wdkerko'
assert fetchTokenValue(str, 'blah') == 'fizzbuzz'
assert fetchTokenValue(str, 'boo') == 'NO_DICE'

Related

scala regex meaning

i am new to scala and hate regex :D
cuurently i am debuggig a piece of code
def validateReslutions(reslutions: String): Unit = {
val regex = "(\\d+-\\d+[d,w,m,h,y],?)*"
if (!reslutions.matches(regex)) {
throw new Error("no match")
} else {
print("matched")
}
}
validateReslutions(reslutions = "(20-1w,100-1w)")
}
the problem is it produces no match for this input , so how to correct the regex to match this input
Your (20-1w,100-1w) string contains a pair of parentheses at the start and end, and the rest matches with your (\d+-\d+[d,w,m,h,y],?)* regex. Since String#matches requires a full string match, you get an exception.
Include the parentheses patterns to the regex to avoid the exception:
def validateReslutions(reslutions: String): Unit = {
val regex = """\((\d+-\d+[dwmhy],?)*\)"""
if (!reslutions.matches(regex)) {
throw new Error("no match")
} else {
print("matched")
}
}
validateReslutions(reslutions = "(20-1w,100-1w)")
// => matched
See the Scala demo.
Note the triple quotes used to define the string literal, inside which you can use single backslashes to define literal backslash chars.
Also, mind the absence of commas in the character class, they match literal commas in the text, they do not mean "or" inside character classes.

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

regex - how to specify the expressions to exclude

I need to replace two characters {, } with {\n, \n}.
But they must be not surrounded in '' or "".
I tried this code to achieve that
text = 'hello(){imagine{myString("HELLO, {WORLD}!")}}'
replaced = re.sub(r'{', "{\n", text)
Ellipsis...
Naturally, This code replaces curly brackets that are surrounded in quote marks.
What are the negative statements like ! or not that can be used in regular expressions?
And the following is what I wanted.
hello(){
imagine{
puts("{HELLO}")
}
}
In a nutshell - what I want to do is
Search { and }.
If that is not enclosed in '' or ""
replace { or } to {\n or \n}
In the opposite case, I can solve it with (?P<a>\".*){(?P<b>.*?\").
But I have no clue how I can solve it in my case.
First replace all { characters with {\n. You will also be replacing {" with {\n". Now, you can replace back all {\n" characters with {".
text = 'hello(){imagine{puts("{HELLO}")}}'
replaced = text.replace('{', '{\n').replace('{\n"','{"')
You may match single and double quoted (C-style) string literals (those that support escape entities with backslashes) and then match { and } in any other context that you may replace with your desired values.
See Python demo:
import re
text = 'hello(){imagine{puts("{HELLO}")}}'
dblq = r'(?<!\\)(?:\\{2})*"[^"\\]*(?:\\.[^"\\]*)*"'
snlq = r"(?<!\\)(?:\\{2})*'[^'\\]*(?:\\.[^'\\]*)*'"
rx = re.compile(r'({}|{})|[{{}}]'.format(dblq, snlq))
print(rx.pattern)
def repl(m):
if m.group(1):
return m.group(1)
elif m.group() == '{':
return '{\n'
else:
return '\n}'
# Examples
print(rx.sub(repl, text))
print(rx.sub(repl, r'hello(){imagine{puts("Nice, Mr. \"Know-all\"")}}'))
print(rx.sub(repl, "hello(){imagine{puts('MORE {HELLO} HERE ')}}"))
The pattern that is generated in the code above is
((?<!\\)(?:\\{2})*"[^"\\]*(?:\\.[^"\\]*)*"|(?<!\\)(?:\\{2})*'[^'\\]*(?:\\.[^'\\]*)*')|[{}]
It can actually be reduced to
(?<!\\)((?:\\{2})*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'))|[{}]
See the regex demo.
Details:
The pattern matches 2 main alternatives. The first one matches single- and double-quoted string literals.
(?<!\\) - no \ immediately to the left is allowed
((?:\\{2})*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')) - Group 1:
(?:\\{2})* - 0+ repetitions of two consecutive backslashes
(?: - a non-capturing group:
"[^"\\]*(?:\\.[^"\\]*)*" - a double quoted string literal
| - or
'[^'\\]*(?:\\.[^'\\]*)*' - a single quoted string literal
) - end of the non-capturing group
| - or
[{}] - a { or }.
In the repl method, Group 1 is checked for a match. If it matched, the single- or double-quoted string literal is matched, it must be put back where it was. Else, if the match value is {, it is replaced with {\n, else, with \n}.
Replace { with {\n:
text.replace('{', '{\n')
Replace } with \n}:
text.replace('}', '\n}')
Now to fix the braces that were quoted:
text.replace('"{\n','"{')
and
text.replace('\n}"', '}"')
Combined together:
replaced = text.replace('{', '{\n').replace('}', '\n}').replace('"{\n','"{').replace('\n}"', '}"')
Output
hello(){
imagine{
puts("{HELLO}")
}
}
You can check the similarities with the input and try to match them.
text = 'hello(){imagine{puts("{HELLO}")}}'
replaced = text.replace('){', '){\n').replace('{puts', '{\nputs').replace('}}', '\n}\n}')
print(replaced)
output:
hello(){
imagine{
puts("{HELLO}")
}
}
UPDATE
try this: https://regex101.com/r/DBgkrb/1

Validations.pattern doesn't work with Regex

I have reactive form and one of the controls should be validated with the pattern .*[^\s].*. In case it is used in template-driven forms, it works well. But in a case of reactive - does not. What should I change to fix it?
this._form = this._formBuilder.group({
description: ['', [Validators.required, Validators.pattern('.*[^\S].*')]]
}
Have a look at the Validator.js:
/**
* Validator that requires a control to match a regex to its value.
*/
static pattern(pattern: string|RegExp): ValidatorFn {
if (!pattern) return Validators.nullValidator;
let regex: RegExp;
let regexStr: string;
if (typeof pattern === 'string') {
regexStr = `^${pattern}$`;
regex = new RegExp(regexStr);
} else {
regexStr = pattern.toString();
regex = pattern;
}
return (control: AbstractControl): {[key: string]: any} => {
if (isEmptyInputValue(control.value)) {
return null; // don't validate empty values to allow optional controls
}
const value: string = control.value;
return regex.test(value) ? null :
{'pattern': {'requiredPattern': regexStr, 'actualValue': value}};
};
}
The point is that the regex pattern is passed either like a regex literal, or as a string literal. When you pass it as a regex literal, you may use
/\S*\s.*/
It will be "converted" to a string with pattern.toString() and ^ and $ will be added - "^\\S*\\s.*$". This is exactly a string pattern you may pass in the Validators.Pattern, too.
Note that ^\S*\s.*$ will match a string that starts with 0+ non-whitespace chars, then has 1 obligatory whitespace, and then has any 0+ chars other than line break chars. It is a bit faster than /^.*\s.*$/. Note that \s = [^\S].

How to exclude a string in groovy using regex?

I've got a string characters, such as XXXabcdacefgabcdcbefgabcdmn. I need to exclude the string "efg" from the original string and each item must start with "abcd" (so I cannot just use split simply).
Here is my sample code:
def str = "XXXabcdacefgabcdcbefgabcdmn"
def matcher = (str =~ /^\/(?!efg)([a-z0-9]+)$/)
//I just tried the solution searched from google but it don't work.
matcher.each {
println it
}
The expected result should be:
abcdac
abcdcb
abcdmn
Any comment is very appreciated.
def s = "XXXabcdacefgabcdcbefgabcdmn"
def m = s =~ /abcd(?:(?!efg).)*/
(0..<m.count).each { print m[it] + '\n' }
Working Demo
Explanation:
abcd # 'abcd'
(?: # group, but do not capture (0 or more times):
(?! # look ahead to see if there is not:
efg # 'efg'
) # end of look-ahead
. # any character except \n
)* # end of grouping
You could also split here:
def s = "XXXabcdacefgabcdcbefgabcdmn"
def m = s.split(/efg/)*.dropWhile { it != 'a' }
println m.join('\n')
I need to exclude the string "efg" from the original string and it should start with "abcd"
This might help you. Get matched group from desired index.
(abcd(.*?)(?=efg)|(?<=efg).*$)
DEMO
OR try this one as well
(abcd(.*?)(?=efg|$))
DEMO
Sample code:
def str = "XXXabcdacefgabcdcbefgabcdmn"
def matcher = str =~ /(abcd(.*?)(?=efg)|(?<=efg).*$)/
matcher.each { println it[0] }
Here is something without using regex but pure tools provided by Groovy. :)
def str = "XXXabcdacefgabcdcbefgabcdmn"
assert ['abcdac', 'abcdcb', 'abcdmn'] ==
str.split(/efg/).findAll { it.contains(/abcd/) }*.dropWhile { it != 'a' }
Ok, you can use this pattern with a matcher:
(?:(?=abcd)|\G(?!\A)efg)((?:(?!efg).)*)
The substrings you need are in the first capturing group.
demo
An other way:
(?:(?=abcd)|\G(?!\A)efg)((?>[^e]+|e(?!fg))*)
demo