Groovy regular expression - is it different to "normal" regular expressions? - regex

I am having issues generating a regular expression that works in Groovy.
My aim is: given a string like:
/products/prodA/index-tab2.html
to get a match that returns a match if the string after the last / contains "-tab"n".html"
My initial attempt is with
([^\/]+)(?<=-tab[0-9]\.html$)
which I tested here http://gskinner.com/RegExr/ against the following test data:
/products/prodA/index-tab2.html
/products/prodA/index.html
/products-tab2/prodA/index-tab2.html
and got matches on "index-tab2.html" - so far so good (or so I thought).
Next step is to put this into Groovy:
log.info("KPF: pageName is ${pageName} ")
def matcher = pageName =~ /([^\/]+)(?<=tab[0-9]\.html$)/
if (matcher.matches()) {
log.debugEnabled && log.debug("KPF: Filename has tab = $filename")
} else {
log.debugEnabled && log.debug("KPF: Filename does not have tab")
}
however when I test the code with the input
/products/prodA/index-tab2.html
(there is no trailing space - verified - but left out of this example)
I get the following logged:
2013-07-02~12:51:10 INFO (xxx.site.controllers.PageController # line 35) KPF: pageName is /products/prodA/index-tab2.html (xxx)
2013-07-02~12:51:10 DEBUG (xxx.site.controllers.PageController # line 44) KPF: Filename does not have tab (xxx)
So which regex is "wrong" and how do I get the match I need?

matcher.matches() requires that the whole string match the regular expression, so it will only return true if pageName contains no slashes at all. You probably want to use find() instead of matches(), which returns true if a match is found anywhere in the string.
log.info("KPF: pageName is ${pageName} ")
def matcher = pageName =~ /([^\/]+)(?<=tab[0-9]\.html$)/
if (matcher.find()) {
log.debugEnabled && log.debug("KPF: Filename has tab = ${matcher.group(1)}")
} else {
log.debugEnabled && log.debug("KPF: Filename does not have tab")
}
Or indeed just if(matcher) as a Matcher coerces to boolean in Groovy by calling find(). This is done to support syntax like
if(pageName =~ /..../)
but in your case you need a reference to the actual Matcher in order to extract the parenthesised group.

Related

Matcher of Regex expression is false while the expression, pattern and string are all valid

I am using a regex regular expression like so:
#Test
fun timePatternFromInstantIsValid() {
val instantOfSometimeEarlier = Instant.now().minus(Duration.ofMinutes((1..3).random().toLong()))
val timeOfEvent = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss").withZone(ZoneId.of("UTC")).format(instantOfSometimeEarlier)
val regex = "(\\d{2}-(0[1-9]|1[0-2])-(0[1-9]|[12]\\d|3[01]))T(?:(?:([01]?\\d|2[0-3]):)?([0-5]?\\d):)?([0-5]?\\d)"
val acceptedDatePattern: Pattern = Pattern.compile(regex)
val matcher: Matcher = microsoftAcceptedDatePattern.matcher(timeOfEvent)
val isMatchToAcceptedDatePattern: Boolean = matcher.matches()
print(isMatchToAcceptedDatePattern)
}
isMatchToAcceptedDatePattern for some reason is returning false which probably indicates something is wrong in my regex BUT, when checking it on multiple regex websites I'm getting a valid match. any ideas why?
try it yourself:
https://www.regextester.com/ or here:
https://regex101.com/
my regex - raw (as in the websites):
(\d{2}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))T(?:(?:([01]?\d|2[0-3]):)?([0-5]?\d):)?([0-5]?\d)
pattern example returned like this (it gets returned without the " ' " near the "T"):
2021-04-01T11:12:51 (when I debug this is what I get)
date pattern:
yyyy-MM-ddTHH:mm:ss
could someone inlight me please?
You use matcher.matches() which is like pre- and appending ^ resp. $ to your regex. Such a regex won't work.
Instead you should either:
use matcher.find() which returns true if a match could be found.
prepend \d{2} to your regex and still use matcher.matches(): Demo

Dart: RegExp by example

I'm trying to get my Dart web app to: (1) determine if a particular string matches a given regex, and (2) if it does, extract a group/segment out of the string.
Specifically, I want to make sure that a given string is of the following form:
http://myapp.example.com/#<string-of-1-or-more-chars>[?param1=1&param2=2]
Where <string-of-1-or-more-chars> is just that: any string of 1+ chars, and where the query string ([?param1=1&param2=2]) is optional.
So:
Decide if the string matches the regex; and if so
Extract the <string-of-1-or-more-chars> group/segment out of the string
Here's my best attempt:
String testURL = "http://myapp.example.com/#fizz?a=1";
String regex = "^http://myapp.example.com/#.+(\?)+\$";
RegExp regexp= new RegExp(regex);
Iterable<Match> matches = regexp.allMatches(regex);
String viewName = null;
if(matches.length == 0) {
// testURL didn't match regex; throw error.
} else {
// It matched, now extract "fizz" from testURL...
viewName = ??? // (ex: matches.group(2)), etc.
}
In the above code, I know I'm using the RegExp API incorrectly (I'm not even using testURL anywhere), and on top of that, I have no clue how to use the RegExp API to extract (in this case) the "fizz" segment/group out of the URL.
The RegExp class comes with a convenience method for a single match:
RegExp regExp = new RegExp(r"^http://myapp.example.com/#([^?]+)");
var match = regExp.firstMatch("http://myapp.example.com/#fizz?a=1");
print(match[1]);
Note: I used anubhava's regular expression (yours was not escaping the ? correctly).
Note2: even though it's not necessary here, it is usually a good idea to use raw-strings for regular expressions since you don't need to escape $ and \ in them. Sometimes using triple-quote raw-strings are convenient too: new RegExp(r"""some'weird"regexp\$""").
Try this regex:
String regex = "^http://myapp.example.com/#([^?]+)";
And then grab: matches.group(1)
String regex = "^http://myapp.example.com/#([^?]+)";
Then:
var match = matches.elementAt(0);
print("${match.group(1)}"); // output : fizz

Simple Regular Expression matching

Im new to regular expressions and Im trying to use RegExp on gwt Client side. I want to do a simple * matching. (say if user enters 006* , I want to match 006...). Im having trouble writing this. What I have is :
input = (006*)
input = input.replaceAll("\\*", "(" + "\\" + "\\" + "S\\*" + ")");
RegExp regExp = RegExp.compile(input).
It returns true with strings like BKLFD006* too. What am I doing wrong ?
Put a ^ at the start of the regex you're generating.
The ^ character means to match at the start of the source string only.
I think you are mixing two things here, namely replacement and matching.
Matching is used when you want to extract part of the input string that matches a specific pattern. In your case it seems that is what you want, and in order to get one or more digits that are followed by a star and not preceded by anything then you can use the following regex:
^[0-9]+(?=\*)
and here is a Java snippet:
String subjectString = "006*";
String ResultString = null;
Pattern regex = Pattern.compile("^[0-9]+(?=\\*)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
On the other hand, replacement is used when you want to replace a re-occurring pattern from the input string with something else.
For example, if you want to replace all digits followed by a star with the same digits surrounded by parentheses then you can do it like this:
String input = "006*";
String result = input.replaceAll("^([0-9]+)\\*", "($1)");
Notice the use of $1 to reference the digits that where captured using the capture group ([0-9]+) in the regex pattern.

Capture a substring which is not present in the source string

I want regular expression search and replace as follows:
Input string ends with images, videos, friends
Output string contains the matched suffix
Else
Output string contains the suffix profile
Example input/output:
/john-smith-images -> /user-images
/john-smith-videos -> /user-videos
/john-smith -> /user-profile
I tried this regex which captures the suffix, if present:
/.+?(images|videos|friends)?$/
I am restricted to one regular expression and regular expression only solution. I need to use this in mod_rewrite/IIRF/IIS URL rewrite.
Instead of using .replace(), consider using .match() or .test() inside a conditional, and handling the non-matching case separately.
Give this a try:
text.replace(/[\w-]+(?=(images|videos|friends))/, 'user-').replace(/[\w-]+-(?!(images|videos|friends))\w*/, 'user-profile')
Use String#replace with callback like this:
var regexp = /.+?(images|videos|friends|)$/;
function cb ($0, $1) {
r = $1 ? $1 : 'profile';
return '/user-' + r;
}
console.log("/john-smith-images".replace(regexp, cb));
console.log("/john-smith-videos".replace(regexp, cb));
console.log("/john-smith".replace(regexp, cb));
Output:
/user-images
/user-videos
/user-profile
Live Demo: http://ideone.com/MVRcku

Groovy Regular matching everything between quotes

I have this regex
regex = ~/\"([^"]*)\"/
so Im looking for all text between quotes
now i have the following string
options = 'a:2:{s:10:"Print Type";s:8:"New Book";s:8:"Template";s:9:"See Notes";}'
however doing
regex.matcher(options).matches() => false
should this not be true, and shouldn't I have 4 groups
The matcher() method tries to match the entire string with the regex which fails.
See this tutorial for more info.
I don't know Groovy, but it looks like the following should work:
def mymatch = 'a:2:{s:10:"Print Type";s:8:"New Book";s:8:"Template";s:9:"See Notes";}' =~ /"([^"]*)"/
Now mymatch.each { println it[1] } should print all the matches.