End RegEx after certain digit - regex

I am working on a filter based upon browser version and am having a little trouble. It has to be in RegEx which loves to encompass everything possible.
I want to select:
12.0
8.0
18.0.1025.168
The problem I am having (looking at the 12.0 specifically however it is a problem for all 3) is that it is selecting things other than 12.0 as well. I have been trying to use negated sets and non-capturing groups however it just isn't quite working.
Currently I have: ((?:18.0.1025.168[^.]|(12.0)[^.]|(?:8.0)[^.]))
I have used \d in the negated sets however it seems as though I have to choose \d or . because it does not allow for special characters within the set.
Things that I need to make sure are not selected include any variation of the following, (the 9's could be any number)
9.12.0
912.0
92.09
12.0.9
Any input of what I should look into or another symbol I could use would be greatly appreciated. Also, if needed I can break this into 3 different formulas that will all fire however would like to avoid that is possible

What about (\A|^)(12.0|8.0|18.0.1025.168)($|\z)

Related

Are named capture groups supported? If so, how to engage?

I understand that VSCode uses the JavaScript regex engine for its functionality.
The latest JavaScript specification allows for named capture groups to be used.
However, I am at a loss in understanding whether this is enabled in VSCode v1.43?
I am using the following notations in the general find command:
(?<name-of-capture>pattern to find)( other stuff )(\k<name-of-capture>)
(?<name-of-capture>pattern to find)( other stuff )(\g<name-of-capture>)
I have also used the combinations of \k'name' and \g'name' and these have no effect.
If anyone has insights into this I would appreciate to hear.
If you want to use an inline backreference, they work in VSCode.
(?<group>[a-z]+) \d+ \k<group>
matches abc 1 abc.
However, new JavaScript-like $<group> replacement does not work, .NET-style replacement backreference, ${group}, does not work either, probably, due to the issue referred to by #JW.
NOTE: They say they need 20 votes on the issue and there are 3 days to go before they close the issue and turn down the suggestion to introduce backreferences in replacement. If you want this feature to be implemented, please consider voting for that issue.

Improve regex that works

I'm not a regex expert, so please be nice :-)
I created this regex to verify if a user submitted a day of the week (in italian language):
/((lun|mart|giov)e|mercol(e?)|vener)d(ì|i('?)|í)|sabato|domenica/
This regex perfectly works and it matches the following:
lunedi
lunedì
lunedí
lunedi’
martedi
martedì
martedí
martedi'
mercoledi
mercoledì
mercoledí
mercoledi'
mercoldi
mercoldì
mercoldí
mercoldi'
giovedi
giovedì
giovedí
giovedi'
venerdi
venerdì
venerdí
venerdi'
sabato
domenica
Now consider the first part of the regex and focus on venerdì: as you can see, I added an OR (|) just to manage the venerdì day, just because of the presence of that “r”.
Anything works just fine but I’m here to ask if is there any way to start the regex this way:
(lun|mar|giov|ven)e
and then manage that “r” some way.
I red about backrefences and conditionals but I’m not sure they can be of any help.
My idea is something like: “if the first group captured ‘ven’, than add “r” to the “e” right after the end of the group.
Is this possible?
Don't "golf" your regex. If you want to improve it at all, make it more readable. While it it certainly worthwile to use different cases for the different "i" variants, everything else should IMHO be kept as simple as possible.
How about something like this?
(lune|marte|mercole?|giove|vener)d(ì|i'?|í)|sabato|domenica
Don't use backreferences and other advanced features if you don't need them, just to make your regex a few chars shorter. Even if you would still understand what it means, think about your fellow co-developers -- or just yourself two months from now.
I just removed a few redundant (...) and the "shared e" part. Note how (besides the (...)) it is the same length, whether you use (lun|mart|giov)e or lune|marte|giove, but the latter is arguably more readable. Similarly, a backreference or some conditional would likely make your regex longer instead of shorter -- and considerably more complicated.

RegEx to match sets of literal strings along with value ranges

Utter RegEx noob here with a project involving RegEx I need to modify. Has been a blast learning all of this.
I need to search for/verify a set of vales that start with one of two string combinations (NC or KH) and a variable numeric list—unique to each string prefix. NC01-NC13 or KH01-11.
I have been able to pull off the first common "chunk" of this with:
^(NC|KH)0[1-9]$
to verify NC01-NC09 or KH01-KH09. The next part is completely throwing me—needing to change the leading character of the two-digit character to a 1 vs a 0, and restricting the range to 0–3 for NC and 0–1 for KH.
I have found references abound for selecting between two strings (where I got the (NC|KH) from), but nothing as detailed as how to restrict following values based on the found text.
Any and all help would be greatly appreciated, as well as any great references/books/tutorials to RegEx (currently using Regular-Expressions.info).
The best way to do this is to just separate the two case altogether.
((NC(0\d|1[0-3])|(KH(0\d|1[01])))
You might want to turn some of those internal capturing groups into non capturing groups, but that make the regex a little hard to read.
Edit: You might also be able to do this with positive lookbehind.
Edit: Here's a regex using lookbehind. It's a lot messier, and not really necessary here, but hopefully demonstrates the utility:
(KH|NC)(0\d|(?<=KH)(1[01])|(?<=NC)(1[0-3]))
Sticking with your original idea of options for NC or KH, do the same for the numbers, try this:
^(NC|KH)(0[1-9]|1[0-3])$
Hope that makes sense
EDIT:
Based upon #Patrick's comment below, and sticking with this original answer, you could use this (although I bet there's a better way):
^(NC|KH)(0[1-9]|1[0-1])|(NC1[2-3])$

Selenum-server fails regexp pattern that passes in Selenium-IDE

i have a (i guess) simple question.
I am running Selenium test cases (HTML, selense) after a successful build in Hudson. The testcases pass when i run them in the IDE after the build but not on the server. The cases that fails holds a regexp expression like so:
regexp:The profile details have been updated, it can take up to [0-9]*\s\w*\(.\) until the changes are fully visible.
the goal is to match times like 1 hour(s), 30 second(s) 4 minutes(s)
Have someone encountered a problem like this and how did you solve it?
I would use a slightly simpler expression, assuming you do not need to handle things like 1 second, 2 seconds, etc, but only need to handle things like 1 second(s) or 2 second(s), you could use this expression:
try replacing the regex stuff with
[0-9]+ [a-z]+\(s\)
Some older regex flavors (POSIX ERE) don't allow the shorthand character classes, so if those rules are somehow being invoked, then it would fail on \s and \w (and maybe even the .). The above expression should work almost anywhere; though it is not as flexible, it should be flexible enough for your situation.
Some flavors that are even older (POSIX BRE) don't support the shorthand repetition and actually interpret curly braces and parentheses differently as well. If that is the flavor being expected, it might just fail on any normal expression, and you'd need to usde something like:
[0-9]\{1,\} [a-z]\{1,\}(s)
If neither of these work, then there is a significant difference in the engines implementing your regex OR there is some difference in how the page is rendering and being parsed/searched/analyzed by the Selenium Server.
If that is the case, there could be <br> tags or formatting tags (like <i> or <span>) that are being parsed as text rather than being extracted from the string. Then, something like changes are fully visible might be parsed as changes are <em>fully</em> visible and thus failing the test
When you test it locally versus hudson run, are there differences in OS platform?
I know that I had some problems with windows xp versus unix (where hudson was deployed). Although I don't think that is the case here, just a thought though.

Regex PatternRepository pattern on BlackBerry 5 - how to ignore case

I hope this title makes sense - I need case-insensitive regex matching on BlackBerry 5.
I have a regular expression defined as:
public static final String SMS_REG_EXP = "(?i)[(htp:/w\\.)]*cobiinteractive\\.com/[\\w|\\%]+";
It is intended to match "cobiinteractive.com/" followed by some text. The preceding (htp:w.) is just there because on my device I needed to override the internal link-recognition that the phone applies (shameless hack).
The app loads at start-up. The idea is that I want to pick up links to my site from sms & email, and process them with my app.
I add it to the PatternRepository using:
PatternRepository.addPattern(
ApplicationDescriptor.currentApplicationDescriptor(),
GlobalConstants.SMS_REG_EXP,
PatternRepository.PATTERN_TYPE_REGULAR_EXPRESSION,
applicationMenu);
On the os 4.5 / 4.7 simulators and on
a Curve 8900 device (running 4.5),
this works.
On the os 5 simulators and the Bold
9700 I tested, app fails to compile
the pattern with an
IllegalArgumentException("unrecognized
character after (?").
I have also tried (naively) to set the pattern to "/rockstar/i" but that only matches the exact string - this is possibly the correct direction to take, but if so, I don't know how to implement it on the BB.
How would I modify my regex in order to pick up case insensitive patterns using the PatternRepository as above?
PS: would the "correct" way be to use the [Cc][Oo][Bb][Ii]2... etc pattern? This is ok for a short string, but I am hoping for a more general solution if possible?
Well not a real solution for the general problem but this workaround is easy, safe and performant:
As your dealing here with URLs and they are not case-sensitive...
(it doesn't matter if we write google.com or GooGLE.COm or whatever)
The most simple solution (we all love KISS_principle) is to do first a lowercase (or uppercase if you like) on the input and than do a regex match where it doesn't matter whether it's case-sensitive or not because we know for sure what we are dealing with.
Since nobody else has answered this question relating to the PatternRepository class, I will self-answer so I can close it.
One way to do this would be to use a pattern like: [Cc][Oo][Bb][Ii]2[Nn][Tt][Ee][Rr][Aa][Cc][Tt][Ii][Vv][Ee]... etc where for each letter in the string, you put 2 options. Fortunately my string is short.
This is not an elegant solution, but it works. Unfortunately I don't know of a way to modify the string passed to PatternRepository and I think the crash when using the (?i) modifier is a bug in BB.
Use the port of the jakarta regex library:
https://code.google.com/p/regexp-me/
If you use unicode support, it's going to eat memory,
but if you just want case insensitive matching,
you simply need to pass the RE.MATCH_CASEINDEPENDENT flag when you compile your regex.
new RE("yourCaseInsensitivePattern", RE.MATCH_CASEINDEPENDENT | OTHER_FLAGS)