how to group in regex matching correctly? - regex

consider following scenario
input string = "WIPR.NS"
i have to replace this with "WIPR2.NS"
i am using following logic.
match pattern = "(.*)\.NS$" \\ any string that ends with .NS
replace pattern = "$12.NS"
In above case, since there is no group with index 12, i get result $12.NS
But what i want is "WIPR2.NS".
If i don't have digit 2 to replace, it works in all other cases but not working for 2.
How to resolve this case?
Thanks in advance,
Alok

Usually depends entirely on your regex engine (I'm not familiar with those that use $1 to represent a capture group, I'm more used to \1 but you'd have the same problem with that).
Some will provide a delimiter that you can use, like:
replace pattern = "${1}2.NS"
which clearly indicates that you want capture group 1 followed by the literal 2.NS.
In fact, by looking at this page, it appears that's exactly the way to do it (assuming .NET):
To replace with the first backreference immediately followed by the digit 9, use ${1}9. If you type $19, and there are less than 19 backreferences, the $19 will be interpreted as literal text, and appear in the result string as such.
Also keep in mind that Jay provides an excellent answer for this specific use case that doesn't require capture groups at all (by just replacing .NS with 2.NS).
You may want to look into that as a possibility - I'll leave this answer here since:
it's the accepted answer; and
it probably better for the more complex cases, like changing X([A-Z])4([A-Z]) with X${1}5${2}, where you have variable text on either side of the bit you wish to modify.

You don't need to do anything with what precedes the .NS, since only what is being matched is subject to replacement.
match pattern = "\.NS$" (any string that ends with .NS -- don't forget to escape the .)
replace pattern = "2.NS"
You can further refine this with lookaround zero-width assertions, but that depends on your regex engine, and you have not specified the environment/programming language in which you are working.

Related

RegEx Replace - Remove Non-Matched Values

Firstly, apologies; I'm fairly new to the world of RegEx.
Secondly (more of an FYI), I'm using an application that only has RegEx Replace functionality, therefore I'm potentially going to be limited on what can/can't be achieved.
The Challange
I have a free text field (labelled Description) that primarily contains "useless" text. However, some records will contain either one or multiple IDs that are useful and I would like to extract said IDs.
Every ID will have the same three-letter prefix (APP) followed by a five digit numeric value (e.g. 12911).
For example, I have the following string in my Description Field;
APP00001Was APP00002TEST APP00003Blah blah APP00004 Apple APP11112OrANGE APP
THE JOURNEY
I've managed to very crudely put together an expression that is close to what I need (although, I actually need the reverse);
/!?APP\d{1,5}/g
Result;
THE STRUGGLE
However, on the Replace, I'm only able to retain the non-matched values;
Was TEST Blah blah Apple OrANGE APP
THE ENDGAME
I would like the output to be;
APP00001 APP00002 APP00003 APP00004 APP11112
Apologies once again if this is somewhat of a 'noddy' question; but any help would be much appreciated and all ideas welcome.
Many thanks in advance.
You could use an alternation | to capture either the pattern starting with a word boundary in group 1 or match 1+ word chars followed by optional whitespace chars.
What you capture in group 1 can be used as the replacement. The matches will not be in the replacement.
Using !? matches an optional exclamation mark. You could prepend that to the pattern, but it is not part of the example data.
\b(APP\d{1,5})\w*|\w+\s*
See a regex demo
In the replacement use capture group 1, mostly using $1 or \1

Regular expression: matching part of words [duplicate]

I'm trying to make a Regex that matches this string {Date HH:MM:ss}, but here's the trick: HH, MM and ss are optional, but it needs to be "HH", not just "H" (the same thing applies to MM and ss). If a single "H" shows up, the string shouldn't be matched.
I know I can use H{2} to match HH, but I can't seem to use that functionality plus the ? to match zero or one time (zero because it's optional, and one time max).
So far I'm doing this (which is obviously not working):
Regex dateRegex = new Regex(#"\{Date H{2}?:M{2}?:s{2}?\}");
Next question. Now that I have the match on the first string, I want to take only the HH:MM:ss part and put it in another string (that will be the format for a TimeStamp object).
I used the same approach, like this:
Regex dateFormatRegex = new Regex(#"(HH)?:?(MM)?:?(ss)?");
But when I try that on "{Date HH:MM}" I don't get any matches. Why?
If I add a space like this Regex dateFormatRegex = new Regex(#" (HH)?:?(MM)?:?(ss)?");, I have the result, but I don't want the space...
I thought that the first parenthesis needed to be escaped, but \( won't work in this case. I guess because it's not a parenthesis that is part of the string to match, but a key-character.
(H{2})? matches zero or two H characters.
However, in your case, writing it twice would be more readable:
Regex dateRegex = new Regex(#"\{Date (HH)?:(MM)?:(ss)?\}");
Besides that, make sure there are no functions available for whatever you are trying to do. Parsing dates is pretty common and most programming languages have functions in their standard library - I'd almost bet 1k of my reputation that .NET has such functions, too.
In your edit you mention an unwanted leading space in the result… to check a leading or trailing condition together with your regex without including this to the result you can use lookaround feature of regex.
new Regex(#"(?<=Date )(HH)?:?(MM)?:?(ss)?")
(?<=...) is a lookbehind pattern.
Regex test site with this example.
For input Date HH:MM:ss, it will match both regexes (with or without lookbehind).
But input FooBar HH:MM:ss will still match a simple regex, but the lookbehind will fail here. Lookaround doesn't change the content of the result, but it prevents false matches (e.g., this second input that is not a Date).
Find more information on regex and lookaround here.

Get all matches for a certain pattern using RegEx

I am not really a RegEx expert and hence asking a simple question.
I have a few parameters that I need to use which are in a particular pattern
For example
$$DATA_START_TIME
$$DATA_END_TIME
$$MIN_POID_ID_DLAY
$$MAX_POID_ID_DLAY
$$MIN_POID_ID_RELTM
$$MAX_POID_ID_RELTM
And these will be replaced at runtime in a string with their values (a SQL statement).
For example I have a simple query
select * from asdf where asdf.starttime = $$DATA_START_TIME and asdf.endtime = $$DATA_END_TIME
Now when I try to use the RegEx pattern
\$\$[^\W+]\w+$
I do not get all the matches(I get only a the last match).
I am trying to test my usage here https://regex101.com/r/xR9dG0/2
If someone could correct my mistake, I would really appreciate it.
Thanks!
This will do the job:
\$\$\w+/g
See Demo
Just Some clarifications why your regex is doing what is doing:
\$\$[^\W+]\w+$
Unescaped $ char means end of string, so, your pattern is matching something that must be on the end of the string, that's why its getting only the last match.
This group [^\W+] doesn't really makes sense, groups starting with [^..] means negate the chars inside here, and \W is the negation of words, and + inside the group means literally the char +, so you are saying match everything that is Not a Not word and that is not a + sign, i guess that was not what you wanted.
To match the next word just \w+ will do it. And the global modifier /g ensures that you will not stop on the first match.
This should work - Based on what you said you wanted to match this should work . Also it won't match $$lower_case_strings if that's what you wanted. If not, add the "i" flag also.
\${2}[A-Z_]+/g

Interesting easy looking Regex

I am re-phrasing my question to clear confusions!
I want to match if a string has certain letters for this I use the character class:
[ACD]
and it works perfectly!
but I want to match if the string has those letter(s) 2 or more times either repeated or 2 separate letters
For example:
[AKL] should match:
ABCVL
AAGHF
KKUI
AKL
But the above should not match the following:
ABCD
KHID
LOVE
because those are there but only once!
that's why I was trying to use:
[ACD]{2,}
But it's not working, probably it's not the right Regex.. can somebody a Regex guru can help me solve this puzzle?
Thanks
PS: I will use it on MYSQL - a differnt approach can also welcome! but I like to use regex for smarter and shorter query!
To ensure that a string contains at least two occurencies in a set of letters (lets say A K L as in your example), you can write something like this:
[AKL].*[AKL]
Since the MySQL regex engine is a DFA, there is no need to use a negated character class like [^AKL] in place of the dot to avoid backtracking, or a lazy quantifier that is not supported at all.
example:
SELECT 'KKUI' REGEXP '[AKL].*[AKL]';
will return 1
You can follow this link that speaks on the particular subject of the LIKE and the REGEXP features in MySQL.
If I understood you correctly, this is quite simple:
[A-Z].*?[A-Z]
This looks for your something in your set, [A-Z], and then lazily matches characters until it (potentially) comes across the set, [A-Z], again.
As #Enigmadan pointed out, a lazy match is not necessary here: [A-Z].*[A-Z]
The expression you are using searches for characters between 2 and unlimited times with these characters ACDFGHIJKMNOPQRSTUVWXZ.
However, your RegEx expression is excluding Y (UVWXZ])) therefore Z cannot be found since it is not surrounded by another character in your expression and the same principle applies to B ([ACD) also excluded in you RegEx expression. For example Z and A would match in an expression like ZABCDEFGHIJKLMNOPQRSTUVWXYZA
If those were not excluded on purpose probably better can be to use ranges like [A-Z]
If you want 2 or more of a match on [AKL], then you may use just [AKL] and may have match >= 2.
I am not good at SQL regex, but may be something like this?
check (dbo.RegexMatch( ['ABCVL'], '[AKL]' ) >= 2)
To put it in simple English, use [AKL] as your regex, and check the match on the string to be greater than 2. Here's how I would do in Java:
private boolean search2orMore(String string) {
Matcher matcher = Pattern.compile("[ACD]").matcher(string);
int counter = 0;
while (matcher.find())
{
counter++;
}
return (counter >= 2);
}
You can't use [ACD]{2,} because it always wants to match 2 or more of each characters and will fail if you have 2 or more matching single characters.
your question is not very clear, but here is my trial pattern
\b(\S*[AKL]\S*[AKL]\S*)\b
Demo
pretty sure this should work in any case
(?<l>[^AKL\n]*[AKL]+[^AKL\n]*[AKL]+[^AKL\n]*)[\n\r]
replace AKL for letters you need can be done very easily dynamicly tell me if you need it
Is this what you are looking for?
".*(.*[AKL].*){2,}.*" (without quotes)
It matches if there are at least two occurences of your charactes sorrounded by anything.
It is .NET regex, but should be same for anything else
Edit
Overall, MySQL regular expression support is pretty weak.
If you only need to match your capture group a minimum of two times, then you can simply use:
select * from ... where ... regexp('([ACD].*){2,}') #could be `2,` or just `2`
If you need to match your capture group more than two times, then just change the number:
select * from ... where ... regexp('([ACD].*){3}')
#This number should match the number of matches you need
If you needed a minimum of 7 matches and you were using your previous capture group [ACDF-KM-XZ]
e.g.
select * from ... where ... regexp('([ACDF-KM-XZ].*){7,}')
Response before edit:
Your regex is trying to find at least two characters from the set[ACDFGHIJKMNOPQRSTUVWXZ].
([ACDFGHIJKMNOPQRSTUVWXZ]){2,}
The reason A and Z are not being matched in your example string (ABCDEFGHIJKLMNOPQRSTUVWXYZ) is because you are looking for two or more characters that are together that match your set. A is a single character followed by a character that does not match your set. Thus, A is not matched.
Similarly, Z is a single character preceded by a character that does not match your set. Thus, Z is not matched.
The bolded characters below do not match your set
ABCDEFGHIJKLMNOPQRSTUVWXYZ
If you were to do a global search in the string, only the italicized characters would be matched:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Regex, continue matching after lookaround

I'm having trouble with lookaround in regex.
Here the problem : I have a big file I want to edit, I want to change a function by another keeping the first parameter but removing the second one.
Let say we have :
func1(paramIWantToKeep, paramIDontWant)
or
func1(func3(paramIWantToKeep), paramIDontWant)
I want to change with :
func2(paramIWantToKeep) in both case.
so I try using positive lookahead
func1\((?=.+), paramIDontWant\)
Now, I just try not to select the first parameter (then I'll manage to do the same with the parenthesis).
But it doesn't work, it appears that my regex, after ignoring the positive look ahead (.+) look for (, paramIDontWant\)) at the same position it was before the look ahead (so the opening parenthesis)
So my question is, how to continue a regex after a matching group, here after (.+).
Thanks.
PS: Sorry for the english and/or the bad construction of my question.
Edit : I use Sublime Text
The first thing you need to understand is that a regex will always match a consecutive string. There will never be gaps.
Therefore, if you want to replace 123abc456 with abc, you can't simply match 123456 and remove it.
Instead, you can use a capturing group. This will allow you to remember a section of the regex for later.
For example, to replace 123abc456 with abc, you could replace this regex:
\d+([a-z]+)\d+
with this string:
$1
What that does is actually replaces the match with the contents of the first capturing group. In this case, the capturing group was ([a-z]+), which matches abc. Thus, the entire match is replaced with just abc.
An example you may find more useful:
Given:
func1(foo, bar)
replacing this regex:
\w+\((\w+),\s*\w+\)
with this string:
func2($1)
results in:
func2(foo)
import re
t = "func1(paramKeep,paramLose)"
t1 = "func1(paramKeep,((paramLose(dog,cat))))"
t2 = "func1(func3(paramKeep),paramDont)"
t3 = "func1(func3(paramKeep),paramDont,((i)),don't,want,these)"
reg = r'(\w+\(.*?(?=,))(,.*)(\))'
keep,lose,end = re.match(reg,t).groups()
print(keep+end)
keep,lose,end = re.match(reg,t1).groups()
print(keep+end)
keep,lose,end = re.match(reg,t2).groups()
print(keep+end)
keep,lose,end = re.match(reg,t3).groups()
print(keep+end)
Produces
>>>
func1(paramKeep)
func1(paramKeep)
func1(func3(paramKeep))
func1(func3(paramKeep))
Apply these two regexp in this order
s/(func1)([^,]*)(, )?(paramIDontWant)(.)/func2$2$5/;
s/(func2\()(func3\()(paramIWantToKeep).*/$1$3)/;
These cope with the two examples you gave. I guess that the real world code you are editing is slightly more complicated but the general idea of applying a series of regexps might be helpful