Regular expression not working - regex

I want to extract from the following regex (?<=^\d+\s*).*?\t trying to extract from the following text just the resources\blahblah:
10 _Resources\index.test FAIL
11 _Resources\index.test FAIL
12 Resources\index.test FAIL
13set\Relicensing Statement.test FAIL
but it captures the following text:
0 _Resources\index.test
1 _Resources\index.test
2 Resources\index.test
3set\Relicensing Statement.test
I just want the lines like Resources\index.test and not the starting numbers, no spaces, why is failing? If I just execute ^\d+\s*and matches with the any number of digits and space, but do not works with prefix.

Since you commented you were using Notepad++, how about matching ^\d+\s*([^\t]*).*$ and replacing by \1 ?

From NSRegularExpression (I saw it was tagged):
Look-behind assertion. True if the parenthesized pattern matches text
preceding the current input position, with the last character of the
match being the input character just before the current position. Does
not alter the input position. The length of possible strings matched
by the look-behind pattern must not be unbounded (no * or +
operators.)
The same problem holds in most of the languages.
Can't you extract $1 from (?:^\d+\s*)(.*?\t)?

Related

Regular Expression: Find a specific group within other groups in VB.Net

I need to write a regular expression that has to replace everything except for a single group.
E.g
IN
OUT
OK THT PHP This is it 06222021
This is it
NO MTM PYT Get this content 111111
Get this content
I wrote the following Regular Expression: (\w{0,2}\s\w{0,3}\s\w{0,3}\s)(.*?)(\s\d{6}(\s|))
This RegEx creates 4 groups, using the first entry as an example the groups are:
OK THT PHP
This is it
06222021
Space Charachter
I need a way to:
Replace Group 1,2,4 with String.Empty
OR
Get Group 3, ONLY
You don't need 4 groups, you can use a single group 1 to be in the replacement and match 6-8 digits for the last part instead of only 6.
Note that this \w{0,2} will also match an empty string, you can use \w{1,2} if there has to be at least a single word char.
^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$
^ Start of string
\w{0,2}\s\w{0,3}\s\w{0,3}\s Match 3 times word characters with a quantifier and a whitespace in between
(.*?) Capture group 1 match any char as least as possible
\s\d{6,8} Match a whitespace char and 6-8 digits
\s? Match an optional whitespace char
$ End of string
Regex demo
Example code
Dim s As String = "OK THT PHP This is it 06222021"
Dim result As String = Regex.Replace(s, "^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$", "$1")
Console.WriteLine(result)
Output
This is it
My approach does not work with groups and does use a Replace operation. The match itself yields the desired result.
It uses look-around expressions. To find a pattern between two other patterns, you can use the general form
(?<=prefix)find(?=suffix)
This will only return find as match, excluding prefix and suffix.
If we insert your expressions, we get
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6}\s?)
where I simplified (\s|) as \s?. We can also drop it completely, since we don't care about trailing spaces.
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6})
Note that this works also if we have more than 6 digits because regex stops searching after it has found 6 digits and doesn't care about what follows.
This also gives a match if other things precede our pattern like in 123 OK THT PHP This is it 06222021. We can exclude such results by specifying that the search must start at the beginning of the string with ^.
If the exact length of the words and numbers does not matter, we simply write
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+)
If the find part can contain numbers, we must specify that we want to match until the end of the line with $ (and include a possible space again).
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+\s?$)
Finally, we use a quantifier for the 3 ocurrences of word-space:
(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)
This is compact and will only return This is it or Get this content.
string result = Regex.Match(#"(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)").Value;

Unable to replace some values in regex

This is my input:
0,0,0,1
1,023,1230,1,0
,1,0,01-09-2018,1,
I want to replace 0s and 1s whose length is 1. Rest of them will be as it is.
I already tried with javascript code i.e. split all the strings with "," as delimiter. Then, checking for strings with length 1 and replacing them as per logic. But that's a tedious method which consumes a lot of time.
I want a Regex that can do the replacements in entire input.
I have already tried with this regex: ((0|1)(?<=,))|((0|1)(?=,)). But the output is wrong
Output will be such:
N,N,N,Y
Y,023,1230,Y,N
,Y,N,01-09-2018,Y,
You can use the following regexps with comma word boundaries:
(?<![^,])1(?![^,])
(?<![^,])0(?![^,])
Replace with the appropriate substring.
They match 1 or 0 only when enclosed with commas or start/end of string positions.
(?<![^,]) - a negative lookbehind that matches a position not immediately preceded with a char other than ,
(?![^,]) - a negative lookahead that matches a position not immediately followed with a char other than ,.

Regex to extract last period and md5 string

I have the following regular expression:
/^[a-f0-9]{8}$/ --- This expression extracts an 8 character string as a md5 hash, for example: if I have the following string "hello world .305eef9f x1xxx 304ccf9f test1232" it will return "304ccf9f"
I also have the following regular expression:
/.[^.]*$/ --- This expression extracts a string after the last period (included), for example, if I have "hello world.this.is.atest.case9.23919sd3xxxs" it will return ".23919sd3xxxs"
Thing is, I've readen a bit about regex but I can't join both expressions in order to find the md5 string after the last period (included), for example:
topLeftLogo.93f02a9d.controller.99f06a7s ----> must return ".99f06a7s"
Thanks in advance for your time and help!
/^[a-f0-9]{8}$/ --- This expression extracts an 8 character string as a md5 hash
Yes but it doesn't return "304ccf9f" from "hello world .305eef9f x1xxx 304ccf9f test1232" because ^ in regex means start of string. How is it possible for it to match in middle of a string?
/.[^.]*$/ --- This expression extracts a string after the last period
No. It will do if you escape first dot only \.
To combine these two you have to replace ^ with \.:
\.[a-f0-9]{8}$
To match your characters 8 times after the last dot in this range [a-f0-9] you might use (if supported) a positive lookahead (?!.*\.) to match your values and assert that what follows does not contain a dot:
\.[a-f0-9]{8}(?!.*\.)
Regex demo
If you want to match characters from a-z instead of a-f like 99f06a7s you could use [a-z0-9]
About the first example
This regex ^[a-f0-9]{8}$ will match one of the ranges in the character class 8 times from the start until the end of the string due to the anchors ^ and $. It would not find a match in hello world .305eef9f x1xxx 304ccf9f test1232 on the same line.
About the second example
.[^.]*$ will match any character zero or more times followed by matching not a dot. That would for example also match a single a and is not bound to first matching a dot because you have to escape the dot to match it literally.
I'm adding this just in case people needs to solve a similar casuistic:
Case 1: for example, we want to get the hexadecimal ([a-f0-9]) 8 char string from our filename string
between the last period and the file extension, in order, for example, to remove that "hashed" part:
Example:
file.name2222.controller.2567d667.js ------> returns .2567d667
We will need to use the following regex:
\.[a-f0-9]{8}(?=\.\w+$)
Case 2: for example, we want the same as above but ignoring the first period:
Example:
file.name2222.controller.2567d667.js ------> returns 2567d667
We will need to use the following regex
[a-f0-9]{8}(?=\.\w+$)

Regex to extract values from look behind groups along with subsequent repetitions

In a JAVA program, I need to match a text input with a regular expression pattern. Simplistically, the text input looks like this: "100|200|123,124,125".
The output from the above match should find three matches, where all matches will return the two fixed subgroups - 100 and 200 and the variable repeating sub-group 123/124/125.
Match 1 - 123
Match 2 - 124
Match 3 - 125.
Each of these match output should also include 100 and 200 in two separate groups.
So basically, matches will target extracting patterns such as '100|200|123', '100|200|124', '100|200|125'.
I have used this regex: (?<=(?:(?<first>\d+)\|(?<second>\d+)\|)|,)(?<vardata>\d+)(?=,|$).
But I get this error: + A quantifier inside a look-behind makes it non-fixed width
As stated in comments above, you cannot use variable length assertions in lookbehind in Java regex.
However you can use this regex based on \G:
(?:(\d+)\|(\d+)\||(?<!^)\G,)(\d+)
RegEx Demo
RegEx Details:
\G asserts position at the end of the previous match or the start of the string for the first match.
You will get comma separated numbers in group(3) in a loop while group(1) and group(2) will give you first 2 numbers from input string.

Regular expression search

I've got a list of lines about 1500 in total. I'm trying to write a regular expression to find the ones that do not contain exactly 8 of the string ?d . Now the problem is there could be other characters in the middle of the ?d's. I don't care about the other characters being there, but I do need exactly 8 (total) of the ?d's.
For example, this line is OK: ?d?u?d?u?d?u?d?d?d?d?d (8 ?d)
This line is not: ?d?d?d?d?u?d?d?d?d?u?d (9 ?d)
This line is not: ?d?l?u?d?d?d?d?d?d?d?d (9 ?d)
The problem is the other characters (which are ?u and ?l) can occur anywhere in the line. Is there a regular expression, or series of regular expressions, that can do this? I'm using Notepad++ regular expressions.
It doesn't have to be all in one shot. For instance, I've already done regular expression searches for [\?d]{9,11} which helped, but only eliminated 27 bad lines.
This does what you need:
^(?=(?:\?d.*?){8})(?!(?:\?d.*?){9}).+$
Demo
It starts from the beginning, ensures the line contains 8 ?d groups, but rejects it if it contains 9 of them (or more). Full explanation:
^ start of the string
(?=(?:\?d.*?){8}) positive lookahead: must be followed by this pattern: (?:\?d.*?){8}
\?d.*? matches the literal string ?d, followed by zero or more characters, matching as few as necessary
{8} 8 occurrences in a row of the preceding pattern
(?!(?:\?d.*?){9}) negative lookahead: must not be followed by this pattern: (?:\?d.*?){9}
\?d.*? matches the literal string ?d, followed by zero or more characters, matching as few as necessary
{9} 9 occurrences in a row of the preceding pattern
.+ match any characters
$ end of the string
Edited
use this pattern
^(?!(?:(?:[^?]|\?(?!d))*?\?d){8}(?:[^?]|\?(?!d))*$)(.*)
Demo
^(?!(?:[^d]*\?d){8}$).*$
You can try this simple regex.See demo.
https://regex101.com/r/uH5sT1/2