Regular expression search - regex

I've got a list of lines about 1500 in total. I'm trying to write a regular expression to find the ones that do not contain exactly 8 of the string ?d . Now the problem is there could be other characters in the middle of the ?d's. I don't care about the other characters being there, but I do need exactly 8 (total) of the ?d's.
For example, this line is OK: ?d?u?d?u?d?u?d?d?d?d?d (8 ?d)
This line is not: ?d?d?d?d?u?d?d?d?d?u?d (9 ?d)
This line is not: ?d?l?u?d?d?d?d?d?d?d?d (9 ?d)
The problem is the other characters (which are ?u and ?l) can occur anywhere in the line. Is there a regular expression, or series of regular expressions, that can do this? I'm using Notepad++ regular expressions.
It doesn't have to be all in one shot. For instance, I've already done regular expression searches for [\?d]{9,11} which helped, but only eliminated 27 bad lines.

This does what you need:
^(?=(?:\?d.*?){8})(?!(?:\?d.*?){9}).+$
Demo
It starts from the beginning, ensures the line contains 8 ?d groups, but rejects it if it contains 9 of them (or more). Full explanation:
^ start of the string
(?=(?:\?d.*?){8}) positive lookahead: must be followed by this pattern: (?:\?d.*?){8}
\?d.*? matches the literal string ?d, followed by zero or more characters, matching as few as necessary
{8} 8 occurrences in a row of the preceding pattern
(?!(?:\?d.*?){9}) negative lookahead: must not be followed by this pattern: (?:\?d.*?){9}
\?d.*? matches the literal string ?d, followed by zero or more characters, matching as few as necessary
{9} 9 occurrences in a row of the preceding pattern
.+ match any characters
$ end of the string

Edited
use this pattern
^(?!(?:(?:[^?]|\?(?!d))*?\?d){8}(?:[^?]|\?(?!d))*$)(.*)
Demo

^(?!(?:[^d]*\?d){8}$).*$
You can try this simple regex.See demo.
https://regex101.com/r/uH5sT1/2

Related

Regex for String with first two characters fixed and rest digits

Is there a regular expression for? :
String of length 8
First two chracters fixed 'UE' or 'ue'
remaining 6 characters must be digits [0-9]
Eg: https://regex101.com/r/PufypE/1
The expression i tried
\^(UE|ue){2}[0-9]{6}\
but its not working (no match found!)
You want:
\b(UE|ue)[0-9]{6}\b
You don't need the {2} next to the (UE|ue) since you are specifying those exactly. The \b is a word boundary so this will match a list like you put in the comment: UE123456,ue654321 This is a good site to play with a regex on for this kind of stuff: http://regex101.com
Regex should be:
^[Uu][Ee][0-9]{6}$
(UE|ue){2} in your regex would match 2 occurrences of UE or ue

Regular expression not working

I want to extract from the following regex (?<=^\d+\s*).*?\t trying to extract from the following text just the resources\blahblah:
10 _Resources\index.test FAIL
11 _Resources\index.test FAIL
12 Resources\index.test FAIL
13set\Relicensing Statement.test FAIL
but it captures the following text:
0 _Resources\index.test
1 _Resources\index.test
2 Resources\index.test
3set\Relicensing Statement.test
I just want the lines like Resources\index.test and not the starting numbers, no spaces, why is failing? If I just execute ^\d+\s*and matches with the any number of digits and space, but do not works with prefix.
Since you commented you were using Notepad++, how about matching ^\d+\s*([^\t]*).*$ and replacing by \1 ?
From NSRegularExpression (I saw it was tagged):
Look-behind assertion. True if the parenthesized pattern matches text
preceding the current input position, with the last character of the
match being the input character just before the current position. Does
not alter the input position. The length of possible strings matched
by the look-behind pattern must not be unbounded (no * or +
operators.)
The same problem holds in most of the languages.
Can't you extract $1 from (?:^\d+\s*)(.*?\t)?

Regular Expression begining of string with special characters

Using this for an example string
+$43073$7
and need the 5 number sequence from it I'm using the Regex expression
#"\$+(?<lot>\d{5})"
which is matching up any +$ in the string. I tried
#"^\$+(?<lot>\d{5})"
as the +$ are always at the beginning of the string. What will work?
If you use anchor ^, you need to include the + symbol at the first and don't forget to escape it because + is a special meta character in regex which repeats the previous token one or more times.
#"^\+\$(?<lot>\d{5})"
And without the anchor, it would be like
#"\$(?<lot>\d{5})"
And get the 5 digit number you want from group index 1.
DEMO
I would match what you want:
\d+
or if you only want digits after "special" characters at the start of input:
^\W+(\d+)
grabbing group 1

Limit number of alpha characters in regular expression

I've been struggling to figure out how to best do this regular expression.
Here are my requirements:
Up to 8 characters
Can only be alphanumeric
Can only contain up to three alpha characters [a-z] (zero alpha characters are valid to)
Any ideas would be appreciated.
This is what I've got so far, but it only looks for contiguous letter characters:
^(\d|([A-Za-z])(?!([A-Za-z]{3,}))){0,8}$
I'd write it like this:
^(?=[a-z0-9]{0,8}$)(?:\d*[a-z]){0,3}\d*$
It has two parts:
(?=[a-z0-9]{0,8}$)
Looksahead and matches up to 8 alphanumeric to the end of the string
(?:\d*[a-z]){0,3}\d*$
Essentially allowing injection of up to 3 [a-z] among \d*
Rubular
On rubular.com
12345678 // matches
123456789
#(#*#$
12345 // matches
abc12345
abcd1234
12a34b5c // matches
12ab34cd
123a456 // matches
Alternatives
I do think regex is the best solution for this, but since the string is short, it would be a lot more readable to do this in two steps as follows:
It must match [a-z0-9]{0,8}
Then, delete all \d
The length must now be <= 3
Do you have to do this in exactly one regular expression? It is possible to do that with standard regular expressions, but the regular expression will be rather long and complicated. You can do better with some of the Perl extensions, but depending on what language you're using, they may or may not be supported. The cleanest solution is probably to check whether the string matches:
^[A-Za-z0-9]{0,8}$
but doesn't match:
([A-Za-z].*){4}
i.e. it's an alpha string of up to 8 characters (first regular expression), but doesn't contain 4 or more alpha characters (possibly separated by other characters (second regular expression).
/^(?!(?:\d*[a-z]){4})[a-z0-9]{0,8}$/i
Explanation:
[a-z0-9]{0,8} matches up to 8 alphanumerics.
Lookahead should be placed before the matching happens.
The (?:\d*[a-z]) matches 1 alphabetic anywhere. The {4} make the count to 4. So this disables the regex from matching when 4 alphabetics can be found (i.e. limit the count to ≤3).
It's better not to exploit regex like this. Suppose you use this solution, are you sure you will know what the code is doing when you revisit it 1 year later? A clearer way is just check rule-by-rule, e.g.
if len(theText) <= 8 and theText.isalnum():
if sum(1 for c in theText if c.isalpha()) <= 3:
# valid
The easiest way to do this would be in multiple steps:
Test the string against /^[a-z0-9]{0,8}$/i -- the string is up to 8 characters and only alphanumeric
Make a copy of the string, delete all non-alphabetic characters
See if the resulting string has a length of 3 or less.
If you want to do it in one regular expression, you can use something like:
/^(?=\d*(?:[a-z]?\d*){0,3}$)[a-z0-9]{0,8}$/i
Which looks for a alphanumeric string between length 0 and 8 (^[a-z0-9]{0,8}$), but first uses a lookahead ((?=\d*(?:[a-z]?\d*){0,3}$)) to make sure that the string
has at most 3 alphabetic characters.

Regex negative match query

I've got a regex issue, I'm trying to ignore just the number '41', I want 4, 1, 14 etc to all match.
I've got this [^\b41\b] which is effectively what I want but this also ignores all single iterations of the values 1 and 4.
As an example, this matches "41", but I want it to NOT match:
\b41\b
Try something like:
\b(?!41\b)(\d+)
The (?!...) construct is a negative lookahead so this means: find a word boundary that is not followed by "41" and capture a sequence of digits after it.
You could use a negative look-ahead assertion to exclude 41:
/\b(?!41\b)\d+\b/
This regular expression is to be interpreted as: At any word boundary \b, if it is not followed by 41\b ((?!41\b)), match one or more digits that are followed by a word boundary.
Or the same with a negative look-behind assertion:
/\b\d+\b(?<!\b41)/
This regular expression is to be interpreted as: Match one or more digits that are surrounded by word boundaries, but only if the substring at the end of the match is not preceded by \b41 ((?<!\b41)).
Or can even use just basic syntax:
/\b(\d|[0-35-9]\d|\d[02-9]|\d{3,})\b/
This matches only sequences of digits surrounded by word boundaries of either:
one single digit
two digits that do not have a 4 at the first position or not a 1 at the second position
three or more digits
This is similar to the question "Regular expression that doesn’t contain certain string", so I'll repeat my answer from there:
^((?!41).)*$
This will work for an arbitrary string, not just 41. See my response there for an explanation.