Regular Expression allow whitspace without counting them - regex

How to get
[\d ]{6}
to match:
1 23456
1 2 3456
1 2 3 456
1 2 3 4 56
1 2 3 4 5 6
In other words, I would like the space to not be counted towards the char limit. Something like [\d]{6 + but allow spaces you can eat}

The following will match 6 numbers, with any amount of space characters between them.
(?:\d\s*){5}\d
?: at the beginning there makes the group non-capturing. It's not necessary if all you wish to do is a simple match.
A live example:
https://regex101.com/r/PZJ8DO/2

Just to put my two cents in: you could use the opposite of \d which is \D in most flavors:
^(?:\d\D*){6}$
See a demo on regex101.com.
Note, that this would even allow something like
1a2b3c4d5e6
If this is not what you want (meaning you only want to allow spaces, nothing else), use \s* instead of \D*.

You can try to use
(?<=).*6.*
This will match any line that contains '6' even if there are some white spaces or other characters in the line.
The (?<=) Positive Look Behind.
The . matches any character except line breaks.
The * matches 0 or more of the preceding token.
And 6 matches a "6" Character.
You can test Regular Expression here: RegExr
Note that the positive look behind feature is not supported in all flavors of RegEx.

Related

How I can delete lines which have less than 11 numbers but more than 8 numbers in one line in notepad++

How I can delete lines which have less than 11 numbers but more than 8 numbers in one line in notepad++. The numbers are separeted from each other with letters or spaces, etc.
Your requirement says to remove lines having 9 or 10 digits, but not more or less than this. You may try using lookaheads to handle this. In regex mode, try finding the following pattern:
^(?!.*\d.*\d.*\d.*\d.*\d.*\d.*\d.*\d.*\d.*\d.*\d)(?=.*\d.*\d.*\d.*\d.*\d.*\d.*\d.*\d.*\d).*
Then just replace that with empty string (nothing). Follow the demo below to see that the pattern correctly flags the appropriate lines.
Demo
Edit:
Here is another pattern you may use, without lookaheads, which is a bit easier on the eyes:
^\D*\d\D*\d\D*\d\D*\d\D*\d\D*\d\D*\d\D*\d\D*\d\D*\d?\D*$
This again says to match any line which contains either 9 or 10 digits, but not more or less than this.
Ctrl+H
Find what: ^(?:\D*\d){8}(?:\D*\d){0,3}(?:\R|$)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
^ # beginning of line
(?:\D*\d){8} # non capture group, 0 or more NON digit and 1 digit, may appear 8 times
(?:\D*\d){0,3} # non capture group, 0 or more NON digit and 1 digit, may appear 0 upto 3 times
(?:\R|$) # non capture group, linebreak or end of file
Given:
1234567
12345678
123456789
1234567890
12345678901
123456789012
a1b2c3d4e5f6g7
a1b2c3d4e5f6g7h8
a1b2c3d4e5f6g7h8i9
a1b2c3d4e5f6g7h8i9j0k1l2
Result for given example:
1234567
123456789012
a1b2c3d4e5f6g7
a1b2c3d4e5f6g7h8i9j0k1l2
Screen capture:

Regex is possible to match?

I have files with these filename:
ZATR0008_2018.pdf
ZATR0018_2018.pdf
ZATR0218_2018.pdf
Where the 4 digits after ZATR is the issue number of magazine.
With this regex:
([1-9][0-9]*)(?=_\d)
I can extract 8, 18 or 218 but I would like to keep minimum 2 digits and max 3 digits so the result should be 08, 18 and 218.
How is possible to do that?
You may use
0*(\d{2,3})_\d
and grab Group 1 value. See the regex demo.
Details
0* - zero or more 0 chars
(\d{2,3}) - Group 1: two or three digits
_\d - a _ followed with a digit.
Here is a PCRE variation that grabs the value you need into a whole match:
0*\K\d{2,3}(?=_\d)
See another regex demo
Here, \K makes the regex engine omit the text matched so far (zeros) and then matches 2 to 3 digits that are followed with _ and a digit.
(?:[1-9][0-9]?)?[0-9]{2}(?=_[0-9])
or perhaps:
(?:[1-9][0-9]+|[0-9]{2})(?=_[0-9])
(https://www.freeformatter.com/regex-tester.html, which claims to use the XRegExp library, that you mention in another answer doesn't seem to backtrack into the (?:)? in my first suggestion where necessary, which makes it very different from any regex engine I've encoutered before and makes it prefer to match just the 18 of 218 even though it starts later in the string. But it does work with my second suggestion.
([1-9]\d{2,3})(?=_\d)
{x,y} will match from x to y times the previous pattern, in this case \d
Edit: from your own regex it looked as you wanted the part of the number which starts with a non-zero. However since your examples include leading 0s, maybe you really wanted :
(\d{2,3})(?=_\d)
Which will give you the last 3 digits before underscore unless there are only 2 digits.
I propose you:
^ZATR0*(\d{2,3})_\d+\.pdf$
demo code here. Result:
Match 1 Full match 0-17 ZATR0008_2018.pdf Group 1. 6-8 08
Match 2 Full match 18-35 ZATR0018_2018.pdf Group 1. 24-26 18
Match 3 Full match 36-53 ZATR0218_2018.pdf Group 1. 41-44 218

Adding mandatory characters to strings when building a regex

I am working on the following regex:
^((199)[0-9]|200[0-9]|201[0-8])(0[1-9]|1[0-2])(0[1-9]|1\d|2\d|3[01])\s[0-9]?$
So I have this regex expression, unbolded part(first 8 characters before space) works ok. In the second portion(which will be optional), starting with a space I would like to put : or . characters mandatory (at least once).
So
19991019 will pass
19991019 1233 won't pass because does not include : or .
19991019 10:12:12 will passs
19991019 10.2.4 will pass
19991019123.1231.123 won't pass
19991019 aa.12.22 won't pass (because no letters are allowed)
You need to add an optional pattern like this:
^(199[0-9]|200\d|201[0-8])(0[1-9]|1[0-2])(0[1-9]|1\d|2\d|3[01])(?:\s+(\d+(?:[.:]\d+)+))?$
See the regex demo
The (?:\s+(\d+(?:[.:]\d+)+))? part matches 1 or 0 sequences of:
\s+ - 1 or more whitespaces
(\d+(?:[.:]\d+)+) - a capturing group matching
\d+ - 1 or more digits
(?:[.:]\d+)+ - 1 or more sequences of . or : followed with 1 or more digits
Note you may further tune this using {min,max} limiting quantifiers instead of +. Say, to match 1 to 3 digits, you can use \d{1,3}.
Try this regex:
((199)[0-9]|200[0-9]|201[0-8])(0[1-9]|1[0-2])(0[1-9]|1\d|2\d|3[01]) (?:(?:\d+(?:\.|:|$)){1,4})?
The last part will accept the pattern {digits}{dot/colon/end of string} 4 times (you can adjust it).
Also I would like to mention that I don't know overall use cases (tested for the mentioned ones) and it may need some tweaks.

Regex to match a 2-digit number or a 3 digit number

I need to be able to check if a string contains either a 2 digit or a 4 digit number before a . (period).
For example, 39. is good, and so is 3926., but 392. is not.
I originally had (^\\d{2,4).$) but that allows between a 2 and a 4 digit number preceding a period.
I also tried (^\\d{2}.|\\d{4}.$) but that didn't work.
You can use this regex:
^\d{2}(?:\d{2})?\.$
This regex makes 2nd set of \d{2} optional thus allowing to match 12. or 1234. but not 123..
In the expression (^\d{2}.|\d{4}.$), the dots match any character.
Try escaping them to make them match literal dots: (^\d{2}\.|\d{4}\.$)

How do I write a regex that won't match a certain amount of whitespace?

I'm trying to write a regex that won't match a certain number of white spaces, but it's not going the way I expected.
I have these strings:
123 99999 # has 6 white spaces
321 99999 # same
123 8888 # has 3 white spaces \
321 8888 # same | - These are the lines I
1237777 | want to match
3217777 /
I want to match the last four lines, i.e. starts with 123 or 321 followed by anything but 6 whitespace characters:
^(123|321)[^\ ]{6}.*
This doesn't seem to do the trick - this matches only the two last ones. What am I missing?
" 888"
If you match this up, this does not match [^\ ]{6}: this is saying
[not a space][not a space][not a space][not a space][not a space][not a space]
In this case, you have the problem that the first 3 characters are a space, so it's not matching up right.
You can use a negative lookahead ^(123)|(321)(?!\s{6}). What I prefer because it is more readable, is to write the regular expression to match what you don't want, then negate (i.e., not, !, etc.). I don't know enough about your data, but I would do use \s{6}, then negate it.
Try this:
^(123|321)(?!\s{6}).*
(uses a negative lookahead so see if there are 6 whitespaces in .* match)
What language are you doing this in? If in Perl or something that supports PCREs, you can simply use a negative lookahead assertion:
^(123)|(321)(?!\ {6}).*
You need to first say that it may have 3 whitespaces and then deny the existence of the three more whitespaces, like this:
^([0-9]+)(\s{0,3})([^ ]{3})([0-9]*)$
^([0-9]+) = Accepts one or more numbers in the beginning of your string.
(\s{0,3}) = Accepts zero or up to three spaces.
([^ ]{3}) = Disallow the next 3 spaces after the allowed spaces.
([0-9]*) = Accepts any number after spaces till the end of your string.
Or:
^([0-9]+)(\s{0,3})(?!\s+)([0-9]*)$
The only change here is that after the three allowed spaces it won't accept any more spaces (I particularly like this second option more because it's more readable).
Hope it helps.