Regex to remove strings in notepad++ that contains certain amount of numbers - regex

If I have the following lines, how to
leave only 10-11 digits long strings that starts with "04" or "05". Don't remove empty lines.
0409999999 012345678
012345678 0409999999
023456789 034566 0455555555
012345678 012345678
0299999999
so the lines above should then look like:
0409999999
0409999999
0455555555

I would suggest pattern
\b\d{0,9}|0[45]\d{8,9}\b
Explanation:
\b - word boundary
\d{0,9} - match up to 9 digits
| - alternation
0 - match 0 literally
[45] - match 4 or 5
Regex demo
EDIT
After update you can use [ \t]*(?!0[45]\d{8,9})\b\d+[ \t]*
The difference here is that it uses negative lookahead to assure that what is ahead is not 10-11 digit number starting with 04 or 05.
[ \t] are used to trim space and tabs.
Then you just need to replace it with empty string.

\b(?!04|05)\d{1,9}\b|\b\d{10,11}\b

Related

regex to extract housenumber plus addition

I'm looking for a regex that matches housenumbers combined with additions for all addresses below:
Breestraat 4
Breestraat 45
Breestraat 456
Dubbele Straat 4a
Dubbele Straat 4-a
5 meistraat 1a
5meistraat 12
5meistraat 12a
Teststraat 22-III
Now the following regex works, except in the first case. This is because the single digit housenummber is missed because of the first \d in the regex (which prevents a starting digit to be captured).
\d?.(\d+.+)$
regex to extract housenumber addition
I'm scratching my head how to get the housenumer '4' for the first line. so basically how to change the "skip starting digit" to "skip starting digit but let it have to result on the capturing group".
You can use
\d+\D*$
\d+\S*$
See the regex demo #1 and regex demo #2.
The pattern matches
\d+ - one or more digits
\D* - zero or more non-digit chars
\S* - zero or more non-whitespace chars
$ - end of string.
It's not perfectly clear what you are requesting precisely..
Anyway this is the pattern matching the house number at the end of the string:
\d+[-\da-zI]*$
https://regexr.com/6l0g7
Anyway I'm aware this is not a valid answer

Regex with wildcard search?

I created a Regex to check a string for the following situation:
first 4 chars are numbers
following by a point
following by 3 numbers
following by a point
following by 4 to 8 numbers or letters
ie: 1234.123.125B
My Regex: ^[0-9]{4}[.][0-9]{3}[.][0-9a-zA-Z]{4,8}$
But now I need a wildcard search: The Regex should also match if there is a '*' after the first 8 characters. For example:
1234.123.12* MATCH
1234.123* MATCH
1234.123.45B9* MATCH
1234.12* NO MATCH
1234.12345* NO MATCH
How can I add the wildcard search to my Regex?
Thank you
You may use this regex with alternation:
^\d{4}\.\d{3}(?:\*|\.[\da-zA-Z]{0,7}\*|\.[\da-zA-Z]{4,8})$
RegEx Demo
RegEx Details:
^: Start
\d{4}\.\d{3}: Match 4 digits + 1 dot + 3 digits
(?:\*|\.[\da-zA-Z]{0,7}\*|\.[\da-zA-Z]{4,8}): matches a single * OR a * after after a dot and 0 to 7 digits/letters OR match 4 to 8 digits/letters
$: End
My assumptions are that:
You don't allow wildcards to be mid-string
Nor do you want to allow wildcards after the full pattern (e.g.: 1234.123.12345678*).
So, alternatively you may possibily use something like:
^\d{4}\.\d{3}(?!.*\*.)(?![^*]{0,4}$)[.*][*\da-zA-Z]{0,8}$
See the online demo.
^ - Start string ancor.
\d{4}\.\d{3} - Four digits, a dot and another three digits.
(?!.*\*.) - Negative lookahead for zero or more characters followed by asterisk and another character other than newline.
(?![^*]{0,4}$) - Negative lookahead for zero to four characters other than asterisk before end string ancor.
[.*] - A literal dot or asterisk.
[*\da-zA-Z]{0,8} - Zero to eight characters from the character class.
$ - End string ancor.

Matching numbers with an optional delimiter in between

So i got this regex code /[0-3][0-9][0-1][1-9]\d{2}[-\s]\d{4}?[^0-9]*|[0-3][0-9][0-1][1-9]\d{2}\d{4}/
This regex code take this kind of numbers:
1002821187
100282 1187
100282-1187
But i found out i dont want the numbers: 1002821187
So is it possible to make 1 regex code that only finds:
100282 1187
100282-1187
Your regex contains an alternation that matches the numbers with and without the space or -. You need to require that space or hyphen:
^[0-3][0-9][0-1][1-9][0-9]{2}[-\s][0-9]{4}$
^^^^^
See the regex demo. If you do not need to check for any boundaries, remove ^ and $ anchors that make the pattern match the whole string and use [0-3][0-9][0-1][1-9][0-9]{2}[-\s][0-9]{4}. Or use word boundaries to find whole words, \b[0-3][0-9][0-1][1-9][0-9]{2}[-\s][0-9]{4}\b.
Details
^ - start of string
[0-3] - a digit from 0 to 3
[0-9] - any digit
[0-1] (=[01]) - 0 or 1
[1-9] - any digit other than 0
[0-9]{2} - 2 digits
[-\s] - a - or whitespace
[0-9]{4} - 4 digits
$ - end of string.
Though you did not specify exactly what you are trying to do, here you can try the following regex :
\b\d{6}-\d{4}\b|\b\d{4}\b|\b\d{6}\b
Hope it helps.

Match all type of numbers

I need regular expression which extracts all numbers with different delimiters (single whitespace, comma, dot). Each number can use none or all of them.
Example:
text: 'numbers: 3.14 2 544 345,345.55 506 test 120 100 100'
output: '3.14', '2 544', '345,345.55', '506', '120 100 100'
I created re: \d+[(.|,|\s)\d+]+, but it not works properly.
I assume the numbers you need to extract are separated with 2 or more whitespaces, else it would be impossible to differentiate between the end of the previous number and the start of a new one.
If you need to extract the numbers in the formats as shown above, XXX XXX.XXX or XXX,XXX,XXX.XX or XXX or XXX XXX XXX, you may use
\b\d{1,3}(?:[, ]\d{3})*(?:\.\d+)?\b
See the regex demo
Details:
\b - leading word boundary
\d{1,3} - 1 to 3 digits
(?:[, ]\d{3})* - 0+ sequences of a comma or space ([, ]) and 3 digits (\d{3})
(?:\.\d+)? - an optional sequence of a dot followed with 1+ digits
\b - trailing word boundary
A less restrictive pattern would be the same as above, but with limiting quantifiers replaced with a +:
\b\d+(?:[, ]\d+)*(?:\.\d+)?\b
See this regex demo
It will also match numbers like 1234566 and 124354354.343344.

RegEx with counting digits and allow special chars

I've done some searching but cant find the right regex.
i would like a regex for a text that only contains digits, whitespaces and plus signs.
like: [0-9 +]
But with a min/max limit for only the digits in that text.
My suggestions ended up with something like this:
^[0-9 \+](?=(.*[0-9]){5,8})$
Should be OK:
"123 456 7"
"12345"
"+ 123 456 78"
Should not be ok:
"123456789"
"+ 124 578a"
"+123456789"
Anyone got a solution that might do the trick?
Edit:
I can see that i was to short on my explanation what i'm aiming for.
My regex conditions should be:
Must include between 5-8 digits
Allow whitespaces and plus signs
I'm guessing from your own regex that between 5 and 8 digits in a row without a whitespace in between are allowed. If that's true, than the following regex might do the trick (example written in Python). It allows single digit groups being between 5 and 8 digits long. If there is more than one group, it allows each group to have exactly 3 digits except for the last group which can be between 1 and 3 digits long. One single plus sign on the left is optional.
Are you parsing phone numbers? :)
In [176]: regex = re.compile(r"""
^ # start of string
(?: \+\s )? # optional plus sign followed by whitespace
(?:
(?: \d{3}\s )+ # one or more groups of three digits followed by whitespace
\d{1,3} # one group of between one and three digits
| # ALTERNATIVE
\d{5,8} # one group of between five and eight digits
)
$ # end of string
""", flags=re.X)
# --- MATCHES ---
In [177]: regex.findall('123 456 7')
Out[177]: ['123 456 7']
In [178]: regex.findall('12345')
Out[178]: ['12345']
In [179]: regex.findall('+ 123 456 78')
Out[179]: ['+ 123 456 78']
In [200]: regex.findall('12345678')
Out[200]: ['12345678']
# --- NON-MATCHES ---
In [180]: regex.findall('123456789')
Out[180]: []
In [181]: regex.findall('+ 124 578a')
Out[181]: []
In [182]: regex.findall('+123456789')
Out[182]: []
In [198]: regex.findall('123')
Out[198]: []
In [24]: regex.findall('1234 556')
Out[24]: []
You can do something like this:
^(?:[ +]*[0-9]){5}(?:(?:[ +]*[0-9])?){3}$
See it here on Regexr
The first group (?:[ +]*[0-9]){5} are the 5 minimum digits, with any amount of spaces and plus before, the second part (?:(?:[ +]*[0-9])?){3} matches the optional digits, with any amount of spaces and plus before.
You were very close - you need to anchor the lookahead to the start of input, and add a second negative lookahead for the upper bound of the quantity of digits:
^(?=(.*\d){5,8})(?!(.*\d){9,})[\d +]+$
Also, fyi you don't need to escape the plus sign within the character class, and [0-9] is \d