How to preg_replace all digits except specific numbers? - regex

It is necessary to clean a string from everything but English letters, spaces and specific numbers (eg 18,19,20 should be kept in the string).
Please help me with regex /([^a-zA-Z\s])/ to keep the specified numbers.

You can list the numbers that you want to keep between word boundaries for example and then make use of SKIP FAIL:
\b(?:1[89]|20)\b(*SKIP)(*F)|[^a-zA-Z\s]+
Rgex demo
$pattern = "/\b(?:1[89]|20)\b(*SKIP)(*F)|[^a-zA-Z\s]+/";
$s="test 18 119 19 50 20 ##$##%";
echo preg_replace($pattern, "", $s);
Output
test 18 19 20

Using the PREG_SPLIT_DELIM_CAPTURE option with preg_split:
$s="test 18 119 19 50 20 ##$##%";
echo implode('', preg_split('~\b(1[89]|20)\b|[^a-z\s]+~', $s, -1, PREG_SPLIT_DELIM_CAPTURE));

Related

Regex match multiple numbers stop at string (word) despite more matches exist

Goal;
Match all variations of phone numbers with 8 digits + (optional) country code.
Stop match when "keyword" is found, even if more matches exist after the "keyword".
Need this in a one-liner and have tried a plethora of variations with lookahead/behind and negate [^keyword] but I am unable to understand how to achieve this.
Example of text;
abra 90998855
kadabra 04 94 84 54
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Ladida
keyword
I Want It To Stop Matching Here Or Right Before The "keyword"
more nice text with some matches
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Example of regex;
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})[^keyword]
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?!keyword)
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?=keyword)
-> This matches nothing
((\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?:(?!keyword))*)
-> This matches all numbers also below the keyword

Match number ending in 1 except when ending in 11

I need to match any number ending in 1 except numbers ending in 11. I use awk. To illustrate, the correctly working lines are:
if ( max ~ /1$/ && max !~ /11$/ ) { print max }
or using regex:
if ( max ~ /[^1]1$|^1$/ ) { print max }
or a much slower variant of the same regex:
([^1]|^)1$
I actualy suspect just this one part (with a modification) should work somehow. It is nice and short and readable, does the job in far less steps than the above combos, works for all numbers with 2 digits of more, but fails for 1 itself. Which I fixed above, but would prefer a better one (if there is). I actually need it to work for 1 to 3 digit numbers, but would prefer to not limiting it.
[^1]1$
As soon as I try quantifers to fix it, it fails to work correctly. It either starts picking leading 1s (e.g. 1211 is matched and it should not) or loose a single digit number 1 as a match. Obviously, my problem is lying in the fact I must match the end of the number. How to make a better regex?
Test cases:
Matching numbers are:
1
21
31
121
131
1021
skip (not match) numbers ending in 11 like:
11
111
211
1011
1211
Can't you just do, I believe it is quicker than a regex parsing:
If you know max is a number:
if ( max%10 == 1 && max%100 != 11 ) { print max }
If you do not know max is a number:
if ( max+0==max && max%10 == 1 && max%100 != 11 ) { print max }
If you want a regex, you can use ^[0-9]*[02-9]1$|^1$ but this is just an extension of RavinderSingh13's answer to make sure it is a number.
If your Input_file is same as shown sample then following awk may help you here.
awk '/[02-9]1$/||/^1$/' Input_file
Let's say following is the sample Input_file.
cat Input_file
1
2001
21
31
121
131
1021
11
111
211
1011
1211
Then following will be output after running the code.
awk '/[02-9]1$/||/^1$/' Input_file
1
2001
21
31
121
131
1021

Regex split on white space except ones with a colon

I have the following string
s = "hiack: 18 seqno: 37 cwnd: 20.000 ssthresh: 200 dupacks: 0"
I would like to use a regex to split that line up so it becomes
s = ['hiack: 18' 'seqno: 37' 'cwnd: 20.000' 'ssthresh: 200' 'dupacks: 0']
What regex pattern should I use to achive this?
Edit: I am using python incase that makes a difference
% nodejs
> s = "hiack: 18 seqno: 37 cwnd: 20.000 ssthresh: 200 dupacks: 0"
'hiack: 18 seqno: 37 cwnd: 20.000 ssthresh: 200 dupacks: 0'
> s.split(/\b\s+\b/)
[ 'hiack: 18',
'seqno: 37',
'cwnd: 20.000',
'ssthresh: 200',
'dupacks: 0' ]
>
Even if you use anything else than JS, you can pick up the regex.
It works using \b aka word boundaries

Perl: difficulties with regular expression

I have problem with making regular expression on perl, may be someone can help me.
Input strings:
bss/216476/29/52/9___\000243477___agt-1319.jpg
bss/216476/29/52/9___\000243477___agt-1319_1.jpg
bss/216476/29/52/9___\000243477___agt-1319_2.jpg
What i expect to get:
29 52 9 1319 or 29 52 9 1319 0
29 52 9 1319 1
29 52 9 1319 2
My Regex works only with last 2 strings:
/\/(\d{2})\/(\d{2})\/(\d+).*-(\d+)_(\d{1})/
As you can see in first line there is no picture number such _0.jpg and here is a problem.
I had tried to make regex like
/\/(\d{2})\/(\d{2})\/(\d+).*-((\d+)_(\d{1}))|(\d+)/
but looks like i'm wrong.
Thank you for help.
Use a non capturing group (?:...) and a ? to make it optional:
/\/(\d{2})\/(\d{2})\/(\d+).*-(\d+)(?:_(\d{1}))?/
It also can clean up your regex some if you use a different delimiter in cases where you need to include a slash. Additionally, you can use the /x modifier so you can include spacing for readability:
use strict;
use warnings;
while (<DATA>) {
if (m{ / (\d{2}) / (\d{2}) / (\d+) .*- (\d+) (?:_(\d{1}))? }x) {
print join(" ", map {$_//''} ($1, $2, $3, $4, $5)), "\n";
}
}
__DATA__
bss/216476/29/52/9___\000243477___agt-1319.jpg
bss/216476/29/52/9___\000243477___agt-1319_1.jpg
bss/216476/29/52/9___\000243477___agt-1319_2.jpg
Outputs:
29 52 9 1319
29 52 9 1319 1
29 52 9 1319 2

Regular expression for matching numbers and ranges of numbers

In an application I have the need to validate a string entered by the user.
One number
OR
a range (two numbers separated by a '-')
OR
a list of comma separated numbers and/or ranges
AND
any number must be between 1 and 999999.
A space is allowed before and after a comma and or '-'.
I thought the following regular expression would do it.
(\d{1,6}\040?(,|-)?\040?){1,}
This matches the following (which is excellent). (\040 in the regular expression is the character for space).
00001
12
20,21,22
100-200
1,2-9,11-12
20, 21, 22
100 - 200
1, 2 - 9, 11 - 12
However, I also get a match on:
!!!12
What am I missing here?
You need to anchor your regex
^(\d{1,6}\040?(,|-)?\040?){1,}$
otherwise you will get a partial match on "!!!12", it matches only on the last digits.
See it here on Regexr
/\d*[-]?\d*/
i have tested this with perl:
> cat temp
00001
12
20,21,22
100-200
1,2-9,11-12
20, 21, 22
100-200
1, 2-9, 11-12
> perl -lne 'push #a,/\d*[-]?\d*/g;END{print "#a"}' temp
00001 12 20 21 22 100-200 1 2-9 11-12 20 21 22 100-200 1 2-9 11-12
As the result above shows putting all the regex matches in an array and finally printing the array elements.