Match number ending in 1 except when ending in 11 - regex

I need to match any number ending in 1 except numbers ending in 11. I use awk. To illustrate, the correctly working lines are:
if ( max ~ /1$/ && max !~ /11$/ ) { print max }
or using regex:
if ( max ~ /[^1]1$|^1$/ ) { print max }
or a much slower variant of the same regex:
([^1]|^)1$
I actualy suspect just this one part (with a modification) should work somehow. It is nice and short and readable, does the job in far less steps than the above combos, works for all numbers with 2 digits of more, but fails for 1 itself. Which I fixed above, but would prefer a better one (if there is). I actually need it to work for 1 to 3 digit numbers, but would prefer to not limiting it.
[^1]1$
As soon as I try quantifers to fix it, it fails to work correctly. It either starts picking leading 1s (e.g. 1211 is matched and it should not) or loose a single digit number 1 as a match. Obviously, my problem is lying in the fact I must match the end of the number. How to make a better regex?
Test cases:
Matching numbers are:
1
21
31
121
131
1021
skip (not match) numbers ending in 11 like:
11
111
211
1011
1211

Can't you just do, I believe it is quicker than a regex parsing:
If you know max is a number:
if ( max%10 == 1 && max%100 != 11 ) { print max }
If you do not know max is a number:
if ( max+0==max && max%10 == 1 && max%100 != 11 ) { print max }
If you want a regex, you can use ^[0-9]*[02-9]1$|^1$ but this is just an extension of RavinderSingh13's answer to make sure it is a number.

If your Input_file is same as shown sample then following awk may help you here.
awk '/[02-9]1$/||/^1$/' Input_file
Let's say following is the sample Input_file.
cat Input_file
1
2001
21
31
121
131
1021
11
111
211
1011
1211
Then following will be output after running the code.
awk '/[02-9]1$/||/^1$/' Input_file
1
2001
21
31
121
131
1021

Related

How to preg_replace all digits except specific numbers?

It is necessary to clean a string from everything but English letters, spaces and specific numbers (eg 18,19,20 should be kept in the string).
Please help me with regex /([^a-zA-Z\s])/ to keep the specified numbers.
You can list the numbers that you want to keep between word boundaries for example and then make use of SKIP FAIL:
\b(?:1[89]|20)\b(*SKIP)(*F)|[^a-zA-Z\s]+
Rgex demo
$pattern = "/\b(?:1[89]|20)\b(*SKIP)(*F)|[^a-zA-Z\s]+/";
$s="test 18 119 19 50 20 ##$##%";
echo preg_replace($pattern, "", $s);
Output
test 18 19 20
Using the PREG_SPLIT_DELIM_CAPTURE option with preg_split:
$s="test 18 119 19 50 20 ##$##%";
echo implode('', preg_split('~\b(1[89]|20)\b|[^a-z\s]+~', $s, -1, PREG_SPLIT_DELIM_CAPTURE));

Match phone numbers with lengths between 8-16 digits, ignoring ()+-

Consider the following:
+12 34 456 432
(12) 34 567 124
1234 56 78 90
(1234) 567 890
1234-567-890
1234 - 567 - 890
12 34 56 78
12-34-56-78
Assume these are all valid phone number structures
Can a regex be used to express: find at least 8 numbers,but not more than 16 and ignore spaces, round brackets, the plus symbol(once) and the minus.
My current working sample is a mess:
^([\+|\(]{1,2})?+(\d{2,4})+([ |-|\)]{1,2})?+(\d{2,3})+([ |-]{1})?+(\d{2,3})+([ |-]{1})?+(\d{2,3})?$
Even if phone number validation is recommended against. Is there not a simpler regex syntax for these things?
To just account for the number of digits and ingore the -, ), ( or spaces (allowing a + at the beginning), you can use the following regex:
^\+?(?:[ ()-]*\d){8,16}$
It matches
^ - start of string
\+? - one or zero +
(?:[ ()-]*\d){8,16} - 8 to 16 sequences of...
[ ()-]* - 0 or more -, ), ( or a space characters
\d - a digit
$ - end of string
See the regex demo
This may ease your task.
First, remove everything that is not a number:
myString = myString.replace(/\D/g,'');
You'll get this:
1234456432
1234567124
1234567890
1234567890
1234567890
1234567890
12345678
12345678
Then just check for length:
if(myString.length >= 0 && myString.length <=16)
// Do stuff
Using preg_replace fetch numbers only, check for the valid length
<?php
$ph = "(12) 34 567 124";
$len = strlen(preg_replace('/[^0-9]+/', '', $ph));
if($len >=8 && $len <=16)
echo "Valid";
else
echo "Invalid";
Don't even think about it. Phone numbers are complicated. They are hugely complicated. Google has a decent library to handle phone numbers named libPhoneNumber.
And excuse me, but ignoring the "+" makes whatever you are doing totally, absolutely wrong. A plus is followed by the country code of some country, followed by a local phone number within that country (which needs to be parsed according to the rules of that country, and there are about 200). Without the "+", you have a phone number according to the local rules, and you need to find out which local rules apply. Which means your number can start with a code for dialing a foreign exchange instead of the "+", otherwise it is formatted according to local rules.
As a result, a number may be valid with the "+" and invalid without it or vice versa, and most likely refers to a different actual phone in totally different countries with or without the "+".

regex - if the first digit is 1 return 1 but if it is 145 return 145 but if its 133 return 133

here is my regex demo
as the question states:
if the first digit is 1 return 1 but if it is 145 return 145 but if its 133 return 133
sample dataa:
K'8134567
K'81345678
K'6134516789
K'61345678
K'643456
K'646345678
K'1234567890
K'12345678901
K'1454567890 <<<--- want 145 returned and not 1
K'13345678901 <<<--- want 133 returned and not 1
K'3214567890123
K'32134567890123
K'3654567890123
K'8934567890123
K'6554567890123
regex exprtession:
K'(?|(?P<name1>81)\d+|(61)\d+|(64)\d+|(1)\d+|(44)\d+|(86)\d+|(678)\d+|(41)\d+|(49)\d+|(33)\d+|(685)\d+|(\d{1,3})\d+)
the regex explained:
I am interested in the digits after K'
I am looking to do this using regex but not sure if it can be done.
What I want is:
if the number starts with 81 return 81
if the number starts with 61 return 61
...
if the number starts with something i am not interested in return other(or its first digits of 1-3)
The above criteria works:
but my question is how do I do the following:
if the fist digit is 1 then return 1 BUT
if the fist digit is 1 and the 2nd and 3rd digit are 45 return 145 and don't return just 1
if the fist digit is 1 and the 2nd and 3rd digit are 33 return 133 and don't return just 1
I presume I have to put something inside this part of the regex |(1)\d+|
Som other questions for my own reference:
Does regex sort the data first?
Is the order of the regex search important to how it is implemented? i deally I do not want this.
You can use this regex:
K'(?P<name1>81|61|64|44|86|678|41|49|33|685|1(?:33|45)?|\d{2,3})\d+
Updated RegEx Demo
Try with:
K'(?|(?P<name1>81)\d+|(61)\d+|(64)\d+|(1(?:45|33)?)\d+|(44)\d+|(86)\d+|(678)\d+|(41)\d+|(49)\d+|(33)\d+|(685)\d+|(\d{1,3})\d+)
DEMO
regex doesn't sorts anything but the order of your regex is important, actually based on your regex engine it would be a bit different but since most of regex engines use Traditional NFA for parsing string the order is important.
And in this case you can simply us following regex or add it to your regex :
(?<=K')1(?:45|33)?
See demo https://regex101.com/r/rT2yJ0/1

Regular expression for matching numbers and ranges of numbers

In an application I have the need to validate a string entered by the user.
One number
OR
a range (two numbers separated by a '-')
OR
a list of comma separated numbers and/or ranges
AND
any number must be between 1 and 999999.
A space is allowed before and after a comma and or '-'.
I thought the following regular expression would do it.
(\d{1,6}\040?(,|-)?\040?){1,}
This matches the following (which is excellent). (\040 in the regular expression is the character for space).
00001
12
20,21,22
100-200
1,2-9,11-12
20, 21, 22
100 - 200
1, 2 - 9, 11 - 12
However, I also get a match on:
!!!12
What am I missing here?
You need to anchor your regex
^(\d{1,6}\040?(,|-)?\040?){1,}$
otherwise you will get a partial match on "!!!12", it matches only on the last digits.
See it here on Regexr
/\d*[-]?\d*/
i have tested this with perl:
> cat temp
00001
12
20,21,22
100-200
1,2-9,11-12
20, 21, 22
100-200
1, 2-9, 11-12
> perl -lne 'push #a,/\d*[-]?\d*/g;END{print "#a"}' temp
00001 12 20 21 22 100-200 1 2-9 11-12 20 21 22 100-200 1 2-9 11-12
As the result above shows putting all the regex matches in an array and finally printing the array elements.

Is it possible to increment numbers using regex substitution?

Is it possible to increment numbers using regex substitution? Not using evaluated/function-based substitution, of course.
This question was inspired by another one, where the asker wanted to increment numbers in a text editor. There are probably more text editors that support regex substitution than ones that support full-on scripting, so a regex might be convenient to float around, if one exists.
Also, often I've learned neat things from clever solutions to practically useless problems, so I'm curious.
Assume we're only talking about non-negative decimal integers, i.e. \d+.
Is it possible in a single substitution? Or, a finite number of substitutions?
If not, is it at least possible given an upper bound, e.g. numbers up to 9999?
Of course it's doable given a while-loop (substituting while matched), but we're going for a loopless solution here.
This question's topic amused me for one particular implementation I did earlier. My solution happens to be two substitutions so I'll post it.
My implementation environment is solaris, full example:
echo "0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909" |
perl -pe 's/\b([0-9]+)\b/0$1~01234567890/g' |
perl -pe 's/\b0(?!9*~)|([0-9])(?=9*~[0-9]*?\1([0-9]))|~[0-9]*/$2/g'
1 2 3 4 8 9 10 11 20 100 110 200 910 1000 1100 1910
Pulling it apart for explanation:
s/\b([0-9]+)\b/0$1~01234567890/g
For each number (#) replace it with 0#~01234567890. The first 0 is in case rounding 9 to 10 is needed. The 01234567890 block is for incrementing. The example text for "9 10" is:
09~01234567890 010~01234567890
The individual pieces of the next regex can be described seperately, they are joined via pipes to reduce substitution count:
s/\b0(?!9*~)/$2/g
Select the "0" digit in front of all numbers that do not need rounding and discard it.
s/([0-9])(?=9*~[0-9]*?\1([0-9]))/$2/g
(?=) is positive lookahead, \1 is match group #1. So this means match all digits that are followed by 9s until the '~' mark then go to the lookup table and find the digit following this number. Replace with the next digit in the lookup table. Thus "09~" becomes "19~" then "10~" as the regex engine parses the number.
s/~[0-9]*/$2/g
This regex deletes the ~ lookup table.
Wow, turns out it is possible (albeit ugly)!
In case you do not have the time or cannot be bothered to read through the whole explanation, here is the code that does it:
$str = '0 1 2 3 4 5 6 7 8 9 10 11 12 13 19 20 29 99 100 139';
$str = preg_replace("/\d+/", "$0~", $str);
$str = preg_replace("/$/", "#123456789~0", $str);
do
{
$str = preg_replace(
"/(?|0~(.*#.*(1))|1~(.*#.*(2))|2~(.*#.*(3))|3~(.*#.*(4))|4~(.*#.*(5))|5~(.*#.*(6))|6~(.*#.*(7))|7~(.*#.*(8))|8~(.*#.*(9))|9~(.*#.*(~0))|~(.*#.*(1)))/s",
"$2$1",
$str, -1, $count);
} while($count);
$str = preg_replace("/#123456789~0$/", "", $str);
echo $str;
Now let's get started.
So first of all, as the others mentioned, it is not possible in a single replacement, even if you loop it (because how would you insert the corresponding increment to a single digit). But if you prepare the string first, there is a single replacement that can be looped. Here is my demo implementation using PHP.
I used this test string:
$str = '0 1 2 3 4 5 6 7 8 9 10 11 12 13 19 20 29 99 100 139';
First of all, let's mark all digits we want to increment by appending a marker character (I use ~, but you should probably use some crazy Unicode character or ASCII character sequence that definitely will not occur in your target string.
$str = preg_replace("/\d+/", "$0~", $str);
Since we will be replacing one digit per number at a time (from right to left), we will just add that marking character after every full number.
Now here comes the main hack. We add a little 'lookup' to the end of our string (also delimited with a unique character that does not occur in your string; for simplicity I used #).
$str = preg_replace("/$/", "#123456789~0", $str);
We will use this to replace digits by their corresponding successors.
Now comes the loop:
do
{
$str = preg_replace(
"/(?|0~(.*#.*(1))|1~(.*#.*(2))|2~(.*#.*(3))|3~(.*#.*(4))|4~(.*#.*(5))|5~(.*#.*(6))|6~(.*#.*(7))|7~(.*#.*(8))|8~(.*#.*(9))|9~(.*#.*(~0))|(?<!\d)~(.*#.*(1)))/s",
"$2$1",
$str, -1, $count);
} while($count);
Okay, what is going on? The matching pattern has one alternative for every possible digit. This maps digits to successors. Take the first alternative for example:
0~(.*#.*(1))
This will match any 0 followed by our increment marker ~, then it matches everything up to our cheat-delimiter and the corresponding successor (that is why we put every digit there). If you glance at the replacement, this will get replaced by $2$1 (which will then be 1 and then everything we matched after the ~ to put it back in place). Note that we drop the ~ in the process. Incrementing a digit from 0 to 1 is enough. The number was successfully incremented, there is no carry-over.
The next 8 alternatives are exactly the same for the digits 1to 8. Then we take care of two special cases.
9~(.*#.*(~0))
When we replace the 9, we do not drop the increment marker, but place it to the left of our the resulting 0 instead. This (combined with the surrounding loop) is enough to implement carry-over propagation. Now there is one special case left. For all numbers consisting solely of 9s we will end up with the ~ in front of the number. That is what the last alternative is for:
(?<!\d)~(.*#.*(1))
If we encounter a ~ that is not preceded by a digit (therefore the negative lookbehind), it must have been carried all the way through a number, and thus we simply replace it with a 1. I think we do not even need the negative lookbehind (because this is the last alternative that is checked), but it feels safer this way.
A short note on the (?|...) around the whole pattern. This makes sure that we always find the two matches of an alternative in the same references $1 and $2 (instead of ever larger numbers down the string).
Lastly, we add the DOTALL modifier (s), to make this work with strings that contain line breaks (otherwise, only numbers in the last line will be incremented).
That makes for a fairly simple replacement string. We simply first write $2 (in which we captured the successor, and possibly the carry-over marker), and then we put everything else we matched back in place with $1.
That's it! We just need to remove our hack from the end of the string, and we're done:
$str = preg_replace("/#123456789~0$/", "", $str);
echo $str;
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 30 100 101 140
So we can do this entirely in regular expressions. And the only loop we have always uses the same regex. I believe this is as close as we can get without using preg_replace_callback().
Of course, this will do horrible things if we have numbers with decimal points in our string. But that could probably be taken care of by the very first preparation-replacement.
Update: I just realised, that this approach immediately extends to arbitrary increments (not just +1). Simply change the first replacement. The number of ~ you append equals the increment you apply to all numbers. So
$str = preg_replace("/\d+/", "$0~~~", $str);
would increment every integer in the string by 3.
I managed to get it working in 3 substitutions (no loops).
tl;dr
s/$/ ~0123456789/
s/(?=\d)(?:([0-8])(?=.*\1(\d)\d*$)|(?=.*(1)))(?:(9+)(?=.*(~))|)(?!\d)/$2$3$4$5/g
s/9(?=9*~)(?=.*(0))|~| ~0123456789$/$1/g
Explanation
Let ~ be a special character not expected to appear anywhere in the text.
If a character is nowhere to be found in the text, then there's no way to make it appear magically. So first we insert the characters we care about at the very end.
s/$/ ~0123456789/
For example,
0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909
becomes:
0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909 ~0123456789
Next, for each number, we (1) increment the last non-9 (or prepend a 1 if all are 9s), and (2) "mark" each trailing group of 9s.
s/(?=\d)(?:([0-8])(?=.*\1(\d)\d*$)|(?=.*(1)))(?:(9+)(?=.*(~))|)(?!\d)/$2$3$4$5/g
For example, our example becomes:
1 2 3 4 8 9 19~ 11 29~ 199~ 119~ 299~ 919~ 1999~ 1199~ 1919~ ~0123456789
Finally, we (1) replace each "marked" group of 9s with 0s, (2) remove the ~s, and (3) remove the character set at the end.
s/9(?=9*~)(?=.*(0))|~| ~0123456789$/$1/g
For example, our example becomes:
1 2 3 4 8 9 10 11 20 100 110 200 910 1000 1100 1910
PHP Example
$str = '0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909';
echo $str . '<br/>';
$str = preg_replace('/$/', ' ~0123456789', $str);
echo $str . '<br/>';
$str = preg_replace('/(?=\d)(?:([0-8])(?=.*\1(\d)\d*$)|(?=.*(1)))(?:(9+)(?=.*(~))|)(?!\d)/', '$2$3$4$5', $str);
echo $str . '<br/>';
$str = preg_replace('/9(?=9*~)(?=.*(0))|~| ~0123456789$/', '$1', $str);
echo $str . '<br/>';
Output:
0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909
0 1 2 3 7 8 9 10 19 99 109 199 909 999 1099 1909 ~0123456789
1 2 3 4 8 9 19~ 11 29~ 199~ 119~ 299~ 919~ 1999~ 1199~ 1919~ ~0123456789
1 2 3 4 8 9 10 11 20 100 110 200 910 1000 1100 1910
Is it possible in a single substitution?
No.
If not, is it at least possible in a single substitution given an upper bound, e.g. numbers up to 9999?
No.
You can't even replace the numbers between 0 and 8 with their respective successor. Once you have matched, and grouped this number:
/([0-8])/
you need to replace it. However, regex doesn't operate on numbers, but on strings. So you can replace the "number" (or better: digit) with twice this digit, but the regex engine does not know it is duplicating a string that holds a numerical value.
Even if you'd do something (silly) as this:
/(0)|(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)/
so that the regex engine "knows" that if group 1 is matched, the digit '0' is matched, it still cannot do a replacement. You can't instruct the regex engine to replace group 1 with the digit '1', group '2' with the digit '2', etc. Sure, some tools like PHP will let you define a couple of different patterns with corresponding replacement strings, but I get the impression that is not what you were thinking about.
It is not possible by regular expression search and substitution alone.
You have to use use something else to help achieve that. You have to use the programming language at hand to increment the number.
Edit:
The regular expressions definition, as part of Single Unix Specification doesn't mention regular expressions supporting evaluation of aritmethic expressions or capabilities for performing aritmethic operations.
Nonetheless, I know some flavors ( TextPad, editor for Windows) allows you to use \i as a substitution term which is an incremental counter of how many times has the search string been found, but it doesn't evaluate or parse found strings into a number nor does it allow to add a number to it.
I have found a solution in two steps (Javascript) but it relies on indefinite lookaheads, which some regex engines reject:
const incrementAll = s =>
s.replaceAll(/(.+)/gm, "$1\n101234567890")
.replaceAll(/(?:([0-8]|(?<=\d)9)(?=9*[^\d])(?=.*\n\d*\1(\d)\d*$))|(?<!\d)9(?=9*[^\d])(?=(?:.|\n)*(10))|\n101234567890$/gm, "$2$3");
The key thing is to add a list of numbers in order at the end of the string in the first step, and in the second, to find the location relevant digit and capture the digit to its right via a lookahead. There are two other branches in the second step, one for dealing with initial nines, and the other for removing the number sequence.
Edit: I just tested it in safari and it throws an error, but it definately works in firefox.
I needed to increment indices of output files by one from a pipeline I can't modify. After some searches I got a hit on this page. While the readings are meaningful, they really don't give a readable solution to the problem. Yes it is possible to do it with only regex; no it is not as comprehensible.
Here I would like to give a readable solution using Python, so that others don't need to reinvent the wheels. I can imagine many of you may have ended up with a similar solution.
The idea is to partition file name into three groups, and format your match string so that the incremented index is the middle group. Then it is possible to only increment the middle group, after which we piece the three groups together again.
import re
import sys
import argparse
from os import listdir
from os.path import isfile, join
def main():
parser = argparse.ArgumentParser(description='index shift of input')
parser.add_argument('-r', '--regex', type=str,
help='regex match string for the index to be shift')
parser.add_argument('-i', '--indir', type=str,
help='input directory')
parser.add_argument('-o', '--outdir', type=str,
help='output directory')
args = parser.parse_args()
# parse input regex string
regex_str = args.regex
regex = re.compile(regex_str)
# target directories
indir = args.indir
outdir = args.outdir
try:
for input_fname in listdir(indir):
input_fpath = join(indir, input_fname)
if not isfile(input_fpath): # not a file
continue
matched = regex.match(input_fname)
if matched is None: # not our target file
continue
# middle group is the index and we increment it
index = int(matched.group(2)) + 1
# reconstruct output
output_fname = '{prev}{index}{after}'.format(**{
'prev' : matched.group(1),
'index' : str(index),
'after' : matched.group(3)
})
output_fpath = join(outdir, output_fname)
# write the command required to stdout
print('mv {i} {o}'.format(i=input_fpath, o=output_fpath))
except BrokenPipeError:
pass
if __name__ == '__main__': main()
I have this script named index_shift.py. To give an example of the usage, my files are named k0_run0.csv, for bootstrap runs of machine learning models using parameter k. The parameter k starts from zero, and the desired index map starts at one. First we prepare input and output directories to avoid overriding files
$ ls -1 test_in/ | head -n 5
k0_run0.csv
k0_run10.csv
k0_run11.csv
k0_run12.csv
k0_run13.csv
$ ls -1 test_out/
To see how the script works, just print its output:
$ python3 -u index_shift.py -r '(^k)(\d+?)(_run.+)' -i test_in -o test_out | head -n5
mv test_in/k6_run26.csv test_out/k7_run26.csv
mv test_in/k25_run11.csv test_out/k26_run11.csv
mv test_in/k7_run14.csv test_out/k8_run14.csv
mv test_in/k4_run25.csv test_out/k5_run25.csv
mv test_in/k1_run28.csv test_out/k2_run28.csv
It generates bash mv command to rename the files. Now we pipe the lines directly into bash.
$ python3 -u index_shift.py -r '(^k)(\d+?)(_run.+)' -i test_in -o test_out | bash
Checking the output, we have successfully shifted the index by one.
$ ls test_out/k0_run0.csv
ls: cannot access 'test_out/k0_run0.csv': No such file or directory
$ ls test_out/k1_run0.csv
test_out/k1_run0.csv
You can also use cp instead of mv. My files are kinda big, so I wanted to avoid duplicating them. You can also refactor how many you shift as input argument. I didn't bother, cause shift by one is most of my use cases.