Bash regular expressions: Matching numbers 0-1000 [duplicate] - regex
This question already has answers here:
Regular expressions in a Bash case statement
(7 answers)
Closed 6 years ago.
I'm writing a script that executes scripts stored in a given directory, based on an array containing the filenames of the scripts.
Here's a section of my 'menu', just to clarify:
#######
Title: Test script 1
Description: Test script 1
To execute: 0
#######
Title: Test script 2
Description: Test script 2
To execute: 1
#######
I have an array named array that contains the names of the scripts with an index corresponding to the printed value under "to execute". Right now, I'm using a case statement to handle input and provide an exit option.
case $REPLY in
[Ee]) clear
exit;;
[[:digit:]] $scriptDirectory/${array[$REPLY]}
However, the [[:digit:]] expression only matches 0-9. I need a regex that works in the case statement that matches 0-999, or similar.
case only uses globs (aka filename expansion patterns), not regular expressions.
You can set extended glob with shopt -s extglob, then you can use +() to match one or more occurrence :
shopt -s extglob
case $REPLY in
[Ee]) clear
exit;;
+([[:digit:]])) $scriptDirectory/${array[$REPLY]};;
esac
Note : I added missing ) after your second case pattern and missing ;; at the end of the same line. Also added the esac missing statement.
Update :
If you just want to match numbers between 0 and 999, try this :
[0-9]?([0-9])?([0-9])) $scriptDirectory/${array[$REPLY]};;
Character range are used here as I find it a bit more readable. The result will be the same.
The easiest way I've found is bash is:
^(1000|[0-9]{1,3})$
Using this regex combined with the =~ operator (which interprets the string to the right as an extended regular expression) you can construct a simple test. (with your input as "$1")
if [[ $1 =~ ^(1000|[0-9]{1,3})$ ]]; then
echo "0 <= $1 <= 1000 (1)"
else
echo "$1 - invalid number"
fi
Example Use/Output
$ for i in -10 -1 0 1 10 100 999 1000 1001; do ./zero1thousand.sh $i; done
-10 - invalid number
-1 - invalid number
0 <= 0 <= 1000
0 <= 1 <= 1000
0 <= 10 <= 1000
0 <= 100 <= 1000
0 <= 999 <= 1000
0 <= 1000 <= 1000
1001 - invalid number
In this simple case I suggest you use something like below :
REPLY="$1" # I assumed there is an argument to the script
if [[ $1 =~ ^[[:digit:]]+$ ]]
then
padded_REPLY=$(printf "%04d" "$REPLY")
#echo "Padded reply : $padded_REPLY"
else
padded_REPLY="$REPLY"
echo "Padded reply : $padded_REPLY"
fi
regexp1="^[[:digit:]]{4}$" #checks padded_REPLY is in the range 0000 to 1000
#the input is padded
regexp2="[eE]+$"
if [[ "$padded_REPLY" =~ $regexp1 ]]
then
if [ "$REPLY" -le 1000 ] #checking script exist
then
echo "$REPLY" # I just echoed, you do the stuff below
#"$scriptDirectory/${array[$REPLY]}"
else
echo "Scripts are numbered from 0 to 1000"
fi
elif [[ "$padded_REPLY" =~ $regexp2 ]]
then
clear
exit
fi
But getopts is suggested to for smarter argument management.
Give a try to this:
case $REPLY in
[Ee])
clear
exit 0;;
[[:digit:]]|[[:digit:]][[:digit:]]|[[:digit:]][[:digit:]][[:digit:]]|1000)
$scriptDirectory/${array[$REPLY]}
exit 0;;
esac
The involved pattern list matches 1 digit or 2 digits or 3 digits or 1000.
The pattern used with case is described in Pattern Matching Notation from the Open Group.
Please note that this is not a regular expression.
There is at least one thing taken from Regular Expression; this is the [] (RE Bracket Expression). It is used to match a single char. So, [[:digit:]] is valid. It matches a single char which could be any digit.
To match several digits, it is needed to concatenate several patterns matching a single character, e.g. use [[:digit:]][[:digit:]] to match 2 digits.
| can be used to match more than one pattern. To match a number between 0 and 99, i.e. one digit or 2 digits numbers, use [[:digit:]]|[[:digit:]][[:digit:]]
Related
Bash regex overwrite line if multiple match
I have a bash script where I have 3 regular expressions. I would like to, through conditional if, to find the match of the first pattern in the file. If there is a match, then look for a match in the second pattern but only with the lines that have matched the first pattern. Finally, to check the third pattern only with the lines that have matched the second pattern (which are also the ones that had already matched the first pattern). I have the following code but I don't know how to tell that if there is a match to overwrite the "line" value to decrease the number of total lines to only the ones matching. #!/bin/bash pattern1= egrep '^([^,]*,){31}[1-9][0-9].*' pattern2= egrep '^([^,]*,){16}[0-1].[3-9].*' pattern3= egrep '^([^,]*,){32}[2-9][0-9].*' while read line do if [[$line == $pattern1]];then newline == $pattern1 if [[$newline == $pattern2 ]];then newline2 == $pattern2 if [[$newline2 == $pattern3 ]]; then echo $pattern3 fi done < mj1.csv #this is the input file I will call this script like ./b1.sh <filename>. Some input data: EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc 1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5 1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1,3,2,5,5,2,1,3,4,21,19.4 1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9 1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1,2,2,4,5,3,1,6,5,25,14.7 1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.75,3,2,5,5,1,1,2,4,17,13.2 1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9 1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3 1985,8,8,11/10/1984,21,267,21.73100616,CHI,0,IND,1,2,1,42,9,22,0.409,0,0,,9,12,0.75,2,7,9,4,2,5,3,4,27,21.2 1985,9,9,11/13/1984,21,270,21.73921971,CHI,1,SAS,1,3,1,43,18,27,0.667,1,1,1,8,11,0.727,2,8,10,4,3,2,4,4,45,37.5 1985,10,10,11/15/1984,21,272,21.74469541,CHI,1,BOS,0,-20,1,33,12,24,0.5,0,1,0,3,3,1,0,2,2,2,2,1,1,4,27,17.1 1985,11,11,11/17/1984,21,274,21.75017112,CHI,1,PHI,0,-9,1,44,4,17,0.235,0,0,,8,8,1,0,5,5,7,5,2,4,5,16,12.5 1985,12,12,11/19/1984,21,276,21.75564682,CHI,1,IND,0,-17,1,39,11,26,0.423,0,3,0,12,16,0.75,2,3,5,2,2,1,3,3,34,20.8 1985,13,13,11/21/1984,21,278,21.76112252,CHI,0,MIL,0,-10,1,42,11,22,0.5,0,0,,13,14,0.929,4,9,13,2,2,2,6,3,35,26.7 1985,14,14,11/23/1984,21,280,21.76659822,CHI,0,SEA,1,19,1,30,9,13,0.692,0,0,,5,6,0.833,0,4,4,3,4,1,4,4,23,19.5 1985,15,15,11/24/1984,21,281,21.76933607,CHI,0,POR,0,-10,1,41,10,24,0.417,0,1,0,10,10,1,3,3,6,8,3,1,4,4,30,23.9 1985,16,16,11/27/1984,21,284,21.77754962,CHI,0,GSW,0,-6,1,24,6,10,0.6,0,0,,1,1,1,0,2,2,3,3,2,4,1,13,11.1 1985,17,17,11/29/1984,21,286,21.78302533,CHI,0,PHO,0,-5,1,30,9,17,0.529,1,1,1,3,4,0.75,1,2,3,2,2,0,2,5,22,14 1985,18,18,11/30/1984,21,287,21.78576318,CHI,0,LAC,1,4,1,37,9,15,0.6,0,0,,2,4,0.5,2,3,5,5,3,0,4,4,20,15.5 1985,19,19,12/2/1984,21,289,21.79123888,CHI,0,LAL,1,1,1,42,7,13,0.538,0,0,,6,8,0.75,2,0,2,3,1,1,4,3,20,12.9 1985,20,20,12/4/1984,21,291,21.79671458,CHI,1,NJN,1,15,1,35,7,13,0.538,0,0,,6,6,1,1,2,3,6,1,0,3,3,20,16 1985,21,21,12/7/1984,21,294,21.80492813,CHI,1,NYK,1,2,1,43,8,16,0.5,0,1,0,5,7,0.714,1,1,2,3,2,0,6,5,21,9.3 1985,22,22,12/8/1984,21,295,21.80766598,CHI,1,DAL,1,2,1,35,10,23,0.435,0,0,,0,0,,4,3,7,2,0,2,2,3,20,11.2 1985,23,23,12/11/1984,21,298,21.81587953,CHI,1,DET,0,-7,1,37,13,28,0.464,0,1,0,1,3,0.333,1,7,8,6,2,0,3,4,27,16.2 1985,24,24,12/12/1984,21,299,21.81861739,CHI,0,DET,0,-7,1,30,6,17,0.353,0,2,0,9,10,0.9,0,1,1,2,2,1,1,5,21,12.5 1985,25,25,12/14/1984,21,301,21.82409309,CHI,0,NJN,0,-2,1,44,12,25,0.48,0,0,,10,10,1,2,6,8,8,1,0,0,4,34,29.5 1985,26,26,12/15/1984,21,302,21.82683094,CHI,1,PHI,0,-12,1,27,7,16,0.438,0,0,,0,0,,1,1,2,2,1,0,1,2,14,7.2 1985,27,27,12/18/1984,21,305,21.83504449,CHI,1,HOU,0,-8,1,45,8,20,0.4,0,1,0,2,4,0.5,1,2,3,8,3,0,1,2,18,14.5 1985,28,28,12/20/1984,21,307,21.84052019,CHI,0,ATL,1,3,1,41,12,22,0.545,0,0,,10,16,0.625,4,4,8,7,5,1,7,5,34,26.6 To make things easier, pattern1 matches all rows where column PTS is higher than 10, pattern 2 matches the rows where column FG_PCT is higher than 0.3, and pattern 3 matches all rows where column GmSc is higher than 19.
While an awk solution is going to be a bit faster ... we'll focus on a bash solution per OP's request. First issue is regex matching uses the =~ operator and not the == operator. Second issue is that to keep a row if only all 3 regexes match means we want to and (&&) the results of all 3 regex matches. Third issue addresses some basic syntax issues with OP's current code (eg, space after [[ and before ]]; improper assignments of regex patterns to the pattern* variables). One bash idea: pattern1='^([^,]*,){31}[1-9][0-9].*' pattern2='^([^,]*,){16}[0-1].[3-9].*' pattern3='^([^,]*,){32}[2-9][0-9].*' head -1 mj1.csv > mj1.new.csv while read -r line do if [[ "${line}" =~ $pattern1 && "${line}" =~ $pattern2 && "${line}" =~ $pattern3 ]] then # do whatever with $line, eg: echo "${line}" fi done < mj1.csv >> mj1.new.csv This generates: $ cat mj1.new.csv EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc 1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9 1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3 1985,8,8,11/10/1984,21,267,21.73100616,CHI,0,IND,1,2,1,42,9,22,0.409,0,0,,9,12,0.75,2,7,9,4,2,5,3,4,27,21.2 1985,9,9,11/13/1984,21,270,21.73921971,CHI,1,SAS,1,3,1,43,18,27,0.667,1,1,1,8,11,0.727,2,8,10,4,3,2,4,4,45,37.5 1985,12,12,11/19/1984,21,276,21.75564682,CHI,1,IND,0,-17,1,39,11,26,0.423,0,3,0,12,16,0.75,2,3,5,2,2,1,3,3,34,20.8 1985,13,13,11/21/1984,21,278,21.76112252,CHI,0,MIL,0,-10,1,42,11,22,0.5,0,0,,13,14,0.929,4,9,13,2,2,2,6,3,35,26.7 1985,15,15,11/24/1984,21,281,21.76933607,CHI,0,POR,0,-10,1,41,10,24,0.417,0,1,0,10,10,1,3,3,6,8,3,1,4,4,30,23.9 1985,25,25,12/14/1984,21,301,21.82409309,CHI,0,NJN,0,-2,1,44,12,25,0.48,0,0,,10,10,1,2,6,8,8,1,0,0,4,34,29.5 1985,28,28,12/20/1984,21,307,21.84052019,CHI,0,ATL,1,3,1,41,12,22,0.545,0,0,,10,16,0.625,4,4,8,7,5,1,7,5,34,26.6 NOTE: OP hasn't (yet) provided the expected output so at this point I have to assume OP's regexes are correct
Regular expression Bash issue
I have to match a string composed of only lowercase characters repeated 2 times , for example ballball or printprint. For example the word ball is not accepted because is not repeated 2 time. For this reason I have this code: read input expr='^(([a-z]*){2})$' if [[ $input =~ $expr ]]; then echo "OK string" exit 0 fi exit 10 but it doesn't work , for example if I insert ball the script prints "OK string". What do I wrong?
Not all Bash versions support backreferences in regexes natively. If yours doesn't, you can use an external tool such as grep: read input re='^\([a-z]\+\)\1$' if grep -q "$re" <<< "$input"; then echo "OK string" exit 0 fi exit 1 grep -q is silent and has a successful exit status if there was a match. Notice how (, + and ) have to be escaped for grep. (grep -E would understand () without escaping.) Also, I've replaced your * with + so we don't match the empty string. Alternatively: your requirement means that a matching string has two identical halves, so we can check for just that, without any regexes: read input half=$(( ${#input} / 2 )) if (( half > 0 )) && [[ ${input:0:$half} = ${input:$half} ]]; then echo "OK string" fi This uses Substring Expansion; the first check is to make sure that the empty string doesn't match.
Your requirement is to match strings made of two repeated words. This is easy to do by just checking if the first half of your string is equal to the remaining part. No need to use regexps... $ var="byebye" && len=$((${#var}/2)) $ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no ok $ var="abcdef" && len=$((${#var}/2)) $ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no no
The regex [a-z]* will match any alphanumeric or empty string. ([a-z]*){2} will match any two of those. Ergo, ^(([a-z]*){2})$ will match any string containing zero or more alphanumeric characters. Using the suggestion from #hwnd (replacing {2} with \1) will enforce a match on two identical strings. N.B: You will need a fairly recent version of bash. Tested in bash 4.3.11.
Make reference to a file in a regular expression
I have two files. One is a SALESORDERLIST, which goes like this ProductID;ProductDesc 1,potatoes 1 kg. 2,tomatoes 2 k 3,bottles of whiskey 2 un. 4,bottles of beer 40 gal (ProductID;ProductDesc) header is actually not in the file, so disregard it. In another file, POSSIBLEUNITS, I have -you guessed- the possible units, and their equivalencies: u;u.;un;un.;unit k;k.;kg;kg.,kilograms This is my first day with regular expressions and I would like to know how can I get the entries in SALESORDERLIST, whose units appear in POSSIBLEUNITS. In my example, I would like to exclude entry 4 since 'gal' is not listed in POSSIBLEUNITS file. I say regex, since I have a further criteria that needs to be matched: egrep "^[0-9]+;{1}[^; ][a-zA-Z ]+" SALESORDERLIST From those resultant entries, I want to get those ending in valid units. Thanks!
One way of achieving what you want is: cat SALESORDERLIST | egrep "\b(u|u\.|un|un\.|unit|k|k\.|kg|kg\.|kilograms)\b" 1,potatoes 1 kg. 2,tomatoes 2 k 3,bottles of whiskey 2 un. The metacharacter \b is an anchor that allows you to perform a "whole words only" search using a regular expression in the form of \bword\b. http://www.regular-expressions.info/wordboundaries.html
One way would be to create a bash script, say called findunit.sh: while read line do match=$(egrep -E "^[0-9]+,{1}[^, ][a-zA-Z ]+" <<< $line) name=${match##* } # echo "$name..." found=$(egrep "$name" /pathtofile/units.txt) # echo "xxx$found" [ -n "$found" ] && echo $line done < $1 Then run with: findunit.sh SALESORDERLIST My output from this is: 1,potatoes 1 kg. 2,tomatoes 2 k 3,bottles of whiskey 2 un.
An example of doing it completely in bash: declare -A units while read line; do while [ -n "$line" ]; do i=`expr index $line ";"` if [[ $i == 0 ]]; then units[$line]=1 break fi units[${line:0:$((i-1))}]=1 line=${line#*;} done done < POSSIBLEUNITS while read line; do unit=${line##* } if [[ ${units[$unit]} == 1 ]]; then echo $line fi done < SALESORDERLIST
Get digit from filename immediately preceeding file extension, with other digits in filename
I'm trying to extract the last number before a file extension in a bash script. So the format varies but it'll be some combination of numbers and letters, and the last character will always be a digit. I need to pull those digits and store them in a variable. The format is generally: sdflkej10_sdlkei450_sdlekr_1.txt I want to store just the final digit 1 into a variable. I'll be using this to loop through a large number of files, and the last number will get into double and triple digits. So for this file: kej10_sdlkei450_sdlekr_310.txt I'd need to return 310. The number of alphanumeric characters and underscores varies with each file, but the number I want always is immediately before the .txt extension and immediately after an underscore. I tried: bname=${f%%.*} number=$(echo $bname | tr -cd '[[:digit:]]') but this returns all digits. If I try number = $(echo $(bname -2) it changes the number it returns. The problem i'm having is mostly related to the variability, and the fact that I've been asked to do it in bash. Any help would really be appreciated.
regex='([0-9]+)\.[^.]*$' [[ $file =~ $regex ]] && number=${BASH_REMATCH[1]} This uses bash's underappreciated =~ regex operator which stores matches in an array named BASH_REMATCH.
You could do this using parameter substitution var=kej10_sdlkei450_sdlekr_310.txt var=${var%.*} var=${var##*_} echo $var 310
Use a Series of Bash Shell Expansions While not the most elegant solution, this one uses a sequence of shell parameter expansions to achieve the desired result without having to define a specific extension. For example, this function uses the length and offset expansions to find the digit after removing filename extensions: extract_digit() { local basename=${1%%.*} echo "${basename:$(( ${#basename} - 1 ))}" } Capturing Function Output You can capture the output in a variable with something like: $ foo=$(extract_digit sdflkej10_sdlkei450_sdlekr_1.txt) $ echo $foo 1 Sample Output from Function $ extract_digit sdflkej10_sdlkei450_sdlekr_1.txt 1 $ extract_digit sdflkej10_sdlkei450_sdlekr_9.txt 9 $ extract_digit sdflkej10_sdlkei450_sdlekr_10.txt 0
This should take care of your situation: INPUT="some6random7numbers_12345_moreletters_789.txt" SUBSTRING=`expr match "$INPUT" '.*_\([[:digit:]]*\)'` echo $SUBSTRING This will output 789
No need of regex here, you can utilize IFS var="kej10_sdlkei450_sdlekr_310.txt" v=$(IFS=[_.] read -ra arr <<< "$var" && echo "${arr[#]:(-2):1}") echo "$v" 310
Grep regular expression for digits in character string of variable length
I need some way to find words that contain any combination of characters and digits but exactly 4 digits only, and at least one character. EXAMPLE: a1a1a1a1 // Match 1234 // NO match (no characters) a1a1a1a1a1 // NO match ab2b2 // NO match cd12 // NO match z9989 // Match 1ab26a9 // Match 1ab1c1 // NO match 12345 // NO match 24 // NO match a2b2c2d2 // Match ab11cd22dd33 // NO match
to match a digit in grep you can use [0-9]. To match anything but a digit, you can use [^0-9]. Since that can be any number of , or no chars, you add a "*" (any number of the preceding). So what you'll want is logically (anything not a digit or nothing)* (any single digit) (anything not a digit or nothing)* .... until you have 4 "any single digit" groups. i.e. [^0-9]*[0-9]... I find with grep long patterns, especially with long strings of special chars that need to be escaped, it's best to build up slowly so you're sure you understand whats going on. For example, #this will highlight your matches, and make it easier to understand alias grep='grep --color=auto' echo 'a1b2' | grep '[0-9]' will show you how it's matching. You can then extend the pattern once you understand each part.
I'm not sure about all the other input you might take (i.e. is ax12ax12ax12ax12 valid?), but this will work based on what you posted: %> grep -P "^(?:\w\d){4}$" fileWithInput
With grep: grep -iE '^([a-z]*[0-9]){4}[a-z]*$' | grep -vE '^[0-9]{4}$' Do it in one pattern with Perl: perl -ne 'print if /^(?!\d{4}$)([^\W\d_]*\d){4}[^\W\d_]*$/' The funky [^\W\d_] character class is a cosmopolitan way to spell [A-Za-z]: it catches all letters rather than only the English ones.
If you don't mind using a little shell as well, you could do something like this: echo "a1a1a1a1" |grep -o '[0-9]'|wc -l which would display the number of digits found in the string. If you like, you could then test for a given number of matches: max_match=4 [ "$(echo "a1da4a3aaa4a4" | grep -o '[0-9]'|wc -l)" -le $max_match ] || echo "too many digits."
Assuming you only need ASCII, and you can only access the (fairly primitive) regexp constructs of grep, the following should be pretty close: grep ^[a-zA-Z]*[0-9][a-zA-Z]*[a-zA-Z]*[0-9][a-zA-Z]*[a-zA-Z]*[0-9][a-zA-Z]*[a-zA-Z]*[0-9][a-zA-Z]*$ | grep [a-zA-Z]
You might try [^0-9]*[0-9][^0-9]*[0-9][^0-9]*[0-9][^0-9]*[0-9][^0-9]* But this will match 1234. why doesn't that match your criteria?
The regex for that is: ([A-Za-z]\d){4} [A-Za-z] - for character class \d - for number you wrapp them in () to group them indicating the format character follow by number {4} - indicating that it must be 4 repetitions
you can use normal shell script, no need complicated regex. var=a1a1a1a1 alldigits=${var//[^0-9]/} allletters=${var//[0-9]/} case "${#alldigits}" in 4) if [ "${#allletters}" -gt 0 ];then echo "ok: 4 digits and letters: $var" else echo "Invalid: all numbers and exactly 4: $var" fi ;; *) echo "Invalid: $var";; esac
thanks for your answers finaly i wrote some script and it work perfect: . /P ab2b2 cd12 z9989 1ab26a9 1ab1c1 1234 24 a2b2c2d2 #!/bin/bash echo "$#" |tr -s " " "\n"s >> sorting cat sorting | while read tostr do l=$(echo $tostr|tr -d "\n"|wc -c) temp=$(echo $tostr|tr -d a-z|tr -d "\n" | wc -c) if [ $temp -eq 4 ]; then if [ $l -gt 4 ]; then printf "%s " "$tostr" fi fi done echo