While-Loop BASH-Regex not matching - regex

Why is this INCREDIBALLY simple REGEX not matching?!!?
#!/bin/bash
while true
do
read -r -p $'What is the JIRA Ticket associated with this work?' JIRA
#Use a regular expresion to verify that our reply stored in JIRA is only 4 digits, if not, loop and try again.
if [[ ! "$JIRA" =~ [0-9]{4} ]]
then
echo -en "The JIRA Ticket should only be 4 digits\nPlease try again."
continue
else
break 1
fi
done
When prompted, if you type "ffffff" it catches, but if you type more than 4 digits "444444" or even toss a letter in there "4444444fffff" it catches nothing, hits the else block and quits. I think this is basic and I'm dumbfounded as to why its not catching the extra digits or characters?
I appreciate the help.

You need to change your equality test to:
if [[ ! "$JIRA" =~ ^[0-9]{4}$ ]]
This ensures that the entire string contains just four digits. ^ means beginning of string, $ means end of string.

The regular expression is open-ended, meaning it only has to match a substring of the left-hand argument, not the entire thing. Anchor your regular expression to force it to match the entire string:
if [[ ! "$JIRA" =~ ^[0-9]{4}$ ]]

Maybe a simpler pattern (== instead of =~) may solve your issue:
#!/bin/bash
while true
do
read -r -p $'What is the JIRA Ticket associated with this work?' JIRA
[[ $JIRA == [0-9][0-9][0-9][0-9] ]] && break 1
echo -en "The JIRA Ticket should only be 4 digits\nPlease try again."
done

Related

Bash regex overwrite line if multiple match

I have a bash script where I have 3 regular expressions. I would like to, through conditional if, to find the match of the first pattern in the file.
If there is a match, then look for a match in the second pattern but only with the lines that have matched the first pattern.
Finally, to check the third pattern only with the lines that have matched the second pattern (which are also the ones that had already matched the first pattern).
I have the following code but I don't know how to tell that if there is a match to overwrite the "line" value to decrease the number of total lines to only the ones matching.
#!/bin/bash
pattern1= egrep '^([^,]*,){31}[1-9][0-9].*'
pattern2= egrep '^([^,]*,){16}[0-1].[3-9].*'
pattern3= egrep '^([^,]*,){32}[2-9][0-9].*'
while read line
do
if [[$line == $pattern1]];then
newline == $pattern1
if [[$newline == $pattern2 ]];then
newline2 == $pattern2
if [[$newline2 == $pattern3 ]]; then
echo $pattern3
fi
done < mj1.csv #this is the input file
I will call this script like ./b1.sh <filename>.
Some input data:
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc
1985,1,1,10/26/1984,21,252,21.6899384,CHI,1,WSB,1,16,1,40,5,16,0.313,0,0,,6,7,0.857,1,5,6,7,2,4,5,2,16,12.5
1985,2,2,10/27/1984,21,253,21.69267625,CHI,0,MIL,0,-2,1,34,8,13,0.615,0,0,,5,5,1,3,2,5,5,2,1,3,4,21,19.4
1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9
1985,4,4,10/30/1984,21,256,21.7008898,CHI,0,KCK,1,5,1,36,8,21,0.381,0,0,,9,9,1,2,2,4,5,3,1,6,5,25,14.7
1985,5,5,11/1/1984,21,258,21.7063655,CHI,0,DEN,0,-16,1,33,7,15,0.467,0,0,,3,4,0.75,3,2,5,5,1,1,2,4,17,13.2
1985,6,6,11/7/1984,21,264,21.72279261,CHI,0,DET,1,4,1,27,9,19,0.474,0,0,,7,9,0.778,1,3,4,3,3,1,5,5,25,14.9
1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3
1985,8,8,11/10/1984,21,267,21.73100616,CHI,0,IND,1,2,1,42,9,22,0.409,0,0,,9,12,0.75,2,7,9,4,2,5,3,4,27,21.2
1985,9,9,11/13/1984,21,270,21.73921971,CHI,1,SAS,1,3,1,43,18,27,0.667,1,1,1,8,11,0.727,2,8,10,4,3,2,4,4,45,37.5
1985,10,10,11/15/1984,21,272,21.74469541,CHI,1,BOS,0,-20,1,33,12,24,0.5,0,1,0,3,3,1,0,2,2,2,2,1,1,4,27,17.1
1985,11,11,11/17/1984,21,274,21.75017112,CHI,1,PHI,0,-9,1,44,4,17,0.235,0,0,,8,8,1,0,5,5,7,5,2,4,5,16,12.5
1985,12,12,11/19/1984,21,276,21.75564682,CHI,1,IND,0,-17,1,39,11,26,0.423,0,3,0,12,16,0.75,2,3,5,2,2,1,3,3,34,20.8
1985,13,13,11/21/1984,21,278,21.76112252,CHI,0,MIL,0,-10,1,42,11,22,0.5,0,0,,13,14,0.929,4,9,13,2,2,2,6,3,35,26.7
1985,14,14,11/23/1984,21,280,21.76659822,CHI,0,SEA,1,19,1,30,9,13,0.692,0,0,,5,6,0.833,0,4,4,3,4,1,4,4,23,19.5
1985,15,15,11/24/1984,21,281,21.76933607,CHI,0,POR,0,-10,1,41,10,24,0.417,0,1,0,10,10,1,3,3,6,8,3,1,4,4,30,23.9
1985,16,16,11/27/1984,21,284,21.77754962,CHI,0,GSW,0,-6,1,24,6,10,0.6,0,0,,1,1,1,0,2,2,3,3,2,4,1,13,11.1
1985,17,17,11/29/1984,21,286,21.78302533,CHI,0,PHO,0,-5,1,30,9,17,0.529,1,1,1,3,4,0.75,1,2,3,2,2,0,2,5,22,14
1985,18,18,11/30/1984,21,287,21.78576318,CHI,0,LAC,1,4,1,37,9,15,0.6,0,0,,2,4,0.5,2,3,5,5,3,0,4,4,20,15.5
1985,19,19,12/2/1984,21,289,21.79123888,CHI,0,LAL,1,1,1,42,7,13,0.538,0,0,,6,8,0.75,2,0,2,3,1,1,4,3,20,12.9
1985,20,20,12/4/1984,21,291,21.79671458,CHI,1,NJN,1,15,1,35,7,13,0.538,0,0,,6,6,1,1,2,3,6,1,0,3,3,20,16
1985,21,21,12/7/1984,21,294,21.80492813,CHI,1,NYK,1,2,1,43,8,16,0.5,0,1,0,5,7,0.714,1,1,2,3,2,0,6,5,21,9.3
1985,22,22,12/8/1984,21,295,21.80766598,CHI,1,DAL,1,2,1,35,10,23,0.435,0,0,,0,0,,4,3,7,2,0,2,2,3,20,11.2
1985,23,23,12/11/1984,21,298,21.81587953,CHI,1,DET,0,-7,1,37,13,28,0.464,0,1,0,1,3,0.333,1,7,8,6,2,0,3,4,27,16.2
1985,24,24,12/12/1984,21,299,21.81861739,CHI,0,DET,0,-7,1,30,6,17,0.353,0,2,0,9,10,0.9,0,1,1,2,2,1,1,5,21,12.5
1985,25,25,12/14/1984,21,301,21.82409309,CHI,0,NJN,0,-2,1,44,12,25,0.48,0,0,,10,10,1,2,6,8,8,1,0,0,4,34,29.5
1985,26,26,12/15/1984,21,302,21.82683094,CHI,1,PHI,0,-12,1,27,7,16,0.438,0,0,,0,0,,1,1,2,2,1,0,1,2,14,7.2
1985,27,27,12/18/1984,21,305,21.83504449,CHI,1,HOU,0,-8,1,45,8,20,0.4,0,1,0,2,4,0.5,1,2,3,8,3,0,1,2,18,14.5
1985,28,28,12/20/1984,21,307,21.84052019,CHI,0,ATL,1,3,1,41,12,22,0.545,0,0,,10,16,0.625,4,4,8,7,5,1,7,5,34,26.6
To make things easier, pattern1 matches all rows where column PTS is higher than 10, pattern 2 matches the rows where column FG_PCT is higher than 0.3, and pattern 3 matches all rows where column GmSc is higher than 19.
While an awk solution is going to be a bit faster ... we'll focus on a bash solution per OP's request.
First issue is regex matching uses the =~ operator and not the == operator.
Second issue is that to keep a row if only all 3 regexes match means we want to and (&&) the results of all 3 regex matches.
Third issue addresses some basic syntax issues with OP's current code (eg, space after [[ and before ]]; improper assignments of regex patterns to the pattern* variables).
One bash idea:
pattern1='^([^,]*,){31}[1-9][0-9].*'
pattern2='^([^,]*,){16}[0-1].[3-9].*'
pattern3='^([^,]*,){32}[2-9][0-9].*'
head -1 mj1.csv > mj1.new.csv
while read -r line
do
if [[ "${line}" =~ $pattern1 && "${line}" =~ $pattern2 && "${line}" =~ $pattern3 ]]
then
# do whatever with $line, eg:
echo "${line}"
fi
done < mj1.csv >> mj1.new.csv
This generates:
$ cat mj1.new.csv
EndYear,Rk,G,Date,Years,Days,Age,Tm,Home,Opp,Win,Diff,GS,MP,FG,FGA,FG_PCT,3P,3PA,3P_PCT,FT,FTA,FT_PCT,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc
1985,3,3,10/29/1984,21,255,21.69815195,CHI,1,MIL,1,6,1,34,13,24,0.542,0,0,,11,13,0.846,2,2,4,5,6,2,3,4,37,32.9
1985,7,7,11/8/1984,21,265,21.72553046,CHI,0,NYK,1,15,1,33,15,22,0.682,0,0,,3,4,0.75,4,4,8,5,3,2,5,2,33,29.3
1985,8,8,11/10/1984,21,267,21.73100616,CHI,0,IND,1,2,1,42,9,22,0.409,0,0,,9,12,0.75,2,7,9,4,2,5,3,4,27,21.2
1985,9,9,11/13/1984,21,270,21.73921971,CHI,1,SAS,1,3,1,43,18,27,0.667,1,1,1,8,11,0.727,2,8,10,4,3,2,4,4,45,37.5
1985,12,12,11/19/1984,21,276,21.75564682,CHI,1,IND,0,-17,1,39,11,26,0.423,0,3,0,12,16,0.75,2,3,5,2,2,1,3,3,34,20.8
1985,13,13,11/21/1984,21,278,21.76112252,CHI,0,MIL,0,-10,1,42,11,22,0.5,0,0,,13,14,0.929,4,9,13,2,2,2,6,3,35,26.7
1985,15,15,11/24/1984,21,281,21.76933607,CHI,0,POR,0,-10,1,41,10,24,0.417,0,1,0,10,10,1,3,3,6,8,3,1,4,4,30,23.9
1985,25,25,12/14/1984,21,301,21.82409309,CHI,0,NJN,0,-2,1,44,12,25,0.48,0,0,,10,10,1,2,6,8,8,1,0,0,4,34,29.5
1985,28,28,12/20/1984,21,307,21.84052019,CHI,0,ATL,1,3,1,41,12,22,0.545,0,0,,10,16,0.625,4,4,8,7,5,1,7,5,34,26.6
NOTE: OP hasn't (yet) provided the expected output so at this point I have to assume OP's regexes are correct

Regular expressions don't work as expected in bash if-else block's condition

My pattern defined to match in if-else block is :
pat="17[0-1][0-9][0-9][0-9].AUG"
nln=""
In my script, I'm taking user input which needs to be matched against the pattern, which if doesn't match, appropriate error messages are to be shown. Pretty simple, but giving me a hard time though. My code block from the script is this:
echo "How many days' AUDIT Logs need to be searched?"
read days
echo "Enter file name(s)[For multiple files, one file per line]: "
for(( c = 0 ; c < $days ; c++))
do
read elements
if [[ $elements =~ $pat ]];
then
array[$c]="$elements"
elif [[ $elements =~ $nln ]];
then
echo "No file entered.Run script again. Exiting"
exit;
else
echo "Invalid filename entered: $elements.Run script again. Exiting"
exit;
fi
done
The format I want from the user for filenames to be entered is this:
170402.AUG
So basically yymmdd.AUG (where y-year,m-month,d-day), with trailing or leading spaces is fine. Anything other than that should throw "Invalid filename entered: $elements.Run script again. Exiting" message. Also I want to check if if it is a blank line with a "Enter" hit, it should give an error saying "No file entered.Run script again. Exiting"
However my code, even if I enter something like "xxx" as filename, which should be throwing "Invalid filename entered: $elements.Run script again. Exiting", is actually checking true against a blank line, and throwing "No file entered.Run script again. Exiting"
Need some help with handling the regular expressions' check with user input, as otherwise rest of my script works just fine.
I think as discussed in the comments you are confusing with the glob match and a regEx match, what you have defined as pat is a glob match which needs to be equated with the == operator as,
pat="17[0-1][0-9][0-9][0-9].AUG"
string="170402.AUG"
[[ $string == $pat ]] && printf "Match success\n"
The equivalent ~ match would be to something as
pat="17[[:digit:]]{4}\.AUG"
[[ $string =~ $pat ]] && printf "Match success\n"
As you can see the . in the regex syntax has been escaped to deprive of its special meaning ( to match any character) but just to use as a literal dot. The POSIX character class [[:digit:]] with a character count {4} allows you to match 4 digits followed by .AUG
And for the string empty check do as suggested by the comments from Cyrus, or by Benjamin.W
[[ $elements == "" ]]
(or)
[[ -z $elements ]]
I would not bug the user with how many days (who want count 15 days or like)? Also, why only one file per line? You should help the users, not bug them like microsoft...
For the start:
show_help() { cat <<'EOF'
bla bla....
EOF
}
show_files() { echo "${#files[#]} valid files entered: ${files[#]}"; }
while read -r -p 'files? (h-help)> ' line
do
case "$line" in
q) echo "quitting..." ; exit 0 ;;
h) show_help ; continue;;
'') (( ${#files} )) && show_files; continue ;;
l) show_files ; continue ;;
p) (( ${#files} )) && break || { echo "No files enterd.. quitting" ; exit 1; } ;; # go to processing
esac
# select (grep) the valid patterns from the entered line
# and append them into the array
# using the -P (if your grep know it) you can construct very complex regexes
files+=( $(grep -oP '17\d{4}.\w{3}' <<< "$line") )
done
echo "processing files ${files[#]}"
Using such logic you can build really powerful and user-friendly app. Also, you can use -e for the read enable the readline functions (cursor keys and like)...
But :) Consider just create a simple script, which accepts arguments. Without any dialogs and such. example:
myscript -h
same as above, or some longer help text
myscript 170402.AUG 170403.AUG 170404.AUG 170405.AUG
will do whatever it should do with the files. Main benefit, you could use globbing in the filenames, like
myscript 1704*
and so on...
And if you really want the dialog, it could show it when someone runs the script without any argument, e.g.:
myscript
will run in interactive mode...

how to enforce a date format

I want to use the date command to output a day of week from user input.
I want to force the input to be of the format MM/DD/YYYY.
For example, at the command line I give
./programname MM/DD/YYYY MM/DD/YYYY
Snippets from the script itself
#!/bin/bash
DATE_FORMAT="^[0-9][0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]$" #MM/DD/YYYY
DATE1="$1"
DATE2="$2"
... followed by
if [ "$DATE1" != "$DATE_FORMAT" ] || [ "$DATE2" != "$DATE_FORMAT" ]; then
echo -e "Please follow the valid format MM/DD/YYYY.\n" 1>&2
exit 1
Now the problem is even when I enter correct date formats,
./programname 11/22/2014 11/23/2014
I still get that error message that I set up, which means that condition for if is evaluated true even when I input valid format... any suggestions why this is happening?
This script seems to work:
#!/bin/bash
DATE_FORMAT="^[01][0-9][/][0-3][0-9][/][0-9][0-9][0-9][0-9]$" #MM/DD/YYYY
DATE1="$1"
DATE2="$2"
if [[ "$DATE1" =~ $DATE_FORMAT ]] && [[ "$DATE2" =~ $DATE_FORMAT ]]
then echo "Both dates ($DATE1 and $DATE2) are OK"
else echo "Please follow the valid format MM/DD/YYYY ($DATE1 or $DATE2 is wrong)."
fi
It uses the =~ operator for a positive regex match inside Bash's [[ test command. The documents don't mention a !~ for negative matching (though that's what Awk and Perl use). With the single-bracket [ test command, there is no regex matching. Note that the regex expression must not be enclosed in double quotes:
Any part of the pattern may be quoted to force the quoted portion to be matched as a string. Bracket expressions in regular expressions must be treated carefully, since normal quoting characters lose their meanings between brackets. If the pattern is stored in a shell variable, quoting the variable expansion forces the entire pattern to be matched as a string.
The test is also more stringent, rejecting 23/45/2091, amongst other invalid date strings.
$ bash dt19.sh 11/22/2014 11/23/2014
Both dates (11/22/2014 and 11/23/2014) are OK
$ bash dt19.sh 31/22/2014 11/43/2014
Please follow the valid format MM/DD/YYYY (31/22/2014 or 11/43/2014 is wrong).
$
Corrected code:
#!/bin/bash
DATE1="$1"
DATE2="$2"
if echo "$DATE1" | grep -q -E '[0-9][0-9][/][0-9][0-9][/][0-9][0-9][0-9][0-9]'
then
echo "Do whatever you want here"
exit 1
else
echo "Invalid date"
fi

How to match string (with regular expression) that begins with a string

In a bash script I have to match strings that begin with exactly 3 times with the string lo; so lololoba is good, loloba is bad, lololololoba is good, balololo is bad.
I tried with this pattern: "^$str1/{$n,}" but it doesn't work, how can I do it?
EDIT:
According to OPs comment, lololololoba is bad now.
This should work:
pat="^(lo){3}"
s="lolololoba"
[[ $s =~ $pat ]] && echo good || echo bad
EDIT (As per OPs comment):
If you want to match exactly 3 times (i.e lolololoba and such should be unmatched):
change the pat="^(lo){3}" to:
pat="^(lo){3}(l[^o]|[^l].)"
You can use following regex :
^(lo){3}.*$
Instead of lo you can put your variable.
See demo https://regex101.com/r/sI8zQ6/1
You can use this awk to match exactly 3 occurrences of lo at the beginning:
# input file
cat file
lololoba
balololo
loloba
lololololoba
lololo
# awk command to print only valid lines
awk -F '^(lo){3}' 'NF == 2 && !($2 ~ /^lo/)' file
lololoba
lololo
As per your comment:
... more than 3 is bad so "lolololoba" is not good!
You'll find that #Jahid's answer doesn't fit (as his gives you "good" to that test string.
To use his answer with the correct regex:
pat="^(lo){3}(?\!lo)"
s="lolololoba"
[[ $s =~ $pat ]] && echo good || echo bad
This verifies that there are three "lo"s at the beginning, and not another one immediately following the three.
Note that if you're using bash you'll have to escape that ! in the first line (which is what my regex above does)

Checking a string to see if it contains numeric character in UNIX

I'm new to UNIX, having only started it at work today, but experienced with Java, and have the following code:
#/bin/bash
echo "Please enter a word:"
read word
grep -i $word $1 | cut -d',' -f1,2 | tr "," "-"> output
This works fine, but what I now need to do is to check when word is read, that it contains nothing but letters and if it has numeric characters in print "Invalid input!" message and ask them to enter it again. I assumed regular expressions with an if statement would be the easy way to do this but I cannot get my head around how to use them in UNIX as I am used to the Java application of them. Any help with this would be greatly appreciated, as I couldn't find help when searching as all the solutions with regular expressions in linux I found only dealt with if it was either all numeric or not.
Yet another approach. Grep exits with 0 if a match is found, so you can test the exit code:
echo "${word}" | grep -q '[0-9]'
if [ $? = 0 ]; then
echo 'Invalid input'
fi
This is /bin/sh compatible.
Incorporating Daenyth and John's suggestions, this becomes
if echo "${word}" | grep '[0-9]' >/dev/null; then
echo 'Invalid input'
fi
The double bracket operator is an extended version of the test command which supports regexes via the =~ operator:
#!/bin/bash
while true; do
read -p "Please enter a word: " word
if [[ $word =~ [0-9] ]]; then
echo 'Invalid input!' >&2
else
break
fi
done
This is a bash-specific feature. Bash is a newer shell that is not available on all flavors of UNIX--though by "newer" I mean "only recently developed in the post-vacuum tube era" and by "not all flavors of UNIX" I mean relics like old versions of Solaris and HP-UX.
In my opinion this is the simplest option and bash is plenty portable these days, but if being portable to old UNIXes is in fact important then you'll need to use the other posters' sh-compatible answers. sh is the most common and most widely supported shell, but the price you pay for portability is losing things like =~.
If you're trying to write portable shell code, your options for string manipulation are limited. You can use shell globbing patterns (which are a lot less expressive than regexps) in the case construct:
export LC_COLLATE=C
read word
while
case "$word" in
*[!A-Za-z]*) echo >&2 "Invalid input, please enter letters only"; true;;
*) false;;
esac
do
read word
done
EDIT: setting LC_COLLATE is necessary because in most non-C locales, character ranges like A-Z don't have the “obvious” meaning. I assume you want only ASCII letters; if you also want letters with diacritics, don't change LC_COLLATE, and replace A-Za-z by [:alpha:] (so the whole pattern becomes *[![:alpha:]]*).
For full regexps, see the expr command. EDIT: Note that expr, like several other basic shell tools, has pitfalls with some special strings; the z characters below prevent $word from being interpreted as reserved words by expr.
export LC_COLLATE=C
read word
while expr "z$word" : 'z[A-Za-z]*$' >/dev/null; then
echo >&2 "Invalid input, please enter letters only"
read word
fi
If you only target recent enough versions of bash, there are other options, such as the =~ operator of [[ ... ]] conditional commands.
Note that your last line has a bug, the first command should be
grep -i "$word" "$1"
The quotes are because somewhat counter-intuitively, "$foo" means “the value of the variable called foo” whereas plain $foo means “take the value of foo, split it into separate words where it contains whitespace, and treat each word as a globbing pattern and try to expand it”. (In fact if you've already checked that $word contains only letters, leaving the quotes won't do any harm, but it takes more time to think of these special cases than to just put the quotes every times.)
Yet another (quite) portable way to do it ...
if test "$word" != "`printf "%s" "$word" | tr -dc '[[:alpha:]]'`"; then
echo invalid
fi
One portable (assuming bash >= 3) way to do this is to remove all numbers and test for length:
#!/bin/bash
read -p "Enter a number" var
if [[ -n ${var//[0-9]} ]]; then
echo "Contains non-numbers!"
else
echo "ok!"
fi
Coming from Java, it's important to note that bash has no real concept of objects or data types. Everything is a string, and complex data structures are painful at best.
For more info on what I did, and other related functions, google for bash string manipulation.
Playing around with Bash parameter expansion and character classes:
# cf. http://wiki.bash-hackers.org/syntax/pe
word="abc1def"
word="abc,def"
word=$'abc\177def'
# cf. http://mywiki.wooledge.org/BashFAQ/058 (no NUL byte in Bash variable)
word=$'abc\000def'
word="abcdef"
(
set -xv
[[ "${word}" != "${word/[[:digit:]]/}" ]] && echo invalid || echo valid
[[ -n "${word//[[:alpha:]]/}" ]] && echo invalid || echo valid
)
Everyone's answers seem to be based on the fact that the only invalid characters are numbers. The initial questions states that they need to check that the string contains "nothing but letters".
I think the best way to do it is
nonalpha=$(echo "$word" | sed 's/[[:alpha:]]//g')
if [[ ${#nonalpha} -gt 0 ]]; then
echo "Invalid character(s): $nonalpha"
fi
If you found this page looking for a way to detect non-numeric characters in your string (like I did!) replace [[:alpha:]] with [[:digit:]].