Implementing Regex In Bash - regex

I am trying to throw an error if my text file lines have any combination of 5 [A-Z 0-9] chars followed by a comma and nothing else, like this:
WH3Y4,
H7UF5,
but my code is showing the error even when the text lines look like this, with spaces and words after the comma:
WH3Y4, my test
H7UF5, your test
The regex I am using below should work, if I understand how this is done:
^ to indicate the beginning of the text line
[A-Z0-9]{5} to indicate 5 chars of either cap letter or numbers
, to indicate they are followed by a comma
$ to indicate the end of the text line
So in theory, when it encounters any text after the comma on the same line, it should not produce the error, yet that's what's happening:
if ! [[ $myText =~ ^[A-Z0-9]{5},$ ]]; then
echo "Error"
continue
fi
Similarly, if I want to error when the text looks like this:
WH3Y4 test
H7UF5 test
this should work, but it doesn't either:
if ! [[ $myText =~ ^[A-Z0-9]{5} *[A-Za-z]$ ]]; then
echo "Error"
continue
fi
And when I try this as suggested in the comments:
[[ "$myText" =~ ^[A-Z0-9]{5},\$ ]]
it produces an error for this as it should:
WH3Y4,
H7UF5,
but also produces an error for this as it shouldn't:
WH3Y4, my test
H7UF5, your test
I thought the idea of the $ is to indicate the end of the line, but if the line continues with more chars then it should not match the error condition.

It seems that your code is not implementing what you want. You say
I am trying to throw an error if my text file lines have any combination
of 5 [A-Z 0-9] chars followed by a comma and nothing else
but then your code says
if ! [[ $myText =~ ^[A-Z0-9]{5},$ ]]; then
echo "Error"
continue
fi
The presence of the ! - the negation operator - means that it will print "Error" if the string does not match the regex, so it will accept the two sample strings your gave - WH3Y4, and H7UF5, - and will reject anything else. I think what you wanted here is
if [[ $myText =~ ^[A-Z0-9]{5},$ ]] ; then
echo "Error"
continue
fi
or in other words, just get rid of the !.
In the second case
if ! [[ $myText =~ ^[A-Z0-9]{5} *[A-Za-z]$ ]]; then
echo "Error"
continue
fi
the problem is that your regular expression doesn't match your data. I suggest that you use
if [[ $myText =~ ^[A-Z0-9]{5},[\ A-Za-z]+$ ]]; then
echo "Error"
continue
fi
Note that I dropped the ! in the this case as well.

Related

Bash only get the first matched result when use regex

There's a string example
"j2sdk/1.8.0_25-static j2sdk/1.8.0_45 j2sdk/1.8.0_p120 j2sdk/1.8.0_40 j2sdk/1.8.0_51"
I want to find the ones matched with format j2sdk/1.8.0_xxx, but xxx only with digits, here, I want below strings be matched
j2sdk/1.8.0_45
j2sdk/1.8.0_40
j2sdk/1.8.0_51
I wrote below code, but when run, it only get the first matched j2sdk/1.8.0_45, anything wrong with my code?
avail_versions="j2sdk/1.8.0_25-static j2sdk/1.8.0_45 j2sdk/1.8.0_p120 j2sdk/1.8.0_40 j2sdk/1.8.0_51"
patern='j2sdk\/1\.8\.0_[0-9]+\s+'
if [[ $avail_versions =~ $patern ]];then
echo matched
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
fi
The results is that BASH_REMATCH[0] is j2sdk/1.8.0_45, BASH_REMATCH[1] and [2] are empty
I expected I can get them in BASH_REMATH[1],BASH_REMATH[2],BASH_REMATH[3].
Is there other way in Bash I can get expected matches.
Thanks
I split the input at spaces and add back the space after each word.
for s in $avail_versions ; do
s="$s "
if [[ $s =~ $patern ]];then
echo ${BASH_REMATCH[0]}
fi
done
j2sdk/1.8.0_45
j2sdk/1.8.0_40
j2sdk/1.8.0_51

preg_match_all equivalent for BASH?

I have a string like this
foo:collection:indexation [options] [--] <text> <text_1> <text_2> <text_3> <text_4>
And i want to use bash regex to get an array or string that I can split to get this in order to check if the syntax is correct
["text", "text_1", "text_2", "text_3", "text_4"]
I have tried to do this :
COMMAND_OUTPUT=$($COMMAND_HELP)
# get the output of the help
# regex
ARGUMENT_REGEX="<([^>]+)>"
GOOD_REGEX="[a-z-]"
# get all the arguments
while [[ $COMMAND_OUTPUT =~ $ARGUMENT_REGEX ]]; do
ARGUMENT="${BASH_REMATCH[1]}"
# bad syntax
if [[ ! $ARGUMENT =~ $GOOD_REGEX ]]; then
echo "Invalid argument '$ARGUMENT' for the command $FILE"
echo "Must only use characters [a-z:-]"
exit 5
fi
done
But the while does not seem to be appropriate since I always get the first match.
How can I get all the matches for this regex ?
Thanks !
The loop doesn't work because every time you're just testing the same input string against the regexp. It doesn't know that it should start scanning after the match from the previous iteration. You'd need to remove the part of the string up to and including the previous match before doing the next test.
A simpler way is to use grep -o to get all the matches.
$COMMAND_HELP | grep -o "$ARGUMENT_REGEX" | while read ARGUMENT; do
if [[ ! $ARGUMENT =~ $GOOD_REGEX ]]; then
echo "Invalid argument '$ARGUMENT' for the command $FILE"
echo "Must only use characters [a-z:-]"
exit 5
fi
done
Bash doesn't have this directly, but you can achieve a similar effect with a slight modification.
string='foo...'
re='<([^>]+)>'
while [[ $string =~ $re(.*) ]]; do
string=${BASH_REMATCH[2]}
# process as before
done
This matches the regex we want and also everything in the string after the regex. We keep shortening $string by assigning only the after-our-regex portion to it on every iteration. On the last iteration, ${BASH_REMATCH[2]} will be empty so the loop will terminate.

Bash regex does not accept slash

i am pretty new to bash shell scripting (and linux too)... i try to do a simple script which involves some regex for a string given by keyboard from a user.
clear
read -p "Insert e-mail > "
if [[ $REPLY =~ ^[.] ]]
then
echo "ERROR (code 1): e-mail cannot start with \".\""
elif [[ $REPLY =~ .[.]$ ]]
then
echo "ERROR (code 2): e-mail cannot end with \".\""
else
if [[ $REPLY =~ ^[0-9][0-9a-zA-Z!#$%^\&\'*+-]+$ ]] #THIS IS WHERE I NEED HELP
then
echo "Good!"
else
echo "Bad!"
fi
fi
so what i want to do is to make a regex
so that the user cant start with . or end with . (i pretty much did that and its working)...
next what i wanted to do was make the string start with a number and i did that with ^[0-9] (i think this is correct)
and after that..string could be anything like a number 0-9 or letters a-z and A-Z or the next characters: !#$%^&'*+-/
so when user entered 1& (it starts with number and the rest is in the acceptable characters) but it didn't work.. because it need to be \& (at the regex formula).
next the same problem occurred to character ' what i did, was to add again a backslash to regex formula (\') and it worked..
then i tried to do the same with / character (slash character) so what i did was add a backslash / (backslash slash) but when user entered 1/ (it starts with number and the rest are acceptable characters) unfortunately it printed "Bad!" ... it should print Good!..
why is that happening?
i tried \/ and \\/ but still... cant understand why it doesn't work!
Problem is presence of ! in your character class that is doing history expansion.
I suggest declaring your regex beforehand like this:
re="^[0-9][0-9a-zA-Z\!#$%^&/*'+-]+$"
Then use it as:
s='1/'
[[ $s =~ $re ]] && echo "good" || echo "bad"
good
Actually, /s work in character classes just fine:
$ [[ "1/" =~ ^[0-9][/]+$ ]]; echo $?
0

Bash need to test for alphanumeric string

Trying to verify that a string has only lowercase, uppercase, or numbers in it.
if ! [[ "$TITLE" =~ ^[a-zA-Z0-9]+$ ]]; then echo "INVALID"; fi
Thoughts?
* UPDATE *
The variable TITLE currently only has upper case text so it should pass and nothing should be outputted. If however I add a special character to TITLE, the IF statement should catch it and echo INVALID. Currently it does not work. It always echos invalid. I think this is because my regex statement is wrong. I think the way I have it written, its looking for a title that has all three in it.
Bash 4.2.25
The idea is, the user should be able to add any title as long as it only contains uppercase, lowercase or numbers. All other characters should fail.
* UPDATE *
If TITLE = ThisIsAValidTitle it echos invalid.
If TITLE = ThisIs#######InvalidTitle it also echos invalid.
* SOLUTION *
Weird, well it started working when I simplified it down to this:
TEST="Valid0"
if ! [[ "$TEST" =~ [^a-zA-Z0-9] ]]; then
echo "VALID"
else
echo "INVALID"
fi
* REAL SOLUTION *
My variable had spaces in it... DUH
Sorry for the trouble guys...
* FINAL SOLUTION *
This accounts for spaces in titles
if ! [[ "$TITLE" =~ [^a-zA-Z0-9\ ] ]]; then
echo "VALID"
else
echo "INVALID"
fi
I'd invert the logic. Test for invalid characters and echo a warning if at least one is present:
if [[ "$TITLE" =~ [^a-zA-Z0-9] ]]; then
echo "INVALID"
fi
With that said, your original check worked for me, so you probably need to provide more context (i.e. a larger portion of your script).
why cant we use alnum
[[ 'mystring123' =~ [:alnum:] ]] && echo "ok" || echo "no"
the nominated answer is wrong. Because it doesn't check to the end of the string. also it's inverted. as the conditional says: "if the start of the string is valid characters then echo invalid"
[[ $TITLE =~ ^[a-zA-Z0-9_-]{3,20}$ ]] && ret="VALID" || ret="INVALID"
echo $ret

In bash how do I match the a string of the form [SOME_ALPHA_NUM_WORD]?

I have tried stuff like =~ "\[[A-Za-z0-9]+\]" which I would expect would work but doesnt. I also tried "[[A-Za-z0-9]+]" and "\[[:alnum:]+\]". What am I doing wrong? Sample line I want to match: [RTNUT18] (I am iterating through a file, some lines are of this form)
This is my code snippet:
while read line;
do
if [[ $line =~ "^\[[A-Za-z0-9]+\]$" ]]; then
echo match
else
echo no match
fi
done < $1
This is a sample file:
[RBPAT7]
Whatever=foo,bla
Otherline
RRR
and I run:
./script.sh thefile.txt
I am not getting a hit on the [RBPAT7] line at all
Stuff like that isn't enough. You must use it in [[.
$ [[ [foo] =~ ^\[[A-Za-z0-9]+\]$ ]] ; echo $?
0
EDIT:
Unlike test, [[ does not need quotes around its arguments. Your code matches nothing, since you can't have " before the beginning of the line, nor " after the end. Remove the quotes.