I'm trying to use the following if statement with regex but having some trouble:
if ! [[ $myText =~ ^[A-Z0-9]{5},[[:space:]]?[\ A-Za-z0-9]+$ ]]; then
echo "ERROR"
continue
fi
The objective is to allow YHG6D,test and YHG6D, test but not YHG6D, test (2 spaces and beyond).
I thought using the ? after [[:space:]] or " " would do the trick by limiting the space to either none or one as I want to do, but it doesn't work because I presume having 2 spaces also meets that match criterion. If so, how do I limit the match literally such that if there is no space or one space after the comma it runs the code but if there's more than one space after the comma it throws an error?
And also, I was advised to add the "\" in front of the [A-Za-z0-9] expression but have no idea what it does and if it is necessary.
Your problem is the \ in [\ A-Za-z0-9]+ which matches a space. If you remove it, the regex matches zero or one space between the comma and the word:
^[A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$
as tested in https://regex101.com, this matches YHG6D,test and YHG6D, test, but it doesn't match YHG6D, test or YHG6D, test.
Also, you don't need the continue in your if statement:
if ! [[ $myText =~ [A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$ ]]; then
echo "ERROR";
fi
Here is are some tests:
$ bash
$ myText="YHG6D,test"; if ! [[ $myText =~ [A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$ ]]; then echo "ERROR"; fi
$ myText="YHG6D, test"; if ! [[ $myText =~ [A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$ ]]; then echo "ERROR"; fi
$ myText="YHG6D, test"; if ! [[ $myText =~ [A-Z0-9]{5},[[:space:]]?[A-Za-z0-9]+$ ]]; then echo "ERROR"; fi
ERROR
$
The $ at the beginning of each line is the bash prompt, so copy the command from the myTest=... and paste it into a bash terminal to test.
Related
I've got the following text file which contains:
12.3-456, test
test test test
If the line contains xx.x-xxx, then I want to print the line out. (X's are numbers)
I think I have the correct regex and have tested it here:
http://regexr.com/3clu3
I have then used this in a bash script but the line containing the text is not printed out.
What have I messed up?
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ $line =~ /\d\d.\d-\d\d\d,/g ]]; then
echo $line
fi
done < input.txt
You need to use [0-9] instead of a \d in Bash regex. No regex delimiters are necessary, and the global flag is not necessary either. Also, you can contract it a bit using limiting quantifiers (like {3} that will match 3 occurrences of the pattern next to it). Besides, a dot matches any character in regex, so you need to escape it if you want to match a literal dot symbol.
Use
regex="[0-9]{2}\.[0-9]-[0-9]{3},"
if [[ $line =~ $regex ]]
...
This works:
#!/bin/bash
#regex="/\d\d.\d-\d\d\d,/g"
regex="[0-9\.\-]+\, [A-Za-z]+"
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line
if [[ $line =~ $regex ]]; then
echo "match"
fi
done
regex is [any of 0-9, '.', '-'] followed by ',' followed by alphachars. This could be refined in a number of ways - e.g. explicit places before/ after '-'.
Testing indicates:
$ ./sqltrace2.sh < input.txt
12.3-456, test
match
123.3-456, test
match
12.3-456,
test test test
test test test
I'm trying to write a bash script that reads in a file skips commented lines.
I have:
#!/bin/bash
### read file
IFS=$'\r\n'
while read line; do
match_pattern="^[:space:]*#"
if [[ "$line" =~ $match_pattern ]];
then
echo "#####"
continue
fi
#semicolons and commas are removed everywhere...
array+=($line)
done <list.txt
And this skips lines that begin with a "#", but not lines that begin with spaces and then a pound. ie: "^\s+#"
I get the same results using [:blank:].
How should this regular expression be written?
You are missing brackets in your pattern:
match_pattern="^[[:space:]]*#"
does what you want.
This works for me:
while read line; do
match_pattern="^\s*#"
if [[ "$line" =~ $match_pattern ]]; then
echo "#####"
fi
done
Input
One
#Two
#Three
# Four
# Five
####Six
Output
One
#Two
#####
#Three
#####
# Four
#####
# Five
#####
####Six
#####
Doesn't start with a hash
Doesn't start with infinite space followed by a hash
(?!^#|^\s+#)^.*$
Yields this result from the code in your question:
IFS=$'\r\n'
while read line; do
match_pattern="^[:space:]*#"
if [[ "$line" =~ $match_pattern ]];
then
echo "#####"
continue
fi
array+=($line)
done <list.txt
It will match lines which look like this though:
while read line; do #while loop
[:space:] is a bracket expression that will match any of the characters :, a, c, e, p, s.
[[:space:]] is a bracket expression containing a character class: it will match a whitespace character.
$ s=" # x"
$ [[ $s =~ ^[:blank:]*# ]] && echo match || echo no match
no match
$ [[ $s =~ ^[[:blank:]]*# ]] && echo match || echo no match
match
bash's extended patterns can handle this as well
$ shopt -s extglob
$ [[ $s == *([[:blank:]])#* ]] && echo match || echo no match
match
I'm sure this is a simple oversight, but I don't see it, and I'm not sure why this regex is matching more than it should:
#!/bin/bash
if [[ $1 =~ ([0-9]+,)+[0-9]+ ]]; then
{
echo "found list of jobs"
}
fi
This is with input that looks like "02,48,109,309,183". Matching that is fine
However, it is also matching input that has no final number and is instead "09,28,34,"
Should the [0-9]+ at the end dictate the final character be at least 1+ numbers?
You have to add markers for beginning (^) and end ($) of input:
#!/bin/bash
if [[ $1 =~ ^([0-9]+,)+[0-9]+$ ]]; then
echo "found list of jobs"
fi
Otherwise it matches 09,28,34, because it matches from 0 until 4, ignoring everything that follows.
Your regex only has to match somewhere in the string, not from start to end. To make it match the whole string, use the ^ and $ meta-characters:
#!/bin/bash
if [[ $1 =~ ^([0-9]+,)+[0-9]+$ ]]; then
echo "found list of jobs"
fi
(Incidentally, you don't need { and } to define a block in Bash, that's the job of then and fi)
I'm a bit confused: how come a regular expression works perfectly well using grep from command line and as I use the exactly same regular expression in a bash conditional statement, it doesn't work at all?
I'd like to match all the strings containing letters only, therefore my regular expression is:
^[a-zA-Z]\+$.
Please will you help sort this out?
Here's the snippet from my bash code
if ! [[ "$1" =~ '^[a-zA-z]+$' ]] ; then
echo "Error: illegal input string." >&2
exit 1
fi
Don't escape the +.
This works for me:
$ [[ "Abc" =~ ^[a-zA-Z]+$ ]] && echo "it matches"
$ it matches
Also, you don't need single quotes around the regex. The following works for me:
if ! [[ "$1" =~ ^[a-zA-z]+$ ]] ; then
echo "Error: illegal input string." >&2
exit 1
fi
if [[ "$len" -lt "$MINLEN" && "$line" =~ \[*\.\] ]]
This is from Advanced bash scripting guide "Example 10-1. Inserting a blank line between paragraphs in a text file"
As I understand this matches "any string or a dot character". Right ?
It matches zero or more open bracket characters (\[*), followed by a period and a close square bracket (\.\]). Note that it only requires that a match exist somewhere in "$line", not that the whole string match. Here's a demo:
$ showmatch() { [[ "$1" =~ \[*\.\] ]] && echo "matched: '${BASH_REMATCH[0]}'" || echo "no match"; }
$ showmatch "abc[.]def"
matched: '[.]'
$ showmatch "abc.]def"
matched: '.]'
$ showmatch "abc[[[[[[[.]def"
matched: '[[[[[[[.]'
$ showmatch "abc[[[[[[[xyz.]def"
matched: '.]'
$ showmatch "abc[[[[[[[.xyz]def"
no match
...and I'm pretty sure that's not what it's supposed to be doing in that example script.
It means any string ended with dot inside bracers, for example: [.]
[abc.]
Update: +1 to Gordon Davisson, who has summed it up pretty well... so I've redacted my original post
In brief: You can test the result of a bash regex match like this:
[[ "[*.]" =~ \[*\.\] ]] ; echo ${BASH_REMATCH[0]}