I have 2 questions about regex in bash expression.
1.non-greedy mode
local temp_input='"a1b", "d" , "45"'
if [[ $temp_input =~ \".*?\" ]]
then
echo ${BASH_REMATCH[0]}
fi
The result is
"a1b", "d" , "45"
In java
String str = "\"a1b\", \"d\" , \"45\"";
Matcher m = Pattern.compile("\".*?\"").matcher(str);
while (m.find()) {
System.out.println(m.group());
}
I can get the result below.
"a1b"
"d"
"45"
But how can I use non-greedy mode in bash?
I can understand why the \"[^\"]\" works.
But I don't understand why does the \".?\" do not work.
2.global matches
local temp_input='abcba'
if [[ $temp_input =~ b ]]
then
#I wanna echo 2 b here.
#How can I set the global flag?
fi
How can I get all the matches?
ps:I only wanna use regex.
For the second question, sorry for the confusing.
I want to echo "b" and "b", not count "b".
Help!
For your first question, an alternative is this:
[[ $temp_input =~ \"[^\"]*\" ]]
For your second question, you can do this:
temp_input=abcba
t=${temp_input//b}
echo "$(( (${#temp_input} - ${#t}) / 1 )) b"
Or for convenience place it on a function:
function count_matches {
local -i c1=${#1} c2=${#2}
if [[ c2 -gt 0 && c1 -ge c2 ]]; then
local t=${1//"$2"}
echo "$(( (c1 - ${#t}) / c2 )) $2"
else
echo "0 $2"
fi
}
count_matches abcba b
Both produces output:
2 b
Update:
If you want to see the matches you can use a function like this. You can also try other regular expressions not just literals.
function find_matches {
MATCHES=()
local STR=$1 RE="($2)(.*)"
while [[ -n $STR && $STR =~ $RE ]]; do
MATCHES+=("${BASH_REMATCH[1]}")
STR=${BASH_REMATCH[2]}
done
}
Example:
> find_matches abcba b
> echo "${MATCHES[#]}"
b b
> find_matches abcbaaccbad 'a.'
> echo "${MATCHES[#]}"
ab aa ad
Your regular expression matches the string starting with the first quotation mark (before ab) and ending with the last quotation mark (after ef). This is greedy, even though your intention was to use a non-greedy match (*?). It seems that bash uses POSIX.2 regular expression (check your man 7 regex), which does not support a non-greedy Kleene star.
If you want just "ab", I'd suggest a different regular expression:
if [[ $temp_input =~ \"[^\"]*\" ]]
which explicitly says that you don't want quotation marks inside your strings.
I don't understand what you mean. If you want to find all matches (and there are two occurrences of b here), I think you cannot do it with a single ~= match.
This is my first post, and I am very amateur at bash, so apologies if I haven't understood the question, but I wrote a function for non-greedy regex using entirely bash:
regex_non_greedy () {
local string="$1"
local regex="$2"
local replace="$3"
while [[ $string =~ $regex ]]; do
local search=${BASH_REMATCH}
string=${string/$search/$replace}
done
printf "%s" "$string"
}
Example invocation:
regex_non_greedy "all cats are grey and green" "gre+." "white"
Which returns:
all cats are white and white
Related
I really don't know what I'm doing.
In variable a, I want to find the first appearance of '$' after the first appearance of 'Bitcoin', and print everything after it until the first newline.
I have the following code:
a = 'something Bitcoin something againe $jjjkjk\n againe something'
if [[ $a =~ .*Bitcoin.*[\$](.*).* ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
In this example I would like to get 'jjjkjk'. All I get is 'no'.
This code might be really flawed, I have no experience in this. I think tho the problem might be with the '$' sign. Please help!
Properly handle newlines in bash with ANSI-C Quoting -- \n sequences become literal newlines.
a=$'something Bitcoin something againe $jjjkjk\n againe something'
regex=$'Bitcoin[^$]*[$]([^\n]+)'
[[ $a =~ $regex ]] && declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="Bitcoin something againe \$jjjkjk" [1]="jjjkjk")'
# .................................................................^^^^^^^^^^^^
To verify the contents contain newlines:
$ printf '%s' "$regex" | od -c
0000000 B i t c o i n [ ^ $ ] * [ $ ] (
0000020 [ ^ \n ] + )
0000026
Here is a working version of your code:
a='something Bitcoin something againe $jjjkjk\n againe something'
r=".*Bitcoin.*[\$]([^\n]*).*"
if [[ $a =~ $r ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
You need to find 'Bitcoin' then find a '$' after it, no matter what is between, so you should use .* operator, also when you want to capture some text until a specific char, the best way is using [^](not) operator, in your case: [^\n] this means capture everything until \n.
Also you had an issue with your variable declaration. a = "..." is not valid, the spaces are waste. so the correct one is 'a=".."`.
Using double quotation is wrong too, this will replaces dollar sign with an empty variable (evaluation)
Using bash, I can check to see if the value of a variable matches a regular expression. However, I cannot find a way of returning the part that matched. Is this possible?
For example take $test as test="123456-name-goes-here.1.2.3-something.zip" The part I'd like to return is 1.2.3-something.
With the code below, I can successfully match $test, but I don't know where to go from here.
[[ $test =~ ([0-9]\.[0-9](\.[0-9])?(\.[0-9])?)(-[a-z-]*)? ]] && echo "matched"
The $BASH_REMATCH[0] will contain the value you need:
test="123456-name-goes-here.1.2.3-something.zip"
reg="[0-9]\.[0-9](\.[0-9])?(\.[0-9])?(-[a-z-]*)?"
if [[ $test =~ $reg ]]; then
echo ${BASH_REMATCH[0]};
fi
See the IDEONE demo
See this cheatsheet:
Regular expression captures will be available in $BASH_REMATCH, ${BASH_REMATCH[1]}, ${BASH_REMATCH[2]}, etc.
That means that the whole match value is stored in ${BASH_REMATCH} with Index = 0, and the subsequent items cotnain submatches that were captured with (...) (capturing groups).
I am writing a script and I want to check a variable for a format. This is the function I use :
check_non_numeric() {
#re='^\".*\"$'
re='\[^\]*\'
if ! [[ $1 =~ $re ]] ; then
echo "'$1' is not a valid format - \"[name]\" "
exit 1
fi
}
I want the regular expression to match a string with anything but quotation mark inside and quotation marks around it ("a" or "string" or "dsfo!^$**#"). The problem is that these regular expressions that I came up with dont work for me. I have used a very similar function to check if a variable is an integer or float and it worked there. Could you please tell me what the regular expression in question should be ?
Thank you very much
I'm assuming you meant you want to match anything that is not a string surrounded by quotes. It's easier to match use your regex to match, and the bash-test to "not" match it-- if that's not clear, use !. Here's a couple of ways to do it.
if [[ ! $(expr "$string" : '\".*\"') -gt 0 ]]; then echo "expr good"; fi
if [[ ! "$string" =~ \".*\" ]]; then echo "test good"; fi
Make sure you quote your variable you are testing with expr (which is there for edification purposes only).
As you want to match anything except string with quotation marks, you just target the quotation mark:
re='["]'
if [[ ! $1 =~ $re ]] ; then
Actually you don't need regex for this. Globbing will be enough:
if [[ ! $1 = *\"* ]]; then
...
fi
Your regex is very, very far off. \[ matches a literal left square bracket, and ^ (outside a character class) matches beginning of line.
Something like '^"[^"]*"' should work, if that's really what you want.
However, I kind of doubt that. In order to pass a value in literal double quotes, you would need something like
yourprogram '"value"'
or
yourprogram "\"value\""
which I would certainly want to avoid if I were you.
I want to search within a string for a regex expression, I am doing this as follows:
Suppose target="string with 1234 in"
if [[ "$target" =~ "{4}\d" ]] then
val=...
fi
I want to capture the regex found i.e. I want val=1234. what's the best way to do this?
If you're only looking for the 4-digit number, you'd use /\b\d{4}\b/
If you're wanting to capture the val= part as well, simply prefix that, so you'd have /\bval=\d{4}\b/
And lastly, if you have a case where you don't know what the val of the val= part is, replace val with \w+, then you'd have /\b\w+=\d{4}\b/
Do you mean sth. like val = s/\D(\d\d\d\d)\D/$1/ , or val = /\d{4}/ ?
The matched parts are held in the .sh.match array in ksh:
if [[ $target == *{4}(\d)* ]]; then
val=${.sh.match[1]}
# do something with $val
fi
Demo:
$ target="string with 1234 in"
$ [[ $target == *{4}(\d)* ]] && echo "${.sh.match[1]}"
1234
Trying to compare input to a file containing alert words,
read MYINPUT
alertWords=( `cat "AlertWordList" `)
for X in "${alertWords[#]}"
do
# the wildcards in my expression do not work
if [[ $MYINPUT =~ *$X* ]]
then
echo "#1 matched"
else
echo "#1 nope"
fi
done
The =~ operator deals with regular expressions, and so to do a wildcard match like you wanted, the syntax would look like:
if [[ $MYINPUT =~ .*$X.* ]]
However, since this is regex, that's not needed, as it's implied that it could be anywhere in the string (unless it's anchored using ^ and/or $, so this should suffice:
if [[ $MYINPUT =~ $X ]]
Be mindful that if your "words" happen to contain regex metacharacters, then this might do strange things.
I'd avoid =~ here because as FatalError points out, it will interpret $X as a regular expression and this can lead to surprising bugs (especially since it's an extended regular expression, so it has more special characters than standard grep syntax).
Instead, you can just use == because bash treats the RHS of == as a globbing pattern:
read MYINPUT
alertWords=($(<"AlertWordList"))
for X in "${alertWords[#]}"
do
# the wildcards in my expression do work :-)
if [[ $MYINPUT == *"$X"* ]]
then
echo "#1 matched"
else
echo "#1 nope"
fi
done
I've also removed a use of cat in your alertWords assignment, as it keeps the file reading inside the shell instead of spawning another process to do it.
If you want to use patterns, not regexes for matching, you can use case:
read MYINPUT
alertWords=( `cat "AlertWordList" `)
for X in "${alertWords[#]}"
do
# the wildcards in my expression do not work
case "$MYINPUT" in
*$X* ) echo "#1 matched" ;;
* ) echo "#1 nope" ;;
esac
done