Matching a regex expression and storing it - regex

I want to search within a string for a regex expression, I am doing this as follows:
Suppose target="string with 1234 in"
if [[ "$target" =~ "{4}\d" ]] then
val=...
fi
I want to capture the regex found i.e. I want val=1234. what's the best way to do this?

If you're only looking for the 4-digit number, you'd use /\b\d{4}\b/
If you're wanting to capture the val= part as well, simply prefix that, so you'd have /\bval=\d{4}\b/
And lastly, if you have a case where you don't know what the val of the val= part is, replace val with \w+, then you'd have /\b\w+=\d{4}\b/

Do you mean sth. like val = s/\D(\d\d\d\d)\D/$1/ , or val = /\d{4}/ ?

The matched parts are held in the .sh.match array in ksh:
if [[ $target == *{4}(\d)* ]]; then
val=${.sh.match[1]}
# do something with $val
fi
Demo:
$ target="string with 1234 in"
$ [[ $target == *{4}(\d)* ]] && echo "${.sh.match[1]}"
1234

Related

'$' in regexp in bash

I really don't know what I'm doing.
In variable a, I want to find the first appearance of '$' after the first appearance of 'Bitcoin', and print everything after it until the first newline.
I have the following code:
a = 'something Bitcoin something againe $jjjkjk\n againe something'
if [[ $a =~ .*Bitcoin.*[\$](.*).* ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
In this example I would like to get 'jjjkjk'. All I get is 'no'.
This code might be really flawed, I have no experience in this. I think tho the problem might be with the '$' sign. Please help!
Properly handle newlines in bash with ANSI-C Quoting -- \n sequences become literal newlines.
a=$'something Bitcoin something againe $jjjkjk\n againe something'
regex=$'Bitcoin[^$]*[$]([^\n]+)'
[[ $a =~ $regex ]] && declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="Bitcoin something againe \$jjjkjk" [1]="jjjkjk")'
# .................................................................^^^^^^^^^^^^
To verify the contents contain newlines:
$ printf '%s' "$regex" | od -c
0000000 B i t c o i n [ ^ $ ] * [ $ ] (
0000020 [ ^ \n ] + )
0000026
Here is a working version of your code:
a='something Bitcoin something againe $jjjkjk\n againe something'
r=".*Bitcoin.*[\$]([^\n]*).*"
if [[ $a =~ $r ]]; then
echo "${BASH_REMATCH[1]}"
else
echo "no"
fi
You need to find 'Bitcoin' then find a '$' after it, no matter what is between, so you should use .* operator, also when you want to capture some text until a specific char, the best way is using [^](not) operator, in your case: [^\n] this means capture everything until \n.
Also you had an issue with your variable declaration. a = "..." is not valid, the spaces are waste. so the correct one is 'a=".."`.
Using double quotation is wrong too, this will replaces dollar sign with an empty variable (evaluation)

Bash Regex comparison not working

keyFileName=$1;
for fileExt in "${validTypes[#]}"
do
echo $fileExt;
if [[ $keyFileName == *.$fileExt ]]; then
keyStatus="true";
fi
done;
I am trying to check the file extension of a file passed in against an array of multiple file extensions. However it doesn't seem to be working properly. Any help?
validTypes=(".txt" ".mp3")
keyFileName="$1"
for fileExt in "${validTypes[#]}"
do
echo $fileExt;
if [[ $keyFileName =~ ^.*$fileExt$ ]]; then
keyStatus="true";
echo "Yes"
fi
done;
Effectively, you could change your if statement to either:
if [[ $keyFileName == ?*$fileExt ]] # Glob pattern case, ? denotes single char
or:
if [[ $keyFileName =~ .*$fileExt ]] # Regex case, . denotes single char
Looping over the array to do a regex match on each element seems rather inefficient. You're using regex; it's easy to combine the expressions and avoid looping at all.
Mangling the array into a valid regex is not entirely trivial, though. Here's my attempt:
validTypes=('\.txt' '\.mp3')
fileExtRe=$(printf '|%s' "${validTypes[#]}"
# Trim off the first alternation, add parens and anchor
fileExtRe="(${fileExtRe#?})$"
if [[ $keyFileName =~ $fileExtRe ]]; then
:
Notice how the elements in validTypes are regular expressions now, with the dot escaped to only match a literal dot.

Return RegEx match in bash

Using bash, I can check to see if the value of a variable matches a regular expression. However, I cannot find a way of returning the part that matched. Is this possible?
For example take $test as test="123456-name-goes-here.1.2.3-something.zip" The part I'd like to return is 1.2.3-something.
With the code below, I can successfully match $test, but I don't know where to go from here.
[[ $test =~ ([0-9]\.[0-9](\.[0-9])?(\.[0-9])?)(-[a-z-]*)? ]] && echo "matched"
The $BASH_REMATCH[0] will contain the value you need:
test="123456-name-goes-here.1.2.3-something.zip"
reg="[0-9]\.[0-9](\.[0-9])?(\.[0-9])?(-[a-z-]*)?"
if [[ $test =~ $reg ]]; then
echo ${BASH_REMATCH[0]};
fi
See the IDEONE demo
See this cheatsheet:
Regular expression captures will be available in $BASH_REMATCH, ${BASH_REMATCH[1]}, ${BASH_REMATCH[2]}, etc.
That means that the whole match value is stored in ${BASH_REMATCH} with Index = 0, and the subsequent items cotnain submatches that were captured with (...) (capturing groups).

Bash regex to match substring with exact integer range

I need to match a string $str that contains any of
foo{77..93}
and capture the above substring in a variable.
So far I've got:
str=/random/string/containing/abc-foo78_efg/ # for example
if [[ $str =~ (foo[7-9][0-9]) ]]; then
id=${BASH_REMATCH[1]}
fi
echo $id # gives foo78
but this also captures ids outside of the target range (e.g. foo95).
Is there a way to restrict the regex to an exact integer range? (tried foo[77-93] but that doesn't work.
Thanks
If you want to use a regex, you're going to have to make it slightly more complex:
if [[ $str =~ foo(7[7-9]|8[0-9]|9[0-3]) ]]; then
id=${BASH_REMATCH[0]}
fi
Note that I have removed the capture group around the whole pattern and am now using the 0th element of the match array.
As an aside, for maximum compatibility with older versions of bash, I would recommend assigning the pattern to a variable and using in the test like this:
re='foo(7[7-9]|8[0-9]|9[0-3])'
if [[ $str =~ $re ]]; then
id=${BASH_REMATCH[0]}
fi
An alternative to using a regex would be to use an arithmetic context, like this:
if (( "${str#foo}" >= 77 && "${str#foo}" <= 93 )); then
id=$str
fi
This strips the "foo" part from the start of the variable so that the integer part can be compared numerically.
Sure is easy to do with Perl:
$ echo foo{1..100} | tr ' ' '\n' | perl -lne 'print $_ if m/foo(\d+)/ and $1>=77 and $1<=93'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93
Or awk even:
$ echo foo{1..100} | tr ' ' '\n' | awk -F 'foo' '$2>=77 && $2<=93
{print}'
foo77
foo78
foo79
foo80
foo81
foo82
foo83
foo84
foo85
foo86
foo87
foo88
foo89
foo90
foo91
foo92
foo93

regex in bash expression

I have 2 questions about regex in bash expression.
1.non-greedy mode
local temp_input='"a1b", "d" , "45"'
if [[ $temp_input =~ \".*?\" ]]
then
echo ${BASH_REMATCH[0]}
fi
The result is
"a1b", "d" , "45"
In java
String str = "\"a1b\", \"d\" , \"45\"";
Matcher m = Pattern.compile("\".*?\"").matcher(str);
while (m.find()) {
System.out.println(m.group());
}
I can get the result below.
"a1b"
"d"
"45"
But how can I use non-greedy mode in bash?
I can understand why the \"[^\"]\" works.
But I don't understand why does the \".?\" do not work.
2.global matches
local temp_input='abcba'
if [[ $temp_input =~ b ]]
then
#I wanna echo 2 b here.
#How can I set the global flag?
fi
How can I get all the matches?
ps:I only wanna use regex.
For the second question, sorry for the confusing.
I want to echo "b" and "b", not count "b".
Help!
For your first question, an alternative is this:
[[ $temp_input =~ \"[^\"]*\" ]]
For your second question, you can do this:
temp_input=abcba
t=${temp_input//b}
echo "$(( (${#temp_input} - ${#t}) / 1 )) b"
Or for convenience place it on a function:
function count_matches {
local -i c1=${#1} c2=${#2}
if [[ c2 -gt 0 && c1 -ge c2 ]]; then
local t=${1//"$2"}
echo "$(( (c1 - ${#t}) / c2 )) $2"
else
echo "0 $2"
fi
}
count_matches abcba b
Both produces output:
2 b
Update:
If you want to see the matches you can use a function like this. You can also try other regular expressions not just literals.
function find_matches {
MATCHES=()
local STR=$1 RE="($2)(.*)"
while [[ -n $STR && $STR =~ $RE ]]; do
MATCHES+=("${BASH_REMATCH[1]}")
STR=${BASH_REMATCH[2]}
done
}
Example:
> find_matches abcba b
> echo "${MATCHES[#]}"
b b
> find_matches abcbaaccbad 'a.'
> echo "${MATCHES[#]}"
ab aa ad
Your regular expression matches the string starting with the first quotation mark (before ab) and ending with the last quotation mark (after ef). This is greedy, even though your intention was to use a non-greedy match (*?). It seems that bash uses POSIX.2 regular expression (check your man 7 regex), which does not support a non-greedy Kleene star.
If you want just "ab", I'd suggest a different regular expression:
if [[ $temp_input =~ \"[^\"]*\" ]]
which explicitly says that you don't want quotation marks inside your strings.
I don't understand what you mean. If you want to find all matches (and there are two occurrences of b here), I think you cannot do it with a single ~= match.
This is my first post, and I am very amateur at bash, so apologies if I haven't understood the question, but I wrote a function for non-greedy regex using entirely bash:
regex_non_greedy () {
local string="$1"
local regex="$2"
local replace="$3"
while [[ $string =~ $regex ]]; do
local search=${BASH_REMATCH}
string=${string/$search/$replace}
done
printf "%s" "$string"
}
Example invocation:
regex_non_greedy "all cats are grey and green" "gre+." "white"
Which returns:
all cats are white and white