how to parse commit message into variables using bash? - regex

I using bash and I have string (a commit message)
:sparkles: feat(xxx): this is a commit
and I want to divide it into variables sections:
emoji=:sparkles:
type=feat
scope=xxx
message=this is a commit
I try to use grep, but the regex is not return what I need (for example the "type") and anyway how to paste it into variables?
echo ":sparkles: feat(xxx): this is a commit" | grep "[((.*))]"

With bash version >= 3, a regex and an array:
x=":sparkles: feat(xxx): this is a commit"
[[ "$x" =~ ^(:.*:)\ (.*)\((.*)\):\ (.*)$ ]]
echo "${BASH_REMATCH[1]}"
echo "${BASH_REMATCH[2]}"
echo "${BASH_REMATCH[3]}"
echo "${BASH_REMATCH[4]}"
Output:
:sparkles:
feat
xxx
this is a commit
From man bash:
BASH_REMATCH: An array variable whose members are assigned by the =~ binary operator to the [[ conditional command. The element
with index 0 is the portion of the string matching the entire regular expression. The element with index n is the
portion of the string matching the nth parenthesized subexpression. This variable is read-only.

Related

RegEx : How can I extract a certain part and modify it?

I'd like to extract a certain part of a string and modify it by using a regular expression.
A given string is TestcaseVzwPerformance_8_2_1_4_1_FDD2.
I'd like to extract the part 8_2_1_4_1 from the string and then replace the underscores _ with dots . So the expected result needs to be 8.2.1.4.1.
The numbers and length of the given string can be different.
For example,
Given string // Expected result
TestcaseVzwCqi_3_9_Test2 // 3.9
TestcaseVzwSvd1xRttAclr_6_6_2_3 // 6.6.2.3
TestcaseVzwCsiFading_9_4_1_1_1_FDD4 // 9.4.1.1.1
Here is my RegEx:
((?:\D{0,}_)(\d(_\d)*)(.*))
The numbered capturing group - $2 - contains 8_2_1_4_1 but with underscores.
Can I replace the underscores with dots?
It needs to be done in one RegEx and a Replace.
regex cannot modify, for example with sed
echo TestcaseVzwPerformance_8_2_1_4_1_FDD2 |
sed -E 's/[^_]*_(([_0-9])+)_.*/\1/;s/_/./g'
8.2.1.4.1
If you have a Bash string, you can use a Bash regex to capture and Bash parameter expansions to replace:
$ s="TestcaseVzwSvd1xRttAclr_6_6_2_3"
$ [[ $s =~ ^[^_]*_([[:digit:]_]+)_* ]] && tmp=${BASH_REMATCH[1]//_/.} && echo "${tmp%.}"
6.6.2.3
Which can be in a loop:
while read -r line; do
if [[ $line =~ ^[^_]*_([[:digit:]_]+)_* ]]; then
tmp=${BASH_REMATCH[1]//_/.}
echo "\"$line\" => ${tmp%.}"
fi
done <<< 'Given string
TestcaseVzwCqi_3_9_Test2
TestcaseVzwSvd1xRttAclr_6_6_2_3
TestcaseVzwCsiFading_9_4_1_1_1_FDD4'
Prints:
"TestcaseVzwCqi_3_9_Test2" => 3.9
"TestcaseVzwSvd1xRttAclr_6_6_2_3" => 6.6.2.3
"TestcaseVzwCsiFading_9_4_1_1_1_FDD4" => 9.4.1.1.1
You can use the same loop to process a file.
If you have a file, you may as well use gawk:
$ awk 'BEGIN{FPAT="_[[:digit:]_]+"}
/_[[:digit:]]/ {sub(/^_/,"", $1); sub(/_$/,"",$1); gsub(/_/,".",$1); print $1}' file
3.9
6.6.2.3
9.4.1.1.1

preg_match_all equivalent for BASH?

I have a string like this
foo:collection:indexation [options] [--] <text> <text_1> <text_2> <text_3> <text_4>
And i want to use bash regex to get an array or string that I can split to get this in order to check if the syntax is correct
["text", "text_1", "text_2", "text_3", "text_4"]
I have tried to do this :
COMMAND_OUTPUT=$($COMMAND_HELP)
# get the output of the help
# regex
ARGUMENT_REGEX="<([^>]+)>"
GOOD_REGEX="[a-z-]"
# get all the arguments
while [[ $COMMAND_OUTPUT =~ $ARGUMENT_REGEX ]]; do
ARGUMENT="${BASH_REMATCH[1]}"
# bad syntax
if [[ ! $ARGUMENT =~ $GOOD_REGEX ]]; then
echo "Invalid argument '$ARGUMENT' for the command $FILE"
echo "Must only use characters [a-z:-]"
exit 5
fi
done
But the while does not seem to be appropriate since I always get the first match.
How can I get all the matches for this regex ?
Thanks !
The loop doesn't work because every time you're just testing the same input string against the regexp. It doesn't know that it should start scanning after the match from the previous iteration. You'd need to remove the part of the string up to and including the previous match before doing the next test.
A simpler way is to use grep -o to get all the matches.
$COMMAND_HELP | grep -o "$ARGUMENT_REGEX" | while read ARGUMENT; do
if [[ ! $ARGUMENT =~ $GOOD_REGEX ]]; then
echo "Invalid argument '$ARGUMENT' for the command $FILE"
echo "Must only use characters [a-z:-]"
exit 5
fi
done
Bash doesn't have this directly, but you can achieve a similar effect with a slight modification.
string='foo...'
re='<([^>]+)>'
while [[ $string =~ $re(.*) ]]; do
string=${BASH_REMATCH[2]}
# process as before
done
This matches the regex we want and also everything in the string after the regex. We keep shortening $string by assigning only the after-our-regex portion to it on every iteration. On the last iteration, ${BASH_REMATCH[2]} will be empty so the loop will terminate.

In Bash, how do you find and crop a string around a wildcard pattern

I want to crop a string around a wildcard (or a pattern using a wildcard) in Bash, preferably using parameter expressions or grep, anything but sed if it's possible. And then get that wildcard in a variable.
Example of string:
DESERT=pie-cake_berry_cream-sirup
And I have a pattern with a wildcard:
_*_
The pattern will match with "_berry_" on my string. I want to run a bash command over my string, and return "berry" if I use this particular pattern.
Just use BASH_REMATCH to access the captured group:
if [[ $DESERT =~ _(.*)_ ]]; then
echo ${BASH_REMATCH[1]}
fi
This says: hey, take the variable $DESERT and capture whatever is placed in between _ and _. If there is such match, the result is captured in the special variable $BASH_REMATCH.
So in your example:
$ DESERT=pie-cake_berry_cream-sirup
$ if [[ $DESERT =~ _(.*)_ ]]; then echo ${BASH_REMATCH[1]}; fi
Returns
berry
From man bash - Bash variables:
BASH_REMATCH
An array variable whose members are assigned by the ‘=~’ binary
operator to the [[ conditional command (see Conditional Constructs).
The element with index 0 is the portion of the string matching the
entire regular expression. The element with index n is the portion of
the string matching the nth parenthesized subexpression. This variable
is read-only.

is there any named regular expression capture for grep?

i'd like to know if its possible to get named regular expression with grep -P(linux bash) from a non formatted string? well.. from any string
For example:
John Smith www.website.com john#website.com jan-01-2001
to capture as
$name
$website
$email
$date
but it seems I cant pass any variables from output?
echo "www.website.com" | grep -Po '^(www\.)?(?<domain>.+)$' | echo $domain
has no output
no. grep is a process. you are talking about environment propagation from child to parent. that's forbidden.
instead, you can do
DATA=($your_line)
then take name=DATA[0] so and forth.
or another way using awk:
eval "`echo $your_line | awk '
function escape(s)
{
gsub(/'\''/,"'\''\"'\''\"'\''", s);
s = "'\''"s"'\''";
return s;
}
{
print "name="escape($1);
print "family_name="escape($2);
print "website="escape($3);
print "email="escape($4);
print "date="escape($5);
}'`"
the sense here is to propagate the info via stdout and eval it in the parent environment.
notice that, here, escape function will escape any string correctly such that nothing will be interpreted wrongly(like the evil of quotes).
following is the output from my jessie:
name='John'
family_name='Smith'
website='www.website.com'
email='john#website.com'
date='jan-01-2001'
if the family name is O'Reilly, the eval result will still be correct:
name='John'
family_name='O'"'"'Reilly'
website='www.website.com'
email='john#website.com'
date='jan-01-2001'
Grep is an independent command-line utility; it does not run inside of bash. So it couldn't create bash variables even if it wanted to.
However, bash has a regular expression matcher built-in. It's not a perl-compatible regex matcher, so it doesn't implement named captures. (To be precise, it matches Posix extended regular expressions, the same as grep -E.) But it does implement numbered captures.
You do regular expression matches with the =~ operator inside of the [[ ... ]] compound command syntax. If the regular expression matches, then the expression succeeds, and the captures are inserted into the array variable BASH_REMATCH. ${BASH_REMATCH[0]} will be the entire matched substring, and the remaining elements, starting with ${BASH_REMATCH[1]}, will be the individual captures in order.
For example:
$ url=www.example.com
$ [[ $url =~ ^(www\.)?(.*) ]]
$ echo "${BASH_REMATCH[1]}"
www.
$ echo "${BASH_REMATCH[2]}"
example.com

Regular expression in Bash filter

i have this string
<div style='text-align:center;padding-top:6px;'>Displaying Result 1 - 10 of 10 Matching Services</div>
I need the number "10" after "of"
My Regex is now
if [[ "$WARNING" =~ "of.([0-9]*)" ]]
then
echo "OK: $WARNING"
else
echo "NOK: $WARNING"
fi
can anyone help me please?
You don't need to quote the rhs of =~.
You can use the BASH_REMATCH variable to get the desired value.
Try:
if [[ "$WARNING" =~ of.([0-9]*) ]]
then
echo "OK: $WARNING"
else
echo "NOK: $WARNING"
fi
echo "${BASH_REMATCH[1]}"
From the manual:
BASH_REMATCH
An array variable whose members are assigned by the =~ binary operator to the [[ conditional command (see Conditional Constructs).
The element with index 0 is the portion of the string matching the
entire regular expression. The element with index n is the portion of
the string matching the nth parenthesized subexpression. This variable
is read-only.
You don't need regular expressions. Just use bash's built-in parameter expansions:
$ x="<div style='text-align:center;padding-top:6px;'>Displaying Result 1 - 10 of 10 Matching Services</div>"
$ x="${x##*of }"
$ echo "${x%% *}"
10
this is another just for fun awk example, you can modify it to supply the WARNING
[[bash_prompt$]]$ cat log
<div style='text-align:center;padding-top:6px;'>Displaying Result 1 - 10 of 10 Matching Services</div>
[[bash_prompt$]]$ awk '/of [0-9]*/{l=gensub(/^.*of ([0-9]*).*$/,"\\1",1); if(l > 10) print "greater"; else print "smaller"}' log
smaller