Regex for matching directory path depth - regex

I'm trying to regex match a specific folder depth of varying path strings using bash scripts.
I want to match two levels down from packages eg. /packages/[any-folder-name]/[any-folder-name]/.
So for example for /packages/frontend/react-app/src/index.ts I want to match /packages/frontend/react-app/ and store it in an array
array=()
string="/packages/frontend/react-app/src/index.ts"
[[ $string =~ packages/.*/.*/ ]] && array+=(${BASH_REMATCH[0]}
almost works, but it returns /packages/frontend/react-app/src/
I've been going round in circles on this for a few hours now.

Probably this:
#!/usr/bin/env bash
array=()
string="/packages/frontend/react-app/src/index.ts"
[[ $string =~ packages/([^/]+/){2} ]] && array+=("${BASH_REMATCH[0]}")
Explanation:
[^/]+ match any non-empty string that does not contain a /.

$ echo '/packages/frontend/react-app/src/index.ts' | sed 's|^\(/packages/[^/]*/[^/]*/\).*$|\1|'
/packages/frontend/react-app/
Explanation:
use sed regex:
|^...$| - match the whole string, anchor at beginning and end
^\(...\) - capture stuff inside parenthesis
/packages/ - expect this text
[^/]*/ - followed by anything non-slash, followed by a slash
[^/]*/ - rinse and repeat
.* - discard anything after the captured text
|\1| - replace matched text with the captured text

Looks like a glob expression would be enough.
# enable nullglob to get an empty array if there is no match
shopt -s nullglob
array=(/packages/*/*/)
echo ${array[*]}

Related

Which regexp would find my datetime format?

I created a regexp for a date with time, formatted like this:
25.06.19 / 16:30
I created the following regexp:
^[0-9]{2}\.[0-9]{2}\.[0-9]{4}\s.\/\s.[0-9]{2}\:[0-9]{2}$
The result should deliver the above match, but it doesn't. Can you help me fix my regexp?
Your year can consist of two or four digits, use
regex='^[0-9]{2}\.[0-9]{2}\.[0-9]{2}([0-9]{2})?[[:space:]]*/[[:space:]]*[0-9]{2}:[0-9]{2}$'
# ^^^^^^^^^^^
Bash demo:
s="25.06.19 / 16:30"
regex='^[0-9]{2}\.[0-9]{2}\.[0-9]{2}([0-9]{2})?[[:space:]]*/[[:space:]]*[0-9]{2}:[0-9]{2}$'
if [[ "$s" =~ $regex ]]; then
echo "Matched!"
fi;
Note I also replaced \s with [[:space:]] that should have wider support in Bash, and / does not need escaping as there are no regex delimiters here, and / is not a special regex metacharacter. Besides, the dots in \s.\/\s. are suspicious, I understand you wanted to match any 0 or more whitespaces, so I replaced . with *.

why this shell script could not work?

My script like this:
#!/bin/env bash
monitor_sock_raw1=socket,id=hmqmondev,port=55919,host=127.0.0.1,nodelay,server,nowait
msock=${monitor_sock_raw1##,port=}
msock=${msock%%,host=}
echo $msock
I expect get '55919', but the result is:
socket,id=hmqmondev,port=55919,host=127.0.0.1,nodelay,server,nowait
Why and how to fix this bug?
For a simple requirement like this, bash supports a regex (See bash ERE support) approach using the ~ operator which you can use it to match the port string and match the digits after it.
#!/bin/env bash
var='monitor_sock_raw1=socket,id=hmqmondev,port=55919,host=127.0.0.1,nodelay'
if [[ $var =~ ^.*port=([[:digit:]]+).*$ ]]; then
printf "%s\n" "${BASH_REMATCH[1]}"
fi
The captured group from the regex is stored in the array BASH_REMATCH from which the first element after index 0 i.e. index 1 contains the value of 1st captured group.
RegEx Demo
You need to add wildcards or the patterns wont match. The pattern needs to match the whole start or end of the text.
msock=${monitor_sock_raw1##*,port=}
msock=${msock%%,host=*}
Script that solves your problem.
#!/bin/bash
monitor_sock_raw1="socket,id=hmqmondev,port=55919,host=127.0.0.1,nodelay,server,nowait"
msock=(${monitor_sock_raw1##*port=})
echo ${msock%%,*}

How to match a value starting with '/' or '//' in shell script using regex?

In my shell script I am trying to match a value using regex. I have two conditions 1) if the value starts with single forward slash and 2) when it starts with double forward slash.
In my hive hql script I use the following and it works for the conditions I mentioned above:
1) "^/{1}[^/]$"
2) "^/{2}"
But I am unable to get it working in shell script. Below is the code for the single forward slash match.
value=/ABCD222
RGX="^/{1}[^/]$"
if [[ $value =~ $RGX ]]; then
echo success
else
echo failure
fi
I even tried using slash twice but it doesn't work. Please help.
Let's examine your regex ^/{1}[^/]$:
^ start of line
/{1} slash exactly once
[^/] any character except slash exactly once
$ end of line
So your regex matches only lines with exactly 2 characters (excluding nl).
To match all lines containg at least 2 characters, starting with a single slash, simply omit $. The simplest way to achieve this is ^/[^/]. You can omit {1}, is default.
You can try following code:
value='/ABCD222'
rgx = '^/[^/]*$'
if [ $value =~ $rgx ]; then
echo success
else
echo failure
fi

Regex doesn't match with the lines in txt file

I'm reading the lines from a text file and check if it matches with the regex that I've created or not.
But it always says that your regex didn't match but the regex tool shows that it matches with my regular explanation.
while read line
do
name=$line
BRANCH_REGEX="\d{10}\-[^_]*\_\d{13}"
if [[ $name =~ $BRANCH_REGEX ]];
then
echo "BRANCH '$name' matches BRANCH_REGEX '$BRANCH_REGEX'"
else
echo "BRANCH '$name' DOES NOT MATCH BRANCH_REGEX '$BRANCH_REGEX'"
fi
done < names.txt
names.txt includes lines for example :
9000999484-suchocka_1416578464908
9000989944-schubertk_1416582641605
9001026342-extbeerfelde_1416586904787
9000687045-sturmjo_1416573131629
9001059401-extburghartswieser_1416405627982
9000806302-PDPUPDATE_1357830207068
9000658783-PDPUPDATE_1360445087963
BRANCH_REGEX="/\d{10}\-[^_]*\_\d{13}"
↑
Remove the leading /, none of your lines begin with it.
Also note that _ doesn't need to be escaped, you can write _ instead of \_.
Change your regex to:
BRANCH_REGEX="[0-9]{10}-[^_]*_[0-9]{13}"
Or else:
BRANCH_REGEX="[[:digit:]]{10}-[^_]*_[[:digit:]]{13}"
As BASH regex doesn't support \d property. There is no need to escape hyphens.

bash regex to parse text of the form +incdir+<dir1>+<dir2>

I have an input string of the form +incdir+<dir1>+<dir2>, where <dir1> and <dir2> are directory names. I want to parse this using a bash regex and have the values of the directories inside BASH_REMATCH[1], [2], ...
Here is what I tried:
function match {
if [[ "$1" =~ \+incdir(\+.*)+ ]]; then
for i in $(seq $(expr ${#BASH_REMATCH[#]} - 1)); do
echo $i ":" ${BASH_REMATCH[$i]}
done
else
echo "no match"
fi
}
This works for match +incdir+foo, but doesn't for match +incdir+foo+bar, because it does greedy matching and it outputs +foo+bar. There isn't any non-greedy matching in bash as regex in bash expression mentions so I tried the following for the pattern: \+incdir(\+[^+]*)+ but this just gives me +bar.
The way I would interpret the regex is the following: find the beginning +incdir, then match me at least one group starting with a + followed by as many characters as you can find that are not +. When you hit a + this is the start of the next group. I guess my reasoning is incorrect.
Does anyone have any idea what I'm doing wrong?
Using only bash builtins (but NOT regular expressions, which are the wrong tool for this job):
match() {
[[ $1 = *+incdir+* ]] || return # noop if no +incdir present
IFS=+ read -r -a pieces <<<"${1#*+incdir+}" # read everything after +incdir+
# into +-separated array
for idx in "${!pieces[#]}"; do # iterate over keys in array
echo "$idx: ${pieces[$idx]}" # ...and emit key/value pairs
done
}
$ match "yadda yadda +incdir+foo+bar+baz"
0: foo
1: bar
2: baz