Which regexp would find my datetime format? - regex

I created a regexp for a date with time, formatted like this:
25.06.19 / 16:30
I created the following regexp:
^[0-9]{2}\.[0-9]{2}\.[0-9]{4}\s.\/\s.[0-9]{2}\:[0-9]{2}$
The result should deliver the above match, but it doesn't. Can you help me fix my regexp?

Your year can consist of two or four digits, use
regex='^[0-9]{2}\.[0-9]{2}\.[0-9]{2}([0-9]{2})?[[:space:]]*/[[:space:]]*[0-9]{2}:[0-9]{2}$'
# ^^^^^^^^^^^
Bash demo:
s="25.06.19 / 16:30"
regex='^[0-9]{2}\.[0-9]{2}\.[0-9]{2}([0-9]{2})?[[:space:]]*/[[:space:]]*[0-9]{2}:[0-9]{2}$'
if [[ "$s" =~ $regex ]]; then
echo "Matched!"
fi;
Note I also replaced \s with [[:space:]] that should have wider support in Bash, and / does not need escaping as there are no regex delimiters here, and / is not a special regex metacharacter. Besides, the dots in \s.\/\s. are suspicious, I understand you wanted to match any 0 or more whitespaces, so I replaced . with *.

Related

Regex: Interpret groups with the same content as a single group

I have the following situation:
^ID[ \t]*=[ \t]*('(.*)'|"(.*)")
The group with content
01
when a file contains:
ID = '01'
is the second.
Instead if:
ID = "01"
is the third.
This cause me a problem with perl:
perl -lne "print \$2 if /^ID[ \t]*=[ \t]*('(.*)'|\"(.*)\")/" test.txt
That if group with single quotes matches then i get the output:
01
Otherwise i obtain an empty string.
How do I make both the case of single quotes and double quotes interpret as group two in regex?
You can print both the groups, as they can never match at the same time:
perl -lne "print \$2.\$3 if /^ID[ \t]*=[ \t]*('(.*)'|\"(.*)\")/"
or remember the quotes in $2 and use $3 for the quoted string, followed by the remembered quote:
perl -lne "print \$3 if /^ID[ \t]*=[ \t]*((['\"])(.*)\2)/"
This looks like it's a good candidate for the branch reset operator, (?|...). Either capture in that alternation is $1, and the branch-reset construct takes care of the grouping without capturing anything:
use v5.10;
my #strings = qw( ID='01' ID="01" ID="01');
foreach ( #strings ) {
say $1 if m/^ID \h* = \h* (?|'(\d+)'|"(\d+)") /x
}
You need v5.10, and that allows you to use the \h to match horizontal whitespace.
But, you don't need to repeat the pattern. You can match the quote and match that same quote later. A relative backreference, \g{N}, can do that:
use v5.10;
my #strings = qw( ID='01' ID="01" ID="01' );
foreach ( #strings ) {
say $2 if m/^ID \h* = \h* (['"])(\d+)\g{-2} /x
}
I prefer that \g{-2} because I usually don't have to update numbering if I change the pattern to include more captures before the thing if refers to.
And, since this is a one-liner, don't type out the literal quotes (as ikegami has already shown):
say $2 if m/^ID \h* = \h* ([\x22\x27])(\d+)\g{-2} /x
Only one of the two will be defined, so simply use the one that's defined.
perl -nle'print $1//$2 if /^ID\h*=\h*(?:\x27(.*)\x27|"(.*)")/' # \x27 is '
You could also use a backreference.
perl -nle'print $2 if /^ID\h*=\h*(["\x27])(.*)\1/'
Note that all the provided solutions including these two fail (leave the escape sequence in) if you have something like ID="abc\"def" or ID="abc\ndef", assuming those are supported.
Thank you #brian_d_foy:
perl -lne "print \$1 if /^ID\h*=\h*(?|'(.*)'|\"(.*)\")/" test.txt
Or better:
perl -lne "print \$2 if /^ID\h*=\h*(['\"])(.*)\1/" test.txt
I have decided of accept also
ID = 01 #Followed by one or more horizontal spaces.
In addition to:
ID = "01" #Followed by one or more horizontal spaces.
And:
ID = '01' #Followed by one or more horizontal spaces.
Therefore I have adopted a super very complex solution:
perl -lne "print \$2 if /^ID\h*=\h*(?|(['\"])(.*)\1|(([^\h'\"]*)))\h*(?:#.*)?$/" test.txt
I have done a fusion of your both solutions #brian_d_foy. The double round parentheses are used to bring the second alternative to the second group as well, otherwise it would be the first group and without even the "branch reset operator", it would be group 4.
I after have enhanced the sintax in a function
function parse-config {
command perl -pe "s/\R/\n/g" "$2" | command perl -lne "print \$2 if /^$1\h*=\h*(?|(['\"])(.*)\1|(([^\h'\"]*)))\h*(?:#.*)?$/"
return $?
}
parse-config "ID" "test.txt"
In this:
"s/\R/\n/g"
I replace all CRLF or CR or LF, in LF. \R is a super powerfull special character present from perl v5.10. Apparently this version of perl has introduced several fundamental innovations for me. The chance would have that I needed all (\h \R ?|). Whoever did the update was brilliant.
I needed this because the dollar "$" at the end of the line did not work, because there was a "\r" before the "Linux end of line" "\n".

Regex for matching directory path depth

I'm trying to regex match a specific folder depth of varying path strings using bash scripts.
I want to match two levels down from packages eg. /packages/[any-folder-name]/[any-folder-name]/.
So for example for /packages/frontend/react-app/src/index.ts I want to match /packages/frontend/react-app/ and store it in an array
array=()
string="/packages/frontend/react-app/src/index.ts"
[[ $string =~ packages/.*/.*/ ]] && array+=(${BASH_REMATCH[0]}
almost works, but it returns /packages/frontend/react-app/src/
I've been going round in circles on this for a few hours now.
Probably this:
#!/usr/bin/env bash
array=()
string="/packages/frontend/react-app/src/index.ts"
[[ $string =~ packages/([^/]+/){2} ]] && array+=("${BASH_REMATCH[0]}")
Explanation:
[^/]+ match any non-empty string that does not contain a /.
$ echo '/packages/frontend/react-app/src/index.ts' | sed 's|^\(/packages/[^/]*/[^/]*/\).*$|\1|'
/packages/frontend/react-app/
Explanation:
use sed regex:
|^...$| - match the whole string, anchor at beginning and end
^\(...\) - capture stuff inside parenthesis
/packages/ - expect this text
[^/]*/ - followed by anything non-slash, followed by a slash
[^/]*/ - rinse and repeat
.* - discard anything after the captured text
|\1| - replace matched text with the captured text
Looks like a glob expression would be enough.
# enable nullglob to get an empty array if there is no match
shopt -s nullglob
array=(/packages/*/*/)
echo ${array[*]}

Bash: Regex: Matching if a string is a remote host for rsync

I thought I had a good regex line below that works with tests I did in Regexbuddy, but doesn't seem to work in bash.
I need someone with much better knowledge of regex than me to help me out. ;)
The point is to do a basic test as to whether a string contains a remote host for rsync. So we're testing for something valid like username#host:/ or username#host:~/ (and I also assume ./ ?) ...
#!/bin/bash
test="foo#bar:/here/path/"
regex='^([\w-_.]*)#([\w-_.:]*):[~./]'
if [[ "${test}" =~ "${regex}" ]]; then
echo "yes, remote host"
else
echo "no, local"
fi
# filter for remote host by regex
# ^ begin at start of line, ( [ match underscore, word & number chars, dashes, fullstops ] in * repetition ) until first # and then ( [ match underscore, word & number chars, dashes, fullstops, and colons] in * repetition ) until : and then at least [ ~ or . or / )
# so someone#host-whatever-123.com:/path/ will match
# someone_here123#192.168.0.1:~/path/ will match
# blah123.user#2001:db8:85a3:8d3:1319:8a2e:370:7348:./path/ will match
# user#wherever:path/ will not, and /anything#starting.com:with/a/slash will not match
# etc
Any ideas?
There are several issues:
The $regex variable should not be quoted after =~, or non-regex string matching gets enabled
\w should not be used, use [:alnum:] POSIX character class instead, that matches letters and digits
- in bracket expressions should be the first or last character to be correctly parsed as a hyphen.
I'd also use + (1 or more) quantifier instead of * in the pattern to enforce at least one char before and after #.
You can use
test="foo#bar:/here/path/"
regex='^([[:alnum:]_.-]+)#([[:alnum:]_.:-]+):[~./]'
if [[ "$test" =~ $regex ]]; then
echo "yes, remote host"
else
echo "no, local"
fi
See Bash demo.
Bash doesn't support character classes like \w, have a look here https://tldp.org/LDP/abs/html/x17129.html section POSIX Character Classes
In your case try replacing \w with [:alnum:] and you have to remove the quotes on the right side of =~.
I modified it a bit but this works for me:
[[ "foo#bar:/here/path/" =~ ^[-_\.[:alnum:]]+#[-_\.[:alnum:]]+:[~./] ]] && \
echo "Remote" || \
echo "Local"

Unexpected behavior in a regular expression in bash

I created this regular expression and tested it out successfully
https://regex101.com/r/a7qvuw/1
However the regular expression behaves differently in this bash code that I wrote
# Splitting by colon
IFS=';' read -ra statements <<< $contents
# Splitting by the = sign.
regex="\s*(.*?)\s*=\s*(.*)\b"
for i in "${statements[#]}"; do
if [[ $i =~ $regex ]]; then
key=${BASH_REMATCH[1]}
params=${BASH_REMATCH[2]}
echo "KEY: $key| PARAMS: $params"
fi
done
The variable $contents has the text as is used in the link. The problem is that the $key has a space at its end, while the regular expression I tried matches the words without the space.
I get output like this:
KEY: vclock_spec | PARAMS: clk_i 1 1
As you can see there is a space between vclock_spec and the | which should not be there. What am I doing wrong?
As #Cyrus mentioned, lazy quantifiers are not supported in Bash regex. They act as greedy ones.
You may fix your pattern to work in Bash using
regex="\s*([^=]*\S)\s*=\s*(.*)\b"
^^^^^^^
The [^=]* matches zero or more symbols other then = and \S matches any non-whitespace (maybe [^\s=] will be more precise here as it matches any char but a whitespace (\s) and =, but it looks like regex="\s*([^=]*[^\s=])\s*=\s*(.*)\b" yields the same results).

Regex doesn't match with the lines in txt file

I'm reading the lines from a text file and check if it matches with the regex that I've created or not.
But it always says that your regex didn't match but the regex tool shows that it matches with my regular explanation.
while read line
do
name=$line
BRANCH_REGEX="\d{10}\-[^_]*\_\d{13}"
if [[ $name =~ $BRANCH_REGEX ]];
then
echo "BRANCH '$name' matches BRANCH_REGEX '$BRANCH_REGEX'"
else
echo "BRANCH '$name' DOES NOT MATCH BRANCH_REGEX '$BRANCH_REGEX'"
fi
done < names.txt
names.txt includes lines for example :
9000999484-suchocka_1416578464908
9000989944-schubertk_1416582641605
9001026342-extbeerfelde_1416586904787
9000687045-sturmjo_1416573131629
9001059401-extburghartswieser_1416405627982
9000806302-PDPUPDATE_1357830207068
9000658783-PDPUPDATE_1360445087963
BRANCH_REGEX="/\d{10}\-[^_]*\_\d{13}"
↑
Remove the leading /, none of your lines begin with it.
Also note that _ doesn't need to be escaped, you can write _ instead of \_.
Change your regex to:
BRANCH_REGEX="[0-9]{10}-[^_]*_[0-9]{13}"
Or else:
BRANCH_REGEX="[[:digit:]]{10}-[^_]*_[[:digit:]]{13}"
As BASH regex doesn't support \d property. There is no need to escape hyphens.