Regex/Shell - how to match all except those with specific pattern - regex

I need a regex in shell to match all strings except those with specific pattern.
My specific pattern can be variable, i.e. (i|I)[2 digits numbers](u|U)[2 digits numbers] in every string should not match.
For example :
Some.text.1234.text => should match
Some.text.1234.i10u20.text => shouldn't match
Some.text.1234.I01U02.text => shouldn't match
Some.text.1234.i83U23.text => shouldn't match

You can try with that:
^(?!.*[tuTU]\d{2}).*$
Demo
Explanation:
^ start of a line
?!.* negative look ahead
[tuTU]\d{2} check if there exists such character following 2 digits only
.*$ if previous condition is negative then match entire string to end of string $

The Bash script checking if a string matches a regex or not can look like
f='It_is_your_string_to_check';
if [[ "${f^^}" =~ I[0-9]{2}U[0-9]{2} ]]; then
echo "$f is invalid";
else
echo "$f is valid"
fi;
Here, "${f^^}" turns the string into uppercase (so as not to use (U|u) and (I|i)), and then =~ operator triggers a regex check here since the pattern on the right side is not quoted. You may play it safe and define the regex pattern with a separate single-quoted string variable and use
rx='I[0-9]{2}U[0-9]{2}'
if [[ "${f^^}" =~ $rx ]]; then ...
See a Bash demo online:
s='Some.text.1234.text
Some.text.1234.i10u20.text
Some.text.1234.I01U02.text
Some.text.1234.i83U23.text'
for f in $s; do
if [[ "${f^^}" =~ I[0-9]{2}U[0-9]{2} ]]; then
echo "$f is invalid";
else
echo "$f is valid"
fi;
done;
Output:
Some.text.1234.text is valid
Some.text.1234.i10u20.text is invalid
Some.text.1234.I01U02.text is invalid
Some.text.1234.i83U23.text is invalid

Related

bash IF not matching variable that contains regex numbers

DPHPV = /usr/local/nginx/conf/php81-remi.conf;
I am unable to figure out how to match a string that contains any 2 digits:
if [[ "$DPHPV" =~ *"php[:digit:][:digit:]-remi.conf"* ]]
You are not using the right regex here as * is a quantifier in regex, not a placeholder for any text.
Actually, you do not need a regex, you may use a mere glob pattern like
if [[ "$DPHPV" == *php[[:digit:]][[:digit:]]-remi.conf ]]
Note
== - enables glob matching
*php[[:digit:]][[:digit:]]-remi.conf - matches any text with *, then matches php, then two digits (note that the POSIX character classes must be used inside bracket expressions), and then -rem.conf at the end of string.
See the online demo:
#!/bin/bash
DPHPV='/usr/local/nginx/conf/php81-remi.conf'
if [[ "$DPHPV" == *php[[:digit:]][[:digit:]]-remi.conf ]]; then
echo yes;
else
echo no;
fi
Output: yes.

Mix of regex and non-regex in bash if-statement

Inside of my $foo variable I have this data (please pay close attention to the .s and ,s):
,example.com,de.wikipedia.org,reddit,stackoverflow.com.,amazon.,
I am trying to write an if statement in bash that basically works like this:
if [[ "${foo}" =~ *','[a-z0-9]','* || "${foo}" =~ *','[a-z0-9]'.,'* ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
It would echo Invalid input detected since reddit and amazon. are in $foo.
If I change the contents of $foo to be:
,example.com,de.wikipedia.org,www.reddit.com,stackoverflow.com.,amazon.com,
Then it would echo OK.
I am using bash 3.2.57(1)-release on OS X 10.11.6 El Capitan.
Try:
if [[ $foo =~ ,[a-z0-9]*, || $foo =~ ,[a-z0-9]*\., ]]; then
echo "Invalid input detected"
else
echo "OK"
fi
Notes:
=~ is a regular expression operator. The right-hand-side needs to be a regular expression, not a glob.
, is not a shell-active character. Thus, it does not need any special quoting.
[a-z0-9] matches exactly one alphanumeric. Since we want to allow for more any number, use [a-z0-9]*
In regular expressions, ','* matches zero or more commas. This is not what you want. One might write ,.* which, because, . is a wildcard, matches a comma followed by zero or more of anything. Since the regex is not anchored to the end, adding a final .* makes no difference.
Inside of [[...]] there is no word splitting. So shell variables do not the double-quoting that need elsewhere.
Note that, in [a-z0-9], the exact characters that match a-z or 0-9 depend on the collation order in the locale.

RegEx : How can I extract a certain part and modify it?

I'd like to extract a certain part of a string and modify it by using a regular expression.
A given string is TestcaseVzwPerformance_8_2_1_4_1_FDD2.
I'd like to extract the part 8_2_1_4_1 from the string and then replace the underscores _ with dots . So the expected result needs to be 8.2.1.4.1.
The numbers and length of the given string can be different.
For example,
Given string // Expected result
TestcaseVzwCqi_3_9_Test2 // 3.9
TestcaseVzwSvd1xRttAclr_6_6_2_3 // 6.6.2.3
TestcaseVzwCsiFading_9_4_1_1_1_FDD4 // 9.4.1.1.1
Here is my RegEx:
((?:\D{0,}_)(\d(_\d)*)(.*))
The numbered capturing group - $2 - contains 8_2_1_4_1 but with underscores.
Can I replace the underscores with dots?
It needs to be done in one RegEx and a Replace.
regex cannot modify, for example with sed
echo TestcaseVzwPerformance_8_2_1_4_1_FDD2 |
sed -E 's/[^_]*_(([_0-9])+)_.*/\1/;s/_/./g'
8.2.1.4.1
If you have a Bash string, you can use a Bash regex to capture and Bash parameter expansions to replace:
$ s="TestcaseVzwSvd1xRttAclr_6_6_2_3"
$ [[ $s =~ ^[^_]*_([[:digit:]_]+)_* ]] && tmp=${BASH_REMATCH[1]//_/.} && echo "${tmp%.}"
6.6.2.3
Which can be in a loop:
while read -r line; do
if [[ $line =~ ^[^_]*_([[:digit:]_]+)_* ]]; then
tmp=${BASH_REMATCH[1]//_/.}
echo "\"$line\" => ${tmp%.}"
fi
done <<< 'Given string
TestcaseVzwCqi_3_9_Test2
TestcaseVzwSvd1xRttAclr_6_6_2_3
TestcaseVzwCsiFading_9_4_1_1_1_FDD4'
Prints:
"TestcaseVzwCqi_3_9_Test2" => 3.9
"TestcaseVzwSvd1xRttAclr_6_6_2_3" => 6.6.2.3
"TestcaseVzwCsiFading_9_4_1_1_1_FDD4" => 9.4.1.1.1
You can use the same loop to process a file.
If you have a file, you may as well use gawk:
$ awk 'BEGIN{FPAT="_[[:digit:]_]+"}
/_[[:digit:]]/ {sub(/^_/,"", $1); sub(/_$/,"",$1); gsub(/_/,".",$1); print $1}' file
3.9
6.6.2.3
9.4.1.1.1

Return RegEx match in bash

Using bash, I can check to see if the value of a variable matches a regular expression. However, I cannot find a way of returning the part that matched. Is this possible?
For example take $test as test="123456-name-goes-here.1.2.3-something.zip" The part I'd like to return is 1.2.3-something.
With the code below, I can successfully match $test, but I don't know where to go from here.
[[ $test =~ ([0-9]\.[0-9](\.[0-9])?(\.[0-9])?)(-[a-z-]*)? ]] && echo "matched"
The $BASH_REMATCH[0] will contain the value you need:
test="123456-name-goes-here.1.2.3-something.zip"
reg="[0-9]\.[0-9](\.[0-9])?(\.[0-9])?(-[a-z-]*)?"
if [[ $test =~ $reg ]]; then
echo ${BASH_REMATCH[0]};
fi
See the IDEONE demo
See this cheatsheet:
Regular expression captures will be available in $BASH_REMATCH, ${BASH_REMATCH[1]}, ${BASH_REMATCH[2]}, etc.
That means that the whole match value is stored in ${BASH_REMATCH} with Index = 0, and the subsequent items cotnain submatches that were captured with (...) (capturing groups).

Getting exact match for pattern in bash

I am having trouble with bash matching exactly a pattern. Say for example I am only wanting to matching letters before my file extension like this "test.bam", but in the case a number is included like, "t1st.bam" I get this output: "st".
hello="t1est.bam"
re="([a-zA-Z]+)\.bam"
if [[ $hello =~ $re ]]; then
result=${BASH_REMATCH[1]}
else
echo "unable to parse string"
fi
echo "$result"
What I would like it to do is not to match the pattern at all if a non-alpha character is provided and go into the 'else' block.Thanks
If you want the match to start at the beginning of the string, add the ^ anchor:
re='^([a-zA_Z_]+)\.bam'