expr: Regex not detecting valid expression

expr: Regex not detecting valid expression - regex

I'm trying to create a script that automatically looks for plugged in devices and makes a compressed backup of it. However, I'm having trouble finding the correct way on how to use expr:
#!/bin/bash
MountText=`mount`
# Show result of regex search
expr "$MountText" : '\/dev\/(sd[^a])\d on [\S]+\/[\s\S]+? type'
The expression by itself is \/dev\/(sd[^a])\d on [\S]+\/[\s\S]+? type, and captures the device name (sd*), while excluding mounts relating to sda.
I drafted the regex on Regexr (regex shared in link), and used what mount dumped (gist).
For some reason, it only gives this odd error:
0
I looked around, and I found this SO question. It didn't help me too much, because now it's implying that expr didn't recognize the parentheses I used to capture the device, and it also believed that the expression didn't capture anything!
I'm really confused. What am I doing wrong?

A few things to note with expr:
The regular expression is implicitly anchored at the beginning of the string (as if you started the expression with ^).
The capture group is indicated with escaped parentheses, \(...\).
Only basic regular expressions are supported. Specifically, \s, \S, and +? are not supported.
The following will match the one device.
expr "$MountText" : '.*/dev/\(sd[^a]\)[[:digit:]] on '
Note that you don't need to use expr with bash, which has regular-expression matching built in.
regex='/dev/(sd[^a])[[:digit:]] on '
mount | while IFS= read -r line; do
[[ $line =~ $regex ]] && echo ${BASH_REMATCH[1]}
done

Related

Regex to match custom key pair not working in linux [duplicate]

I'm trying to do a tiny bash script that'll clean up the file and folder names of downloaded episodes of some tv shows I like. They often look like "[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE", and I basically just want to strip out that speedcd advertising bit.
It's easy enough to remove www.Speed.Cd, spaces, and dashes using regexp matching in BASH, but for the life of me, I cannot figure out how to include the brackets in a list of characters to be matched against. [- [] doesn't work, neither does [- \[], [- \\[], [- \\\[], or any number of escape characters preceding the bracket I want to remove.
Here's what I've got so far:
[[ "$newfile" =~ ^(.*)([- \[]*(www\.torrenting\.com|spastikustv|www\.speed\.cd|moviesp2p\.com)[- \]]*)(.*)$ ]] &&
newfile="${BASH_REMATCH[1]}${BASH_REMATCH[4]}"
But it breaks on the brackets.
Any ideas?
TIA,
Daniel :)
EDIT: I should probably note that I'm using "shopt -s nocasematch" to ensure case insensitive matching, just in case you're wondering :)
EDIT 2: Thanks to all who contributed. I'm not 100% sure which answer was to be the "correct" one, as I had several problems with my statement. Actually, the most accurate answer was just a comment to my question posted by jw013, but I didn't get it at the time because I hadn't understood yet that spaces should be escaped. I've opted for aefxx's as that one basically says the same, but with explanations :) Would've liked to put a correct answer mark on ormaaj's answer, too, as he spotted more grave issues with my expression.
Anyway, the approach I was using above, trying to match and extract the parts to keep and leave behind the unwanted ones is really not very elegant, and won't catch all cases, not even something really simple like "Some.Show.S07E14.720p.HDTV.X264-SOMEONE - [ www.Speed.Cd ]". I've instead rewritten it to match and extract just the unwanted parts and then do string replacement of those on the original string, like so (loop is in case there's multiple brandings):
# Remove common torrent site brandings, including surrounding spaces, brackets, etc.:
while [[ "$newfile" =~ ([[\ {\(-]*(www\.)?(torrentday\.com|torrenting\.com|spastikustv|speed\.cd|moviesp2p\.com|publichd\.org|publichd|scenetime\.com|kingdom-release)[]\ }\)-]*) ]]; do
newfile=${newfile//"${BASH_REMATCH[1]}"/}
done

Ok, this is the first time I've heard of the =~ operator but nevertheless here's what I found by trial and error:
if [[ $newfile =~ ^(.*)([-[:space:][]*(what|ever)[][:space:]-]*)(.*)$ ]]
^^^^^^^^^^ ^^^^^^^^^^
Looks strange but actually does work (just tested it).
EDIT
Quote from the Linux man pages regex(7):
To include a literal ] in the list, make it the first character (following a possible ^). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal aq-aq as the first endpoint of a range, enclose it in "[." and ".]" to make it a collating element (see below). With the exception of these and some combinations using aq[aq (see next paragraphs), all other special characters, including aq\aq, lose their special significance within a bracket expression.

Whenever you're doing a regex it's most compatible between Bash versions to put regexes in a variable even if you do manage to dodge all the pitfalls of putting them directly in a test expression. http://mywiki.wooledge.org/BashPitfalls#if_.5B.5B_.24foo_.3D.2BAH4_.27some_RE.27_.5D.5D
Your current regex looks like you're trying to optionally match anything preceding the opening bracket. I'd guess you're actually trying to save for example 3 and 4 from something like this:
$ shopt -s nocasematch
$ newfile='[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE'
$ re='^.*[-[:space:][]*(www\.torrenting\.com|spastikustv|www\.speed\.cd|moviesp2p\.com)[][:space:]-]*(.*)$'
$ [[ $newfile =~ $re ]]
$ declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE" [1]="www.Speed.Cd" [2]="Some.Show.S07E14.720p.HDTV.X264-SOMEONE")'

The basic issue is quite simple, if not obvious.
A BASH REGEX is totally unprotected (from the shell), and cannot be protected by "double quotes". This means that every literal space (and tab,etc) must be protected by a baskslash \ ... end of story. The rest is just a case of getting you regex to suit your needs.
One other thing; use [\ [] and []\ ] to match [ and ] respectively, within the range square-bracket construct (in this case along with a space).
example:
newfile="[ ]"
[[ "$newfile" =~ ^[\ []\ []\ ]$ ]] &&
echo YES ||
echo NO

You can try something like this (though you weren't 100% clear on what cases you are trying to filter:
newfile="[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE"
if [[ $newfile =~ ^(.*)([^a-zA-Z0-9.]*\[.*\][^a-zA-Z0-9.]*)(.*)$ ]]; then
newfile="${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
fi
echo $newfile
# Some.Show.S07E14.720p.HDTV.X264-SOMEONE
Its just stripping any non-alnum (and dot) characters outside the [], and anything within []

Perl Regex Command Line Issue

I'm trying to use a negative lookahead in perl in command line:
echo 1.41.1 | perl -pe "s/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g"
to get an incremented version that looks like this:
1.41.2
but its just returning me:
![0-9]+\.[0-9]+\.: event not found
i've tried it in regex101 (PCRE) and it works fine, so im not sure why it doesn't work here

In Bash, ! is the "history expansion character", except when escaped with a backslash or single-quotes. (Double-quotes do not disable this; that is, history expansion is supported inside double-quotes. See Difference between single and double quotes in Bash)
So, just change your double-quotes to single-quotes:
echo 1.41.1 | perl -pe 's/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g'
and voilà:
1.41.2

I'm guessing that this expression also might work:
([0-9.]+)\.([0-9]+)
Test
perl -e'
my $name = "1.41.1";
$name =~ s/([0-9.]+)\.([0-9]+)/$1\.2/;
print "$name\n";
'
Output
1.41.2
Please see the demo here.

If you want to "increment" a number then you can't hard-code the new value but need to capture what is there and increment that
echo "1.41.1" | perl -pe's/[0-9]+\.[0-9]+\.\K([0-9]+)/$1+1/e'
Here /e modifier makes it so that the replacement side is evaluated as code, and we can +1 the captured number, what is then substituted. The \K drops previous matches so we don't need to put them back; see "Lookaround Assertions" in Extended Patterns in perlre.
The lookarounds are sometimes just the thing you want, but they increase the regex complexity (just by being there), can be tricky to get right, and hurt efficiency. They aren't needed here.
The strange output you get is because the double quotes used around the Perl program "invite" the shell to look at what's inside whereby it interprets the ! as history expansion and runs that, as explained in ruakh's post.

As an alternate to lookahead, we can use capture groups, e.g. the following will capture the version number into 3 capture groups.
(\d+)\.(\d+)\.(\d+)
If you wanted to output the captured version number as is, it would be:
\1.\2.\3
And to just replace the 3rd part with the number "2" would be:
\1.\2.2
To adapt this to the OP's question, it would be:
$ echo 1.14.1 | perl -pe 's/(\d+)\.(\d+)\.(\d+)/\1.\2.2/'
1.14.2
$

Regular Expression: Capture character pattern zero or one positions from start of string

I have a series of entries, which can be represented by this string:
my_string="-D-K4_NNNN_M116_R1_001.gz _D-K4_NNNN_M56_R1_001.gz R-K4_NNNN_KQ9_R1_001.gz D-K4_NNNN_M987_R1_001.gz _R-K4_NNNN_M987_R1_001.gz"
For each entry, I need to return whether it starts with 'R' or 'D'. In order to do this, I need to ignore any character that comes before it. So, I wrote this regular expression:
for i in $my_string; do echo $i | grep -E -o "^*?[RD]"; done
However, this is only returning R or D for entries which are not preceded by a character.
How do I get this regex to return the R or D value in every case, whether there is a character in front of it or not? Keep in mind that the only thing which can be 'hard-coded' into the expression is the pattern to be matched.

It will be easy if you use sed:
sed -r 's/^.?([RD]).*$/\1/'
i.e.
for i in $my_string; do echo $i | sed -r 's/^.?([RD]).*$/\1/'; done
Update:
Here is what each part of the command means:
-r : extended regular expression, although I think -e should work but
turns out that during my testing, in order to use capturing group
in regex, I need -r. Anyway, not the main point
The script can be read as:
s/XXXX/YYYY/ : substitude from XXXX to YYYY
The "from" pattern (XXXX) means:
^ : start with
.? : zero or one occurence of any character
( : start of group
[RD] : either R or D
) : end of group (which means, the group will contains either R or D
.* : any number of any character
$ : till the end
the "to" pattern (YYYY):
\1 : content of capture group 1 in the "from" pattern (which is the "R or D")

Use a parameter expansion to remove the prefix before using grep:
for i in $my_string; do echo ${i#[^RD]} | grep -o "^[RD]" ; done
or use a simple test without grep (since you already know that each item starts with a R or a D):
for i in $my_string; do
if [[ $i =~ ^[^D]?R ]] ; then
echo 'R'
else
echo 'D'
fi
done

This regex worked in my local tests. Please have a try:
^.?[RD]
I can't think of a way to ONLY return the letter you want. I'd have a command after to detect whether the returned string is greater than 1 character long, and if so, I'd return only the second character.

I'm not 100% sure of what you are asking ( i understood you want to match only R and D at the beginning of a filename, whatever the character before it, if there is one ), but I think you should use lookbehind, in php you would do
$re = "/(?<=^\S|\s\S|\s)[RD]/";
$str = "-D-K4_NNNN_M116_R1_001.gz _D-K4_NNNN_M56_R1_001.gz R-K4_NNNN_KQ9_R1_001.gz D-K4_NNNN_M987_R1_001.gz _R-K4_NNNN_M987_R1_001.gz";
preg_match_all($re, $str, $matches);
You can see the output here.
To use Perl syntax in bash you must enable it. https://unix.stackexchange.com/questions/84477/forcing-bash-to-use-perl-regex-engine
You can test your regexp here if you need https://regex101.com/r/vV3nS3/1

This does it when using the modifier 'g' for global: (^| ).?(R|D)
See the regex101 here

Bash string replacement with regex repetition

I have a file: filename_20130214_suffix.csv
I'd like replace the yyyymmdd part in bash. Here is what I intend to do:
file=`ls -t /path/filename_* | head -1`
file2=${file/20130214/20130215}
#this will not work
#file2=${file/[0-9]{8}/20130215/}

The problem is that parameter expansion does not use regular expressions, but patterns or globs(compare the difference between the regular expression "filename_..csv" and the glob "filename_.csv"). Globs cannot match a fixed number of a specific string.
However, you can enable extended patterns in bash, which should be close enough to what you want.
shopt -s extglob # Turn on extended pattern support
file2=${file/+([0-9])/20130215}
You can't match exactly 8 digts, but the +(...) lets you match one or more of the pattern inside the parentheses, which should be sufficient for your use case.
Since all you want to do in this case is replace everything between the _ characters, you could also simply use
file2=${file/_*_/_20130215_}

[[ $file =~ ^([^_]+_)[0-9]{8}(_.*) ]] && file2="${BASH_REMATCH[1]}20130215${BASH_REMATCH[2]}"

BASH regexp matching - including brackets in a bracketed list of characters to match against?

I'm trying to do a tiny bash script that'll clean up the file and folder names of downloaded episodes of some tv shows I like. They often look like "[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE", and I basically just want to strip out that speedcd advertising bit.
It's easy enough to remove www.Speed.Cd, spaces, and dashes using regexp matching in BASH, but for the life of me, I cannot figure out how to include the brackets in a list of characters to be matched against. [- [] doesn't work, neither does [- \[], [- \\[], [- \\\[], or any number of escape characters preceding the bracket I want to remove.
Here's what I've got so far:
[[ "$newfile" =~ ^(.*)([- \[]*(www\.torrenting\.com|spastikustv|www\.speed\.cd|moviesp2p\.com)[- \]]*)(.*)$ ]] &&
newfile="${BASH_REMATCH[1]}${BASH_REMATCH[4]}"
But it breaks on the brackets.
Any ideas?
TIA,
Daniel :)
EDIT: I should probably note that I'm using "shopt -s nocasematch" to ensure case insensitive matching, just in case you're wondering :)
EDIT 2: Thanks to all who contributed. I'm not 100% sure which answer was to be the "correct" one, as I had several problems with my statement. Actually, the most accurate answer was just a comment to my question posted by jw013, but I didn't get it at the time because I hadn't understood yet that spaces should be escaped. I've opted for aefxx's as that one basically says the same, but with explanations :) Would've liked to put a correct answer mark on ormaaj's answer, too, as he spotted more grave issues with my expression.
Anyway, the approach I was using above, trying to match and extract the parts to keep and leave behind the unwanted ones is really not very elegant, and won't catch all cases, not even something really simple like "Some.Show.S07E14.720p.HDTV.X264-SOMEONE - [ www.Speed.Cd ]". I've instead rewritten it to match and extract just the unwanted parts and then do string replacement of those on the original string, like so (loop is in case there's multiple brandings):
# Remove common torrent site brandings, including surrounding spaces, brackets, etc.:
while [[ "$newfile" =~ ([[\ {\(-]*(www\.)?(torrentday\.com|torrenting\.com|spastikustv|speed\.cd|moviesp2p\.com|publichd\.org|publichd|scenetime\.com|kingdom-release)[]\ }\)-]*) ]]; do
newfile=${newfile//"${BASH_REMATCH[1]}"/}
done

Ok, this is the first time I've heard of the =~ operator but nevertheless here's what I found by trial and error:
if [[ $newfile =~ ^(.*)([-[:space:][]*(what|ever)[][:space:]-]*)(.*)$ ]]
^^^^^^^^^^ ^^^^^^^^^^
Looks strange but actually does work (just tested it).
EDIT
Quote from the Linux man pages regex(7):
To include a literal ] in the list, make it the first character (following a possible ^). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal aq-aq as the first endpoint of a range, enclose it in "[." and ".]" to make it a collating element (see below). With the exception of these and some combinations using aq[aq (see next paragraphs), all other special characters, including aq\aq, lose their special significance within a bracket expression.

Whenever you're doing a regex it's most compatible between Bash versions to put regexes in a variable even if you do manage to dodge all the pitfalls of putting them directly in a test expression. http://mywiki.wooledge.org/BashPitfalls#if_.5B.5B_.24foo_.3D.2BAH4_.27some_RE.27_.5D.5D
Your current regex looks like you're trying to optionally match anything preceding the opening bracket. I'd guess you're actually trying to save for example 3 and 4 from something like this:
$ shopt -s nocasematch
$ newfile='[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE'
$ re='^.*[-[:space:][]*(www\.torrenting\.com|spastikustv|www\.speed\.cd|moviesp2p\.com)[][:space:]-]*(.*)$'
$ [[ $newfile =~ $re ]]
$ declare -p BASH_REMATCH
declare -ar BASH_REMATCH='([0]="[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE" [1]="www.Speed.Cd" [2]="Some.Show.S07E14.720p.HDTV.X264-SOMEONE")'

The basic issue is quite simple, if not obvious.
A BASH REGEX is totally unprotected (from the shell), and cannot be protected by "double quotes". This means that every literal space (and tab,etc) must be protected by a baskslash \ ... end of story. The rest is just a case of getting you regex to suit your needs.
One other thing; use [\ [] and []\ ] to match [ and ] respectively, within the range square-bracket construct (in this case along with a space).
example:
newfile="[ ]"
[[ "$newfile" =~ ^[\ []\ []\ ]$ ]] &&
echo YES ||
echo NO

You can try something like this (though you weren't 100% clear on what cases you are trying to filter:
newfile="[ www.Speed.Cd ] - Some.Show.S07E14.720p.HDTV.X264-SOMEONE"
if [[ $newfile =~ ^(.*)([^a-zA-Z0-9.]*\[.*\][^a-zA-Z0-9.]*)(.*)$ ]]; then
newfile="${BASH_REMATCH[1]}${BASH_REMATCH[3]}"
fi
echo $newfile
# Some.Show.S07E14.720p.HDTV.X264-SOMEONE
Its just stripping any non-alnum (and dot) characters outside the [], and anything within []

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

expr: Regex not detecting valid expression - regex

Related

Regex to match custom key pair not working in linux [duplicate]

Perl Regex Command Line Issue

Regular Expression: Capture character pattern zero or one positions from start of string

Bash string replacement with regex repetition

BASH regexp matching - including brackets in a bracketed list of characters to match against?

Categories

Resources