I am trying to use regex in my shell script to find a substring.
Original string:
"relative-to="jboss.server.base.dir" scan-enabled="true" scan-interval="0""
Trying to find following substring:
"scan-enabled="true""
Code:
str="relative-to=\"jboss.server.base.dir\" scan-enabled=\"true\" scan-interval=\"0\""
reg='scan-enabled.*"'
[[ "$str" =~ $reg ]] && echo $BASH_REMATCH
but it is returning,
scan-enabled="true" scan-interval="0"
Can someone please help on how to search for a pattern involving double quotes using regex?
Bash version: 4.1.2(1)-release
If you want to match the entire expression scan-enabled="true" or scan-enabled="false" then you can try this:
reg='(scan-enabled=\"[^"]*\")'
[[ "$str" =~ $reg ]] && echo ${BASH_REMATCH[1]}
The variable ${BASH_REMATCH[1]} will match the first capture group match in the regular expression. In this case, the entire regular expression is contained in parenthesis, so this is the first capture group.
You can explore this regex at this link:
Regex101
Related
I need to collect all instances of files that match a pattern in an array.
Following grep pattern matches the filenames I want to match.
[a-zA-Z0-9]\+\([-_]\?[a-zA-Z0-9]\)*-\([0-9][0-9]\?.[0-9][0-9]\?.[0-9][0-9]\?\)
I had to escape some characters.
Problem is, that I would like to get more knowledgable how to to this with bash test alone, or the [[ $string =~ $pattern ]] syntax respectively.
How would the grep pattern from above have to be translated into $pattern in order for the [[ ... ]] to match the example string "ruby-gem2-2.1.13" ?
Like this. The dot . will match any string/character when used in a regex pattern, It needs to be escaped to remove it's special meaning.
#!/usr/bin/env bash
pattern='[a-zA-Z0-9]+([-_]?[a-zA-Z0-9])*-([0-9][0-9]?\.[0-9][0-9]?\.[0-9][0-9]?)'
string='ruby-gem2-2.1.13'
[[ "$string" =~ $pattern ]] && printf 'match\n'
I'm trying to capture BAR_BAR in FOO_FOO_FOO_BAR_BAR using the following regex: (?:.*?_){3}(.*).
The regular expression works when using a validator such as RegExr or regex101, but Bash doesn't return anything when I run:
text="FOO_FOO_FOO_BAR_BAR"
regex="(?:.*?_){3}(.*)"
[[ $text =~ $regex ]] && echo "${BASH_REMATCH[1]}"
When I run the following example regex it works perfectly (returning b):
text="abcdef"
regex="(b)(.)(d)e"
[[ $text =~ $regex ]] && echo "${BASH_REMATCH[1]}"
I'm new to using regex in Bash, what am I missing here?
POSIX regex does not support non-capturing groups and lazy quantifiers. Bash uses POSIX ERE, so you can use
text="FOO_FOO_FOO_BAR_BAR"
regex="([^_]*_){3}(.*)"
[[ $text =~ $regex ]] && echo "${BASH_REMATCH[2]}"
# => BAR_BAR
Here,
([^_]*_){3} - matches three occurrences (Group 1) of any zero or more chars other than _ followed with a _ char
(.*) - the rest of the string (Group 2).
As in this case a capturing group is used to serve a grouping construct at the beginning, "${BASH_REMATCH[2]}" holds the required value.
I am finding some wrong results here in bash. I dont know why can some one help to understand whats happening
$ [[ example.com/something =~ .*\.mp4\?.* ]] && echo matched2
matched2
My regex is ^.*\.mp4\?.* should only match something like example.com/file.mp4?size=large but how come its matching without any such pattern here.
I am using zsh
$ zsh --version
zsh 5.7.1 (x86_64-pc-linux-gnu)
The backslashes aren't part of the regular expression; the shell performs quote removal to generate the regular expression .*.mp4?.*, which matches any string containing 1 or more arbitrary characters, followed by mp and an optional 4. You need to escape the backslashes as well.
[[ example.com/something =~ .*\\.mp4\\?.* ]] && echo matched2
This will produces the desired regular expression .*\.mp4\?.*.
(Note that regular expression aren't anchored to the beginning or end of the input string, so \\.mp4\\? or '\.mp4\?' would suffice.)
I want the name from fielnames like this:
abc-dirk-alt.avi
and I only want the part between the -- (dirk)
The normal regex is -(.*?)- but i dont know how to write this in a bash script.
how can I do this?
You may use a -([^-]*)- regex ([^-]* matches zero or more chars other than -) to avoid using lazy quantifiers and extract Group 1 value via ${BASH_REMATCH[1]} after a match is found:
s="abc-dirk-alt.avi"
rx="-([^-]*)-"
if [[ $s =~ $rx ]]; then
echo ${BASH_REMATCH[1]};
fi
See the online Bash demo.
I have the following bash script which should be producing the output TEST
#!/bin/bash
test="TEST:THING - OBJECT_X"
if [[ $test =~ ^([a-zA-Z0-9]+)\:([a-zA-Z0-9]+)[A-Z\s\-_]+$ ]]; then
echo ${BASH_REMATCH[1]}
fi
In my regex tester the regular expression seems to be matching and capturing on the first and second groups:
https://regex101.com/r/kR1jM7/1
Any idea whats causing this?
\s is a PCRE construct not meaningful inside of ERE. Use [:space:] instead. Also, instead of escaping the dash as \-, move the - to the very end of the character set definition.
The following works:
[[ $test =~ ^([a-zA-Z0-9]+):([a-zA-Z0-9]+)[A-Z[:space:]_-]+$ ]]
That said, for compatibility with a wider range of bash releases, move the regex into a variable:
re='^([a-zA-Z0-9]+):([a-zA-Z0-9]+)[A-Z[:space:]_-]+$'
[[ $test =~ $re ]]
To use POSIX character classes more aggressively (and thus make your code more likely to work correctly across languages and locales), also consider:
re='^([[:alnum:]]+):([[:alnum:]]+)[[:upper:][:space:]_-]+$'