Can OR expressions be used in ${var//OLD/NEW} replacements? - regex

I was testing some string manipulation stuff in a bash script and I've quickly realized it doesn't understand regular expressions (at least not with the syntax I'm using for string operations), then I've tried some glob expressions and it seems to understand some of them, some not. To be specific:
FINAL_STRING=${FINAL_STRING//<title>/$(get_title)}
is the main operation I'm trying to use and the above line works, replacing all occurrences of <title> with $(get_title) on $FINAL_STRING... and
local field=${1/#*:::/}
works, assigning $1 with everything from the beginning to the first occurrence of ::: replaced by nothing (removed). However # do what I'd expect ^ to do. Plus when I've tried to use the {,,} glob expression here:
FINAL_STRING=${FINAL_STRING//{<suffix>,<extension>}/${SUFFIX}}
to replace any occurrence of <suffix> OR <extension> by ${SUFFIX} , it works not.
So I see it doesn't take regex and it also doesn't take glob patterns... so what Does it take? Are there any exhaustive listing of what symbols/expressions are understood by plain bash string operations (particularly substring replacement)? Or are *, ?, #, ##, % and %% the only valid stuff?
(I'm trying to rely only on plain bash, without calling sed or grep to do what I want)

The gory details can be found in the bash manual, Shell Expansions section. The complete picture is surprisingly complex.
What you're doing is described in the Shell Parameter Expansion section. You'll see that the pattern in
${parameter/pattern/string}
uses the Filename Expansion rules, and those don't include Brace Expansion - that is done earlier when processing the command line arguments. Filename expansion "only" does ?, * and [...] matching (unless extglob is set).
But parameter expansion does a bit more than just filename expansion, notably the anchoring you noticed with # or %.

bash does in fact handle regex; specifically, the [[ =~ ]] operator, which you can then assign to a variable using the magic variable $BASH_REMATCH. It's funky, but it works.
See: http://www.linuxjournal.com/content/bash-regular-expressions
Note this is a bash-only hack feature.
For code that works in shells besides bash as well, the old school way of doing something like this is indeed to use #/##/%/%% along with a loop around a case statement (which supports basic * glob matching).

Related

Need to match similarly titled filenames present in a variable using regex

I need to find similarly named strings that are passed as bash variables to a regex pattern in an interpolated string as a function argument. I'm new to Regex so am unsure what the best approach is.
Here's what I currently have:
bash_script.sh
findKeys(`grep --ignore-case ^${apiServiceName}$`)
However, some APIs have similar names, eg:
apiServiceNames = ['api-name', 'api-name-one', 'api-name-two']
The confusing bit is where to put \ (which characters to escape) as I need ${} for the variable but $^ opens and closes a string.
You don't need a regex match with grep or any third party tools. The native bash shell provides strong enough features for pattern matching. For e.g. the below construct when written as
if [[ $apiServiceName == api-name?(?(-)+(one|two)) ]]; then
printf '%s - is allowed\n' "$apiServiceName"
fi
The construct api-name?(?(-)+(one|two)) is an extended glob match syntax provided by the shell, that is enabled by default when [[..]] is used for pattern matching with the == operator. See more on extglob

reg exp: "if" and single "="

I need a regular expression (grep -e "__"), which matching all lines containing if and just one = (ignoring lines containing ==)
I tried this:
grep -e "if.*=[^=]"
but = is not a character class, so it doesn't work.
The problem is .* may contain an =.
I'd suggest
grep -e "if[^=]*=[^=]"
If your goal is to find lines of code with an if containing an erroneous assignment instead of a comparison, I'd suggest to use a linter (which would be based on a robust parser instead of just regexes). The linter to use depends on the language of the code, of course (for example I use this one in Javascript).

bash 2.0 string matching

I'm on GNU bash, version 2.05b.0(1)-release (2002). I'd like to determine whether the value of $1 is a path in one of those /path/*.log rules in, say, /etc/logrotate.conf. It's not my box so I can't upgrade it.
Edit: my real goal is given /path/actual.log answer whether it is already governed by logrotate or if all the current rules miss it. I wonder then if my script should just run logrotate -d /etc/logrotate.conf and see if /path/actual.log is in the output. This seems simpler and covers all the cases as opposed to this other approach.
But I still want to know how to approach string matching in Bash 2.0 in general...
the line itself can start with some white space or none
it's not a match if it is in a commented line (comments are lines where the first non white space char is #)
there can be one or more paths on the same line to the left of $1
like if $1 is /my/path/*.log and the line in question is
/other/path*.log /yet/another.log /my/path/*.log {
there can be one or more paths to the right as well
the line itself can end with { and even more white space or not
paths can be contained in double-quotes or not
it can be assumed that the file is a valid logrotate conf file.
I have something that seems to work in Bash 4 but not in Bash 2.05. Where can I go to read what Bash 2.0 supports? How would this matching be checked in Bash 2.0?
You can find a terse bash changelog here.
You'll see that =~, the regex-matching operator, didn't get introduced until version 3.0.
Thus, your best bet is to use a utility to perform the regex matching for you; e.g.:
if grep -Eq '<your-extended-regex>' <<<"$1"; then ...
grep -Eq '<your-extended-regex>' <<<"$1":
IS like [[ $1 =~ <your-extended-regex> ]] in Bash 3.0+ in that its exit code indicates whether the literal value of $1 matches the extended regex <your-extended-regex>
Note that Bash 3.1 changed the interpretation of the RHS to treat quoted (sub)strings as literals.
Also note that grep -E may support a slightly different regular-expression dialect.
is NOT like it in that the grep solution cannot return capture groups; by contrast, Bash 3.0+ provides the overall match and capture groups via special array variable ${BASH_REMATCH[#]}.

Bash script - variable expansion within backtick/grep regex string

I'm trying to expand a variable in my bash script inside of the backticks, inside of the regex search string.
I want the $VAR to be substituted.
The lines I am matching are like:
start....some characters.....id:.....some characters.....[variable im searching for]....some characters....end
var=`grep -E '^.*id:.*$VAR.*$' ./input_file.txt`
Is this a possibility?
It doesn't seem to work. I know I can normally expand a variable with "$VAR", but won't this just search directly for those characters inside the regex? I am not sure what takes precedence here.
Variables do expand in backticks but they don't expand in single quotes.
So you need to either use double quotes for your regex string (I'm not sure what your concern with that is about) or use both quotes.
So either
var=`grep -E "^.*id:.*$VAR.*$" ./input_file.txt`
or
var=`grep -E '^.*id:.*'"$VAR"'.*$' ./input_file.txt`
Also you might want to use $(grep ...) instead of backticks since they are the more modern approach and have better syntactic properties as well as being able to be nested.
You need to have the expression in double quotes (and, then, escape anything which needs to be escaped) in order for the variable to be interpolated.
var=$(grep -E "^.*id:.*$VAR.*\$" ./input_file.txt)
(The backslash is not strictly necessary here, but I put it in to give you an idea. Your real expression is perhaps more complex.)

Convention on how pass multiple regexps on command line

I'm writing a small command line utility that will need to take several arguments each of which can be a list of regular expressions. Is there a convention on how to do that?
Here is an example of what I have in mind
mycliutility -i regexp1,regexp2 -o regexp3,regexp4 somefilename
so I'm asking if for example a comma is good separtor for the regexpression and what/how to escape that if the separator need to appear in the regexp.
I'm expecting/hoping that the need to use comma (or whatever) in the regexp is rare so I would like to use a syntax that is as as light weight as possible.
Pointer to existing cli tools that take arguments like that are welcome.
EDIT
It is also possible that the regexps come from a Java Properties file and for this reason if would be 'cleaner' if multiple rexeps on the command line were treated as one (so the syntax would be the same on CLI and the properties file), see this example.properties file:
iexps = regexp1,regexp2
oexps = regexp3,regexp4
If the regexes are simple alternatives, a single regex of the form regex1|regex2 may well be the simplest solution.
If you need to parse comma-separated regexes out of the property file anyway, you'd better use the same syntax on the command line as well. Game over.
One thing I thought of, but don't really recommend, is to wrap the regex inside a pair of delimiters, outside of which a comma delimiter would be unambiguous. Slashes are popular as regex delimiters in sed, Awk, Perl, and PHP; but PHP should act as a warning example, because the preg_replace syntax has a pesky problem with double quoting ("/regex/" is a regex between slash delimiters inside a double-quoted string).
No, a comma is not a good separator, because it can validly occur inside a regular expression.
My recommendation would be to use an option parser which allows you to specify the same option name multiple times, so you can say
mycliutility -i regexp1 -i regexp2 -o regexp3 -o regexp4 somefilename
If your implementation language is Python and you are using optparse, for example, look at the action='append' behavior.