Bash string replacement with regex repetition - regex

I have a file: filename_20130214_suffix.csv
I'd like replace the yyyymmdd part in bash. Here is what I intend to do:
file=`ls -t /path/filename_* | head -1`
file2=${file/20130214/20130215}
#this will not work
#file2=${file/[0-9]{8}/20130215/}

The problem is that parameter expansion does not use regular expressions, but patterns or globs(compare the difference between the regular expression "filename_..csv" and the glob "filename_.csv"). Globs cannot match a fixed number of a specific string.
However, you can enable extended patterns in bash, which should be close enough to what you want.
shopt -s extglob # Turn on extended pattern support
file2=${file/+([0-9])/20130215}
You can't match exactly 8 digts, but the +(...) lets you match one or more of the pattern inside the parentheses, which should be sufficient for your use case.
Since all you want to do in this case is replace everything between the _ characters, you could also simply use
file2=${file/_*_/_20130215_}

[[ $file =~ ^([^_]+_)[0-9]{8}(_.*) ]] && file2="${BASH_REMATCH[1]}20130215${BASH_REMATCH[2]}"

Related

regex quantifiers in bash --simple vs extended matching {n} times

I'm using the bash shell and trying to list files in a directory whose names match regex patterns. Some of these patterns work, while others don't. For example, the * wildcard is fine:
$ls FILE_*
FILE_123.txt FILE_2345.txt FILE_789.txt
And the range pattern captures the first two of these with the following:
$ls FILE_[1-3]*.txt
FILE_123.txt FILE_2345.txt
but not the filename with the "7" character after "FILE_", as expected. Great. But now I want to count digits:
$ls FILE_[0-9]{3}.txt
ls: FILE_[0-9]{3}.txt: No such file or directory
Shouldn't this give me the filenames with three numeric digits following "FILE_" (i.e. FILE_123.txt and FILE_789.txt, but not FILE_2345.txt) Can someone tell me how I should be using the {n} quantifier (i.e. "match this pattern n times)?
ls uses with glob pattern, you can not use {3}. You have to use FILE_[0-9][0-9][0-9].txt. Or, you could the following command.
ls | grep -E "FILE_[0-9]{3}.txt"
Edit:
Or, you also use find command.
find . -regextype egrep -regex '.*/FILE_[0-9]{3}\.txt'
The .*/ prefix is needed to match a complete path. On Mac OS X :
find -E . -regex ".*/FILE_[0-9]{3}\.txt"
Bash filename expansion does not use regular expressions. It uses glob pattern matching, which is distinctly different, and what you're trying with FILE_[0-9]{3}.txt does brace expansion followed by filename expansion. Even bash's extended globbing feature doesn't have an equivalent to regular expression's {N}, so as already mentioned you have to use FILE_[0-9][0-9][0-9].txt

Perl Regex Command Line Issue

I'm trying to use a negative lookahead in perl in command line:
echo 1.41.1 | perl -pe "s/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g"
to get an incremented version that looks like this:
1.41.2
but its just returning me:
![0-9]+\.[0-9]+\.: event not found
i've tried it in regex101 (PCRE) and it works fine, so im not sure why it doesn't work here
In Bash, ! is the "history expansion character", except when escaped with a backslash or single-quotes. (Double-quotes do not disable this; that is, history expansion is supported inside double-quotes. See Difference between single and double quotes in Bash)
So, just change your double-quotes to single-quotes:
echo 1.41.1 | perl -pe 's/(?![0-9]+\.[0-9]+\.)[0-9]$/2/g'
and voilà:
1.41.2
I'm guessing that this expression also might work:
([0-9.]+)\.([0-9]+)
Test
perl -e'
my $name = "1.41.1";
$name =~ s/([0-9.]+)\.([0-9]+)/$1\.2/;
print "$name\n";
'
Output
1.41.2
Please see the demo here.
If you want to "increment" a number then you can't hard-code the new value but need to capture what is there and increment that
echo "1.41.1" | perl -pe's/[0-9]+\.[0-9]+\.\K([0-9]+)/$1+1/e'
Here /e modifier makes it so that the replacement side is evaluated as code, and we can +1 the captured number, what is then substituted. The \K drops previous matches so we don't need to put them back; see "Lookaround Assertions" in Extended Patterns in perlre.
The lookarounds are sometimes just the thing you want, but they increase the regex complexity (just by being there), can be tricky to get right, and hurt efficiency. They aren't needed here.
The strange output you get is because the double quotes used around the Perl program "invite" the shell to look at what's inside whereby it interprets the ! as history expansion and runs that, as explained in ruakh's post.
As an alternate to lookahead, we can use capture groups, e.g. the following will capture the version number into 3 capture groups.
(\d+)\.(\d+)\.(\d+)
If you wanted to output the captured version number as is, it would be:
\1.\2.\3
And to just replace the 3rd part with the number "2" would be:
\1.\2.2
To adapt this to the OP's question, it would be:
$ echo 1.14.1 | perl -pe 's/(\d+)\.(\d+)\.(\d+)/\1.\2.2/'
1.14.2
$

How do I reference a shell variable and arbitrary digits inside a grep regex?

I am looking to translate this regular expression into grep flavour:
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
Example of line that should match, assuming that VAR=285900
b3fb1e501749b98c69c623b8345a512b8e01c611 refs/changes/00/285900/9
Current code:
VAR=285900
grep 'refs/changes/\d+/$VAR/' sample.txt
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
That would be
grep "refs/changes/[[:digit:]]\{1,\}/$VAR/"
or
grep -E "refs/changes/[[:digit:]]+/$VAR/"
Note that the \d+ notation is a perl thing. Some overfeatured greps might support it with an option, but I don't recommend it for portability reasons.
inside simple quotes I cannot use variable expansion
You can mix and match quotes:
foo=not; echo 'single quotes '"$foo"' here'
with double quotes it does match anything.
It's not clear what you're doing, so we can't say why it doesn't work. It should work. There is no need to escape forward slashes for grep, they don't have any special meaning.

bash regular expression different formats

I have used regular expression in my code like this: .*[^0-9].*
But recently I have seen some functions implemented like this: *[!0-9]* for the same purpose of first example, that is non-integer numbers.
So I confused what is the true form of regex and what is the difference of them.
can anybody help me in this issue?
There is only one regular expression - the first one. The second one is a glob pattern.
See regex(7) for the description of POSIX extended regular expressions supported by Bash:
http://man7.org/linux/man-pages/man7/regex.7.html
See Bash manual for the description of glob patterns: http://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html
Bash uses regular expressions in [[…]] command only: http://www.gnu.org/software/bash/manual/html_node/Conditional-Constructs.html
Bash uses glob patterns for everything else.
POSIX defines:
1) two types of regular expressions: BREs and EREs. These are used by utilities / built-ins.
BREs are more restricted and exist for backwards compatibility and typing less on an interactive session. Avoid them if possible and use EREs instead, which are more flexible and PERL-like.
Some utilities allow you to choose between both types of regular expressions.
For example, grep matches BREs by default (backwards compatibility...), but you can make it match EREs with -E.
Use usually must quote those before passing them to utilities or the shell will filename expand them.
.*[^0-9].* could be both a BRE or an ERE. In both cases it means the same as the Perl regex, which is equivalent to the glob *[!0-9]*.
The main difference between BRE and ERE is that EREs add more useful Perl like special characters such as (a|b), a{m,n}, a+, a?. Examples:
echo a | grep '(a|b)'
# output:
echo a | grep -E '(a|b)'
# output: a
echo a | grep 'a{1,2}'
# output:
echo a | grep -E 'a{1,2}'
# output: a
2) Patterns Used for Filename Expansion, also known as globs (used by the POSIX glob C function). These are usually expanded by the shell before going to the utilities and expand to match filenames. If you quote them they are don't expand anymore.
*[!0-9]* is must be a glob since BREs ane EREs use ^ instead of !.
echo *[!0-9]*
# output: filenames which are not numbers
echo '*[!0-9]*'
# output: *[!0-9]*

Regular expression: replace one character set with another

I have a string ( e.g. 3122323123123) and want to replace any 1->ax, 2->by and 3->cz.
How do I do that in bash?
I started with the character set [123] and tried with "sed", but didn't know how to write the replacement expression ?
Regex is not the tool for you here. There's nothing in your question that requires any regex.
You didn't specify your language, but if you're working in PHP, you could use the function strtr() which does exactly what you are looking for.
And good old str_replace() can probably also do what you want too, as it can accept arrays for the search/replacement arguments.
Most other languages should have similar capabilities that mean you shouldn't need regex for this.
Look at standard tr utility.
% echo "3122323123123" | tr "123" "abc"
cabbcbcabcabc
If you want to replace a character with multiple characters, you can use sed for every replacement:
% echo "3122323123123" | sed -e "s/1/ax/g" -e "s/2/by/g" -e "s/3/cz/g"
czaxbybyczbyczaxbyczaxbycz
In c#
string input = "3122323123123";
string output = intput.Replace('1','a').Replace('2','b').Replace('3','c');
Using Perl tr/// for example:
$ echo "3122323123123" | perl -pe "tr/123/abc/"
cabbcbcabcabc