increase the integer value of 1 with Regex - regex

I want to increase value.For Example
Jerry1
Jerry2
Jerry3
Jerry4
I want to change that.
Jerry2
Jerry3
Jerry4
Jerry5
How can I change ?

Don't try to abuse regular expressions for everything.
By design, regular expressions are meant to not support counting. The reason is simple: if you want to have this, you need at least a type-2 language, while processing is signficiantly more complex than for type 3 ("regular") languages.
See Wikipedia for details: https://en.wikipedia.org/wiki/Chomsky_hierarchy
So by the definition, once you fully support counting it probably no longer is a regular language.
There are extensions around, for example perl extended regular expressions, that do allow to solve this particular problem. But essentially, they are no longer regular expressions, but they invoke an external function to do the work.
The following perl extended regular expression should do what you want:
s/(-?\d+)/$1 + 1/eg
but essentially, only the matching part is a regular expression, the substitution is Perl, so turing complete. The e flag indicates the right part should be evaluated by Perl, not as regexp substitution string.
You can of course do this trick in pretty much any other regular expression engine. Match, then compute the increment, then substitute the match with the new value.
Full perl filter demo:
> echo 'Test 123 test 0 Banana9 -17 3 route66' | perl -pe 's/(-?\d+)/$1+1/eg'
Test 124 test 1 Banana10 -16 4 route67
The p flag makes perl read standard input and apply the program to each line, then output the result. That is why the actual script consists of the substitution only. This is what makes Perl so popular for unix scripting. You can even mass-apply this filter to a whole set of files (see -i for in-place modification, and the perlrun manual page). So in order to modify a whole set of files in place (backups will be postfixed with .bak):
perl -p -i .bak -e 's/(-?\d+)/$1+1/eg' <filenames>

Related

How to comprehend expression/pattern in find/grep/rsync?

I have to use find, grep and rsync commands for my program. Generally, I rarely used all of these in a single script so didn't notice earlier. Is there a category of regular-expression that fit these commands like:
find command: follows regex type1
grep command: follows regex type2
rsync command: follows regex type3
For example, for finding all the paths which lead to my program log file, we can do:
find -type f -name "foo.log*"
Here, in the above command, the star is not acting like a proper regular expression, as in regex, the star corresponds to the zero/one/multiple instances of the immediate before expression which is character('g') in this case? So if it actually follows regex, it can match filenames like:
foo.lo
foo.log
foo.logg
foo.loggg
and so on...
Similar to find command, the rsync behave when given expression for its source and destination path. While on the other hand, I noticed the grep command do follow the regular expression.
So, in total:
Do all of these commands follow a different kind of regular expression?
Or some of them follows regex while some of them do not, and if not, then what pattern they follow? Basically, I'm looking for the generalisation of the patterns of all these tools?
I'm new to Linux tools. Please guide!
There is a big difference between wildcards and regular expressions.
Wildcards:
special characters that define a simple search pattern
used by shells (bash, old MS-DOS, ...), and by many unix commands (find, ...)
limited set of wildcards, typically just:
* - zero or more chars (any combination)
? - exactly one char (any char)
[...] - exactly one char out of a set or range of chars, such as [0-9a-f] for a hex digit
see tutorial: https://linuxhint.com/bash_wildcard_tutorial/
Regular Expression:
a sequence of characters that define a search pattern
think of regular expressions (regex for short) as wildcards on steroids
regex patterns are used to find or find and replace strings
powerful language, natively supported by most programming languages
there are different flavors of regular expressions, typically grouped into these categories:
POSIX Basic (BRE - Basic Regular Expressions)
POSIX Extended (ERE - Extended Regular Expressions)
Perl and PCRE (Perl Compatible Regular Expressions)
JavaScript
many more flavors, see https://en.wikipedia.org/wiki/Comparison_of_regular-expression_engines
some unix commands allow you to select one regex flavor or another; for example:
grep uses POSIX Basic by default
grep -E or egrep uses POSIX Extended
grep -Puses Perl
Wikipedia article: https://en.wikipedia.org/wiki/Regular_expression
tutorial: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

Bash variable substitution with a regex not working as expected

Given a bash variable holding the following string:
INPUT="Cookie: cf_clearance=foo; __cfduid=bar;"
Why is the substitution ${INPUT/cf_clearance=[^;]*;/} producing the output: Cookie: instead of what I'd expect: Cookie: __cfduid=bar;
Testing the same regex in online regex validators confirms that cf_clearance=[^;]*; should match cf_clearance=foo; only, and not the rest of the string.
What am I doing wrong here?
Use the actual regular-expression matching features instead of parameter expansion, which works with patterns.
[[ $INPUT =~ (.*)(cf_clearance=[^;]*;)(.*) ]]
ans=${BASH_REMATCH[1]}${BASH_REMATCH[3]}
You can also use an extended pattern, which is equivalent to a regular expression in power:
shopt -s extglob
$ echo "${INPUT/cf_clearance=*([^;]);/}"
Use sed:
INPUT=$(sed 's/cf_clearance=[^;]*;//' <<< "$INPUT")
Like you have been told in comments, bash parameter substitution only supports glob patterns, not regular expressions. So the problem is really with your expectation, not with your code per se.
If you know that the expression can be anchored to the beginning of the string, you can use the ${INPUT#prefix} parameter substitution to grab the shortest possible match, and add back the Cookie: in front:
echo "Cookie: ${INPUT#Cookie: cf_clearance=*;}"
If you don't have this guarantee, something very similar can be approximated with a pair of parameter substitutions. Find which part precedes cf_clearance, find which part follows after the semicolon after cf_clearance; glue them together.
head=${INPUT%cf_clearance=*}
tail=${INPUT#*cf_clearance=*;}
echo "$head$tail"
(If you are not scared of complex substitutions, the temporary variables aren't really necessary or useful.
echo "${INPUT%cf_clearance=*}${INPUT#*cf_clearance=*;}"
This is a little dense even for my sophisticated taste, though.)

reg exp: "if" and single "="

I need a regular expression (grep -e "__"), which matching all lines containing if and just one = (ignoring lines containing ==)
I tried this:
grep -e "if.*=[^=]"
but = is not a character class, so it doesn't work.
The problem is .* may contain an =.
I'd suggest
grep -e "if[^=]*=[^=]"
If your goal is to find lines of code with an if containing an erroneous assignment instead of a comparison, I'd suggest to use a linter (which would be based on a robust parser instead of just regexes). The linter to use depends on the language of the code, of course (for example I use this one in Javascript).

Can OR expressions be used in ${var//OLD/NEW} replacements?

I was testing some string manipulation stuff in a bash script and I've quickly realized it doesn't understand regular expressions (at least not with the syntax I'm using for string operations), then I've tried some glob expressions and it seems to understand some of them, some not. To be specific:
FINAL_STRING=${FINAL_STRING//<title>/$(get_title)}
is the main operation I'm trying to use and the above line works, replacing all occurrences of <title> with $(get_title) on $FINAL_STRING... and
local field=${1/#*:::/}
works, assigning $1 with everything from the beginning to the first occurrence of ::: replaced by nothing (removed). However # do what I'd expect ^ to do. Plus when I've tried to use the {,,} glob expression here:
FINAL_STRING=${FINAL_STRING//{<suffix>,<extension>}/${SUFFIX}}
to replace any occurrence of <suffix> OR <extension> by ${SUFFIX} , it works not.
So I see it doesn't take regex and it also doesn't take glob patterns... so what Does it take? Are there any exhaustive listing of what symbols/expressions are understood by plain bash string operations (particularly substring replacement)? Or are *, ?, #, ##, % and %% the only valid stuff?
(I'm trying to rely only on plain bash, without calling sed or grep to do what I want)
The gory details can be found in the bash manual, Shell Expansions section. The complete picture is surprisingly complex.
What you're doing is described in the Shell Parameter Expansion section. You'll see that the pattern in
${parameter/pattern/string}
uses the Filename Expansion rules, and those don't include Brace Expansion - that is done earlier when processing the command line arguments. Filename expansion "only" does ?, * and [...] matching (unless extglob is set).
But parameter expansion does a bit more than just filename expansion, notably the anchoring you noticed with # or %.
bash does in fact handle regex; specifically, the [[ =~ ]] operator, which you can then assign to a variable using the magic variable $BASH_REMATCH. It's funky, but it works.
See: http://www.linuxjournal.com/content/bash-regular-expressions
Note this is a bash-only hack feature.
For code that works in shells besides bash as well, the old school way of doing something like this is indeed to use #/##/%/%% along with a loop around a case statement (which supports basic * glob matching).

Regular expression for odd number of a's

I have a problem in solving the following exercise and I'd appreciate any help.
Let Σ = {a,b}. I need to give a regular expression for all strings containing an odd number of a.
Thank you for your time
b*(ab*ab*)*ab*
the main part of it is (ab*ab*)*, which enumerate all possibilities of even number of as. then at last, an extra a has to exist to make it odd.
notice that this regular expression is equivalent to:
b*a(b*ab*a)*b*
these two constructs are in the form defined by pumping lemma:
http://en.wikipedia.org/wiki/Pumping_lemma
UPDATE:
#MahanteshMAmbi presented his concern of the regular expression matching the case aaabaaa. In fact, it doesn't. If we run grep, we shall see clearly what is matched.
$ echo aaabaaa | grep -P -o 'b*(ab*ab*)*ab*'
aaabaa
a
-o option of grep will print each matching instance every line. In this case, as we can see, the regular expression is being matched twice. One matches 5 as, one matches 1 a. The seeming error in my comment below is caused by an improper test case, rather than the error in the regular expression.
If we want to make it rigorous to use in real life, it's probably better to use anchors in the expression to force a complete string match:
^b*(ab*ab*)*ab*$
therefore:
$ echo aaabaaa | grep -P -q '^b*(ab*ab*)*ab*$'
$ echo $?
1
^[^a]*a(?=[^a]*(?:a[^a]*a)*[^a]*$).*$
This will find only odd number of a's for any generic string.See demo.
https://regex101.com/r/eS7gD7/22