Regular expression: replace one character set with another - regex

I have a string ( e.g. 3122323123123) and want to replace any 1->ax, 2->by and 3->cz.
How do I do that in bash?
I started with the character set [123] and tried with "sed", but didn't know how to write the replacement expression ?

Regex is not the tool for you here. There's nothing in your question that requires any regex.
You didn't specify your language, but if you're working in PHP, you could use the function strtr() which does exactly what you are looking for.
And good old str_replace() can probably also do what you want too, as it can accept arrays for the search/replacement arguments.
Most other languages should have similar capabilities that mean you shouldn't need regex for this.

Look at standard tr utility.
% echo "3122323123123" | tr "123" "abc"
cabbcbcabcabc
If you want to replace a character with multiple characters, you can use sed for every replacement:
% echo "3122323123123" | sed -e "s/1/ax/g" -e "s/2/by/g" -e "s/3/cz/g"
czaxbybyczbyczaxbyczaxbycz

In c#
string input = "3122323123123";
string output = intput.Replace('1','a').Replace('2','b').Replace('3','c');

Using Perl tr/// for example:
$ echo "3122323123123" | perl -pe "tr/123/abc/"
cabbcbcabcabc

Related

Select a single character in an alphanumeric string in bash

I have an issue with string manipulation in bash. I have a list of names, each name being composed of two parts, chars and numbers: for example
abcdef01234
I want to cut the last character before the numeric part starts, in this case
f
I think there is a regular expression to help me with this but just can't figure it out. AWK/sed solutions are accepted too. Hope someone can help.
Thank you.
In bash it can be done with parameter expansion with substring removal and string indexes, e.g.,
a=abcdef01234 # your string
tmp=${a%%[0-9]*} # remove all numbers from right
echo ${tmp:(-1)} # output last of remaining chars
Output: f
You can use a regexp like [a-zA-Z]+([a-zA-Z])[0-9]+. If you know how to use sed is pretty easy.
Check https://regex101.com/r/XCkKM5/1
The match will be the letter you want.
^\w+([a-zA-Z])\d+$
As a sed command (on OSX) this will be :
echo "abcdef12345" | sed -E "s#^[a-zA-Z]+([a-zA-Z])[0-9]+\$#\1#"
try following too once.
echo "abcdef01234" | awk '{match($0,/[a-zA-Z]+/);print substr($0,RLENGTH,1)}'
I have a list of names I assume is a file, file. Using grep's PCRE and (positive) lookahead:
$ grep -oP "[a-z](?=[^a-z])" file
f
It prints out the first (lowercase) letter followed by a non-(lowercase)-letter.

Conditional in perl regex replacement

I'm trying to return different replacement results with a perl regex one-liner if it matches a group. So far I've got this:
echo abcd | perl -pe "s/(ab)(cd)?/defined($2)?\1\2:''/e"
But I get
Backslash found where operator expected at -e line 1, near "1\"
(Missing operator before \?)
syntax error at -e line 1, near "1\"
Execution of -e aborted due to compilation errors.
If the input is abcd I want to get abcd out, if it's ab I want to get an empty string. Where am I going wrong here?
You used regex atoms \1 and \2 (match what the first or second capture captured) outside of a regex pattern. You meant to use $1 and $2 (as you did in another spot).
Further more, dollar signs inside double-quoted strings have meaning to your shell. It's best to use single quotes around your program[1].
echo abcd | perl -pe's/(ab)(cd)?/defined($2)?$1.$2:""/e'
Simpler:
echo abcd | perl -pe's/(ab(cd)?)/defined($2)?$1:""/e'
Simpler:
echo abcd | perl -pe's/ab(?!cd)//'
Either avoid single-quotes in your program[2], or use '\'' to "escape" them.
You can usually use q{} instead of single-quotes. You can also switch to using double-quotes. Inside of double-quotes, you can use \x27 for an apostrophe.
Why torture yourself, just use a branch reset.
Find (?|(abcd)|ab())
Replace $1
And a couple of even better ways
Find abcd(*SKIP)(*FAIL)|ab
Replace ""
Find (?:abcd)*\Kab
Replace ""
These use regex wisely.
There is really no need nowadays to have to use the eval form
of the regex substitution construct s///e in conjunction with defined().
This is especially true when using the perl command line.
Good luck...

How do I reference a shell variable and arbitrary digits inside a grep regex?

I am looking to translate this regular expression into grep flavour:
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
Example of line that should match, assuming that VAR=285900
b3fb1e501749b98c69c623b8345a512b8e01c611 refs/changes/00/285900/9
Current code:
VAR=285900
grep 'refs/changes/\d+/$VAR/' sample.txt
I am trying to filter all lines that contain refs/changes/\d+/$VAR/
That would be
grep "refs/changes/[[:digit:]]\{1,\}/$VAR/"
or
grep -E "refs/changes/[[:digit:]]+/$VAR/"
Note that the \d+ notation is a perl thing. Some overfeatured greps might support it with an option, but I don't recommend it for portability reasons.
inside simple quotes I cannot use variable expansion
You can mix and match quotes:
foo=not; echo 'single quotes '"$foo"' here'
with double quotes it does match anything.
It's not clear what you're doing, so we can't say why it doesn't work. It should work. There is no need to escape forward slashes for grep, they don't have any special meaning.

How to retain the first instance of a match with sed

I have a set of tokens in data and wish to strip off the trailing ".[0-9]", however i cannot figure out how to quote the regexp properly. The First match should be all up to the . and the second the . and a number. I am intending that the first match be retained.
data="thing thing__aaa.0 thing__bbb.3 thing__ccc.5 other_aaa other_bbb other_ccc.5"
data=`echo $data | sed s/\([a-zA-Z0-9_]+\)\(\.[0-9]\)/\1/g`
echo $data
Actual output:
thing thing__aaa.0 thing__bbb.3 thing__ccc.5 other_aaa other_bbb other_ccc.5
Desired output:
thing thing__aaa thing__bbb thing__ccc other_aaa other_bbb other_ccc
The idea is that the unquoted ([a-zA-Z0-9_]+) is the first matching group, and the (\.[0-9]) matches the .number. the \1 should replace both groups with the first group.
How about just
echo $data | sed 's/\.[0-9]//g'
or if number may contain more digits, then
echo $data | sed 's/\.[0-9]\+//g'
It looks like you just want to delete all strings of the form \.[0-9]. So why not just do:
sed 's/\.[0-9]+\b//g'
(This relies on gnu sed's \b and + extensions. For other sed you can do:
sed 's/\.[0-9][0-9]*\( \|$\)/\1/g'
I normally don't encourage the use of shell specific extensions, but if you are using bash you might be happy using an array:
bash$ data=(thing thing__aaa.0 thing__bbb.3)
bash$ echo "${data[#]%.[0-9]*}"
Note that this will also delete extensions that are not all digits (ie foo.34bb), but perhaps is adequate for your needs.)

How to escape slashes in Perl text used in a regular expression?

end_date=$(date +"%m/%d/%Y")
/usr/bin/perl -pi -e "s/_end_date_/${end_date}/g" filename
I want to replace string '_end_date_' with the current date. Since the current date has slashes in it (yes, I want the slashes), I need to escape them. How can I do this?
I've tried several ways, like replacing slashes with "/" using sed and Perl itself, but it didn't work. Finally I used 'cut' to break date in 3 parts and escaped slashes, but this solution doesn't look good. Is there a better solution?
In Perl you can choose which character to use to separate parts of a regular expression. The following code will work fine.
end_date = $(date +"%m/%d/%Y")
/usr/bin/perl -pi -e "s#_end_date_#${end_date}#g" filename
This is to avoid the 'leaning toothpick' syndrome with / alternating.
Use a different s delimiter: s{all/the/slashes/you/want/}{replacement}.
I would recommend changing the delimiter, but you can almost always get by with quotemeta:
/usr/bin/perl -pi -e "my \$ed=quotemeta('${end_date}');s/_end_date_/\$ed/g" filename
But you also have this route:
/usr/bin/perl -pi -e 'BEGIN { use POSIX qw<strftime>; $ed=quotemeta(strftime( q[%m/%d/%Y], localtime())); } s/_end_date_/$ed/'
which does the same thing as your two lines.
Building on Axeman's answer, the following works for me :
perl -MPOSIX=strftime -p -e'$ed=strftime( q[%m/%d/%Y], localtime()) ;s/_end_date_/$ed/'
A few things to note
The quotemeta isn't needed because the compiler isn't looking for a / in the variable $ed.
I have used single quotes ' rather than " as otherwise you end up having to quote $
I prefer using -MPOSIX=strftime to BEGIN { use POSIX qw<strftime> }