how to replace a string inside perl regex - regex

Is there a way to replace characters from inside the regex?
like so:
find x | xargs perl -pi -e 's/(as dasd asd)/replace(" ","",$1)/'
From OP's comment
code find x | xargs perl -pi -e 's/work_search=1\/ttype=2\/tag=(.*?)">(.*?)<\/a>/work\/\L$1\E\" rel=\"follow\">$2<\/a>/g'
in this case i want $1's spaces be replaced with _

You can use a nested substitution:
$ echo 'foo bar baz' | perl -wpE's/(\w+ \w+ \w+)/ $1 =~ s# ##gr /e'
foobarbaz
Note that the /r modifier requires perl v5.14. For earlier versions, use:
$ echo 'foo bar baz' | perl -wpE's/(\w+ \w+ \w+)/my $x=$1; $x=~s# ##g; $x/e'
foobarbaz
Note also that you need to use a different delimiter for the inner substitution. I used #, as you can see.

As far as I understand, you want to remove the spaces. Is it correct?
You can do:
s/(as) (dasd) (asd)/$1$2$3/

Related

Printing only text from group

I have working example of substitution in online regex tester https://regex101.com/r/3FKdLL/1 and I want to use it as a substitution in sed editor.
echo "repo-2019-12-31-14-30-11.gz" | sed -r 's/^([\w-]+)-\d{4}-\d{2}-\d{2}-\d{2}-\d{2}-\d{2}.gz$.*/\1/p'
It always prints whole string: repo-2019-12-31-14-30-11.gz, but not matched group [\w-]+.
I expect to get only text from group which is repo string in this example.
Try this:
echo "repo-2019-12-31-14-30-11.gz" |
sed -rn 's/^([A-Za-z]+)-[[:alnum:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}-[[:digit:]]{2}.gz.*$/\1/p'
Explanations:
\w will work (not [\w] wich matches either backslash or w), but you should use [[:alnum:]] which is POSIX
For sed, \d isn't a regex class, but an escaped character representing a non-printable character
Add -n to mute sed, with /p to explicitly print matched lines
Additionaly, you could refactor your regex by removing duplication:
echo "repo-2019-12-31-14-30-11.gz" |
sed -rn 's/^([[:alnum:]]+)-[[:digit:]]{4}(-[[:digit:]]{2}){5}.gz.*$/\1/p'
Looks like a job for GNU grep :
echo "repo-2019-12-31-14-30-11.gz" | grep -oP '^\K[[:alpha:]-]+'
Displays :
repo-
On this example :
echo "repo-repo-2019-12-31-14-30-11.gz" | grep -oP '^\K[[:alpha:]-]+'
Displays :
repo-repo-
Which I think is what you want because you tried with [\w-]+ on your regex.
If I'm wrong, just replace the grep command with : grep -oP '^\K\w+'

Replace Random Characters in a Variable

I want to replace value of a variable (can contain a number, a character, a string of characters).
$ echo $VAR
http://some-random-string.watch.film.tv/nfvsere/watch/skrz1j8exe/chunks.m3u8?nimblesessionid=30931574352........
So far, I've tried this command, however it's not working, so I'm thinking some of these might need a regex.
$ echo $VAR | sed -e "s/\(http[^^]*\).*\(.watch\)/\1"mystring"\2/g"
$ echo $VAR | sed -e "s/\(https\?:\/\/\).*\(.watch\)/\1"mystring"\2/g"
$ echo $VAR | sed -e "s/\(http[s]\?:\/\/\).*\(.watch\)/\1"mystring"\2/g"
I'm aware that there are questions that answer similar queries, but they have not been of help.
echo $VAR | sed 's|\(http[x]*://\)[^.]*\(.*\)|\1mystring\2|'
explanation
s| # substitute
\(http[x]*://\) # save first part in arg1 (\1)
[^.]* # all without '.'
\(.*\) # save the rest in arg2 (\2)
|\1 # print arg1
mystring # print your replacement
\2 # print arg2
|
In sed you have to escape any control characters like forward slash before matching:
echo $VAR | sed 's/http.\/\/.*\.watch\.film\.tv/http:\/\/mystring.watch.film.tv/'
You don't need sed. This task can be done in just shell:
$ var='http://some-random-string.watch.film.tv/nfvsere/watch/skrz1j8exe/chunks.m3u8?nimblesessionid=30931574352'
$ echo "${var%%//*}//mystring.watch${var#*.watch}"
http://mystring.watch.film.tv/nfvsere/watch/skrz1j8exe/chunks.m3u8?nimblesessionid=30931574352
How it works:
${var%%//*} returns the value of $var with the first // and everything after it removed.
//mystring.watch adds the string that we want.
${var#*.watch}" returns the value of $var with everything up to and including the first occurrence of .watch removed.
Because this approach does not require pipelines or subshells, it will be more efficient.
gnu sed
$ echo $VAR | sed -E 's/^(http:\/\/)\S+(\.watch\.film\.tv\/)/\1"mystring"\2/i'

Get substring using either perl or sed

I can't seem to get a substring correctly.
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g')
That still returns bugfix/US3280841-something-duh.
If I try an use perl instead:
declare BRANCH_NAME="bugfix/US3280841-something-duh";
# Trim it down to "US3280841"
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9]|[A-Z0-9])+/; print $1');
That outputs nothing.
What am I doing wrong?
Using bash parameter expansion only:
$: # don't use caps; see below.
$: declare branch="bugfix/US3280841-something-duh"
$: tmp="${branch##*/}"
$: echo "$tmp"
US3280841-something-duh
$: trimmed="${tmp%%-*}"
$: echo "$trimmed"
US3280841
Which means:
$: tmp="${branch_name##*/}"
$: trimmed="${tmp%%-*}"
does the job in two steps without spawning extra processes.
In sed,
$: sed -E 's#^.*/([^/-]+)-.*$#\1#' <<< "$branch"
This says "after any or no characters followed by a slash, remember one or more that are not slashes or dashes, followed by a not-remembered dash and then any or no characters, then replace the whole input with the remembered part."
Your original pattern was
's/\(^.*\)\/[a-z0-9]\|[A-Z0-9]\+/\1/g'
This says "remember any number of anything followed by a slash, then a lowercase letter or a digit, then a pipe character (because those only work with -E), then a capital letter or digit, then a literal plus sign, and then replace it all with what you remembered."
GNU's manual is your friend. I look stuff up all the time to make sure I'm doing it right. Sometimes it still takes me a few tries, lol.
An aside - try not to use all-capital variable names. That is a convention that indicates it's special to the OS, like RANDOM or IFS.
You may use this sed:
sed -E 's~^.*/|-.*$~~g' <<< "$BRANCH_NAME"
US3280841
Ot this awk:
awk -F '[/-]' '{print $2}' <<< "$BRANCH_NAME"
US3280841
sed 's:[^/]*/\([^-]*\)-.*:\1:'<<<"bugfix/US3280841-something-duh"
Perl version just has + in wrong place. It should be inside the capture brackets:
TRIMMED=$(echo $BRANCH_NAME | perl -nle 'm/^.*\/([a-z0-9A-Z]+)/; print $1');
Just use a ^ before A-Z0-9
TRIMMED=$(echo $BRANCH_NAME | sed -e 's/\(^.*\)\/[a-z0-9]\|[^A-Z0-9]\+/\1/g')
in your sed case.
Alternatively and briefly, you can use
TRIMMED=$(echo $BRANCH_NAME | sed "s/[a-z\/\-]//g" )
too.
type on shell terminal
$ BRANCH_NAME="bugfix/US3280841-something-duh"
$ echo $BRANCH_NAME| perl -pe 's/.*\/(\w\w[0-9]+).+/\1/'
use s (substitute) command instead of m (match)
perl is a superset of sed so it'd be identical 'sed -E' instead of 'perl -pe'
Another variant using Perl Regular Expression Character Classes (see perldoc perlrecharclass).
echo $BRANCH_NAME | perl -nE 'say m/^.*\/([[:alnum:]]+)/;'

sed Back-references used to replace

there is a string a_b_c_d. I want to replace _ with - in the string between a_ and _d. Below is processing.
echo "a_b_c_d" | sed -E 's/(.+)_(.+)_(.+)/\1`s/_/-/g \2`\3/g'
But it does not work. how can I reuse the \2 to replace its content?
Perl allows to use code in replacement section with e modifier
$ echo 'a_b_c_d' | perl -pe 's/a_\K.*(?=_d)/$&=~tr|_|-|r/e'
a_b-c_d
$ echo 'x_a_b_c_y' | perl -pe 's/x_\K.*(?=_y)/$&=~tr|_|-|r/e'
x_a-b-c_y
$&=~tr|_|-|r here $& is the matched portion, and tr is applied on that to replace _ to -
a_\K this will match a_ but won't be part of matched portion
(?=_d) positive lookahead to match _d but won't be part of matched portion
With sed (tested on GNU sed 4.2.2, not sure of syntax for other versions)
$ echo 'a_b_c_d' | sed -E ':a s/(a_.*)_(.*_d)/\1-\2/; ta'
a_b-c_d
$ echo 'x_a_b_c_y' | sed -E ':a s/(x_.*)_(.*_y)/\1-\2/; ta'
x_a-b-c_y
:a label a
s/(a_.*)_(.*_d)/\1-\2/ substitute one _ with - between a_ and _d
ta go to label a as long as the substitution succeeds
gnu sed:
$ sed -r 's/_/-/g;s/(^[^-]+)-/\1_/;s/-([^-]+$)/_\1/' <<<'x_a_b_c_y'
x_a-b-c_y
The idea is, replacing all _ by -, then restoring the ones you want to keep.
update
if the fields separated by _ contains -, we can make use ge of gnu sed:
sed -r 's/(^[^_]+_)(.*)(_[^_]+$)/echo "\1"$(echo "\2"\|sed "s|_|-|g")"\3"/ge'
For example we want ----_f-o-o_b-a-r_---- to be ----_f-o-o-b-a-r_----:
sed -r 's/(^[^_]+_)(.*)(_[^_]+$)/echo "\1"$(echo "\2"\|sed "s|_|-|g")"\3"/ge' <<<'----_f-o-o_b-a-r_----'
----_f-o-o-b-a-r_----
Following Kent's suggestion, and if you do not need a general solution, this works:
$ echo 'a_b_c+d_x' | tr '_' '-' | sed -E 's/^([a-z]+)-(.+)-([a-z]+)$/\1_\2_\3/g'
$ a_b-c+d_x
The character classes should be adjusted to match the leading and trailing parts of your input string. Fails, of course, if a or x contain the '-' character.

How to seek forward and replace selected characters with sed

Can I use sed to replace selected characters, for example H => X, 1 => 2, but first seek forward so that characters in first groups are not replaced.
Sample data:
"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";
How it should be after sed:
"Hello World";"Number 1 is there";"tX2s-Xas,2,XXunKnownData";
What I have tried:
Nothing really, I would try but everything I know about sed expressions seems to be wrong.
Ok, I have tried to capture ([^;]+) and "skip" (get em back using ´\1\2´...) first groups separated by ;, this is working fine but then comes problem, if I use capturing I need to select whole group and if I don't use capturing I'll lose data.
This is possible with sed, but is kinda tedious. To do the translation if field number $FIELD you can use the following:
sed 's/\(\([^;]*;\)\{'$((FIELD-1))'\}\)\([^;]*;\)/\1\n\3\n/;h;s/[^\n]*\n\([^\n]*\).*/\1/;y/H1/X2/;G;s/\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)\n\([^\n]*\)/\2\1\4/'
Or, reducing the number of brackets with GNU sed:
sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
Example:
$ FIELD=3
$ echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' | sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
"Hello World";"Number 1 is there";"tX2s-Xas,2,XXunKnownData";
$ FIELD=2
$ echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' | sed -r 's/(([^;]*;){'$((FIELD-1))'})([^;]*;)/\1\n\3\n/;h;s/[^\n]*\n([^\n]*).*/\1/;y/H1/X2/;G;s/([^\n]*)\n([^\n]*)\n([^\n]*)\n([^\n]*)/\2\1\4/'
"Hello World";"Number 2 is there";"tH1s-Has,1,HHunKnownData";
There may be a simpler way that I didn't think of, though.
If awk is ok for you:
awk -F";" '{gsub("H","X",$3);gsub("1","2",$3);}1' OFS=";" file
Using -F, the file is split with semi-colon as delimiter, and hence now the 3rd field($3) is of our interest. gsub function substitutes all occurences of H with X in the 3rd field, and again 1 to 2.
1 is to print every line.
[UPDATE]
(I just realized that it could be shorter. Perl has an auto-split mode):
$F[2] =~ s/H/X/g; $F[2] =~ s/1/2/g; $_=join(";",#F)
Perl is not known for being particularly readable, but in this case I suspect the best you can get with sed might not be as clear as with Perl:
echo '"Hello World";"Number 1 is there";"tH1s-Has,1,HHunKnownData";' |
perl -F';' -ape '$F[2] =~ s/H/X/g; $F[2] =~ s/1/2/g; $_=join(";",#F)'
Taking apart the Perl code:
# your groups are in #F, accessed as $F[$i]
$F[2] =~ s/H/X/g; # Do whatever you want with your chosen (Nth) group.
$F[2] =~ s/1/2/g;
$_ = join(";", #F) # Put them back together.
perl -pe is like sed. (sort of.)
and perl -F';' -ape means use auto-splitting (-a) and set the field separator to ';'. Then your groups are accessible via $F[i] - so it works slightly like awk, too.
So it would also work like perl -F';' -ape '/*your code*/' < inputfile
I know you asked for a sed solution - I often find myself switching to Perl (though I do still like sed) for one-liners.
awk -F";" '{gsub("H","X",$3);gsub("1","2",$3);}1' Your_file
This might work for you (GNU sed):
sed 's/H/X/2g;s/1/2/2g' file
This changes all but the first occurrence of H or 1 to X or 2 respectively
If it's by fields separated by ;'s, use:
sed 's/H[^;]*;/&\n/;h;y/H/X/;H;g;s/\n.*\n//;s/1[^;]*;/&\n/;h;y/1/2/;H;g;s/\n.*\n//' file
This can be mutated to cater for many values, so:
echo -e "H=X\n1=2"|
sed -r 's|(.*)=(.*)|s/\1[^;]*;/\&\\n/;h;y/\1/\2/;H;g;s/\\n.*\\n//|' |
sed -f - file