How can I replace all occurrences of a character within matched string in sed - regex

I want to replace all occurrences of a character, say ,, in a matched string in sed. The matched string looks like:
[cite!abc,cde]
I want to replace it with:
[!cite](abc, cde)
The command to replace the outer format is:
sed 's/\[cite\!\([^]]+\)\]/\[\!cite\]\(\1\)/g' file
which gives
[!cite](abc,cde)
However, I want to put space after , and there may be an arbitrary number or comma delimited entries, e.g.
[cite!abc,cde,def,fgh]
Is there an elegant way of doing this in sed or do I need to resort to perl scripts?

If you're guaranteed no spaces after commas in the input string:
sed 's/,/, /g' file
If you do have spaces after the commas in the input string, you'll get extra spaces in the output.
EDIT:
If there may be spaces after the commas in some of the elements, you can avoid adding more with:
sed 's/,\([^ ]\)/, \1/g' file
$ echo '1, 2, 3,4,5,6' | sed -e 's/,\([^ ]\)/, \1/g'
1, 2, 3, 4, 5, 6

This might work for you (GNU sed):
sed -E ':a;/\[([^]!]+)!([^]]+)\]/{s//\n[!\1](\2)\n/;h;s/,/, /g;H;g;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/;ta}' file
The solution comes in two parts:
The string to be amended are identified, and a first pass rearranges to the requires format less the comma space separated list.
The string is surrounded by newlines and a copy is made into the hold space. Then the current pattern space is has all commas space separated and the copy and current line are joined and using patten matching formatted to final required state.
The process is then repeated on the current line so as to effect a global replacement.

Related

Regex to get match on entire string

How to match a a word before a specific charachter using sed in bash?
In my scenario I would need to match the metrics names in the entire string which occurs only before {.
The below is the string I am working on.
sum(rate(nginx_ingress_controller_request_duration_seconds_sum{namespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))/sum(rate(nginx_ingress_controller_request_duration_seconds_count{namespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))
What I would need the output is the below.
nginx_ingress_controller_request_duration_seconds_sum
nginx_ingress_controller_request_duration_seconds_count
I am not a Regex expert and I would be very thankful.
With GNU grep:
grep -oP '\(\K[^({]+(?={)'
This will print the results in separate lines. \(\K will check for presence of ( character and reset the start of matching portion (since ( isn't needed in the output). [^({]+ will match except ( and { characters. (?={) makes sure that the matched portion is followed by { character (but not part of the output).
If you know that the required portion can have only word characters, you can also use:
grep -oP '\w+(?={)'
This will look for two occurrences on the line onto a separate line in new_file
(with GNU sed):
sed 's/.*(\(.*\){.*(\(.*\){.*/\1\n\2/' your_file > new_file
Contents of new_file:
nginx_ingress_controller_request_duration_seconds_sum
nginx_ingress_controller_request_duration_seconds_count
The ways it's working is as follows:
/.*(: Match everything after a { up to a (
\(.*\): I remember the stuff in between \( and \) (these are called
capture group)
{.*(: Match everything after a { up to a (
\(.*\): I remember a second group of stuff using a second capture group
{.*: Match the rest of the stuff in the line
/\1\n\2/: Put the two patterns we remembered back into a file a newline
\n between.
Edit
Another approach that would would work for multiple occurrences would be to
create newlines and a unique patter at the points before and after the part of the string that
you're interested in, and then grep away those lines:
sed 's/(/BADLINES\n/g; s/{/\nBADLINES/g' your_file | grep -v BADLINES
The first part (sed 's/(/BADLINES\n/g; s/{/\nBADLINES/g' your_file) produces:
sumBADLINES
rateBADLINES
nginx_ingress_controller_request_duration_seconds_sum
BADLINESnamespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))/sumBADLINES
rateBADLINES
nginx_ingress_controller_request_duration_seconds_count
BADLINESnamespace=\"$namespace\",ingress=~\"$ingress\"}[3m]))
and the | grep -v BADLINES produces:
nginx_ingress_controller_request_duration_seconds_sum
nginx_ingress_controller_request_duration_seconds_count
This might work for you (GNU sed):
sed -E '/^(\w+)\{/{s//\1\n/;P;D};s/^\w*\W/\n/;D' file
If the start of the line is a valid string followed by a {, replace the { by a newline, print/delete the first line in the pattern space and repeat.
Otherwise, reduce the pattern space and repeat until all strings are matched.
N.B. A valid string in this case is a word i.e. alphanumeric or an underscore.

How to use sed to search and replace a pattern who appears multiple times in the same line?

Because the question can be misleading, here is a little example. I have this kind of file:
some text
some text ##some-text-KEY-some-other-text##
text again ##some-text-KEY-some-other-text## ##some-text-KEY-some-other-text##
again ##some-text-KEY-some-other-text-KEY-text##
some text with KEY ##KEY-some-text##
blabla ##KEY##
In this example, I want to replace each occurrence of KEY- inside a pair of ## by VALUE-. I started with this sed command:
sed -i 's/\(##[^#]*\)KEY-\([^#]*##\)/\1VALUE-\2/g'
Here is how it works:
\(##[^#]*\): create a first group composed of two # and any characters except # ...
KEY-: ... until the last occurrence of KEY- on that line
\([^#]*##\): and create a second group with all the characters except # until the next pair of #.
The problem is my command can't handle correctly the following line because there are multiple KEY- inside my pair of ##:
again ##some-text-KEY-some-other-text-KEY-text##
Indeed, I get this result:
again ##some-text-KEY-some-other-text-VALUE-text##
If I want to replace all the occurrences of KEY- in that line, I have to run my command multiple times and I prefer to avoid that. I also tried with lazy operators but the problem is the same.
How can I create a regex and a sed command who can handle correctly all my file?
The problem is rather complex: you need to replace all occurrences of some multicharacter text inside blocks of text between identical multicharacter delimiters.
The easiest and safest way to solve the task is using Perl:
perl -i -pe 's/(##)(.*?)(##)/$end_delim=$3; "$1" . $2=~s|KEY-|VALUE-|gr . "$end_delim"/ge' file
See the online demo.
The (##)(.*?)(##) pattern will match strings between two adjacent ## substrings capturing the start delimiter into Group 1, end delimiter in Group 3, and all text in between into Group 2. Since the regex substitution re-sets all placeholders, the temporary variable is used to keep the value of the end delimiter ($end_delim=$3), then, "$1" . $2=~s|KEY-|VALUE-|gr . "$end_delim" replaces the match with the value in the Group 1 of the first match (the first ##), then the Group 2 value with all KEY- replaced with VALUE-, and then the end delimiter.
If there are no KEY-s in between matches on the same line you may use a branch with sed by enclosing your command with :A and tA:
sed -i ':A; s/\(##[^#]*\)KEY-\([^#]*##\)/\1VALUE-\2/g; tA' file
Note you missed the first placeholder in \VALUE-\2, it should be \1VALUE-\2.
See the online demo:
s="some KEY- text
some text ##some-text-KEY-some-other-text##
text again ##some-text-KEY-some-other-text## ##some-text-KEY-some-other-text##
again ##some-text-KEY-some-other-text-KEY-text##
some text with KEY ##KEY-some-text##
blabla ##KEY##"
sed ':A; s/\(##[^#]*\)KEY-\([^#]*##\)/\1VALUE-\2/g; tA' <<< "$s"
Output:
some KEY- text
some text ##some-text-VALUE-some-other-text##
text again ##some-text-VALUE-some-other-text## ##some-text-VALUE-some-other-text##
again ##some-text-VALUE-some-other-text-VALUE-text##
some text with KEY ##VALUE-some-text##
blabla ##KEY##
More details:
sed allows the usage of loops and branches. The :A in the code above is a label, a special location marker that can be "jumped at" using the appropriate operator. t is used to create a branch, this "command jumps to the label only if the previous substitute command was successful". So, once the pattern matched and the replacement occurred, sed goes back to where it was and re-tries a match. If it is not successful, sed goes on to search for the matches further in the string. So, tA means go back to the location marked with A if there was a successful search-and-replace operation.
This might work for you (GNU sed):
sed -E 's/##/\n/g;:a;s/^([^\n]*(\n[^\n]*\n[^\n]*)*\n[^\n]*)KEY-/\1VALUE-/;ta;s/\n/##/g' file
Convert ##'s to newlines. Using a loop, replace VAL- between matched newlines to VALUE-. When all done replace newlines by ##'s.

How to match and replace string following the match via sed or awk

I have a file which I want to modify into a new file using cat.
So the file contains lines like:
name "myName"
place "xyz"
and so on....
I want these lines to be changed to
name "Jon"
place "paris"
I tried to do it like this but its not working:
cat originalFile | sed 's/^name\*/name "Jon"/' > tempFile
I tried using all sorts of special characters and it did not work. I am unable to recognize the space characters after name and then "myName".
You may match the rest of the line using .*, and you may match a space with a space, or [[:blank:]] or [[:space:]]:
sed 's/^\(name[[:space:]]\).*/\1"Jon"/;s/^\(place[[:space:]]\).*/\1"paris"/' originalFile > tempFile
Note there are two replace commands here joined with s semicolon. The first parts are wrapped with a capturing group that is necessary because the space POSIX character class is not literal and in order to keep it after replacing the \1 backreference should be used (to insert the text captured with Group 1).
See the online demo:
s='name "myName"
place "xyz"'
sed 's/^\(name[[:space:]]\).*/\1"Jon"/;s/^\(place[[:space:]]\).*/\1"paris"/' <<< "$s"
Output:
name "Jon"
place "paris"
An awk alternative:
awk '$1=="name"{$0="name \"Jon\""} $1=="place"{$0="place \"paris\""} 1' originalFile
It will work when there're space(s) before name or place.
It's not regex match here but just string compare.
awk separates fields by space characters which including \n or .
Append > tempFile to it when the results seems correct to you.

Finding and replacing a numeric string between colons, before a space, using sed?

I am attempting to change all coordinate information in a fastq file to zeros. My input file is composed of millions of entries in the following repeating 4-line structure:
#HWI-SV007:140:C173GACXX:6:2215:16030:89299 1:N:0:CAGATC
GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAG
+
###FFFDFHGGDHIIHGIJJJJJJJJJJJGIJJJJJJJIIIDHGHIGIJJIIIJJIJ
I would like to replace the two numeric strings in the first line 16030:89299 with zeros in a generic way, such that any numeric string between the colons, before the space, is replaced. I would like the output to appear as follows, replacing the two strings globally throughout the file with zeros:
#HWI-SV007:140:C173GACXX:6:2215:0:0 1:N:0:CAGATC
GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAG
+
###FFFDFHGGDHIIHGIJJJJJJJJJJJGIJJJJJJJIIIDHGHIGIJJIIIJJIJ
I am attempting to do this using the following sed:
sed 's/:^[0-9]+$:^[0-9]+$\s/:0:0 /g'
However, this does not behave as expected.
I think you will need to use sed -r option.
Also, ^ matches beginning of the line and $ matches end of the line.
Thus this is the command line that works against your sample.
sed -r 's/:[0-9]+:[0-9]+\s/:0:0 /g'
some alternative
awk -F ":" 'BEGIN{ OFS = ":" }{ if ( NF > 1 ) {$6 = 0; sub( /^[0-9]*/, 0, $7)}; print $0 }' YourFile
using column separate by :
sed 's/^\(\([^:]*:\)\{5\}\)[^[:blank:]]*/\10:0/' YourFile
using 5 first element separate by : thant space as delimiter
for your sed
sed 's/:[0-9]+:[0-9]+\(\s\)/:0:0\1/'
^and $ are relative to the whole string not the current word
option to keep the original space instead of replacing by a blank space (case of several or other like \t)
g is not needed (and better not to use here) because normaly only 1 occurence per line
you need to be sure that the pattern is not possible somewhere else (never a space after the previous number) because it's a small one

How to use sed to replace partial of a find

some of the lines in a file look like this:
LOB ("VALUE") STORE AS SECUREFILE "L_MS_WRKNPY_VALUE_0000000011"(
LOB ("INDEX_XML") STORE AS SECUREFILE "L_HRRPTRY_INDX_L_0000000011"(
What I can assume is that in the "*" the string starts with an L_ and ends in 10 chars number.
I want for each line that:
start with LOB (white-space before the LOB)
inside "" the first two letters are L_
line always ends with "(
replace the last 10 chars in the "" with variable.
all I manage to do is:
cat /tmp/out.log | sed 's/_[0-9_]*/$NUM/g' > /tmp/newout.log
to find the required rows I run:
grep "^ LOB" create_tables_clean.sql | grep "\"L_"
However I dont know how to combine the two and get what I wish.
sed -r 's/(\sLOB.*"L_.+_)([0-9]{10})("\()/\1'$myVar'\3/'
Replace $myVar with your variable, obviously.
I made three capturing groups:
(\sLOB.*"L_.+_) #catches everything until the 10 numbers
([0-9]{10}) #catches the 10 numbers
("\() #catches the last "(
The first capturing group matches only if your line starts with LOB (with a preceeding whitespace) and contains "L_.
Then you simply substitute the second capturing group (containing only the 10 numbers) with your variable while keeping the first and third capturing group (\1'$myVar'\3).
Your whole call would look like
cat /tmp/out.log | sed -r 's/(\sLOB.*"L_.+_)([0-9]{10})("\()/\1'$NUM'\3/g' > /tmp/newout.log
(notice I added the g-modifier to the regex, so it will match every occurence)
several useless characters are in accepted sed command, here is shorter one.
sed -r 's/(LOB.*"L_.*)([0-9]{10})("\()/\1'$myVar'\3/' file
Second, #Basti M, when you need echo with double quotes in string, use singe quotes, then you needn't escape the double quotes.
echo 'LOB ("VALUE") STORE AS SECUREFILE "L_MS_WRKNPY_VALUE_0000000011"('