How to use sed to replace partial of a find - regex

some of the lines in a file look like this:
LOB ("VALUE") STORE AS SECUREFILE "L_MS_WRKNPY_VALUE_0000000011"(
LOB ("INDEX_XML") STORE AS SECUREFILE "L_HRRPTRY_INDX_L_0000000011"(
What I can assume is that in the "*" the string starts with an L_ and ends in 10 chars number.
I want for each line that:
start with LOB (white-space before the LOB)
inside "" the first two letters are L_
line always ends with "(
replace the last 10 chars in the "" with variable.
all I manage to do is:
cat /tmp/out.log | sed 's/_[0-9_]*/$NUM/g' > /tmp/newout.log
to find the required rows I run:
grep "^ LOB" create_tables_clean.sql | grep "\"L_"
However I dont know how to combine the two and get what I wish.

sed -r 's/(\sLOB.*"L_.+_)([0-9]{10})("\()/\1'$myVar'\3/'
Replace $myVar with your variable, obviously.
I made three capturing groups:
(\sLOB.*"L_.+_) #catches everything until the 10 numbers
([0-9]{10}) #catches the 10 numbers
("\() #catches the last "(
The first capturing group matches only if your line starts with LOB (with a preceeding whitespace) and contains "L_.
Then you simply substitute the second capturing group (containing only the 10 numbers) with your variable while keeping the first and third capturing group (\1'$myVar'\3).
Your whole call would look like
cat /tmp/out.log | sed -r 's/(\sLOB.*"L_.+_)([0-9]{10})("\()/\1'$NUM'\3/g' > /tmp/newout.log
(notice I added the g-modifier to the regex, so it will match every occurence)

several useless characters are in accepted sed command, here is shorter one.
sed -r 's/(LOB.*"L_.*)([0-9]{10})("\()/\1'$myVar'\3/' file
Second, #Basti M, when you need echo with double quotes in string, use singe quotes, then you needn't escape the double quotes.
echo 'LOB ("VALUE") STORE AS SECUREFILE "L_MS_WRKNPY_VALUE_0000000011"('

Related

Convert regex positive look ahead to sed operation

I would like to sed to find and replace every occurrence of - with _ but only before the first occurrence of = on every line.
Here is a dataset to work with:
ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"
In the end the dataset should look like this:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
I found this regex will match properly.
\-(?=.*=)
However the regex uses positive lookaheads and it appears that sed (even with -E, -e or -r) dose not know how to work with positive lookaheads.
I tried the following but keep getting Invalid preceding regular expression
cat dataset.txt | sed -r "s/-(?=.*=)/_/g"
Is it possible to convert this in a usable way with sed?
Note, I do not want to use perl. However I am open to awk.
You can use
sed ':a;s/^\([^=]*\)-/\1_/;ta' file
See the online demo:
#!/bin/bash
s='ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"'
sed ':a; s/^\([^=]*\)-/\1_/;ta' <<< "$s"
Output:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
Details:
:a - setting a label named a
s/^\([^=]*\)-/\1_/ - find any zero or more chars other than a = char from the start of string (while capturing into Group 1 (\1)) and then matches a - char, and replaces with Group 1 value (\1) and a _ (that replaces the found - char)
ta - jump to lable a location upon successful replacement. Else, stop.
You might also use awk setting the field separator to = and replace all - with _ for the first field.
To print only the replaced lines:
awk 'BEGIN{FS=OFS="="}gsub("-", "_", $1)' file
Output
ke_y_0_1="foo"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
If you want to print all lines:
awk 'BEGIN{FS=OFS="="}{gsub("-", "_", $1);print}' file

How to use sed to search and replace a pattern who appears multiple times in the same line?

Because the question can be misleading, here is a little example. I have this kind of file:
some text
some text ##some-text-KEY-some-other-text##
text again ##some-text-KEY-some-other-text## ##some-text-KEY-some-other-text##
again ##some-text-KEY-some-other-text-KEY-text##
some text with KEY ##KEY-some-text##
blabla ##KEY##
In this example, I want to replace each occurrence of KEY- inside a pair of ## by VALUE-. I started with this sed command:
sed -i 's/\(##[^#]*\)KEY-\([^#]*##\)/\1VALUE-\2/g'
Here is how it works:
\(##[^#]*\): create a first group composed of two # and any characters except # ...
KEY-: ... until the last occurrence of KEY- on that line
\([^#]*##\): and create a second group with all the characters except # until the next pair of #.
The problem is my command can't handle correctly the following line because there are multiple KEY- inside my pair of ##:
again ##some-text-KEY-some-other-text-KEY-text##
Indeed, I get this result:
again ##some-text-KEY-some-other-text-VALUE-text##
If I want to replace all the occurrences of KEY- in that line, I have to run my command multiple times and I prefer to avoid that. I also tried with lazy operators but the problem is the same.
How can I create a regex and a sed command who can handle correctly all my file?
The problem is rather complex: you need to replace all occurrences of some multicharacter text inside blocks of text between identical multicharacter delimiters.
The easiest and safest way to solve the task is using Perl:
perl -i -pe 's/(##)(.*?)(##)/$end_delim=$3; "$1" . $2=~s|KEY-|VALUE-|gr . "$end_delim"/ge' file
See the online demo.
The (##)(.*?)(##) pattern will match strings between two adjacent ## substrings capturing the start delimiter into Group 1, end delimiter in Group 3, and all text in between into Group 2. Since the regex substitution re-sets all placeholders, the temporary variable is used to keep the value of the end delimiter ($end_delim=$3), then, "$1" . $2=~s|KEY-|VALUE-|gr . "$end_delim" replaces the match with the value in the Group 1 of the first match (the first ##), then the Group 2 value with all KEY- replaced with VALUE-, and then the end delimiter.
If there are no KEY-s in between matches on the same line you may use a branch with sed by enclosing your command with :A and tA:
sed -i ':A; s/\(##[^#]*\)KEY-\([^#]*##\)/\1VALUE-\2/g; tA' file
Note you missed the first placeholder in \VALUE-\2, it should be \1VALUE-\2.
See the online demo:
s="some KEY- text
some text ##some-text-KEY-some-other-text##
text again ##some-text-KEY-some-other-text## ##some-text-KEY-some-other-text##
again ##some-text-KEY-some-other-text-KEY-text##
some text with KEY ##KEY-some-text##
blabla ##KEY##"
sed ':A; s/\(##[^#]*\)KEY-\([^#]*##\)/\1VALUE-\2/g; tA' <<< "$s"
Output:
some KEY- text
some text ##some-text-VALUE-some-other-text##
text again ##some-text-VALUE-some-other-text## ##some-text-VALUE-some-other-text##
again ##some-text-VALUE-some-other-text-VALUE-text##
some text with KEY ##VALUE-some-text##
blabla ##KEY##
More details:
sed allows the usage of loops and branches. The :A in the code above is a label, a special location marker that can be "jumped at" using the appropriate operator. t is used to create a branch, this "command jumps to the label only if the previous substitute command was successful". So, once the pattern matched and the replacement occurred, sed goes back to where it was and re-tries a match. If it is not successful, sed goes on to search for the matches further in the string. So, tA means go back to the location marked with A if there was a successful search-and-replace operation.
This might work for you (GNU sed):
sed -E 's/##/\n/g;:a;s/^([^\n]*(\n[^\n]*\n[^\n]*)*\n[^\n]*)KEY-/\1VALUE-/;ta;s/\n/##/g' file
Convert ##'s to newlines. Using a loop, replace VAL- between matched newlines to VALUE-. When all done replace newlines by ##'s.

How to match and replace string following the match via sed or awk

I have a file which I want to modify into a new file using cat.
So the file contains lines like:
name "myName"
place "xyz"
and so on....
I want these lines to be changed to
name "Jon"
place "paris"
I tried to do it like this but its not working:
cat originalFile | sed 's/^name\*/name "Jon"/' > tempFile
I tried using all sorts of special characters and it did not work. I am unable to recognize the space characters after name and then "myName".
You may match the rest of the line using .*, and you may match a space with a space, or [[:blank:]] or [[:space:]]:
sed 's/^\(name[[:space:]]\).*/\1"Jon"/;s/^\(place[[:space:]]\).*/\1"paris"/' originalFile > tempFile
Note there are two replace commands here joined with s semicolon. The first parts are wrapped with a capturing group that is necessary because the space POSIX character class is not literal and in order to keep it after replacing the \1 backreference should be used (to insert the text captured with Group 1).
See the online demo:
s='name "myName"
place "xyz"'
sed 's/^\(name[[:space:]]\).*/\1"Jon"/;s/^\(place[[:space:]]\).*/\1"paris"/' <<< "$s"
Output:
name "Jon"
place "paris"
An awk alternative:
awk '$1=="name"{$0="name \"Jon\""} $1=="place"{$0="place \"paris\""} 1' originalFile
It will work when there're space(s) before name or place.
It's not regex match here but just string compare.
awk separates fields by space characters which including \n or .
Append > tempFile to it when the results seems correct to you.

How to match until the last occurrence of a character in bash shell

I am using curl and cut on a output like below.
var=$(curl https://avc.com/actuator/info | tr '"' '\n' | grep - | head -n1 | cut -d'-' -f -1, -3)
Varible var gets have two kinds of values (one at a time).
HIX_MAIN-7ae526629f6939f717165c526dad3b7f0819d85b
HIX-R1-1-3b5126629f67892110165c524gbc5d5g1808c9b5
I am actually trying to get everything until the last '-'. i.e HIX-MAIN or HIX-R1-1.
The command shown works fine to get HIX-R1-1.
But I figured this is the wrong way to do when I have something something like only 1 - in the variable; it is getting me the entire variable value (e.g. HIX_MAIN-7ae526629f6939f717165c526dad3b7f0819d85b).
How do I go about getting everything up to the last '-' into the variable var?
This removes everything from the last - to the end:
sed 's/\(.*\)-.*/\1/'
As examples:
$ echo HIX_MAIN-7ae52 | sed 's/\(.*\)-.*/\1/'
HIX_MAIN
$ echo HIX-R1-1-3b5126629f67 | sed 's/\(.*\)-.*/\1/'
HIX-R1-1
How it works
The sed substitute command has the form s/old/new/ where old is a regular expression. In this case, the regex is \(.*\)-.*. This works because \(.*\)- is greedy: it will match everything up to the last -. Because of the escaped parens,\(...\), everything before the last - will be saved in group 1 which we can refer to as \1. The final .* matches everything after the last -. Thus, as long as the line contains a -, this regex matches the whole line and the substitute command replaces the whole line with \1.
You can use bash string manipulation:
$ foo=a-b-c-def-ghi
$ echo "${foo%-*}"
a-b-c-def
The operators, # and % are on either side of $ on a QWERTY keyboard, which helps to remember how they modify the variable:
#pattern trims off the shortest prefix matching "pattern".
##pattern trims off the longest prefix matching "pattern".
%pattern trims off the shortest suffix matching "pattern".
%%pattern trims off the longest suffix matching "pattern".
where pattern matches the bash pattern matching rules, including ? (one character) and * (zero or more characters).
Here, we're trimming off the shortest suffix matching the pattern -*, so ${foo%-*} will get you what you want.
Of course, there are many ways to do this using awk or sed, possibly reusing the sed command you're already running. Variable manipulation, however, can be done natively in bash without launching another process.
You can reverse the string with rev, cut from the second field and then rev again:
rev <<< "$VARIABLE" | cut -d"-" -f2- | rev
For HIX-R1-1----3b5126629f67892110165c524gbc5d5g1808c9b5, prints:
HIX-R1-1---
I think you should be using sed, at least after the tr:
var=$(curl https://avc.com/actuator/info | tr '"' '\n' | sed -n '/-/{s/-[^-]*$//;p;q}')
The -n means "don't print by default". The /-/ looks for a line containing a dash; it then executes s/-[^-]*$// to delete the last dash and everything after it, followed by p to print and q to quit (so it only prints the first such line).
I'm assuming that the output from curl intrinsically contains multiple lines, some of them with unwanted double quotes in them, and that you need to match only the first line that contains a dash at all (which might very well not be the first line). Once you've whittled the input down to the sole interesting line, you could use pure shell techniques to get the result that's desired, but getting the sole interesting line is not as trivial as some of the answers seem to be assuming.

Finding and replacing a numeric string between colons, before a space, using sed?

I am attempting to change all coordinate information in a fastq file to zeros. My input file is composed of millions of entries in the following repeating 4-line structure:
#HWI-SV007:140:C173GACXX:6:2215:16030:89299 1:N:0:CAGATC
GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAG
+
###FFFDFHGGDHIIHGIJJJJJJJJJJJGIJJJJJJJIIIDHGHIGIJJIIIJJIJ
I would like to replace the two numeric strings in the first line 16030:89299 with zeros in a generic way, such that any numeric string between the colons, before the space, is replaced. I would like the output to appear as follows, replacing the two strings globally throughout the file with zeros:
#HWI-SV007:140:C173GACXX:6:2215:0:0 1:N:0:CAGATC
GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAG
+
###FFFDFHGGDHIIHGIJJJJJJJJJJJGIJJJJJJJIIIDHGHIGIJJIIIJJIJ
I am attempting to do this using the following sed:
sed 's/:^[0-9]+$:^[0-9]+$\s/:0:0 /g'
However, this does not behave as expected.
I think you will need to use sed -r option.
Also, ^ matches beginning of the line and $ matches end of the line.
Thus this is the command line that works against your sample.
sed -r 's/:[0-9]+:[0-9]+\s/:0:0 /g'
some alternative
awk -F ":" 'BEGIN{ OFS = ":" }{ if ( NF > 1 ) {$6 = 0; sub( /^[0-9]*/, 0, $7)}; print $0 }' YourFile
using column separate by :
sed 's/^\(\([^:]*:\)\{5\}\)[^[:blank:]]*/\10:0/' YourFile
using 5 first element separate by : thant space as delimiter
for your sed
sed 's/:[0-9]+:[0-9]+\(\s\)/:0:0\1/'
^and $ are relative to the whole string not the current word
option to keep the original space instead of replacing by a blank space (case of several or other like \t)
g is not needed (and better not to use here) because normaly only 1 occurence per line
you need to be sure that the pattern is not possible somewhere else (never a space after the previous number) because it's a small one