Find and replace text with all-inclusive wild card - replace

I have a file like this:
foo and more
stuff
various stuff
variable number of lines
with a bar
Stuff I want to keep
More stuff I want to Keep
These line breaks are important
I want to replace everything between foo and bar so that I get:
foo testtext bar
Stuff I want to keep
More stuff I want to Keep
These line breaks are important
as recommended in another thread I tried:
sed -e '/^foo/,/^bar/{/^foo/b;/^bar/{i testtext' -e 'b};d}' file.txt
Is there a more general-purpose solution to find and replace everything between foo and bar, absolutely no matter what it is?

You can use the following sed script:
replace.sed:
# Check for "foo"
/\bfoo\b/ {
# Define a label "a"
:a
# If the line does not contain "bar"
/\bbar\b/!{
# Get the next line of input and append
# it to the pattern buffer
N
# Branch back to label "a"
ba
}
# Replace everything between foo and bar
s/\(\bfoo\)\b.*\b\(bar\b\)/\1TEST DATA\2/
}
Call it like this:
sed -f extract.sed input.file
Output:
fooTEST DATAbar
Stuff I want to keep
More stuff I want to Keep
These line breaks are important
If you want to pass the begin and ending delimiter using a shell script you can do it like this (comments removed for brevity):
#!/bin/bash
begin="foo"
end="bar"
replacement=" Hello world "
sed -r '/\b'"$begin"'\b/{
:a;/\b'"$end"'\b/!{
N;ba
}
s/(\b'"$begin"')\b.*\b('"$end"'\b)/\1'"$replacement"'\2/
}' input.file
The above works as long as $start and $end won't contain regex special characters, to escape them properly use the following code:
#!/bin/bash
begin="foo"
end="bar"
replace=" Hello\1world "
# Escape variables to be used in regex
beginEsc=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$begin")
endEsc=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$end")
replaceEsc=$(sed 's/[&/\]/\\&/g' <<<"$replace")
sed -r '/\b'"$beginEsc"'\b/{
:a;/\b'"$endEsc"'\b/!{
N;ba
}
s/(\b'"$beginEsc"')\b.*\b('"$endEsc"'\b)/\1'"$replaceEsc"'\2/
}' input.file

Related

Insert text into line if that line doesn't contain another string using sed

I am merging a number of text files on a linux server but the lines in some differ slightly and I need to unify them.
For example some files will have line like
id='1244' group='american' name='fred',american
Other files will be like
id='2345' name='frank', english
finally others will be like
id='7897' group='' name='maria',scottish
what I need to do is, if group='' or group is not in the string at all I need to add it somewhere before the comma setting it to the text after the comma so in the 2nd example above the line would become:
id='2345' name='frank' group='english',english
and the same in the last example which would become
id='7897' name='maria' group='scottish',scottish
This is going into a bash script. I can't actually delete the line and add to the end of the file as it relates to the following line.
I've used the following:
sed -i.bak 's#group=""##' file
which deletes the group="" string so the lines will either contain group='something' or wont contain it at all and that works
Then I tried to add the group if it doesn't exist using the following:
sed -i.bak '/group/! s#,(.*$)#group="\1",\1#' file
but that throws up the error
sed: -e expression #1, char 38: invalid reference \1 on `s' command's RHS
EDIT by Ed Morton to create a single sample input file and expected output:
Sample Input:
id='1244' group='american' name='fred',american
foo
id='2345' name='frank', english
bar
id='7897' group='' name='maria',scottish
Expected Output:
id='1244' group='american' name='fred',american
foo
id='2345' name='frank' group='english',english
bar
id='7897' name='maria' group='scottish',scottish
sed -r "
/group=''/ s/// # group is empty, remove it
/group=/! s/,[[:blank:]]*(.+)/ group='\\1',\\1/ # group is missing, add it
" file
id='1244' group='american' name='fred',american
foo
id='2345' name='frank' group='english',english
bar
id='7897' name='maria' group='scottish',scottish
The foo and bar lines are untouched because the s/// command did not match a comma followed by characters.
something like
sed '
/^[^,]*group[^,]*,/ ! {
s/, *\(.*\)/ group='\''\1'\'', \1/
}
/^[^,]*group='\'\''/ {
s/group='\'\''\([^,]*\), *\(.*\)/group='\''\2'\''\1, \2/
}
'
This GNU awk may help:
awk -v sq="'" '
BEGIN{RS="[ ,\n]+"; FS="="; found=0}
$1=="group"{
if($2==sq sq)
{next}
else
{found=1}
}
NF>1{
printf "%s=%s ",$1,$2
}
NF==1{
if(!found)
{printf "group=%s",$1}
print ","$1
found=0
}
' file
The script relies on the record separator RS which is set to get all key='value' pairs.
If the key group isn't found or is empty, it is printed when reaching a record with only one field.
Note that the variable sq holds the single quote character and is used to detect empty group field.
Sed can be pretty ugly. And your data format appears to be somewhat inconsistent. This MIGHT work for you:
$ sed -e "/group='[a-z]/b e" -e "s/group='' *//" -e "s/,\([a-z]*\)$/ group='\1', /" -e ':e' input.txt
Broken out for easier reading, here's what we're doing:
/group='[a-z]/b e - If the line contains a valid group, branch to the end.
s/group='' *// - Remove any empty group,
s/,\([a-z]*\)$/ group='\1', / - add a new group based on your specs
:e - branch label for the first command.
And then the default action is to print the line.
I really don't like manipulating data this way. It's prone to error, and you'll be further ahead reading this data into something that accurately stores its data structure, then prints the data according to a new structure. A more robust solution would likely be tied directly to whatever is producing or consuming this data, and would not sit in the middle like this.

How to change a new line not started by a (") character to another string

I need to change a newline not started by " (quote) to another printable word, Like \n or <br>.
I tried this, but it does not work:
cat file.csv | sed 's/^[^\"]/\<br\>/g'
An example of an input file:
cat file.csv
"a","bcde","fgh
ijk
mnopq
asd"
The output I need:
cat file.csv
"a","bcde","fgh<br>ijk<br> mnopq<br>asd"
I don't think targeting a newline that isn't followed by a double quote is a reliable way to do what you want. For instance, it doesn't handle cases like this one:
"abc","def
"
A more reliable way consists to check if there's an odd number of double quotes in a line and to append next lines until this number becomes even, then you can proceed to the replacement:
sed -E '/^("[^"]*"[^"]*)*"[^"]*$/{:a;N;/^("[^"]*"[^"]*)*$/{s/\n/<br>/g;bb};ba;};:b;' file
-E switches the regex syntax to ERE (Extended Regular Expression)
-i changes the file content in-place (When you are sure, add this switch)
command details:
/^("[^"]*"[^"]*)*"[^"]*$/ # check if the line has an odd number of quotes
{ # when the match succeeds:
:a; # define a label "a"
N; # append the next line to the pattern space
/^("[^"]*"[^"]*)*$/ # check if the pattern space contains an even number of quotes
{ # in this case:
s/\n/<br>/g; # proceed to the replacement
bb; # go to label "b"
};
ba; # go to label "a"
};
:b; # define the label "b"
You can use conditional branching in sed:
sed -i -E ':a;N;s~\n([^"])~<br\>\1~;ba' file.csv
# check results
cat file.csv
"a","bcde","fgh<br>ijk<br> mnopq<br>asd"
Read more about it

Search for text between two patterns with multiple lines in between

I have a simple question. I have a file containing:
more random text
*foo*
there
is
random
text
here
*foo*
foo
even
more
random
text
here
foo
more random text
(to clarify between which parts i want the result from, i added the *'s next to foo. The *'s are not in the file.)
I only want to print out the multiple lines between the first 2 instances of foo.
I tried searching for ways to let "foo" occur only once and then remove it. But i didnt get that far. However i did find the way to remove all the "more random text" using: sed '/foo/,/foo/p' but i couldnt find a way using sed, or awk to only match ones and print the output.
Can anyone help me out?
With sed:
$ sed -n '/foo/{:a;n;/foo/q;p;ba}' infile
there
is
random
text
here
Explained:
/foo/ { # If we match "foo"
:a # Label to branch to
n # Discard current line, read next line (does not print because of -n)
/foo/q # If we match the closing "foo", then quit
p # Print line (is a line between two "foo"s)
ba # Branch to :a
}
Some seds complain about braces in one-liners; in those cases, this should work:
sed -n '/foo/ {
:a
n
/foo/q
p
ba
}' infile
$ awk '/foo/{++c;next} c==1' file
there
is
random
text
here
$ awk '/foo/{++c;next} c==3' file
even
more
random
text
here
or with GNU awk for multi-char RS you COULD do:
$ awk -v RS='(^|\n)[^\n]*foo[^\n]*(\n|$)' 'NR==2' file
there
is
random
text
here
$ awk -v RS='(^|\n)[^\n]*foo[^\n]*(\n|$)' 'NR==4' file
even
more
random
text
here
See https://stackoverflow.com/a/17914105/1745001 for other ways of printing after a condition is true.
Since checking for "foo" (using /foo/) is relatively expensive, the following avoids that check and will work with all awks worthy of the name:
awk 'c==2 {next} /foo/{++c;next} c==1' file

Bash how to replace comment backslash with nothing

I have a strign in bash with the following format:
// comment.
I want to obtain a new variable with comment alone (no backslash) and I don't want to depend on the // begin the first two characters in the string. How can I do this?
I have tried this:
nline=${line/%/////}
echo $nline
To use string substitution but it doesn't work.
Perhaps you want the # substitution?
$ a='// this is a comment'
$ printf "%s\n" "${a#// }"
this is a comment
$ a='not a comment'
$ printf "%s\n" "${a#// }"
not a comment
And as SergA pointed out, a little better patterns for our variable extraction can save us the need for the sed solution below:
$ a="first //a comment"
$ printf "%s" "${a##*//}"
If you just want to get the comment part of a line anywhere it is you could use sed like so:
$ a="first //a comment"
$ printf "%s\n" "$a" | sed -e 's,^.*// \?,,'
a comment
which of course you could store in another variable:
nline=$(printf "%s" "$a" | sed -e 's,^.*// \?,,')
(note also that I remved the \n from the printf)
Remove first two characters:
echo ${nline:2}
% matches the end of the string. # matches the beginning of the string.
Since you said you wanted neither of those you don't want either % or # in there.
Also you need to escape / in a /-delimited pattern.
nline=${line/\/\/}
echo "$nline"
This will remove the first // from the string no matter where it is or what comes before it. So foo // comment will become foo comment, etc.
If you want to also remove any surrounding spaces from the // string then you need to do a bit more work and can't so easily use string substitution for it.

How can I use perl/awk/sed to search for all occurrences of text wrapped in quotes within a file and then delete them?

How can I use perl, awk, or sed to search for all occurrences of text wrapped in quotes within a file, and print the result of deleting those occurrences from the file? I do not want to actually alter the file, but simply print the result of altering the file like sed does.
For example, say the file contains the following :
data|more data|"not important"|"more unimportant stuff"
I need it to print out:
data|more data||
But I want to leave the file intact. I tried using sed but I could not get it to accept regexs.
I have tried something like this:
sed -e 's/\<["]+[^"]*["]+\>//g' file.txt
but it does nothing and prints the original file.
Any Thoughts?
Using a perl one-liner:
perl -pe 's/".*?"//g' file
Explanation:
Switches:
-p: Creates a while(<>){...; print} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
You seem to have a few extra characters in your sed command.
sed -e 's/"[^"]*"//g' file.txt
Input:
"quoted text is here" but not quoted there
never more
"hello world" foo bar
data|more data|"not important"|"more unimportant stuff"
Output:
but not quoted there
never more
foo bar
data|more data||
echo 'data|more data|"not important"|"more unimportant stuff"' | sed -E 's/"[^"]*"//g'
You don't need to declare a character class (brackets) for only one character...
my $cnt=qq(data|more data|"not important"|"more unimportant stuff");
my #arr = $cnt =~ m{(?:^|\|)([^"][^\|]*[^"])(?=\||$)}ig;
print "#arr";
This code might help you..