sed: remove strings between two patterns leaving the 2nd pattern intact (half inclusive) - regex

I am trying to filter out text between two patterns, I've seen a dozen examples but didn't manage to get exactly what I want:
Sample input:
START LEAVEMEBE text
data
START DELETEME text
data
more data
even more
START LEAVEMEBE text
data
more data
START DELETEME text
data
more
SOMETHING that doesn't start with START
# sometimes it starts with characters that needs to be escaped...
I want to stay with:
START LEAVEMEBE text
data
START LEAVEMEBE text
data
more data
SOMETHING that doesn't start with START
# sometimes it starts with characters that needs to be escaped...
I tried running sed with:
sed 's/^START DELETEME/,/^[^ ]/d'
And got an inclusive removal, I tried adding "exclusions" (not sure if I really understand this syntax well):
sed 's/^START DELETEME/,/^[^ ]/{/^[^ ]/!d}'
But my "START DELETEME" line is still there (yes, I can grep it out, but that's ugly :) and besides - it DOES remove the empty line in this sample as well and I'd like to leave empty lines if they are my end pattern intact )
I am wondering if there is a way to do it with a single sed command.
I have an awk script that does this well:
BEGIN { flag = 0 }
{
if ($0 ~ "^START DELETEME")
flag=1
else if ($0 !~ "^ ")
flag=0
if (flag != 1)
print $0
}
But as you know "A is for awk which runs like a snail". It takes forever.
Thanks in advance.
Dave.

Using a loop in sed:
sed -n '/^START DELETEME/{:l n; /^[ ]/bl};p' input

GNU sed
sed '/LEAVEMEBE/,/DELETEME/!d;{/DELETEME/d}' file

I would stick with awk:
awk '
/LEAVE|SOMETHING/{flag=1}
/DELETE/{flag=0}
flag' file
But if you still prefer sed, here's another way:
sed -n '
/LEAVE/,/DELETE/{
/DELETE/b
p
}
' file

Related

How to use 'sed' to add dynamic prefix to each number in integer list?

How can I use sed to add a dynamic prefix to each number in an integer list?
For example:
I have a string "A-1,2,3,4,5", I want to transform it to string "A-1,A-2,A-3,A-4,A-5" - which means I want to add prefix of first integer i.e. "A-" to each number of the list.
If I have string like "B-1,20,300" then I want to transform it to string "B-1,B-20,B-300".
I am not able to use RegEx Capturing Groups because for global match they do not retain their value in subsequent matches.
When it comes to looping constructs in sed, I like to use newlines as markers for the places I have yet to process. This makes matching much simpler, and I know they're not in the input because my input is a text line.
For example:
$ echo A-1,2,3,4,5 | sed 's/,/\n/g;:a s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/; ta'
A-1,A-2,A-3,A-4,A-5
This works as follows:
s/,/\n/g # replace all commas with newlines (insert markers)
:a # label for looping
s/^\([^0-9]*\)\([^\n]*\)\n/\1\2,\1/ # replace the next marker with a comma followed
# by the prefix
ta # loop unless there's nothing more to do.
The approach is similar to #potong's, but I find the regex much more readable -- \([^0-9]*\) captures the prefix, \([^\n]*\) captures everything up to the next marker (i.e. everything that's already been processed), and then it's just a matter of reassembling it in the substitution.
Don't use sed, just use the other standard UNIX text manipulation tool, awk:
$ echo 'A-1,2,3,4,5' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
A-1,A-2,A-3,A-4,A-5
$ echo 'B-1,20,300' | awk '{p=substr($0,1,2); gsub(/,/,"&"p)}1'
B-1,B-20,B-300
This might work for you (GNU sed):
sed -E ':a;s/^((([^-]+-)[^,]+,)+)([0-9])/\1\3\4/;ta' file
Uses pattern matching and a loop to replace a number following a comma by the first column prefix and that number.
Assuming this is for shell scripting, you can do so with 2 seds:
set string = "A1,2,3,4,5"
set prefix = `echo $string | sed 's/^\([A-Z]\).*/\1/'`
echo $string | sed 's/,\([0-9]\)/,'$prefix'-\1/g'
Output is
A1,A-2,A-3,A-4,A-5
With
set string = "B-1,20,300"
Output is
B-1,B-20,B-300
Could you please try following(if ok with awk).
awk '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i !~ /^A/&&$i !~ /\"A/){
$i="A-"$i
}
}
}
1' Input_file
if your data in 'd' file, tried on gnu sed:
sed -E 'h;s/^(\w-).+/\1/;x;G;:s s/,([0-9]+)(.*\n(.+))/,\3\1\2/;ts; s/\n.+//' d

Sed - replace value in file with regex match in another file

I am trying to code a bash script in a build process where we only have a few tools (like grep, sed, awk) and I am trying to replace a value in an ini file with a value from a regular expression match in another.
So, I am matching something like "^export ADDRESS=VALUE" in file export_vars.h and putting VALUE into an ini file called config.ini in a line with "ADDRESS=[REPLACE]". So, I am trying to replace [REPLACE] with VALUE with one command in bash.
I have come across that sed can take an entire file and insert it into another with a command like
sed -i -e "/[REPLACE]/r export_vars.h" config.ini
I need to somehow refine this command to only read the pattern match from export_vars.h. Does anyone know how to do this?
sed is for simple substitutions on individual lines, that is all. You need to be looking at awk for what you're trying to do. Something like:
awk '
BEGIN { FS=OFS="=" }
NR==FNR {
if ( $1 == "export ADDRESS" ) {
value = $2
}
next
}
{ sub(/\[REPLACE\]/,value); print }
' export_vars.h config.ini
Untested, of course, since you didn't provide testable sample input/output.
Another in awk:
$ awk '/ADDRESS/{if(a!="")$0=a;else a=$NF}NR>FNR' export_vars.h config.ini
ADDRESS=VALUE
Explained:
$ awk '
/ADDRESS/ { # when ADDRESS is found in record
if(a!="") $0=a # if a is set (from first file), use it
else a=$NF } # otherwise set a with the last field
NR>FNR # print all record of the last file
' export_vars.h config.ini # mind the order
This solution does not tolerate space around = since $0 is replaced with $NF from the other file.

Find and append to Text Between Two Strings or Words using sed or awk

I am looking for a sed in which I can recognize all of the text in between two indicators and then replace it with a place holder.
For instance, the 1st indicator is a list of words
(no|noone|haven't)
and the 2nd indicator is a list of punctuation
Code:
(.|,|!)
From an input text such as
"Noone understands the plot. There is no storyline. I haven't
recommended this movie to my friends! Did you understand it?"
The desired result would be.
"Noone understands_AFFIX me_AFFIX. There is no storyline_AFFIX. I
haven't recommended_AFFIX this_AFFIX movie_AFFIX to_AFFIX my_AFFIX
friends_AFFIX! Did you understand it?"
I know that there is the following sed:
sed -n '/WORD1/,/WORD2/p' /path/to/file
which recognizes the content between two indicators. I have also found a lot of great information and resources here. However, I still cannot find a way to append the affix to each token of text that occurs between the two indicators.
I have also considered to use awk, such as
awk '{sub(/.*indic1 /,"");sub(/ indic2.*/,"");print;}' < infile
yet still, it does not allow me to append the affix.
Does anyone have a suggestion to do so, either with awk or sed?
Little more compact awk
$ awk 'BEGIN{RS=ORS=" ";s="_AFFIX"}
/[.,!]$/{f=0; $0=gensub(/(.)$/,"s\\1","g")}
f{$0=$0s}
/Noone|no|haven'\''t/{f=1}1' story
Noone understands_AFFIX the_AFFIX plot_AFFIX. There is no storyline_AFFIX. I haven't recommended_AFFIX this_AFFIX movie_AFFIX to_AFFIX my_AFFIX friends_AFFIX! Did you understand it?
Perl to the rescue!
perl -pe 's/(?:no(?:one)?|haven'\''t)\s*\K([^.,!]+)/
join " ", map "${_}_AFFIX", split " ", $1/egi
' infile > outfile
\K matches what's on its left, but excludes it from the replacement. In this case, it verifies the 1st indicator. (\K needs Perl 5.10+.)
/e evaluates the replacement part as code. In this case, the code splits $1 on whitespace, map adds _AFFIX to each of the members, and join joins them back into a string.
Here is one verbose awk command for the same:
s="Noone understands the plot. There is no storyline. I haven't recommended this movie to my friends! Did you understand it?"
awk -v IGNORECASE=1 -v kw="no|noone|haven't" -v pct='\\.|,|!' '{
a=0
for (i=2; i<=NF; i++) {
if ($(i-1) ~ "\\y" kw "\\y")
a=1
if (a && $i ~ pct "$") {
p = substr($i, length($i), 1)
$i = substr($i, 1, length($i)-1)
}
if (a)
$i=$i "_AFFIX" p
if(p) {
p=""
a=0
}
}
} 1'
Output:
Noone understands_AFFIX the_AFFIX plot_AFFIX. There is no storyline_AFFIX. I haven't recommended_AFFIX this_AFFIX movie_AFFIX to_AFFIX my_AFFIX friends_AFFIX! Did you understand it?

how to replace the next string after match (every) two blank lines?

is there a way to do this kind of substitution in Awk, sed, ...?
I have a text file with sections divived into two blank lines;
section1_name_x
dklfjsdklfjsldfjsl
section2_name_x
dlskfjsdklfjsldkjflkj
section_name_X
dfsdjfksdfsdf
I would to replace every "section_name_x" by "#section_name_x", this is, how to replace the next string after match (every) two blank lines?
Thanks,
Steve,
awk '
(NR==1 || blank==2) && $1 ~ /^section/ {sub(/section/, "#&")}
{
print
if (length)
blank = 0
else
blank ++
}
' file
#section1_name_x
dklfjsdklfjsldfjsl
#section2_name_x
dlskfjsdklfjsldkjflkj
#section_name_X
dfsdjfksdfsdf
hm....
Given your example data why not just
sed 's/^section[0-9]*_name.*/#/' file > newFile && mv newFile file
some seds support sed -i OR sed -i"" to overwrite the existing file, avoiding the && mv ... shown above.
The reg ex says, section must be at the beginning of the line, and can optionally contain a number or NO number at all.
IHTH
In gawk you can use the RT builtin variable:
gawk '{$1="#"$1; print $0 RT}' RS='\n\n' file
* Update *
Thanks to #EdMorton I realized that my first version was incorrect.
What happens:
Assigning to $1 causes the record to be rebuildt, which is not good in this cases since any sequence of white space is replaced by a single space between fields, and by the null string in the beginning and at the end of the record.
Using print adds an additional newline to the output.
The correct version:
gawk '{printf "%s", "#" $0 RT}' RS='\n\n\n' file

AWK to match strings beginning with a number

I want to print all the lines of a file where the first element of each line begins with a number using awk. Below are the details on the data contained in the file and command used:
filename contents:
12.44.4444goad ABCDEF/END
LMNOP/START joker
98.0 kites
command used:
awk '{ $1 ~ /^\d[a-zA-Z0-9]*/ }' filename
After running the above command, no results are displayed on the prompt.
Please let me know if there is any correction that needs to be made to the above command.
To print the lines starting with a digit, you can try the following:
awk '/^[[:digit:]]+/' file
as pointed out by #HenkLangeveld your syntax is incorrect. Also the regex \d is not available in awk.
If you only need to match at least one digit at the start of the line, all you need is ^ to match the start of a line and [0-9] to match a digit.
You can use curly brackets with an if statement:
awk '{if($1 ~ /^[0-9]/) print $0}' filename
But that would just be longhand for this:
awk '$1 ~ /^[0-9]/' filename
From your attempted solution, it looks like you want:
awk 'NF>1 && $1 ~ /^[0-9.]*$/' filename
You need to explicitly match the . if you want to include the decimal point, and you need the $ anchor to make the * meaningful. This will miss lines in which the first column looks like 5e39 or -2.3. You can try to catch those cases with:
awk 'NF>1 && $1 ~ /^-?[0-9.]*(e[0-9*])?$/' filename
but at this point I would tell you to use perl and stop trying to be more robust with awk.
Perhaps (this will print blank lines...not sure which behavior you want):
perl -lane 'use POSIX qw(strtod); my ($num, $end) = strtod($F[0]);
print unless $end;' filename
This uses strtod to parse the number and tells you the number of characters at the end of the string that are not part of it.
Drop the braces and the \d, like this:
awk ' $1 ~ /^[0-9]/ ' filename
Awk programs come in chunks. A chunk is a pattern block pair, where the block
defaults to { print }. (An empty pattern defaults to true.)
The /\d/ is a perl-ism and might work in some versions awk - not in those that I tried*. You need either the traditional /^[0-9]/ or the POSIX /^[[:digit:]]/ notation.
*
gnu and ast