Regex with sed, search across multiple lines - regex

I'd like to concatenate a few lines, perform a regex match on them and print them.
I tried to do that with sed.
Namely, I used:
cat add | sed -rn '/FIRST_LINE_REGEX/,/LAST_LINE_REGEX/s/SOME_REGEX/&/p'
It prints only the lines that match SOME_REGEX while I expect it to concatenate the lines from the range between FIRST_LINE and LAST_LINE and print the concatenation if it matches SOME_REGEX.

When using '/FIRST_LINE_REGEX/,/LAST_LINE_REGEX/' each line is still processed separately, to concatenate lines you need to use the hold space or the N command to append the next line to the pattern space. Here is one option:
cat add | sed -rn '/FIRST_LINE_REGEX/{:a;N;/LAST_LINE_REGEX/{/SOME_REGEX/p;d};ba}'
Commented version:
cat add | sed -rn '/FIRST_LINE_REGEX/ { # if line matches /FIRST_LINE_REGEX/
:a # create label a
N # read next line into pattern space
/LAST_LINE_REGEX/ { # if line matches /LAST_LINE_REGEX/
/SOME_REGEX/p # print if line matches /SOME_REGEX/
d # return to start
}
ba # return to label a
}'

sed -n '/FIRST_LINE_REGEX/,/LAST_LINE_REGEX/p' add | sed -n '/FIRST_LINE_REGEX/ b check; H; $ b check; b; :check; x; /SOME_REGEX/p'
The motivation of the second pipe part comes from here: https://stackoverflow.com/a/6287105/992834
Edit: Amended for when SOME_REGEX is in between.

Related

How to change a new line not started by a (") character to another string

I need to change a newline not started by " (quote) to another printable word, Like \n or <br>.
I tried this, but it does not work:
cat file.csv | sed 's/^[^\"]/\<br\>/g'
An example of an input file:
cat file.csv
"a","bcde","fgh
ijk
mnopq
asd"
The output I need:
cat file.csv
"a","bcde","fgh<br>ijk<br> mnopq<br>asd"
I don't think targeting a newline that isn't followed by a double quote is a reliable way to do what you want. For instance, it doesn't handle cases like this one:
"abc","def
"
A more reliable way consists to check if there's an odd number of double quotes in a line and to append next lines until this number becomes even, then you can proceed to the replacement:
sed -E '/^("[^"]*"[^"]*)*"[^"]*$/{:a;N;/^("[^"]*"[^"]*)*$/{s/\n/<br>/g;bb};ba;};:b;' file
-E switches the regex syntax to ERE (Extended Regular Expression)
-i changes the file content in-place (When you are sure, add this switch)
command details:
/^("[^"]*"[^"]*)*"[^"]*$/ # check if the line has an odd number of quotes
{ # when the match succeeds:
:a; # define a label "a"
N; # append the next line to the pattern space
/^("[^"]*"[^"]*)*$/ # check if the pattern space contains an even number of quotes
{ # in this case:
s/\n/<br>/g; # proceed to the replacement
bb; # go to label "b"
};
ba; # go to label "a"
};
:b; # define the label "b"
You can use conditional branching in sed:
sed -i -E ':a;N;s~\n([^"])~<br\>\1~;ba' file.csv
# check results
cat file.csv
"a","bcde","fgh<br>ijk<br> mnopq<br>asd"
Read more about it

Bash one liner - Get line group from file if matched by a string

I have a log file like;
A
some lines
some lines
Z
A
some lines
some lines
IMPORTANT text
some lines
Z
A
some lines
more lines
some lines
Z
A
some lines
IMPORTANT text
more lines
some lines
Z
I only need lines between A-Z if it has the IMPORTANT word. So the desired output is;
A
some lines
some lines
IMPORTANT text
some lines
Z
A
some lines
IMPORTANT text
more lines
some lines
Z
The line count between A-Z is variable. I tried too many commands like:
grep 'IMPORTANT' -A 3 -B 3 x.log | sed -n '/^A$/,/^Z$/p'
grep 'IMPORTANT' -A 3 -B 3 x.log | grep -E '^Z$' -B 5 | grep -E '^A$' -A 5
Some printed not needed lines from another group, other printed lines without starting or ending points... And all failed.
Is there any way to do this with a one liner?
Using gnu-awk you can do:
awk 'BEGIN{RS=ORS="\nZ\n"} /^A/ && /IMPORTANT/' file
A
some lines
some lines
IMPORTANT text
some lines
Z
A
some lines
IMPORTANT text
more lines
some lines
Z
BEGIN{RS=ORS="\nZ\n"} sets input ad output record separators as Z with newlines on either side.
/^A/ && /IMPORTANT/ ensures that each record starts with A and has IMPORTANT in it.
each matching is printed as that is default action in awk
With sed:
sed -n '/^A$/{:a;N;/\nZ$/!ba;/IMPORTANT/p}' x.log
Explained:
/^A$/ { # If line matches ^A$...
:a # Label to branch to
N # Append next line to pattern space
/\nZ$/!ba # Branch to :a if pattern space doesn't end with \nZ
/IMPORTANT/p # Print if pattern space contains IMPORTANT
}
This basically appends lines until we have a complete block in the pattern space, then prints it if it matches IMPORTANT and just discards it otherwise.
The -n option prevents output when we reach the end of a cycle.
Some seds don't support oneliners with command grouping ({} with ;) or inline comments. For some seds, having p; instead of p works, and for others, this (basically the above minus comments) should work (and is POSIX compliant):
sed -n '/^A$/{
:a
N
/\nZ$/!ba
/IMPORTANT/p
}' x.log

Remove matching and previous line

I need to remove a line containing "not a dynamic executable" and a previous line from a stream using grep, awk, sed or something other. My current working solution would be to tr the entire stream to strip off newlines then replace the newline preceding my match with something else using sed then use tr to add the newlines back in and then use grep -v. I'm somewhat weary of artifacts with this approach, but I don't see how else I can to it at the moment:
tr '\n' '|' | sed 's/|\tnot a dynamic executable/__MY_REMOVE/g' | tr '|' '\n'
EDIT:
Input is a list of mixed files piped to xargs ldd, basically I want to ignore all output about non library files since that has nothing to do with what I'm doing next. I didn't want to use lib*.so mask since that could concievably be different
Most simply with pcregrep in multi-line mode:
pcregrep -vM '\n\tnot a dynamic executable' filename
If pcregrep is not available to you, then awk or sed can also do this by reading one line ahead and skipping the printing of previous lines when a marker line appears.
You could be boring (and sensible) with awk:
awk '/^\tnot a dynamic executable/ { flag = 1; next } !flag && NR > 1 { print lastline; } { flag = 0; lastline = $0 } END { if(!flag) print }' filename
That is:
/^\tnot a dynamic executable/ { # in lines that start with the marker
flag = 1 # set a flag
next # and do nothing (do not print the last line)
}
!flag && NR > 1 { # if the last line was not flagged and
# is not the first line
print lastline # print it
}
{ # and if you got this far,
flag = 0 # unset the flag
lastline = $0 # and remember the line to be possibly
# printed.
}
END { # in the end
if(!flag) print # print the last line if it was not flagged
}
But sed is fun:
sed ':a; $! { N; /\n\tnot a dynamic executable/ d; P; s/.*\n//; ba }' filename
Explanation:
:a # jump label
$! { # unless we reached the end of the input:
N # fetch the next line, append it
/\n\tnot a dynamic executable/ d # if the result contains a newline followed
# by "\tnot a dynamic executable", discard
# the pattern space and start at the top
# with the next line. This effectively
# removes the matching line and the one
# before it from the output.
# Otherwise:
P # print the pattern space up to the newline
s/.*\n// # remove the stuff we just printed from
# the pattern space, so that only the
# second line is in it
ba # and go to a
}
# and at the end, drop off here to print
# the last line (unless it was discarded).
Or, if the file is small enough to be completely stored in memory:
sed ':a $!{N;ba}; s/[^\n]*\n\tnot a dynamic executable[^\n]*\n//g' filename
Where
:a $!{ N; ba } # read the whole file into
# the pattern space
s/[^\n]*\n\tnot a dynamic executable[^\n]*\n//g # and cut out the offending bit.
This might work for you (GNU sed):
sed 'N;/\n.*not a dynamic executable/d;P;D' file
This keeps a moving window of 2 lines and deletes them both if the desired string is found in the second. If not the first line is printed and then deleted and then next line appended and the process repeated.
Always keep in mind that while grep and sed are line-oriented awk is record-oriented and so can easily handle problems that span multiple lines.
It's a guess given you didn't post any sample input and expected output but it sounds like all you need is (using GNU awk for multi-char RS):
awk -v RS='^$' -v ORS= '{gsub(/[^\n]+\n\tnot a dynamic executable/,"")}1' file

How to exclude patterns in regex conditionally in bash?

This is the content of input.txt:
hello=123
1234
stack=(23(4))
12341234
overflow=345
=
friends=(987)
Then I'm trying to match all the lines with equal removing the external parenteses (if the line has it).
To be clear, this is the result I'm looking for:
hello=123
stack=23(4)
overflow=345
friends=987
I toughth in something like this:
cat input.txt | grep -Poh '.+=(?=\()?.+(?=\))?'
But does not returns nothing. What am I doing wrong? Do you have any idea to do this? I'm so interested.
Using awk:
awk 'BEGIN{FS=OFS="="} NF==2 && $1!=""{gsub(/^\(|\)$/, "", $2); print}' file
hello=123
stack=23(4)
overflow=345
friends=987
Here is an alternate way with sed:
sed -nr ' # Use n to disable default printing and r for extended regex
/.+=.+/ { # Look for lines with key value pairs separated by =
/[(]/!ba; # If the line does not contain a paren branch out to label a
s/\(([^)]+)\)/\1/; # If the line contains a paren find a subset and print that
:a # Our label
p # print the line
}' file
$ sed -nr '/.+=.+/{/[(]/!ba;s/\(([^)]+)\)/\1/;:a;p}' file
hello=123
stack=23(4)
overflow=345
friends=987

AWK end of line sign in regular expressions

I have a simple awk script named "script.awk" that contains:
/\/some_simple_string/ { print $0;}
I'm using it to parse some file that contains:
(by using: cat file | awk -f script.awk)
14 catcat one_two/some_thing
15 catcat one_three/one_more_some_simple_string
16 dogdog one_two/some_simple_string_again
17 dogdog one_four/some_simple_string
18 qweqwe firefire/ppp
I want the script to only print the stroke that fully reflect "/some_simple_string[END_OF_LINE]" but not 2 or 3.
Is there any simple way to do it?
I think, the most appropriate way is to add end-of-line sigh to the regular expression.
So it will parse only strokes that starting with "/some.." and have a new line at the end of "..string[END_OF_LINE]"
Desired output:
17 dogdog one_four/some_simple_string
Sorry for confusion, I was asking for END OF LINE sign in regular expressions.
The correct answer is:
/\/some_simple_string$/ { print $0;}
You can always use:
/\/some_simple_string$/ { print $0 }
I.e. match not only "some_simple_string" but match "/some_simple_string" followed by the end of the line ($ is end of line in regex)
grep '\some_simple_string$' file | tail -n 1 should do the trick.
Or if you really want to use awk do awk '/\/some_simple_string/{x = $0}END{print x}'
To return just the last of a group of matches, ...
Store the line in a variable and print it in the END block.
/some_simple_string/ { x = $0 }
END{ print x }
To print all the matches that end with the string /some_simple_string using regular expression you need to anchor to the the end of the line using $. The most suitable tool for this job is grep:
$ grep '/some_simple_string$' file
In awk the command is much the same:
$ awk '/[/]some_simple_string$/' file
To print all lines after the matching you would do:
$ awk 'print_flag{print;f=0} /[/]some_simple_string$/{print_flag=1}' file
Or just combine grep and tail if it makes it clearer using context option -A to print the following lines:
$ grep -A1 '/some_simple_string$' file | tail -n 1
I sometimes find that the input records can have a trailing carriage return (\r).
Yes, I deal with both Windows and Linux text files.
So I add the following 'pre-processor' to my awk scripts:
1 == 1 { # preprocess all records
res = gsub("\r", "") # remove unwanted trailing char
if(res>0 && NR<100) { print "(removed stuff)" > "/dev/stderr" } # optional
}
more optimally, let FS do the work instead of having it perform unnecessary and unrelated field splitting (adding the \r bit for Windows/DOS completeness):
mawk '!_<NF' FS='[/]some_simple_string[\r]?$'
17 dogdog one_four/some_simple_string