Matching blocks with conditions - regex

I am in the need for some regexp guru help.
I am trying to make a small config system for a home project, but for this it seams that I need a bit more regexp code than my regexp skills can come up with.
I need to be able to extract some info inside blocks based on conditions and actions. For an example.
action1 [condition1 condition2 !condition3] {
Line 1
Line 2
Line 3
}
The conditions are stored in simple variables separated by space. I use these variables to create the regexp used to extract the block info from the file. Most if this is working fine, except that I have no idea how to make the "not matching" part, which basically means that a "word" is not available in the condition variable.
VAR1="condition1 condition2"
VAR2="condition1 condition2 condition3"
When matched against the above, it should match VAR1 but not VAR2.
This is what I have so far
PARAMS="con1 con2 con3"
INPUT_PARAMS="[^!]\\?\\<$(echo $PARAMS | sed 's/ /\\>\\|[^!]\\?\\</g')\\>"
sed -n "/^$ACTION[ \t]*\(\[\($INPUT_PARAMS\)*\]\)\?[ \t]*{/,/}$/p" default.cfg | sed '/^[^{]\+{/d' | sed '/}/d'
Not sure how pretty this is, but it does work, except for not-matching.
EDIT:
Okay I will try to elaborate a bit.
Let's say that I have the below text/config file
action1 [con1 con2 con3] {
Line A
Line B
}
action2 [con1 con2 !con3] {
Line C
}
action3 [con1 con2] {
Line D
}
action4 {
Line E
}
and I have the fallowing conditions to match against
ARG1="con1 con2 con3"
ARG2="con1 con2"
ARG3="con1"
ARG4="con1 con4"
# Matching against ARG1 should print Line A, B, D and E
# Matching against ARG2 should print Line C, D and E
# Matching against ARG3 should print Line E
# Matching against ARG4 should print Line E
Below is a java like example of action2 using normal conditional check. It give a better idea of what I am trying
if (ARG2.contains("con1") && ARG2.contains("con2") && !ARG2.contains("con3")) {
// Print all lines in this block
}

The logic of how you're selecting which records to print lines from isn't clear to me so here's how to create sets of positive and negative conditions using awk:
$ cat tst.awk
BEGIN{
RS = ""; FS = "\n"
# create the set of the positive conditions in the "conds" variable.
n = split(conds,tmp," ")
for (i=1; i<=n; i++)
wanted[tmp[i]]
}
{
# create sets of the positive and negative conditions
# present in the first line of the current record.
delete negPresent # use split("",negPresent) in non-gawk
delete posPresent
n = split($1,tmp,/[][ {]+/)
for (i=2; i<n; i++) {
cond = tmp[i]
sub(/^!/,"",cond) ? negPresent[cond] : posPresent[cond]
}
allPosInWanted = 1
for (cond in posPresent)
if ( !(cond in wanted) )
allPosInWanted = 0
someNegInWanted = 0
for (cond in negPresent)
if (cond in wanted)
someNegInWanted = 1
if (allPosInWanted && !someNegInWanted)
for (i=2;i<NF;i++)
print $i
}
.
$ awk -v conds='con1 con2 con3' -f tst.awk file
Line A
Line B
Line D
Line E
$
$ awk -v conds='con1 con2' -f tst.awk file
Line C
Line D
Line E
$
$ awk -v conds='con1' -f tst.awk file
Line E
$
$ awk -v conds='con1 con4' -f tst.awk file
Line E
$
and now you just have to code whatever logic you like in that final block where the printing is being done to compare the conditions in each of the sets.

Related

awk sub with a capturing group into the replacement

I am writing an awk oneliner for this purpose:
file1:
1 apple
2 orange
4 pear
file2:
1/4/2/1
desired output: apple/pear/orange/apple
addendum: Missing numbers should be best kept unchanged 1/4/2/3 = apple/pear/orange/3 to prevent loss of info.
Methodology:
Build an associative array key[$1] = $2 for file1
capture all characters between the slashes and replace them by matching to the key of associative array eg key[4] = pear
Tried:
gawk 'NR==FNR { key[$1] = $2 }; NR>FNR { r = gensub(/(\w+)/, "key[\\1]" , "g"); print r}' file1.txt file2.txt
#gawk because need to use \w+ regex
#gensub used because need to use a capturing group
Unfortunately, results are
1/4/2/1
key[1]/key[4]/key[2]/key[1]
Any suggestions? Thank you.
You may use this awk:
awk -v OFS='/' 'NR==FNR {key[$1] = $2; next}
{for (i=1; i<=NF; ++i) if ($i in key) $i = key[$i]} 1' file1 FS='/' file2
apple/pear/orange/apple
Note that if numbers from file2 don't exist in key array then it will make those fields empty.
file1 FS='/' file2 will keep default field separators for file1 but will use / as field separator while reading file2.
EDIT: In case you don't have a match in file2 from file and you want to keep original value as it is then try following:
awk '
FNR==NR{
arr[$1]=$2
next
}
{
val=""
for(i=1;i<=NF;i++){
val=(val=="" ? "" : val FS) (($i in arr)?arr[$i]:$i)
}
print val
}
' file1 FS="/" file2
With your shown samples please try following.
awk '
FNR==NR{
arr[$1]=$2
next
}
{
val=""
for(i=1;i<=NF;i++){
val = (val=="" ? "" : val FS) arr[$i]
}
print val
}
' file1 FS="/" file2
Explanation: Reading Input_file1 first and creating array arr with index of 1st field and value of 2nd field then setting field separator as / and traversing through each field os file2 and saving its value in val; printing it at last for each line.
Like #Sundeep comments in the comments, you can't use backreference as an array index. You could mix match and gensub (well, I'm using sub below). Not that this would be anywhere suggested method but just as an example:
$ awk '
NR==FNR {
k[$1]=$2 # hash them
next
}
{
while(match($0,/[0-9]+/)) # keep doing it while it lasts
sub(/[0-9]+/,k[substr($0,RSTART,RLENGTH)]) # replace here
}1' file1 file2
Output:
apple/pear/orange/apple
And of course, if you have k[1]="word1", you'll end up with a neverending loop.
With perl (assuming key is always found):
$ perl -lane 'if(!$#ARGV){ $h{$F[0]}=$F[1] }
else{ s|[^/]+|$h{$&}|g; print }' f1 f2
apple/pear/orange/apple
if(!$#ARGV) to determine first file (assuming exactly two files passed)
$h{$F[0]}=$F[1] create hash based on first field as key and second field as value
[^/]+ match non / characters
$h{$&} get the value based on matched portion from the hash
If some keys aren't found, leave it as is:
$ cat f2
1/4/2/1/5
$ perl -lane 'if(!$#ARGV){ $h{$F[0]}=$F[1] }
else{ s|[^/]+|exists $h{$&} ? $h{$&} : $&|ge; print }' f1 f2
apple/pear/orange/apple/5
exists $h{$&} checks if the matched portion exists as key.
Another approach using awk without loop:
awk 'FNR==NR{
a[$1]=$2;
next
}
$1 in a{
printf("%s%s",FNR>1 ? RS: "",a[$1])
}
END{
print ""
}' f1 RS='/' f2
$ cat f1
1 apple
2 orange
4 pear
$ cat f2
1/4/2/1
$ awk 'FNR==NR{a[$1]=$2;next}$1 in a{printf("%s%s",FNR>1?RS:"",a[$1])}END{print ""}' f1 RS='/' f2
apple/pear/orange/apple

How can I output the number of repeats of a pattern in regex?

I would like to output the number of repeats of a pattern with regex. For example, convert "aaad" to "3xad", "bCCCCC" to "b5xC". I want to do this in sed or awk.
I know I can match it by (.)\1+ or even capture it by ((.)\1+). But how can I obtain the times of repeating and insert that value back to string in regex or sed or awk?
Perl to the rescue!
perl -pe 's/((.)\2+)/length($1) . "x$2"/ge'
-p reads the input line by line and prints it after processing
s/// is the substitution similar to sed
/e makes the replacement evaluated as code
e.g.
aaadbCCCCCxx -> 3xadb5xC2xx
In GNU awk:
$ echo aaadbCCCCCxx | awk -F '' '{
for(i=1;i<=NF;i+=RLENGTH) {
c=$i
match(substr($0,i),c"+")
b=b (RLENGTH>1?RLENGTH "x":"") c
}
print b
}'
3xadb5xC2xx
If the regex metachars want to be read as literal characters as noted in the comments one could try to detect and escape them (solution below is only directional):
$ echo \\\\\\..**aaadbCCCCC++xx |
awk -F '' '{
for(i=1;i<=NF;i+=RLENGTH) {
c=$i
# print i,c # for debugging
if(c~/[*.\\]/) # if c is a regex metachar (not complete)
c="\\"c # escape it
match(substr($0,i),c"+") # find all c:s
b=b (RLENGTH>1?RLENGTH "x":"") $i # buffer to b
}
print b
}'
3x\2x.2x*3xadb5xC2x+2xx
Just for fun.
With sed it is cumbersome but do-able. Note this example relies on GNU sed (:
parse.sed
/(.)\1+/ {
: nextrepetition
/((.)\2+)/ s//\n\1\n/ # delimit the repetition with new-lines
h # and store the delimited version
s/^[^\n]*\n|\n[^\n]*$//g # now remove prefix and suffix
b charcount # count repetitions
: aftercharcount # return here after counting
G # append the new-line delimited version
# Reorganize pattern space to the desired format
s/^([^\n]+)\n([^\n]*)\n(.)[^\n]+\n/\2\1x\3/
# Run again if more repetitions exist
/(.)\1+/b nextrepetition
}
b
# Adapted from the wc -c example in the sed manual
# Ref: https://www.gnu.org/software/sed/manual/sed.html#wc-_002dc
: charcount
s/./a/g
# Do the carry. The t's and b's are not necessary,
# but they do speed up the thing
t a
: a; s/aaaaaaaaaa/b/g; t b; b done
: b; s/bbbbbbbbbb/c/g; t c; b done
: c; s/cccccccccc/d/g; t d; b done
: d; s/dddddddddd/e/g; t e; b done
: e; s/eeeeeeeeee/f/g; t f; b done
: f; s/ffffffffff/g/g; t g; b done
: g; s/gggggggggg/h/g; t h; b done
: h; s/hhhhhhhhhh//g
: done
# On the last line, convert back to decimal
: loop
/a/! s/[b-h]*/&0/
s/aaaaaaaaa/9/
s/aaaaaaaa/8/
s/aaaaaaa/7/
s/aaaaaa/6/
s/aaaaa/5/
s/aaaa/4/
s/aaa/3/
s/aa/2/
s/a/1/
y/bcdefgh/abcdefg/
/[a-h]/ b loop
b aftercharcount
Run it like this:
sed -Ef parse.sed infile
With an infile like this:
aaad
daaadaaa
fsdfjs
bCCCCC
aaadaaa
The output is:
3xad
d3xad3xa
fsdfjs
b5xC
3xad3xa
I was hoping we'd have a MCVE by now but we don't so what the heck - here is my best guess at what you're trying to do:
$ cat tst.awk
{
out = ""
for (pos=1; pos<=length($0); pos+=reps) {
char = substr($0,pos,1)
for (reps=1; char == substr($0,pos+reps,1); reps++);
out = out (reps > 1 ? reps "x" : "") char
}
print out
}
$ awk -f tst.awk file
3xad
d3xad3xa
fsdfjs
b5xC
3xad3xa
The above was run against the sample input that #Thor kindly provided:
$ cat file
aaad
daaadaaa
fsdfjs
bCCCCC
aaadaaa
The above will work for any input characters using any awk in any shell on any UNIX box. If you need to make it case-insensitive just throw a tolower() around each side of the comparison in the innermost for loop. If you need it to work on multi-character strings then you'll have to tell us how to identify where the substrings you're interested in start/end.

copying first string into second line

I have a text file in this format:
abacası Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875
abacı Abaç[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375 Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875
abacılarla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375 aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375 abacı[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625
Here I call the first string before the first space as word (for example abacısı)
The string which starts with after first space and ends with integer is definition (for example Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875)
I want to do this: If a line includes more than one definition (first line has one, second line has two, third line has three), apply newline and put the first string (word) into the beginning of the new line. Expected output:
abacası Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875
abacı Abaç[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375
abacı Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875
abacılarla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375
abacılarla aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375
abacılarla abacı[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625
I have almost 1.500.000 lines in my text file and the number of definition is not certain for each line. It can be 1 to 5
Small python script does the job. Input is expected in input.txt, output gotes to output.txt.
import re
rf = re.compile('([^\s]+\s).+')
r = re.compile('([^\s]+\s\:\s\d+\.\d+)')
with open("input.txt", "r") as f:
text = f.read()
with open("output.txt", "w") as f:
for l in text.split('\n'):
offset = 0
first = ""
match = re.search(rf, l[offset:])
if match:
first = match.group(1)
offset = len(first)
while True:
match = re.search(r, l[offset:])
if not match:
break
s = match.group(1)
offset += len(s)
f.write(first + " " + s + "\n")
I am assuming the following format:
word definitionkey : definitionvalue [definitionkey : definitionvalue …]
None of those elements may contain a space and they are always delimited by a single space.
The following code should work:
awk '{ for (i=2; i<=NF; i+=3) print $1, $i, $(i+1), $(i+2) }' file
Explanation (this is the same code but with comments and more spaces):
awk '
# match any line
{
# iterate over each "key : value"
for (i=2; i<=NF; i+=3)
print $1, $i, $(i+1), $(i+2) # prints each "word key : value"
}
' file
awk has some tricks that you may not be familiar with. It works on a line-by-line basis. Each stanza has an optional conditional before it (awk 'NF >=4 {…}' would make sense here since we'll have an error given fewer than four fields). NF is the number of fields and a dollar sign ($) indicates we want the value of the given field, so $1 is the value of the first field, $NF is the value of the last field, and $(i+1) is the value of the third field (assuming i=2). print will default to using spaces between its arguments and adds a line break at the end (otherwise, we'd need printf "%s %s %s %s\n", $1, $i, $(i+1), $(i+2), which is a bit harder to read).
With perl:
perl -a -F'[^]:]\K\h' -ne 'chomp(#F);$p=shift(#F);print "$p ",shift(#F),"\n" while(#F);' yourfile.txt
With bash:
while read -r line
do
pre=${line%% *}
echo "$line" | sed 's/\([0-9]\) /\1\n'$pre' /g'
done < "yourfile.txt"
This script read the file line by line. For each line, the prefix is extracted with a parameter expansion (all until the first space) and spaces preceded by a digit are replaced with a newline and the prefix using sed.
edit: as tripleee suggested it, it's much faster to do all with sed:
sed -i.bak ':a;s/^\(\([^ ]*\).*[0-9]\) /\1\n\2 /;ta' yourfile.txt
Assuming there are always 4 space-separated words for each definition:
awk '{for (i=1; i<NF; i+=4) print $i, $(i+1), $(i+2), $(i+3)}' file
Or if the split should occur after that floating point number
perl -pe 's/\b\d+\.\d+\K\s+(?=\S)/\n/g' file
(This is the perl equivalent of Avinash's answer)
Bash and grep:
#!/bin/bash
while IFS=' ' read -r in1 in2 in3 in4; do
if [[ -n $in4 ]]; then
prepend="$in1"
echo "$in1 $in2 $in3 $in4"
else
echo "$prepend $in1 $in2 $in3"
fi
done < <(grep -o '[[:alnum:]][^:]\+ : [[:digit:].]\+' "$1")
The output of grep -o is putting all definitions on a separate line, but definitions originating from the same line are missing the "word" at the beginning:
abacası Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875
abacı Abaç[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375
Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875
abacılarla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375
aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375
abacı[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625
The for loop now loops over this, using a space as the input file separator. If in4 is a zero length string, we're on a line where the "word" is missing, so we prepend it.
The script takes the input file name as its argument, and saving output to an output file can be done with simple redirection:
./script inputfile > outputfile
Using perl:
$ perl -nE 'm/([^ ]*) (.*)/; my $word=$1; $_=$2; say $word . " " . $_ for / *(.*?[0-9]+\.[0-9]+)/g;' < input.log
Output:
abacası Abaca[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 20.1748046875
abacı Abaç[Noun]+[Prop]+[A3sg]+SH[P3sg]+[Nom] : 16.3037109375
abacı Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+[A3sg]+[Pnon]+[Nom] : 23.0185546875
abacılarla Aba[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 27.8974609375
abacılarla aba[Noun]+[A3sg]+[Pnon]+[Nom]-CH[Noun+Agt]+lAr[A3pl]+[Pnon]+YlA[Ins] : 23.3427734375
abacılarla abacı[Noun]+lAr[A3pl]+[Pnon]+YlA[Ins] : 19.556640625
Explanation:
Split the line to separate first field as word.
Then split the remaining line using the regex .*?[0-9]+\.[0-9]+.
Print word concatenated with every match of above regex.
I would approach this with one of the excellent Awk answers here; but I'm posting a Python solution to point to some oddities and problems with the currently accepted answer:
It reads the entire input file into memory before processing it. This is harmless for small inputs, but the OP mentions that the real-world input is kind of big.
It needlessly uses re when simple whitespace tokenization appears to be sufficient.
I would also prefer a tool which prints to standard output, so that I can redirect it where I want it from the shell; but to keep this compatible with the earlier solution, this hard-codes output.txt as the destination file.
with open('input.txt', 'r') as input:
with open('output.txt', 'w') as output:
for line in input:
tokens = line.rstrip().split()
word = tokens[0]
for idx in xrange(1, len(tokens), 3):
print(word, ' ', ' '.join(tokens[idx:idx+3]), file=output)
If you really, really wanted to do this in pure Bash, I suppose you could:
while read -r word analyses; do
set -- $analyses
while [ $# -gt 0 ]; do
printf "%s %s %s %s\n" "$word" "$1" "$2" "$3"
shift; shift; shift
done
done <input.txt >output.txt
Please find the following bash code
#!/bin/bash
# read.sh
while read variable
do
for i in "$variable"
do
var=`echo "$i" |wc -w`
array_1=( $i )
counter=0
for((j=1 ; j < $var ; j++))
do
if [ $counter = 0 ] #1
then
echo -ne ${array_1[0]}' '
fi #1
echo -ne ${array_1[$j]}' '
counter=$(expr $counter + 1)
if [ $counter = 3 ] #2
then
counter=0
echo
fi #2
done
done
done
I have tested and it is working.
To test
On bash shell prompt give the following command
$ ./read.sh < input.txt > output.txt
where read.sh is script , input.txt is input file and output.txt is where output is generated
here is a sed in action
sed -r '/^indirger(ken|di)/{s/([0-9]+[.][0-9]+ )(indirge)/\1\n\2/g}' my_file
output
indirgerdi indirge[Verb]+[Pos]+Hr[Aor]+[A3sg]+YDH[Past] : 22.2626953125
indirge[Verb]+[Pos]+Hr[Aor]+YDH[Past]+[A3sg] : 18.720703125
indirgerken indirge[Verb]+[Pos]+Hr[Aor]+[A3sg]-Yken[Adv+While] : 19.6201171875

AWK script to check first line of a file and then print the rest

I am trying to write an AWK script to parse a file of the form
> field1 - field2 field3 ...
lineoftext
anotherlineoftext
anotherlineoftext
and I am checking using regex if the first line is correct (begins with a > and then has something after it) and then print all the other lines. This is the script I wrote but it only verifies that the file is in a correct format and then doesn't print anything.
#!/bin/bash
# FASTA parser
awk ' BEGIN { x = 0; }
{ if ($1 !~ />.*/ && x == 0)
{ print "Not a FASTA file"; exit; }
else { x = 1; next; }
print $0 }
END { print " - DONE - "; }'
Basically you can use the following awk command:
awk 'NR==1 && /^>./ {p=1} p' file
On the first row NR==1 it checks whether the line starts with a > followed by "something" (/^>./). If that condition is true the variable p will be set to one. The p at the end checks whether p evaluates true and prints the line in that case.
If you want to print the error message, you need to revert the logic a bit:
awk 'NR==1 && !/^>./ {print "Not a FASTA file"; exit 1} 1' file
In this case the program prints the error messages and exits the program if the first line does not start with a >. Otherwise all lines gets printed because 1 always evaluates to true.
For this OP literally
awk 'NR==1{p=$0~/^>/}p' YourFile
# shorter version with info of #EdMorton
awk 'NR==1{p=/^>/}p' YourFile
for line after > (including)
awk '!p{p=$0~/^>/}p' YourFile
# shorter version with info of #EdMorton
awk '!p{p=/^>/}p' YourFile
Since all you care about is the first line, you can just check that, then exit.
awk 'NR > 1 { exit (0) }
! /^>/ { print "Not a FASTA file" >"/dev/stderr"; exit (1) }' file
As noted in comments, the >"/dev/stderr" is a nonportable hack which may not work for you. Regard it as a placeholder for something slightly more sophisticated if you want a tool which behaves as one would expect from a standard Unix tool (run silently if no problems; report problems to standard error).

Bash script to print pattern 1, search and print all lines from pattern 2 through Pattern 3, and print pattern 4

Please Help - I'm very rusty with my sed/awk/grep and I'm attempting to process a file (an export of a PDF that is around 4700 pages long).
Here is what I'm attempting to do: search/print line matching pattern 1, search for line matching pattern 2 and print that line and all lines from it until pattern 3 (if it includes/prints the line with pattern 3, I'm ok with it at this point), and search/print lines matching pattern 4.
All of the above patterns should occur in order (pattern 1,2,3,4) several hundred times in the file and I need to keep them in order.
Pattern 1: lines beginning with 1-5 and a whitespace (this is specific enough despite it seeming vague)
Pattern 2: lines beginning with (all caps) SOLUTION:
Pattern 3: lines beginning with (all caps) COMPLIANCE:
Pattern 4: lines beginning with an IP Addresses
Here's what I've cobbled together, but it's clearly not working:
#!/bin/bash
#
sed '
/^[1-5]\s/p {
/^SOLUTION/,/^COMPLIANCE/p {
/^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/p }
}' sample.txt
to use p in sed you need to use -n as well and also add -r for extended regex:
Here is how it should look like:
sed -r -n '{
/^[1-5] /p
/^SOLUTION/,/^COMPLIANCE/p
/^([0-9]{1,3}[\.]){3}[0-9]{1,3}/p
}' sample.txt
You probably want something like this, untested since you didn't provide any sample input or expected output:
awk '
BEGIN { state = 0 }
/^[1-5] / { if (state ~ /[01]/) { block = $0; state = 1 } }
/^SOLUTION/ { state = (state ~ /[12]/ ? 2 : 0) }
state == 2 { block = block ORS $0 }
/^COMPLIANCE/ { state = (state == 2 ? 3 : state) }
/^([0-9]{1,3}\.){3}[0-9]{1,3}/ { if (state == 3) { print block ORS $0; state = 0 } }
' file