grep: how to find ALL the lines between to expressions - regex

We have a HUGE file (numbers), we want to get ALL the lines between two expressions, e.g.,
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
.
.
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx
and the expressions are -9998.01 and -8000.0, so tried:
$ grep -A100000 '[0-9] -9998.[0-9]' mf.in | grep -B100000 '[0-9] -8000.[0-9]' mf.in > mfile.out
And this is OK ...ALL the lines between are get it... of course, 100000 is so big as to keept ALL the lines between... but if we are wrong? i.e., if there are more than 100000 between? How we can take ALL between without numeric specification after A and B ...
PD: I was unable to use sed with similar "[ ...]" expressions
PD2: the columns has more digits (here only 4 columns)
-1931076.0 -9998.96235 1.0002741998076021 0.0191476198569163
-1931075.0 -9998.95962 1.0000742544770280 0.0192495084654059
-1931074.0 -9998.95688 0.9998778097258081 0.0193725608470694

With awk:
awk '$2 ~ /^-9998.01$/{p=1} p{print} $2 ~ /^-8000.0$/{p=0}' file
Test:
$ cat file
232445 -9998.00 xxxxxxxxxx
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
234566 -9998.03 xxxxxxxxx
234566 -9998.05 xxxxxxxxx
....
....
324444 -8000.011 xxxxxxx
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx
344444 -8000.1 xxxx
$ awk '$2 ~ /^-9998.01$/{p=1} p{print} $2 ~ /^-8000.0$/{p=0}' file
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
234566 -9998.03 xxxxxxxxx
234566 -9998.05 xxxxxxxxx
....
....
324444 -8000.011 xxxxxxx
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx

sed already has this functionality builtin using this expression:
/regex1/,/regex2/ p=>p command prints all lines that are present in between 2 lines(start line having regex1 and end line having regex2(both inclusive in output)).
Here is an example wrt your file format:
$ cat file
124235 -69768.77 xxx
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
12345 -124.66 xxxx
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx
344444 -7000.0 xxxx
$ sed -nr '/^[0-9]+\s-9998.[0-9]+\s/,/^[0-9]+\s-8000.[0-9]+\s/ p' file
232445 -9998.01 xxxxxxxxxx
234566 -9998.02 xxxxxxxxx
12345 -124.66 xxxx
324444 -8000.012 xxxxxxx
344444 -8000.0 xxxx
$

Well it might not be the best answer, but the easy fix for your command would be to use the file's number of lines as argument to -A and -B, so you're sure you cannot miss any lines:
NB_LINES=$(wc -l main.c | awk '{print $1}')
grep -A$NB_LINES '[0-9] -9998.[0-9]' mf.in | grep -B$NB_LINES '[0-9] -8000.[0-9]' mf.in > mfile.out
Though, tbh, in pure shell it's very likely I'd do something similar. Or I'd write a small python script, that would look like:
import re
LINE_RE = re.compile(r'[^ ]+ (-[0-9]+\.[0-9]+) .*')
with open('mf.in', 'r') as fin:
with open('mf.out', 'w') as fout:
for line in f:
match = LINE_RE.match(line)
if match:
if float(match.groups()[0]) > -9998.0:
fout.write(line)
elif float(match.groups()[0]) < -8000.0:
break
N.B.: this script is just to expose the algorithmic idea, and being blindly coded and untested, it might need some tweaking to actually work.
HTH

Related

How to replace a sub-string in a larger-string that matches to an egrep?

How can I replace /32 to /128 only for IPv6 addresses alone in below text?
52.222.128.45/32
172.22.187.101/32
52.222.128.248/32
2600:2000:2046:2000:3:db06:4200:23a1/32
2600:2000:2046:5800:3:db06:4200:23a1/32
2600:2000:2046:7800:3:db06:4200:23a1/32
desired output:
52.222.128.45/32
172.22.187.101/32
52.222.128.248/32
2600:2000:2046:2000:3:db06:4200:23a1/128
2600:2000:2046:5800:3:db06:4200:23a1/128
2600:2000:2046:7800:3:db06:4200:23a1/128
egrep '\:[0-9a-f]{1,4}/32' is matching the last four hex char and /32 but how can I keep the same four hex char how they but only change /32 to /128?
thanks!
$ cat /tmp/ips
52.222.128.45/32
172.22.187.101/32
52.222.128.248/32
2600:2000:2046:2000:3:db06:4200:23a1/32
2600:2000:2046:5800:3:db06:4200:23a1/32
2600:2000:2046:7800:3:db06:4200:23a1/32
$ cat /tmp/ips | sed 's%\(:[0-9a-f]\{1,4\}\)/32%\1/128%'
52.222.128.45/32
172.22.187.101/32
52.222.128.248/32
2600:2000:2046:2000:3:db06:4200:23a1/128
2600:2000:2046:5800:3:db06:4200:23a1/128
2600:2000:2046:7800:3:db06:4200:23a1/128
A description of the regex is here: https://regex101.com/r/FDjUct/1 Note that % is being used as a delimiter instead of / to avoid having to escape the / characters in the regex.
Depending on how much sed you are familiar with, it may be more clear to use:
$ cat /tmp/ips | sed '/:/s/32$/128/'
52.222.128.45/32
172.22.187.101/32
52.222.128.248/32
2600:2000:2046:2000:3:db06:4200:23a1/128
2600:2000:2046:5800:3:db06:4200:23a1/128
2600:2000:2046:7800:3:db06:4200:23a1/128
The above command uses sed addresses to only apply the substitution of 32 with 128 on lines that contain a :.
Following awk may help here.(As per Ed's comment changed [0-9a-fA-F] to [[:xdigit:]] in solution)
awk -v value="128" 'match($0,/(([[:xdigit:]]){1,4}:){1,4}[0-9](:([[:xdigit:]]){1,4}){1,3}\//){$0=substr($0,RSTART,RLENGTH) value}1' Input_file
I tested it in awk 4.1 version in case you have old version of it use awk --re-interval for above code too.
Adding a non-one liner form of solution too now.
awk -v value="128" '
match($0,/(([[:xdigit:]]){1,4}:){1,4}[0-9](:([[:xdigit:]]){1,4}){1,3}\//){
$0=substr($0,RSTART,RLENGTH) value
}
1
' Input_file
You can use the following regex:
^(([[:xdigit:]]{1,4}:){7}[[:xdigit:]]{1,4}\/)32$
demo
and embed it in your sed command:
sed -E 's#^(([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\/)32$#\1128#g'
where the -E option is used to enable Extended regex, if you don't specify this option you will have to escape the ( and it is a pain.
Uppercase letters in your IPv6 address will also be taken into account!
^, $ anchors are used to add constraint on your regex
# is used as separator in your find/replace command and the replacement is backreference to the IPv6 and you add 128 to it as 32 is not in the group it is as if you had replaced it by 128
INPUT:
52.222.128.45/32
172.22.187.101/32
52.222.128.248/32
2600:2000:2046:2000:3:db06:4200:23a1/32
2600:2000:2046:5800:3:DB06:4200:23a1/32
2600:2000:2046:7800:3:dB06:4200:23a1/32
OUTPUT:
52.222.128.45/32
172.22.187.101/32
52.222.128.248/32
2600:2000:2046:2000:3:db06:4200:23a1/128
2600:2000:2046:5800:3:DB06:4200:23a1/128
2600:2000:2046:7800:3:dB06:4200:23a1/128
With awk You can also reach a similar result:
$ cat testip | awk -F'/' '/^([[:xdigit:]]{1,4}:){7}[[:xdigit:]]{1,4}\/32$/{print $1"/"128; next}1'
52.222.128.45/32
172.22.187.101/32
52.222.128.248/32
2600:2000:2046:2000:3:db06:4200:23a1/128
2600:2000:2046:5800:3:db06:4200:23a1/128
2600:2000:2046:7800:3:db06:4200:23a1/128
where you use as field separator / and a similar regex, when a line matches the pattern you print the 1st field (corresponding to the IPv6 part) followed by /128 then you jump to next line. By default you print the line.

Identify & replace 2nd instance of search term in string... VBA RegEx doesn't have lookbehind functionality

I have a list of strings in format as below :
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 250PS xxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 350PS xxxxxxxxxxxxx xxxxxxxxx xxxx
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxx 100PS xxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 200PS xxxxxxxxxxxxxxxx 200PS xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 250PS xxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 350PS xxxxxxxxxxxxx xxxxxxxxx xxxx
In Excel/VBA, and I am trying to remove duplicate values from the string i.e. 100PS and 200PS where it is printed out twice. Using VBA and Reg-Ex I've come up with :
(?<=\d\d\dPS\s.*)(\d\d\dPS\s)
And this seems to work when testing it online and on other languages, but in VBA, lookbehind is not supported, and this is absolutely wrecking my brain.
The value always consists of \d\d\d (3 digits) and PS, ends with \s but all the xxxxxx text around it can differ every time and have different lengths etc.
How would I possibly choose the duplicate PS value with regex?
I have looked through stackoverflow and found a couple of reg-ex examples, but they don't seem to be working in VBA..
Any help is greatly appreciated,
Thanks
Have you considered a worksheet formula?
=SUBSTITUTE(A1,MID(A1,SEARCH("???PS",A1),6),"",2)
See regex in use here
(\s(\d{3}PS)\s.*\s)\2\s
(\s(\d{3}PS)\s.*\s) Capture the following into capture group 1
\s Matches a single whitespace character
(\d{3}PS) Capture the following into capture group 2
\d{3} Matches any 3 digits
PS Match this literally
\s Matches a single whitespace character
.* Matches any character (except \n) any number of times
\s Matches a single whitespace character
\2 Matches the text that was most recently captured by capture group 2
\s Matches a single whitespace character
Replacement: $1 (puts capture group 1 back into the string)
Result:
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 250PS xxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 350PS xxxxxxxxxxxxx xxxxxxxxx xxxx
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxx xxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 200PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 100PS xxxxxxxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 250PS xxxxxxxxxxx xxxxxxxxx xxxxxx
xxxxxxxxxx xxxxxxxxxxxxx 350PS xxxxxxxxxxxxx xxxxxxxxx xxxx

RegEx: Do not match - please check my solution [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
What I'm doing is trying to use diff's --ignore-matching-lines= to ignore lines that contain certain patterns. The reason for this is I have a bash script that uses HPE's RESTful API to check/patch BIOS settings on the hosts. I use the --ignore-matching-lines= to omit patterns in the BIOS settings when both the RegEx matches the BIOS setting on the host as well as a basic template for the settings that I have stored in a variable. If diff finds the settings do not match the "golden configuration" it shows the differences on the terminal and prompts to apply the correct config. The settings I'm using the RegEx for are the UEFI boot order and Intel SGX settings.
This particular RegEx question is around the SGX settings. HPE has occasionally set the Epoch to a default of all 0's. I need to ensure that the SgxEpoch value is not all 0's, but rather a random 32 character string like in the following instance or we have a problem securing our enclave secretes.
Broken:
"SgxEpoch": "00000000000000000000000000000000"
OK:
"SgxEpoch": "5FSQUWEED6XPC8PJ2CWZGQIS4WWKLKUI"
I did quite a bit of looking and found that Wiktor Stribiżew was extremely helpful in showing how you can use POSIX to "match everything but" (I found out diff doesn't support lookaheads as it is BRE) - Regex: match everything but
So I came up with the following ERE version which also looks for "SgxEpoch": "" as that is how my comparison template is defined - https://regex101.com/r/Upd1KL/1
SgxEpochRegEx='"SgxEpoch":\s*(\"([^0].{31}|.[^0].{30}|.{2}[^0].{29}|.{3}[^0].{28}|.{4}[^0].{27}|.{5}[^0].{26}|.{6}[^0].{25}|.{7}[^0].{24}|.{8}[^0].{23}|.{9}[^0].{22}|.{10}[^0].{21}|.{11}[^0].{20}|.{12}[^0].{19}|.{13}[^0].{18}|.{14}[^0].{17}|.{15}[^0].{16}|.{16}[^0].{15}|.{17}[^0].{14}|.{18}[^0].{13}|.{19}[^0].{12}|.{20}[^0].{11}|.{21}[^0].{10}|.{22}[^0].{9}|.{23}[^0].{8}|.{24}[^0].{7}|.{25}[^0].{6}|.{26}[^0].{5}|.{27}[^0].{4}|.{28}[^0].{3}|.{29}[^0].{2}|.{30}[^0].|.{31}[^0])\"|"")'
And then converted it to BRE so it would work with diff:
SgxEpochRegEx='"SgxEpoch":\s*\("\([^0].\{31\}\|.[^0].\{30\}\|.\{2\}[^0].\{29\}\|.\{3\}[^0].\{28\}\|.\{4\}[^0].\{27\}\|.\{5\}[^0].\{26\}\|.\{6\}[^0].\{25\}\|.\{7\}[^0].\{24\}\|.\{8\}[^0].\{23\}\|.\{9\}[^0].\{22\}\|.\{10\}[^0].\{21\}\|.\{11\}[^0].\{20\}\|.\{12\}[^0].\{19\}\|.\{13\}[^0].\{18\}\|.\{14\}[^0].\{17\}\|.\{15\}[^0].\{16\}\|.\{16\}[^0].\{15\}\|.\{17\}[^0].\{14\}\|.\{18\}[^0].\{13\}\|.\{19\}[^0].\{12\}\|.\{20\}[^0].\{11\}\|.\{21\}[^0].\{10\}\|.\{22\}[^0].\{9\}\|.\{23\}[^0].\{8\}\|.\{24\}[^0].\{7\}\|.\{25\}[^0].\{6\}\|.\{26\}[^0].\{5\}\|.\{27\}[^0].\{4\}\|.\{28\}[^0].\{3\}\|.\{29\}[^0].\{2\}\|.\{30\}[^0].\|.\{31\}[^0]\)"\|""\)'
Here's an example of the BRE version catching either a non-zero or blank version of the Epoch as well as not catching something.
[~]$ echo '"SgxEpoch": "5FSQUWEED6XPC8PJ2CWZGQIS4WWKLKUI"' | grep '"SgxEpoch":\s*\("\([^0].\{31\}\|.[^0].\{30\}\|.\{2\}[^0].\{29\}\|.\{3\}[^0].\{28\}\|.\{4\}[^0].\{27\}\|.\{5\}[^0].\{26\}\|.\{6\}[^0].\{25\}\|.\{7\}[^0].\{24\}\|.\{8\}[^0].\{23\}\|.\{9\}[^0].\{22\}\|.\{10\}[^0].\{21\}\|.\{11\}[^0].\{20\}\|.\{12\}[^0].\{19\}\|.\{13\}[^0].\{18\}\|.\{14\}[^0].\{17\}\|.\{15\}[^0].\{16\}\|.\{16\}[^0].\{15\}\|.\{17\}[^0].\{14\}\|.\{18\}[^0].\{13\}\|.\{19\}[^0].\{12\}\|.\{20\}[^0].\{11\}\|.\{21\}[^0].\{10\}\|.\{22\}[^0].\{9\}\|.\{23\}[^0].\{8\}\|.\{24\}[^0].\{7\}\|.\{25\}[^0].\{6\}\|.\{26\}[^0].\{5\}\|.\{27\}[^0].\{4\}\|.\{28\}[^0].\{3\}\|.\{29\}[^0].\{2\}\|.\{30\}[^0].\|.\{31\}[^0]\)"\|""\)'
"SgxEpoch": "5FSQUWEED6XPC8PJ2CWZGQIS4WWKLKUI"
[~]$ echo '"SgxEpoch": ""' | grep '"SgxEpoch":\s*\("\([^0].\{31\}\|.[^0].\{30\}\|.\{2\}[^0].\{29\}\|.\{3\}[^0].\{28\}\|.\{4\}[^0].\{27\}\|.\{5\}[^0].\{26\}\|.\{6\}[^0].\{25\}\|.\{7\}[^0].\{24\}\|.\{8\}[^0].\{23\}\|.\{9\}[^0].\{22\}\|.\{10\}[^0].\{21\}\|.\{11\}[^0].\{20\}\|.\{12\}[^0].\{19\}\|.\{13\}[^0].\{18\}\|.\{14\}[^0].\{17\}\|.\{15\}[^0].\{16\}\|.\{16\}[^0].\{15\}\|.\{17\}[^0].\{14\}\|.\{18\}[^0].\{13\}\|.\{19\}[^0].\{12\}\|.\{20\}[^0].\{11\}\|.\{21\}[^0].\{10\}\|.\{22\}[^0].\{9\}\|.\{23\}[^0].\{8\}\|.\{24\}[^0].\{7\}\|.\{25\}[^0].\{6\}\|.\{26\}[^0].\{5\}\|.\{27\}[^0].\{4\}\|.\{28\}[^0].\{3\}\|.\{29\}[^0].\{2\}\|.\{30\}[^0].\|.\{31\}[^0]\)"\|""\)'
"SgxEpoch": ""
[~]$ echo '"SgxEpoch": "00000000000000010000000000000000"' | grep '"SgxEpoch":\s*\("\([^0].\{31\}\|.[^0].\{30\}\|.\{2\}[^0].\{29\}\|.\{3\}[^0].\{28\}\|.\{4\}[^0].\{27\}\|.\{5\}[^0].\{26\}\|.\{6\}[^0].\{25\}\|.\{7\}[^0].\{24\}\|.\{8\}[^0].\{23\}\|.\{9\}[^0].\{22\}\|.\{10\}[^0].\{21\}\|.\{11\}[^0].\{20\}\|.\{12\}[^0].\{19\}\|.\{13\}[^0].\{18\}\|.\{14\}[^0].\{17\}\|.\{15\}[^0].\{16\}\|.\{16\}[^0].\{15\}\|.\{17\}[^0].\{14\}\|.\{18\}[^0].\{13\}\|.\{19\}[^0].\{12\}\|.\{20\}[^0].\{11\}\|.\{21\}[^0].\{10\}\|.\{22\}[^0].\{9\}\|.\{23\}[^0].\{8\}\|.\{24\}[^0].\{7\}\|.\{25\}[^0].\{6\}\|.\{26\}[^0].\{5\}\|.\{27\}[^0].\{4\}\|.\{28\}[^0].\{3\}\|.\{29\}[^0].\{2\}\|.\{30\}[^0].\|.\{31\}[^0]\)"\|""\)'
"SgxEpoch": "00000000000000010000000000000000"
[~]$ echo '"SgxEpoch": "00000000000000000000000000000000"' | grep '"SgxEpoch":\s*\("\([^0].\{31\}\|.[^0].\{30\}\|.\{2\}[^0].\{29\}\|.\{3\}[^0].\{28\}\|.\{4\}[^0].\{27\}\|.\{5\}[^0].\{26\}\|.\{6\}[^0].\{25\}\|.\{7\}[^0].\{24\}\|.\{8\}[^0].\{23\}\|.\{9\}[^0].\{22\}\|.\{10\}[^0].\{21\}\|.\{11\}[^0].\{20\}\|.\{12\}[^0].\{19\}\|.\{13\}[^0].\{18\}\|.\{14\}[^0].\{17\}\|.\{15\}[^0].\{16\}\|.\{16\}[^0].\{15\}\|.\{17\}[^0].\{14\}\|.\{18\}[^0].\{13\}\|.\{19\}[^0].\{12\}\|.\{20\}[^0].\{11\}\|.\{21\}[^0].\{10\}\|.\{22\}[^0].\{9\}\|.\{23\}[^0].\{8\}\|.\{24\}[^0].\{7\}\|.\{25\}[^0].\{6\}\|.\{26\}[^0].\{5\}\|.\{27\}[^0].\{4\}\|.\{28\}[^0].\{3\}\|.\{29\}[^0].\{2\}\|.\{30\}[^0].\|.\{31\}[^0]\)"\|""\)'
[~]$
The problem I have is I don't understand how the above RegEx works. It appears to me that it is all just a bunch of alternatives using the meta-character | where I am basically saying if x number of "any character except newline" (via .{x}) and then followed by a 0 and then again x number of any character except newline (via .{x}). It seems to me that this RegEx should match the following examples but it doesn't... and I don't understand why.
"SgxEpoch": "00000000000000010000000000000000"
"SgxEpoch": "00000000000000000000000001000000"
"SgxEpoch": "00000100000000000000000000000000"
Here is it working very similarly to how it is implemented in the bash script. In the first input fd <(curl ...) it pulls all BIOS settings from the host in JSON format. For the 2nd fd <(echo ...) I echo a variable with the desired BIOS settings in generic form. Then both are fed to | python -m json.tool | perl -00pe 's:\[.*?\]:($x=$&)=~s/\s//gs;$x:ges' to alphabetize both inputs to diff, pretty print format to separate the JSON settings by \n so --suppress-common-lines only displays the discrepancies to the user and perl removes \n characters for array type variables because I could not get diff to match across newlines as "one chunk" with something like ..
[awilk00#nvdejb-dc-2p ~]$ SgxEpochRegEx='"SgxEpoch":\s*\("\([^0].\{31\}\|.[^0].\{30\}\|.\{2\}[^0].\{29\}\|.\{3\}[^0].\{28\}\|.\{4\}[^0].\{27\}\|.\{5\}[^0].\{26\}\|.\{6\}[^0].\{25\}\|.\{7\}[^0].\{24\}\|.\{8\}[^0].\{23\}\|.\{9\}[^0].\{22\}\|.\{10\}[^0].\{21\}\|.\{11\}[^0].\{20\}\|.\{12\}[^0].\{19\}\|.\{13\}[^0].\{18\}\|.\{14\}[^0].\{17\}\|.\{15\}[^0].\{16\}\|.\{16\}[^0].\{15\}\|.\{17\}[^0].\{14\}\|.\{18\}[^0].\{13\}\|.\{19\}[^0].\{12\}\|.\{20\}[^0].\{11\}\|.\{21\}[^0].\{10\}\|.\{22\}[^0].\{9\}\|.\{23\}[^0].\{8\}\|.\{24\}[^0].\{7\}\|.\{25\}[^0].\{6\}\|.\{26\}[^0].\{5\}\|.\{27\}[^0].\{4\}\|.\{28\}[^0].\{3\}\|.\{29\}[^0].\{2\}\|.\{30\}[^0].\|.\{31\}[^0]\)"\|""\),'
[awilk00#nvdejb-dc-2p ~]$ hostsettings='{"ServicePhone":"","SgxEpoch": "00000000000000000000000000000000","SgxEpochControl":"SgxEpochNoChange","DefaultBootOrder":["PcieSlotNic","EmbeddedFlexLOM","EmbeddedStorage","PcieSlotStorage","Usb","Cd","UefiShell","Floppy"]}'
[awilk00#nvdejb-dc-2p ~]$ desiredsettings='{"ServicePhone":"","SgxEpoch": "","SgxEpochControl":"SgxEpochNoChange","DefaultBootOrder":["Floppy","Cd","Usb","EmbeddedStorage","PcieSlotStorage","EmbeddedFlexLOM","PcieSlotNic","UefiShell"]}'
[awilk00#nvdejb-dc-2p ~]$ echo "${hostsettings}" | python -m json.tool | perl -00pe 's:\[.*?\]:($x=$&)=~s/\s//gs;$x:ges'
{
"DefaultBootOrder": ["PcieSlotNic","EmbeddedFlexLOM","EmbeddedStorage","PcieSlotStorage","Usb","Cd","UefiShell","Floppy"],
"ServicePhone": "",
"SgxEpoch": "00000000000000000000000000000000",
"SgxEpochControl": "SgxEpochNoChange"
}
[awilk00#nvdejb-dc-2p ~]$ echo "${desiredsettings}" | python -m json.tool | perl -00pe 's:\[.*?\]:($x=$&)=~s/\s//gs;$x:ges'
{
"DefaultBootOrder": ["Floppy","Cd","Usb","EmbeddedStorage","PcieSlotStorage","EmbeddedFlexLOM","PcieSlotNic","UefiShell"],
"ServicePhone": "",
"SgxEpoch": "",
"SgxEpochControl": "SgxEpochNoChange"
}
[awilk00#nvdejb-dc-2p ~]$ diff --report-identical-files --suppress-common-lines --side-by-side --ignore-matching-lines="${SgxEpochRegEx}" <(echo "${hostsettings}" | python -m json.tool | perl -00pe 's:\[.*?\]:($x=$&)=~s/\s//gs;$x:ges') <(echo "${desiredsettings}" | python -m json.tool | perl -00pe 's:\[.*?\]:($x=$&)=~s/\s//gs;$x:ges')
"DefaultBootOrder": ["PcieSlotNic","EmbeddedFlexLOM","Emb | "DefaultBootOrder": ["Floppy","Cd","Usb","EmbeddedStorage
"SgxEpoch": "00000000000000000000000000000000", | "SgxEpoch": "",
[awilk00#nvdejb-dc-2p ~]$
[awilk00#nvdejb-dc-2p ~]$
[awilk00#nvdejb-dc-2p ~]$ hostsettings='{"ServicePhone":"","SgxEpoch": "5FSQUWEED6XPC8PJ2CWZGQIS4WWKLKUI","SgxEpochControl":"SgxEpochNoChange","DefaultBootOrder":["PcieSlotNic","EmbeddedFlexLOM","EmbeddedStorage","PcieSlotStorage","Usb","Cd","UefiShell","Floppy"]}'
[awilk00#nvdejb-dc-2p ~]$ echo "${hostsettings}" | python -m json.tool | perl -00pe 's:\[.*?\]:($x=$&)=~s/\s//gs;$x:ges'
{
"DefaultBootOrder": ["PcieSlotNic","EmbeddedFlexLOM","EmbeddedStorage","PcieSlotStorage","Usb","Cd","UefiShell","Floppy"],
"ServicePhone": "",
"SgxEpoch": "5FSQUWEED6XPC8PJ2CWZGQIS4WWKLKUI",
"SgxEpochControl": "SgxEpochNoChange"
}
[awilk00#nvdejb-dc-2p ~]$ echo "${desiredsettings}" | python -m json.tool | perl -00pe 's:\[.*?\]:($x=$&)=~s/\s//gs;$x:ges'
{
"DefaultBootOrder": ["Floppy","Cd","Usb","EmbeddedStorage","PcieSlotStorage","EmbeddedFlexLOM","PcieSlotNic","UefiShell"],
"ServicePhone": "",
"SgxEpoch": "",
"SgxEpochControl": "SgxEpochNoChange"
}
[awilk00#nvdejb-dc-2p ~]$ diff --report-identical-files --suppress-common-lines --side-by-side --ignore-matching-lines="${SgxEpochRegEx}" <(echo "${hostsettings}" | python -m json.tool | perl -00pe 's:\[.*?\]:($x=$&)=~s/\s//gs;$x:ges') <(echo "${desiredsettings}" | python -m json.tool | perl -00pe 's:\[.*?\]:($x=$&)=~s/\s//gs;$x:ges')
"DefaultBootOrder": ["PcieSlotNic","EmbeddedFlexLOM","Emb | "DefaultBootOrder": ["Floppy","Cd","Usb","EmbeddedStorage
[awilk00#nvdejb-dc-2p ~]$
NOTE: I'm almost certain I found a thread on stack overflow where someone reviewed diff's source code and found it does a compare on a line by line basis but cannot find that thread ATM.
NOTE2: I noticed that when diff finds differentiating lines immediately preceding a line where there would have been have a match in the --suppress-common-lines regex, diff will not remove the two lines that match the regex and show them as a difference immediately after the preceding non-regex matching line it found a difference in. Hope I didn't butcher that too bad. For example:
[~]$ SgxEpochRegEx='"SgxEpoch":\s*\("\([^0].\{31\}\|.[^0].\{30\}\|.\{2\}[^0].\{29\}\|.\{3\}[^0].\{28\}\|.\{4\}[^0].\{27\}\|.\{5\}[^0].\{26\}\|.\{6\}[^0].\{25\}\|.\{7\}[^0].\{24\}\|.\{8\}[^0].\{23\}\|.\{9\}[^0].\{22\}\|.\{10\}[^0].\{21\}\|.\{11\}[^0].\{20\}\|.\{12\}[^0].\{19\}\|.\{13\}[^0].\{18\}\|.\{14\}[^0].\{17\}\|.\{15\}[^0].\{16\}\|.\{16\}[^0].\{15\}\|.\{17\}[^0].\{14\}\|.\{18\}[^0].\{13\}\|.\{19\}[^0].\{12\}\|.\{20\}[^0].\{11\}\|.\{21\}[^0].\{10\}\|.\{22\}[^0].\{9\}\|.\{23\}[^0].\{8\}\|.\{24\}[^0].\{7\}\|.\{25\}[^0].\{6\}\|.\{26\}[^0].\{5\}\|.\{27\}[^0].\{4\}\|.\{28\}[^0].\{3\}\|.\{29\}[^0].\{2\}\|.\{30\}[^0].\|.\{31\}[^0]\)"\|""\),'
[~]$ desiredsettings='{"SgxEpoch": "","SgxEpochControl":"SgxEpochNoChange","DefaultBootOrder":["Floppy"]}'
[~]$ hostsettings='{"SgxEpoch": "5FSQUWEED6XPC8PJ2CWZGQIS4WWKLKUI","SgxEpochControl":"SgxEpochNoChange","DefaultBootOrder":["PcieSlotNic"]}'
[~]$ diff --report-identical-files --side-by-side --ignore-matching-lines="${SgxEpochRegEx}" <(echo "${hostsettings}" | python -m json.tool) <(echo "${desiredsettings}" | python -m json.tool)
{ {
"DefaultBootOrder": [ "DefaultBootOrder": [
"PcieSlotNic" | "Floppy"
], ],
"SgxEpoch": "5FSQUWEED6XPC8PJ2CWZGQIS4WWKLKUI", "SgxEpoch": "",
"SgxEpochControl": "SgxEpochNoChange" "SgxEpochControl": "SgxEpochNoChange"
} }
[~]$ diff --report-identical-files --side-by-side --ignore-matching-lines="${SgxEpochRegEx}" <(echo "${hostsettings}" | python -m json.tool | grep -v '[]],') <(echo "${desiredsettings}" | python -m json.tool | grep -v '[]],')
{ {
"DefaultBootOrder": [ "DefaultBootOrder": [
"PcieSlotNic" | "Floppy"
"SgxEpoch": "5FSQUWEED6XPC8PJ2CWZGQIS4WWKLKUI", | "SgxEpoch": "",
"SgxEpochControl": "SgxEpochNoChange" "SgxEpochControl": "SgxEpochNoChange"
} }
[~]$
Because of the sensitivity of the data, I wanted to be sure I had a clear understanding of how this RegEx worked.
I would also greatly appreciate any syntax verification that you can offer around the huge regex line since I don't really understand how it works.
The ask here is for someone to explain/verify the regex. I thought it would be best that I explain my end goal due to the XY Problem, but really I can't afford the time investment currently to re-write everything. I am open to that going forward but I have a date to meet.
Aaron
Your regex seems fine. Maybe the problem is diff's --ignore-matching-lines option (shorthand -I) which works a bit different than one might expect.
How does diff -I regex work?
Let a and b be two files.
If I hadn't read the documentation, I would expect that the following two commands are equivalent:
diff -I regex a b
diff <(grep -v regex a) <(grep -v regex b)
This is wrong in two ways:
diff always considers pairs or lines from a and b. Such a pair is ignored if both lines (the line from a and the line from b) both match the regex.
Even if both lines from the pair match, it can happen that they are not ignored. Not only both lines from the pair have to match but all lines from the hunk have to match!
Example for the second point (-y is a shorthand for --side-by-side):
diff --suppress-common-lines -yI '1\|2' <(printf '1\n') <(printf '2\n')
diff --suppress-common-lines -yI '1\|2' <(printf '1\n1\n') <(printf '2\nX\n')
1 | 2
1 | X
The first command worked as expected, but the second command didn't. Instead of the line pair (1,2) diff tried to match all lines from the hunk ((1,1),(2,X)). One line from that hunk did not match therefore the whole hunk was printed.
Alternative to diff -I
I'm not entirely sure what your typical input and expected output is. What I guessed:
You have the file original that might contain the line
"SgxEpoch": "00000000000000000000000000000000"
You generate the file generated that fixed the 0-lines to something like
"SgxEpoch": "5FSQUWEED6XPC8PJ2CWZGQIS4WWKLKUI"
(Step where you need help)
You want to compare the two files original and generated to make sure, that your script from the second step did the right thing. To make the comparison easier, you only want to see the differences when the corresponding line from original was either
"SgxEpoch": "00000000000000000000000000000000"
or
"SgxEpoch": ""
There is an easy solution for this. Simply grep the output of diff:
diff --suppress-common-lines -y original generated |
grep -E '^\s*"SgxEpoch"\s*:\s*"0*"'

Using regex with sed search and replace

I want to remove some dynamic text from the log file. I am able to extract it using regex and grep -oP, however, the same regex is not working
with sed command.
Sample data: (for reading convenience Concerned data between ABCDEF and LMNOP only)
XXX 2 13:53:35 XXXX0-0-0 XXXXXXXX[3513]: ABCDEF[XXXX]: 1472846015.555671: LMNOP(79): XXXXXXXXXXXXX - XXXXXX XX XXX XXX XXXXX XX XXXXX XXXX XXX XXXX XXX
Following is the data I want to remove from the log file. I am able to extract it using regex + grep :
grep -Po ']: [0-9]{10}\.[0-9]{6}:' sample
]: 1472846015.555671:
Now, if I use the same regex with sed command it's not helping.Any suggestions ?
I used the following command with sed and it returned me the unchanged file.
sed "s/]: [0-9]{10}\.[0-9]{6}://" input
or
awk '{gsub(/]: [0-9]{10}\.[0-9]{6}:/,"")}1' input
I need following output:
XXX 2 13:53:35 XXXX0-0-0 XXXXXXXX[3513]: ABCDEF[XXXX LMNOP(79): XXXXXXXXXXXXX - XXXXXX XX XXX XXX XXXXX XX XXXXX XXXX XXX XXXX XXX
OR even better :
XXX 2 13:53:35 XXXX0-0-0 XXXXXXXX[3513]: ABCDEF[XXXX]::LMNOP(79): XXXXXXXXXXXXX - XXXXXX XX XXX XXX XXXXX XX XXXXX XXXX XXX XXXX XXX
Into the sed use:
sed "s/]: [0-9]\{10\}\.[0-9]\{6\}: /]::/" input
#1 of the "s/#1/#2/" instruction searchs for the pattern, but you need to escape curly braces (\{ and \}). Then replace it to #2, which will add ]: backward cause it is in the search pattern. If you needs ::, the add it into the replace pattern, like above.
But maybe you don't need to search and replace ]:, just replace digits and dot to : with command (it works for your example)
sed "s/ [0-9]\{10\}\.[0-9]\{6\}: /:/" input
You can choose to use sed with extended regex. But note that the extended regex is a GNU extension and so may not be portable. Here is the same sed as suggested by #Konstantin Morenko, but without the backslashes for the { and }. Extended regex option is -r or --regexp-extended
sed -r "s/ [0-9]{10}\.[0-9]{6}: /:/" input

Help with SED syntax : unterminated `s' command

Edit: I'm using CYGWIN/GNU sed version 4.1.5 on windows Vista and I want a case insensitive search
I want to use sed to replace inline, the following:
c:\DEV\Suite\anything here --- blah 12 334 xxx zzzzz etc\Modules etc
Edit: anything here --- blah 12 334 xxx zzzzz etc means anything could appear here. Sorry for omitting that.
In a file with lines like
FileName="c:\DEV\Suite\anything here --- blah 12 334 xxx zzzzz etc\Modules\.... snipped ...."
with a value I supply, say :
Project X - Version 99.98
So the file ends up with:
FileName="c:\DEV\Suite\Project X - Version 99.98\Modules\.... snipped ...."
My attempt:
c:\temp>sed -r -b s/Dev\\Suite\\.*\\Modules/dev\\suite\\simple\\/g test.txt
However I get the following error:
sed: -e expression #1, char 42: unterminated `s' command
Thanks.
Edit:
I've already tried added quotes.
It's the '\\' before the '/'. Apparently you need 4 backslashes.
sed -r -b "s/Dev\\\\Suite\\\\.*\\\\Modules/dev\\\\suite\\\\simple\\\\/g" test.txt
I think the shell is interpreting the '\\' into a '\' before passing it to sed, and then sed is doing the same thing on what it gets.
Single quotes would work, so:
sed -r -b 's/Dev\\Suite\\.*\\Modules/dev\\suite\\simple\\/g' test.txt
If I use "\\\" where you have "\\", it works for me. With the double backslashes, the way it gets parsed evidently has a backslash escaping the terminating "/" of the substitution expression. (I still get the error if I replace ".*" with ".+".)
(Amusingly, I had to add more backslashes to get this to post properly -- SO ate a few of them!)
Got it: Replace the .* with .+
sed -r -b s/Dev\\Suite\\.+\\Modules/dev\\suite\\simple\\/g test.txt
I don't know what version of sed your using. I'm not familiar with the -b option.
First, I'd suggest using the i regex flag, to make it case insensitive. Your example of DEV won't match your regex of Dev.
I suspect the problem your running into is how your version of sed interprets backslash characters.
I'd suggest using the sed bundled with Cygwin. With single quotes, it seems to work for me.
echo 'c:\DEV\Suite\anything here --- blah 12 334 xxx zzzzz etc\Modules\' | sed -r 's/Dev\\Suite\\.*\\Modules/dev\\suite\\simple\\/gi'
c:\dev\suite\simple\\
well...
sed -e s/"anything here --- blah 12 334 xxx zzzzz etc"/"Project X - Version 99.98"/g test.txt
worked fine
(The compliant about the unterminated 's' was because of the unescaped '/')
Funny I was having the same issue in one directory but the same command worked in other directories on the same machine. This is the command I was working with
export version=grep "version.*SNAPSHOT.*version" pom.xml |sed -e 's|<version>||g'|sed -e 's|</version>||g'|sed -e "s|\t* *||g"; cat sonar-project.properties.template |sed -e "s/BUILDVERSION/$version/g">sonar-project.properties
when I changed the * to + it worked.
Thanks :-)
Will rename:
TV Show - 376 [720p].mkv
TV Show - 377 [720p].mkv
to
376.mkv
377.mkv
works under cygwin.
#!/bin/bash
for i in *; do
mv "$i" "`echo $i | sed -r -b 's/^.*[ ]([0-9]*)[ ].*$/\1.mkv/'`";
done