Exclude line which containing strings on the capturing group

Exclude line which containing strings on the capturing group - regex

Logs containing below:
2018-10-31 14:14:39; dcv0000088; 192.168.48.200;
Variable Bindings
vmwVpxdNewStatus:= Green
vmwVpxdObjValue:= alarm.FanHealthAlarm - Event: Hardware Health Changed (3131155);
--ENDOFTRAP--
2018-10-31 10:41:49; sb02; 192.168.41.252;
Variable Bindings
sysUpTime:= 2 days 20 hours 18 minutes 24.23 seconds (24590423)
snmpTrapOID:= FSC-RTP-MIB:iandcAdmin.55.1.3.4.5 (1.3.6.1.4.1.4329.2.55.1.3.4.5)
iandcAdmin.55.1.1.3.0:= SIP Server not running
iandcAdmin.55.1.1.7.0:= SIP Server;
--ENDOFTRAP--
I would like to capture all text after Variable Bindings and before ; but exclude line containing sysUpTime....from the capture.
I use regex:
Variable\sBindings\s+(?P<varBind>[^;]+(?!sysUpTime\:=.*))
but it still not working. Expected result is:
varBind=
vmwVpxdNewStatus:= Green
vmwVpxdObjValue:= alarm.FanHealthAlarm - Event: Hardware Health Changed (3131155)
varBind=
snmpTrapOID:= FSC-RTP-MIB:iandcAdmin.55.1.3.4.5 (1.3.6.1.4.1.4329.2.55.1.3.4.5)
iandcAdmin.55.1.1.3.0:= SIP Server not running
iandcAdmin.55.1.1.7.0:= SIP Server
Please advise. thank you.

You can make an optional (non-capturing) group that will match the sysUpTime line if it's there, ensuring that it won't be included in the subsequent varBind group:
Variable\sBindings\s+(?:sysUpTime.+\s+)?(?P<varBind>[^;]+)
^^^^^^^^^^^^^^^^^^^
https://regex101.com/r/n5zPcr/2
If sysUpTime can appear somewhere other than the first line after Variable Bindings, then note that any group (or full match) must contain contiguous characters from the input - leaving out part of them is not possible without some other method, such as capturing the initial substring, matching the sysUpTime line, and then capturing the later substring.

Related

Regex to match the last line of a JCL job card or the whole card

If any of you are familiar with mainframe JCL.
I'm trying to match the last line of the job card.
Basically the first line that starts with // and ends without a comma.
In the example I need the 3rd line or up to the 3rd line matched.
I'm using Ansible's lineinfile to dynamically insert a route card after the job card.
For example:
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0, <--- start of job card
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID <--- end of job card
//STEPNAME EXEC PGM=BPXBATCH
//STDERR DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDPARM DD *
SH cat /dev/urandom
So far I got this, which matches the start of // and anything after, but, I cant figure out the last part
^(\Q//\E(.)*)

Parsing JCL in the general case is hard. As noted in the comments, the rules are full of caveats.
I have an ANTLR4 grammar for JCL, it's MIT licensed. Possibly of use. It reflects the beauty of JCL.

To match the whole job card (in this case 3 lines):
(?sm)\A.*?\/\/[^*]((?!\/\*)[^\n])*[^,]$
See live demo.
Breaking this down:
(?sm)
s enables the DOTALL flag (meaning . matches new lines too)
m enables the MUTLILINE flag (meaning ^ and $ match start and end of lines
\A means start of input (so it only matches at the very start)
.*? means anything, but as little as possible
//[^*]
((?!\/\*)[^\n])* means non-new lines, except the sequence /* (so don't match when a comment is put in line)
[^,] not a comma
$ end of line
In English: "match from the start until there's a non-comma at the end of a line that is not a comment, or does not end with a comment"
You would then replace with $0 (group zero is the entire match) followed by your injected content:
$0\\n*ROUTE statement

You can use a negative lookbehind for this: (?<!,).
But you'll also need to insert after the firstmatch and use backrefs.
Given the task:
- lineinfile:
path: file.jcl
regexp: '^(\/\/.*)(?<!,)$'
line: "\\1\\n//*ROUTE statement"
firstmatch: true
backrefs: true
You would end up, from your example, with:
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0,
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID
//*ROUTE statement
//STEPNAME EXEC PGM=BPXBATCH
//STDERR DD SYSOUT=*
//STDOUT DD SYSOUT=*
//STDPARM DD *
SH cat /dev/urandom

For the general case this is tougher than you think because of comments allowed within the scope of the JOB card.
//SPOOL1 JOB (UU999999999,1103),'Programmer',CLASS=0, <--- start of job card
// REGION=0M,MSGCLASS=R,TIME=5, LINES=(999999,WARNING),
// NOTIFY=&SYSUID <--- end of job card
The strings you show:
<--- start of job card
LINES=(999999,WARNING),
<--- end of job card
are all valid as comments in JCL because they follow a space.
You can even have whole comment lines within the JOB card. For example:
//name JOB (accounting info),'data capture ___',
//* TYPRUN=SCAN,
// NOTIFY=&SYSUID,
// CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1),TIME=(5,00),
// REGION=5M
So you're not necessarily looking for the first card that doesn't end in a comma unless you can restrict the JCL you're looking at.
Your JOB card starts with //name JOB and ends just before the next //name card. *** edit *** As was correctly pointed out, the JOB card could be followed by a card which does not require a name field, like // SET for example. See https://www.ibm.com/docs/en/zos/2.4.0?topic=statements-jcl-statement-fields *** end of edit ***
It starts with ^(\Q//\E)[A-Z0-9]+\s+\QJOB\E.+
and ends just before the next named card ^(\Q//\E)[A-Z0-9]+\s+
But I don't know regular expressions well enough to find the "just before" point to insert your new line. Hopefully someone else can add that.

Regular Expression to match groups that may not exist

I'm trying to capture some data from logs in an application. The logs look like so:
*junk* [{count=240.0, state=STATE1}, {count=1.0, state=STATE2}, {count=93.0, state=STATE3}, {count=1.0, state=STATE4}, {count=1147.0, state=STATE5}, etc. ] *junk*
If the count for a particular state is ever 0, it actually won't be in the log at all, so I can't guarantee the ordering of the objects in the log (The only ordering is that they are sorted alphabetically by state name)
So, this is also a potential log:
*junk* [{count=240.0, state=STATE1}, {count=1.0, state=STATE4}, {count=1147.0, state=STATE5}, etc. ] *junk*
I'm somewhat new to using regular expressions, and I think I'm overdoing it, but this is what I've tried.
^[^=\n]*=(?:(?P<STATE1>\d+)(?=\.0,\s+\w+=STATE1))*.*?=(?P<STATE2>\d+)(?=\.0,\s+\w+=STATE2)*.*?=(?P<STATE3>\d+)(?=\.0,\s+\w+=STATE3)
The idea being that I'll loook for the '=' and then look ahead to see if this is for the state that I want, and it may or may not be there. Then skip all the junk after the count until the next state that I'm interested in(this is the part that I'm having issues with I believe). Sometimes it matches too far, and skips the state I'm interested in, giving me a bad value. If I use the lazy operator(as above), sometimes it doesn't go far enough and gets the count for a state that is before the one I want in the log.

See if this approach works for you:
Regex: (?<=count=)\d+(?:\.\d+)?(?=, state=(STATE\d+))
Demo
The group will be your State# and Full match will be the count value

You might use 2 capturing groups to capture the count and the state.
To capture for example STATE1, STATE2, STATE3 and STATE5, you could specify the numbers using a character class with ranges and / or an alternation.
{count=(\d+(?:\.\d+)?), state=(STATE(?:[123]|5))}
Explanation
{count= Match literally
( Capture group 1
\d+(?:\.\d+)? Match 1+ digits with an optional decimal part
) Close group
, state= Match literally
( Capture group 2
STATE(?:[123]|5) Match STATE and specify the allowed numbers
)} Close group and match }
Regex demo
If you want to match all states and digits:
{count=(\d+(?:\.\d+)?), state=(STATE\d+)}
Regex demo

After some experimentation, this is what I've come up with:
The answers provided here, although good answers, don't quite work if your state names don't end with a number (mine don't, I just changed them to make the question easier to read and to remove business information from the question).
Here's a completely tile-able regex where you can add on as many matches as needed
count=(?P<GROUP_NAME_HERE>\d+(?=\.0, state=STATE_NAME_HERE))?
This can be copied and appended with the new state name and group name.
Additionally, if any of the states do not appear in the string, it will still match the following states. For example:
count=(?P<G1>\d+(?=\.0, state=STATE_ONE))?(?P<G2>\d+(?=\.0, state=STATE_TWO))?(?P<G3>\d+(?=\.0, state=STATE_THREE))?
will match states STATE_ONE and STATE_THREE with named groups G1 & G3 in the following string even though STATE_TWO is missing:
[{count=55.0, state=STATE_ONE}, {count=10.0, state=STATE_THREE}]
I'm sure this could be improved, but it's fast enough for me, and with 11 groups, regex101 shows 803 steps with a time of ~1ms
Here's a regex101 playground to mess with: https://regex101.com/r/3a3iQf/1
Notice how groups 1,2,3,4,5,6,7,9, & 11 match. 8 & 10 are missing and the following groups still match.

would like to get a regex expression for my multi line log

I am trying to form the correct regex to capture strings out of a multi line log like -
AMQ9206: Error sending data to host hic4 (10.254.101.168)(1414).
or
AMQ9999: Channel 'TO.MQH4' to host 'HIC4(1414)' ended abnormally.
multi line log excrepts as follows -
06/17/16 22:45:14 - Process(509640.1) User(mqsystem) Program(runmqchl)
Host(mqah103p) Installation(MQAppliance)
VRMF(8.0.0.4) QMgr(PRDCDE3A)
AMQ9206: Error sending data to host hic4 (10.254.101.168)(1414).
--------------------------- amqccita.c : 3166 ----------------------------------
06/17/16 22:45:14 - Process(509640.1) User(mqsystem) Program(runmqchl)
Host(mqah103p) Installation(MQAppliance)
VRMF(8.0.0.4) QMgr(PRDCDE3A)
AMQ9999: Channel 'TO.MQH4' to host 'HIC4(1414)' ended abnormally.

Depending on the programming language, this will be expressed slightly differently, but the main trick is to enable multi-line mode in your regex. This will allow special characters like ^ and $ to match the beginning and end of a line instead of the beginning and end of the string.
Assuming your log always has this general format of AMQ followed by 4 numbers, the regex would be something like:
/^AMQ\d{4}: .*$/gm
Regex101 Demo

How to parse csv output requiring multiple matches using one-liner?

I have a scenario, where I have post-process / filter values taken out from DB. I'm using perl ple for the task. All works well until I come across extracted output (csv) which contains multiple text tags. See sample here. The code works same (extract regex) correctly if there is just one text tag. In my db there are instances where there are more then one text files (i.e rule conditions).
The code is
echo "COPY (SELECT rule_data FROM custom_rule) TO STDOUT with CSV HEADER" | psql -U qradar -o /tmp/Rules.csv qradar;
perl -ple '
($enabled) = /(?<=enabled="").*?(?="")/g;
($group) = /(?<=group="").*?(?="")/g;
($name) = /(?<=<name>).*?(?=<\/name>)/g;
($text) = /(?<=<text>).*?(?=<\/text>)/g;
$_= "$enabled;$group;$name;$text";
s/<.*?>//g;
' Rules.csv > rules_revised.csv
Just running the code on sample output I get following content in rule_revised file.
true;Flow Property Tests;DoS: Local Flood (Other);when the flow bias
is any of the following outbound
Actually the line is truncated after outbound which infact should carry information similar to this..
when at least 3 flows are seen with the same Source IP,
Destination IP in 5 minutes and when the IP protocol is one of the
following IPSec, Uncommon and when the source packets is greater than
60000
I have tried to correct this by making the regex greedy removing the ? in $text but then it overflow all in-between text till the last text and at the end removing lt;.*?>messes the rest as it includes all the tag characters (i.e html) elements which I originally intended to dis include before making the regex greedy change.

The reason you are getting a truncated result with multiple matches is that you only store the first one.
($text) = /(?<=<text>).*?(?=<\/text>)/g;
This only stores the first match. If you change that scalar to an array, you will capture all matches:
(#text) = /(?<=<text>).*?(?=<\/text>)/g;
When you interpolate the array, it will insert spaces (the value of $") between the elements. If you do not want that, you can change the value of $" to an acceptable delimiter. To be clear, you would change two characters to get the following lines:
(#text) = /(?<=<text>).*?(?=<\/text>)/g;
...
$_= "$enabled;$group;$name;#text";
If I run your code on your sample with these changes the output looks like this:
false;Flow Property Tests;DoS: Local Flood (Other);when the flow bias is any of the following outbound when at least 3 flows are seen with the same Source IP, Destination IP in 5 minutes when the IP protocol is one of the following IPSec, Uncommon when the source packets is greater than 60000

Have you tried to use the s modifier, it make the dot match newline:
perl -ple '
($enabled) = /(?<=enabled="").*?(?="")/g;
($group) = /(?<=group="").*?(?="")/g;
($name) = /(?<=<name>).*?(?=<\/name>)/g;
($text) = /(?<=<text>).*?(?=<\/text>)/gs;
# here ___^
$_= "$enabled;$group;$name;$text";
s/<.*?>//g;
' Rules.csv > rules_revised.csv

Find and verify path strings in text file using PowerShell, RegEx search

First time posting here, and I'll try to be clear and detailed, but be gentle if I missed an existing answer when I searched these boards.
First, the issues:
How to exclude a RegEx response that contains a specific keyword ("fastcopy")
How to include path results that do not end in a file name/wildcard
I am working with a set of text files that are very similar to batch files. They are plain text, and contain header lines, lines containing paths to files on a server, and comment lines. Commented lines begin with a semicolon (;), so that is simple enough to rule out. The paths should all start with a variable %INSTDIR%, but they may or may not have quotes surrounding the path, and they may or may not have execution options following the path. One last note... the company uses FastCopy.exe to dump files/folders down from the network, and in such a line I would like to return the folder/file being copied instead of the path containing fastcopy.exe.
Here is a sample (kind of large to show potential issues):
[Installing .NET 3.5 Hotfix KB943326 for App1]
; *** Added NET 3.5 SP1 hotfix KB943326: resolves App1 hidden menus force laptop re-booting
1 = %INSTDIR%\ToolShare$\Sample_Toolbox\applications\.NET_3.5_Hotfix_KB943326\WindowsXP-KB943326-x86-ENU.exe /quiet /norestart
[Installing Agent 5.3.1]
1 = %INSTDIR%\ToolShare$\Sample_Toolbox\applications\AGenT_531_2.0\w7wxp_ze_20\install.exe
[Installing APR Manager 2.1]
1 = %INSTDIR%\ToolShare$\Sample_Toolbox\applications\APRManager_21_Updated_2.0\wviwxp_ze_20\install.exe
[Installing Scope Simulator]
1 = MD "C:\Temp\scope_simulator_10"
2 = start /wait /high %INSTDIR%\ToolShare$\Site_Toolbox\Custom_Scripts\Source\fastcopy.exe /auto_close /no_confirm_del /no_confirm_stop /log=FALSE /open_window /force_start /force_close /stream=FALSE /cmd=diff "%INSTDIR%\ToolShare$\Sample_Toolbox\applications\scope_simulator_10" /to="C:\Temp\scope_simulator_10"
3 = "C:\Temp\scope_simulator_10\w7wxp_ze_10\Install.exe"
4 = RD "C:\temp\scope_simulator_10" /q /s
[Installing Log Analyzer Offline 2.6.1]
1 = %INSTDIR%\ToolShare$\Sample_Toolbox\applications\Log_Analyzer_Offline_261\wxp_ze_10\install.exe
[Installing Data Migration Script]
1 = MD "C:\Temp\Data Migration"
2 = xcopy "%INSTDIR%\ToolShare$\Sample_Toolbox\Support\Data Migration\*.*" "C:\Temp\Data Migration" /y /e
3 = xcopy "%INSTDIR%\ToolShare$\Sample_Toolbox\Support\Data Migration\Data Migration.lnk" C:\DOCUME~1\ALLUSE~1\Desktop\ /Y
I have it set to pull a 'dir \\UNCPath\*.ini' and then loop through that doing a ForEach ($INI in $Results) bit. The line that I have been using inside the loop to try and pull the paths from each line is:
gc $ini|?{!($_ -match "^;") -and ($_ -match "%INST[^`"]*?\\.*(\.\w{3}|\.\*)(?=`"|\s|\Z)")}|%{$TestPath = $Matches[0].replace("%INSTDIR%","\\ServerName1");if(test-path $testpath){write-host " [OK] " -foregroundcolor Green -NoNewline}else{write-host "[Missing] " -ForegroundColor red -NoNewline};write-host "$testpath"}
This gets me almost everything I could want. What it doesn't do is get anything that does not end in either a .* or standard 3 character extension (.exe, .cmd, .jar etc). Plus it kicks back the fastcopy path instead of the path that it being attempted to be copied.
What I would like for results:
%INSTDIR%\ToolShare$\Sample_Toolbox\applications\.NET_3.5_Hotfix_KB943326\WindowsXP-KB943326-x86-ENU.exe
%INSTDIR%\ToolShare$\Sample_Toolbox\applications\AGenT_531_2.0\w7wxp_ze_20\install.exe
%INSTDIR%\ToolShare$\Sample_Toolbox\applications\APRManager_21_Updated_2.0\wviwxp_ze_20\install.exe
%INSTDIR%\ToolShare$\Sample_Toolbox\applications\scope_simulator_10
%INSTDIR%\ToolShare$\Sample_Toolbox\applications\Log_Analyzer_Offline_261\wxp_ze_10\install.exe
%INSTDIR%\ToolShare$\Sample_Toolbox\Support\Data Migration\*.*
%INSTDIR%\ToolShare$\Sample_Toolbox\Support\Data Migration\Data Migration.lnk
I do not get the second result (instead I get the FastCopy path, but even if I strip Fastcopy from the line and only have the desired path it won't return it). Any suggestions are welcome.

The following script should work just fine.
$paths = Get-Content $ini | Foreach {
if ($_ -match "^(?=[^;]).*?(?<delimiter>[""' ])(?<path>%INSTDIR%(?!.*?fastcopy.exe).*?)(?:\1|$)")
{
Write-Output $Matches["path"]
}
}
The $paths variable will now contain all the paths requested. Observe that if any string contains the "fastcopy.exe" literal string anywhere in the path it will not be found by this regular expression.
An attempt to explaining the regular expression:
^ - match the start of the line
(?=[^;]) - positive lookahead verifying that the line does not start with a semicolon
.*? - any character, as few as possible (to remove all characters before the path we want to match)
(?<delimiter>["' ]) - named group verifying whether the path is surrounded by space, a quotation character or a apostrophe.
(?<path> - start a named capturing group for capturing the "path"
%INSTDIR% - matches the literal string '%INSTDIR%'
(?!.*?fastcopy.exe) - negative lookahead verifying that the part of the line we're trying to match (which has started with %INSTDIR%) doesn't contain the word fastcopy.exe anywhere later in the string (the second time the %INSTDIR% occurs on the fastcopy line, the rest of the line does not contain the fastcopy.exe literal string).
.*? - matches any character, as few as possible, to make sure that we stop as soon as we find a matching delimiter character below
) - ends the named capturing group "path"
(?:\1|$) - matches (in a non-capturing group) the character found by the delimiter group above (to match a quotation character, apostrophe or space, depending on what character was immediately before the %INSTDIR% literal string), or the end of the line.
If anything is unclear, please add a comment below asking for clarifications.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Exclude line which containing strings on the capturing group - regex

Related

Regex to match the last line of a JCL job card or the whole card

Regular Expression to match groups that may not exist

would like to get a regex expression for my multi line log

How to parse csv output requiring multiple matches using one-liner?

Find and verify path strings in text file using PowerShell, RegEx search

Categories

Resources