Match Multi line Events Using Only Starting Value - regex

I'm attempting to match events where the only way to tell when an event starts and ends is with the header or first value in the multi line event (e.g. START--). Basically, using the header as an ending anchor to get the whole event. Also, the last event will end at the end of the file, so there's no anchor for that one. I'm not quite sure how to make this work.
Event Example (There's no spaces between the lines)
START--random stuff here
more random stuff on this new line
more stuff and things
START--some random things
additional random things
blah blah
START--data data more data
START--things
blah data
$FileContent | select-string '^START--(.*?)^START--' -AllMatches | Foreach {$_.Matches} | Foreach {$_.Value}

You may read in the file into a single variable (it can be done by passing -Raw option to Get-Content, for example) and split it at the start of lines starting with START-- but the first line:
$contents = Get-Content 'your_file_path' -Raw
$contents -split '(?m)^(?!\A)(?=START--)'
It will yield
Regex details
(?m) - the multiline option is ON
^ - now, it matches start of lines due to (?m)
(?!\A) - not the start of the whole string/text
(?=START--) - the location that is immediately followed with START-- substring.

Related

How to replace lines depending on the remaining text in file using PowerShell

I need to edit txt file using PowerShell. The problem is that I need to apply changes for the string only if the remaining part of the string matches some pattern. For example, I need to change 'specific_text' to 'other_text' only if the line ends with 'pattern':
'specific_text and pattern' -> changes to 'other_text and pattern'
But if the line doesn't end with pattern, I don't need to change it:
'specific_text and something else' -> no changes
I know about Replace function in PowerShell, but as far as I know it makes simple change for all matches of the regex. There is also Select-String function, but I couldn't combine them properly. My idea was to make it this way:
((get-content myfile.txt | select-string -pattern "pattern") -Replace "specific_text", "other_text") | Out-File myfile.txt
But this call rewrites the whole file and leaves only changed lines.
You may use
(get-content myfile.txt) -replace 'specific_text(?=.*pattern$)', "other_text" | Out-File myfile.txt
The specific_text(?=.*pattern$) pattern matches
specific_text - some specific_text...
(?=.*pattern$) - not immediately followed with any 0 or more chars other than a newline as many as possible and then pattern at the end of the string ($).

Extracting match from text file if subsequent lines contain specific strings

I'm trying to pull certain lines of data from multiple text file using a certain match of data. I have that part working (it matches on the strings that I have and pulling back the entire line). That's what I want, but I also need a certain line of data that occurs before the match (only when it matches). I also have that working, but its not 100% right.
I have tried to accomplish pulling the line above my match by using the -Context parameter. It seems to work, but in some cases it is merging data together from multiple matches and not pulling the line above my matches. Below is a sample of one of the files that I'm searching in:
TRN*2*0000012016120500397~
STC*A3:0x9210019*20170103*U*18535********String of data here
STC*A3:0x810049*20170103*U*0********String of Data here
STC*A3:0x39393b5*20170103*U*0********String of data here
STC*A3:0x810048*20170103*U*0********String of data here
STC*A3:0x3938edc*20170103*U*0********String of data here
STC*A3:0x3938edd*20170103*U*0********String of data here
STC*A3:0x9210019*20170103*U*0********String of data here
TRN*2*0000012016120500874~
STC*A3:0x9210019*20170103*U*18535********String of data here
STC*A3:0x39393b5*20170103*U*0********String of data here
STC*A3:0x3938edc*20170103*U*0********String of data here
STC*A3:0x3938edd*20170103*U*0********String of data here
STC*A3:0x9210019*20170103*U*0********String of data here
TRN*2*0000012016120500128~
STC*A3:0x810049*20170103*U*0********String of Data here
STC*A3:0x39393b5*20170103*U*0********String of data here
STC*A3:0x810024*20170103*U*0********String of data here
STC*A3:0x9210019*20170103*U*0********String of data here
TRN*2*0000012016120500345~
STC*A3:0x9210019*20170103*U*18535********String of data here
STC*A3:0x810049*20170103*U*0********String of Data here
STC*A3:0x39393b5*20170103*U*0********String of data here
STC*A3:0x3938edc*20170103*U*0********String of data here
TRN*2*0000012016120500500~
STC*A3:0x810048*20170103*U*18535********String of data here
TRN*2*0000012016120500345~
STC*A3:0x810049*20170103*U*18535********String of data here
I'm trying to pull the TRN*2 line only when the lines below each TRN*2 have STC*A3:0x810024 and STC*A3:0x810048 in them, but again getting inconsistent results.
Is there a way that I could search for the TRN*2 line and pull the TRN*2 and the lines below it that contain STC*A3:0x810024 and STC*A3:0x810048? If the lines below the TRN*2 line do not contain STC*A3:0x810024 and STC*A3:0x810048, then don't pull anything.
Here is my code so far:
$FilePath = "C:\Data\2017"
$files = Get-ChildItem -Path $FilePath -Recurse -Include *.277CA_UNWRAPPED
foreach ($file in $files) {
(Get-Content $file) |
Select-String -Pattern "STC*A3:0x810024","STC*A3:0x810048" -SimpleMatch -Context 1,0 |
Out-File -Append -Width 512 $FilePath\Output\test_results.txt
}
Your approach won't work, because you're selecting lines that contain STC*A3:0x810024 or STC*A3:0x810048 and the line before them. However, the preceding lines don't necessarily start with TRN. Even if they did, the statement would still produce TRN lines that are followed by any of the STC strings, not just lines that are followed by both STC strings.
What you actually want is split the files before lines starting with TRN, and then check each fragment if it contains both STC strings.
(Get-Content $file | Out-String) -split '(?m)^(?=TRN\*2)' | Where-Object {
$_.Contains('STC*A3:0x810024') -and
$_.Contains('STC*A3:0x810048')
} | ForEach-Object {
($_ -split '\r?\n')[0] # get just the 1st line from each fragment
} | Out-File -Append "$FilePath\Output\test_results.txt"
(?m)^(?=TRN\*2) is a regular expression matching the beginning of a line followed by the string "TRN*2". The (?=...) is a so-called positive lookahead assertion. It ensures that the "TRN*2" is not removed when splitting the string. (?m) is a modifier that makes ^ match the beginning of a line inside a multiline string rather than just the beginning of the string.

Parse log file for lines containing 2 strings and the lines inbetween

I am trying to parse some large log files to detect occurrences of a coding bug. Identifying the defect is finding a sequence of strings on different lines with a date in between. I am terrible at describing things so posting an example:
<Result xmlns="">
<Failure exceptionClass="processing" exceptionDetail="State_Open::Buffer Failed - none">
<SystemID>ffds[sid=EPS_FFDS, 50] Version:01.00.00</SystemID>
<Description>Lo
ck Server failed </Description>
</Failure>
</Result>
</BufferReply>
7/22/2017 8:41:15 AM | SomeServer | Information | ResponseProcessing.TreatEPSResponse() is going to process a response or event. Response.ServiceID [Server_06] Response.Response [com.schema.fcc.ffds.BufferReply]
I will be searching for multiple instances of this sequence through multiple logs: Buffer Failed on followed by Server_#.
The Server_# can be any 2-digit number and will never be on the same line.
Buffer failed will never repeat prior to Server_# being found.
The date and time that is in between but guessing that if this is possible it would be captured also.
Ideally, I would pipe something like this to another file
Buffer Failed - none" 7/22/2017 8:41:15 AM [Server_06]
I have attempted a few things like
Select-String 'Failed - none(.*?)Response.Response' -AllMatches
but it doesn't seem to work across lines.
Select-String can only match text spanning multiple lines if it receives the input as a single string. Plus, . normally matches any character except line feeds (\n). If you want it to match line feeds as well you must prefix your regular expression with the modifier (?s). Otherwise you need an expression that does include line feeds, e.g. [\s\S] or (.|\n).
It might also be advisable to anchor the match at expressionDetail rather than the actual detail, because that makes the match more flexible.
Something like this should give you the result you're looking for:
$re = '(?s)exceptionDetail="(.*?)".*?(\d+/\d+/\d+ \d+:\d+:\d+ [AP]M).*?\[(.*?)\] Response\.Response'
... | Out-String |
Select-String -Pattern $re -AllMatches |
Select -Expand Matches |
ForEach-Object { '{0} {1} [{2}]' -f $_.Groups[1..3] }
The expression uses non-greedy matches and 3 capturing groups for extracting exception detail, timestamp and servername.

Match content between IF code block

I'm trying to ensure some stored procedures should not have a RETURN statement, except for the last one. For this task I'm trying to use regular expresions on my PowerShell scripts.
My strategy is to check every IF ##ERROR<>0 block and perform another search inside of them. How can I match content of every IF block of this SQL query?
--Some code here...
IF ##ERROR<>0
BEGIN
RAISERROR('MY ERROR HERE. %s',1,16,#STORE_PROCEDURE_NAME)
GOTO ROLL
END
--More code here...
IF ##ERROR<>0
BEGIN
RAISERROR('ANOTHER ERROR HERE. %s',16,1,#STORE_PROCEDURE_NAME)
GOTO ROLL
END
--more code here...
IF ##ERROR<>0
BEGIN
RAISERROR('MAIN ERROR %s',16,1,#STORE_PROCEDURE_NAME)
ROLL:
ROLLBACK TRANSACTION
RETURN
END
COMMIT TRANSACTION
Use Select-String with a regular expression like this:
IF .*\s+BEGIN([\s\S]*?)END
and select just the groups from the result:
... | Select-String 'IF .*\s+BEGIN([\s\S]*?)END' -AllMatches |
Select-Object -Expand Matches |
Select-Object -Expand Groups |
Where-Object { -not $_.Groups } |
Select-Object -Expand Value
The regular expression matches the keyword IF followed by a space and optional other text on the same line, one or more whitespace character, the keyword BEGIN, and the shortest amount of text (non-greedy match) up to the next occurrence of the keyword END. The subexpression between BEGIN and END is grouped with parentheses so it can be extracted from the full match.
With the following Regex /(If ##ERROR<>0\s+BEGIN)(.*?)(end)/sig I can get the content between the IF and END. Please note that only matches the inmediate END, so actually is matching the END of BEGIN of some blocks.
DEMO CODE

PowerShell - Replace contents of every tag using REGEX

I am trying to write a PowerShell script to replace the contents of tags i have put into an XML file. The tags appear multiple times within the XML, this is resulting in everything between the first and last tag being replaced as it is not stopping the first time the end tag is found.
I am using this:
$NewFile = (Get-Content .\myXML_file.xml) -join "" | foreach{$_ -replace "<!--MyCustom-StartTag-->(.*)<!--MyCustom-EndTag-->","<!--MyCustom-StartTag-->New Contents of Tag<!--MyCustom-EndTag-->"};
Set-Content .\newXMLfile.xml $newfile;
The file has contents like:
<!--MyCustom-StartTag-->
Lots of content
<!--MyCustom-EndTag-->
More stuff here
<!--MyCustom-StartTag-->
Lots of content
<!--MyCustom-EndTag-->
And i am ending up with:
<!--MyCustom-StartTag-->
New Content Here
<!--MyCustom-EndTag-->
Instead of:
<!--MyCustom-StartTag-->
New content
<!--MyCustom-EndTag-->
More stuff here
<!--MyCustom-StartTag-->
New content
<!--MyCustom-EndTag-->
I have tried using: (?!MyCustom-StartTag) but that does work either.
Any ideas of what i should do to get this to work.
Thanks,
Richard
You should use the non-greedy version of *, namely *?. For more info, see: http://www.dijksterhuis.org/csharp-regular-expression-operator-cheat-sheet/ (Powershell uses same regex engine as C#).
$NewFile = (Get-Content .\myXML_file.xml) -join "" | foreach{$_ -replace "<!--MyCustom-StartTag-->(.*?)<!--MyCustom-EndTag-->","<!--MyCustom-StartTag-->New Contents of Tag<!--MyCustom-EndTag-->"};
Set-Content .\newXMLfile.xml $newfile;
I think the reason that you are left with just a single pair of start and end tags is because your query pattern finds three matches in the search string.
The first pair of start and end.
The second pair of start and end.
The start from the first one, and the end tag from the second one (and if this match is found last, it will in fact replace all thats between the first and last with the new value).
So in your "(.*)" you might have to exclude any other start and end tags?