Powershell Regex Pattern for EDI - regex

New to Powershell/Regex/EDI. Please no comments on why this shouldn't be done with regular expressions for EDI, I've seen the warnings, but have no choice.
What I need is most likely basic, but I need some help. I have to find all instances of a segment, and retrieve a specific element value from it. The text being searched will be read as one long string, with no CR/LF/etc.
Example data:
~SV1*HC:V2020*35*UN*1***1~DTP*472*D8*20120807~REF*6R*
~SV1*HC:V2100:LT*28.98*UN*1***1~DTP*472*D8*20120807~REF*6R*
~SV1*HC:92014*165*UN*1***1~DTP*472*D8*20120716~REF*6R*
I'm using the following command on another segment and it works just like i want, but it doesn't have to account for non-word characters either:
Select-String -pattern '~svd\*\w+\*(\d+|\d+\.\d+)\*' -input $string -AllMatches | %{$_.Matches} | %{$_.Groups[1]} | %{$_.Value}
Ideally, I would like to find an instance of "~SV1*", skip to the next asterisk, then read everything through the next asterisk. That way it doesn't matter what letter/number/character is in there, it skips it. In the data example above, I would want a return of 35, 28.98, 165. If not, then I can work with what I have, but matching on combinations of word/non-word characters is throwing me, since I don't know what order they may exist in. Everything else I've played with has continued on pulling the rest of the string, and not stopping properly.
If I can get it to do this, I'd be very happy:
~SV1*<skip this>*<get this>*<skip to next SV1>~SV1*<skip this>*<get this>*<skip to next SV1>
Lastly, the data being pulled is a money field, so it may or may not have a decimal present. If there is a cleaner way than (\d+|\d+.\d+), I'm all for it.
Thanks

A starting point, but you need to test it:
Select-String -pattern '(?<=~sv\d\*.*\*)(\d*\.?\d+)(?=\*un)' -input $string -AllMatches | %{$_.Matches} | % {$_.Groups[1]} | %{$_.Value}
Using your example data returns 35, 28.98, 165.

Use a pattern like this:
~sv\d\*[^*]*\*([^*]*)\*

Related

Understanding access of object attributes in Powershell scripting

Firstly I'm trying to understand this. Second I would like to use it.
# test string
$pgNumString = 'C:\test\test5\AALTONEN-ALLAN_PENCARROW_PAGE_1.txt'
# Regex with capture group for number '1' ONLY from $pgNumString
# In other use cases it may be page 10 or any page in 100s
$pgNumRegex = "(?s)_(\d+)\."
# Simplest - not using -SimpleMatch because this example uses regex (Select-String docs)
$pgNum = $pgNumString | Select-String -Pattern $pgNumRegex -AllMatches
The match is not assigned to $pgNum. No capture grouping means no good anyway. A slightly more sophisticated attempt:
$pgNum = $pgNumString | Select-String -Pattern $pgNumRegex -AllMatches | Select-Object {$_.Matches.Groups[1].Value}
Output:
$_.Matches.Groups[1].Value
--------------------------
1
The match is still not assigned to $pgNum. But the output shows I'm on the right track. What am I doing wrong?
Especially if you're dealing with strings already in memory, but often also with files (except if they're exceptionally large), use of Select-String isn't necessary and both slows down and complicates the solution, as your example shows.
While -match works in principle too - to focus on matching only what should be extracted - it is limited to one match, whose results are reflected in the automatic $Matches variable.
However, you can make direct use of an underlying .NET API, namely [regex]::Matches().
# Sample input.
$pgNumString = #'
C:\test\test5\AALTONEN-ALLAN_PENCARROW_PAGE_1.txt
C:\test\test6\AALTONEN-ALLAN_PENCARROW_PAGE_42.txt
'#
# -> '1', '42'
# Note: To match PowerShell's case-*insensitive* behavior (not relevant here), use:
# [regex]::Matches($pgNumString, '(?<=_)\d+(?=\.)', 'IgnoreCase').Value
[regex]::Matches($pgNumString, '(?<=_)\d+(?=\.)').Value
As an aside:
Bringing the functionality of [regex]::MatchAll() natively to PowerShell in the future, in the form of a -matchall operator, is the subject of GitHub issue #7867.
Note that I've modified your regex to use look-around assertions so that what it captures consists solely of the substring to extract, reflected in the .Value property.
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Using your original approach requires extra work to extract the capture-group values, with the help of the intrinsic .ForEach() method:
[regex]::Matches($pgNumString, '_(\d+)\.').ForEach({ $_.Groups[1].Value })
As for what you tried:
As Santiago notes, you need to use ForEach-Object instead of Select-Object, but there's an additional requirement:
Given your use of -AllMatches, you need to access .Groups[1].Value on each of the matches reported in .Matches, otherwise you'll only get the first match's capture-group value:
$pgNumString |
Select-String -Pattern $pgNumRegex -AllMatches |
ForEach-Object { $_.Matches.ForEach({ $_.Groups[1].Value }) }
As an aside:
Making Select-String only return the matching parts of the input lines / strings, via an -OnlyMatching switch is a green-lit future enhancement - see GitHub issue #7712
While this wouldn't directly help with capture groups, it is usually possible to reformulate regexes with look-around assertions, as shown with [regex]::Matches() above.

Searching for a matching map name in Powershell

sorry i really dont know how to properly ask this question.
I would like to parse CS:GO Demo files in Powershell, and i would like to retrive the map name from it.
I opening dem files like this:
Get-Content $demo | Select -First 1 | Select-String -Pattern 'de_'
And i get this as response:
HL2DEMO đ5 MatchServer I.
GOTV Demo
de_mirage
csgo
##A g uÔ ~ř˙˙
ą Vđk (8wEÄü€ŢMĐhZăU X#`śh u <zcsgo‚ de_mirageŠ ’sky_dustšGOTV¨ ° ¸  ( 0 ž
I would like to get only the de_mirage as a variable. So if a map changes, then it will be de_dust2 or de_inferno and so on. Does anybody know a solution for this?
Thank you!
When using Get-Content, each line is passed down the pipeline one at a time, unless specifying the -Raw switch. The reason I bring this up is due to your Select cmdlet that you're piping to. When you specified the parameter of -First, with a value of 1, you're only grabbing the first line, and then trying to find the pattern in the first line.
Here's my poor attempt at RegEx:
Get-Content -Path $demo | Where-Object -FilterScript { $_ -match 'de_\w+' }
$Matches[0]
. . .where the $Matches Automatic Variable contains all the matched RegEx patterns (as the name indicates) stored in an array format; where we use the index number to reference the value. This would also work piping to Select-String when searching for a Pattern just like you had done.

Parse log file for lines containing 2 strings and the lines inbetween

I am trying to parse some large log files to detect occurrences of a coding bug. Identifying the defect is finding a sequence of strings on different lines with a date in between. I am terrible at describing things so posting an example:
<Result xmlns="">
<Failure exceptionClass="processing" exceptionDetail="State_Open::Buffer Failed - none">
<SystemID>ffds[sid=EPS_FFDS, 50] Version:01.00.00</SystemID>
<Description>Lo
ck Server failed </Description>
</Failure>
</Result>
</BufferReply>
7/22/2017 8:41:15 AM | SomeServer | Information | ResponseProcessing.TreatEPSResponse() is going to process a response or event. Response.ServiceID [Server_06] Response.Response [com.schema.fcc.ffds.BufferReply]
I will be searching for multiple instances of this sequence through multiple logs: Buffer Failed on followed by Server_#.
The Server_# can be any 2-digit number and will never be on the same line.
Buffer failed will never repeat prior to Server_# being found.
The date and time that is in between but guessing that if this is possible it would be captured also.
Ideally, I would pipe something like this to another file
Buffer Failed - none" 7/22/2017 8:41:15 AM [Server_06]
I have attempted a few things like
Select-String 'Failed - none(.*?)Response.Response' -AllMatches
but it doesn't seem to work across lines.
Select-String can only match text spanning multiple lines if it receives the input as a single string. Plus, . normally matches any character except line feeds (\n). If you want it to match line feeds as well you must prefix your regular expression with the modifier (?s). Otherwise you need an expression that does include line feeds, e.g. [\s\S] or (.|\n).
It might also be advisable to anchor the match at expressionDetail rather than the actual detail, because that makes the match more flexible.
Something like this should give you the result you're looking for:
$re = '(?s)exceptionDetail="(.*?)".*?(\d+/\d+/\d+ \d+:\d+:\d+ [AP]M).*?\[(.*?)\] Response\.Response'
... | Out-String |
Select-String -Pattern $re -AllMatches |
Select -Expand Matches |
ForEach-Object { '{0} {1} [{2}]' -f $_.Groups[1..3] }
The expression uses non-greedy matches and 3 capturing groups for extracting exception detail, timestamp and servername.

Update a line in the AD info field

I have slight problem.
We have a PowerShell script that sets an expiration date in the 'Notes:' field in the AD.
What i want to do is to be able to remove/update this w/o removing other data in the field.
Example of 'Notes:' field (for ie. user X):
GR1234567890 expires on 20251125
END
If i use following code to try and isolate everything but the line starting with GR in it.
$UserName = Get-ADUser -Filter {SAMAccountName -eq "X"} -Properties Info
$UserName.Info | Select-String -Pattern 'GR[\s\S].+' -NotMatch
I get a "full match" and no output at all.
And if i remove -NotMatch i get a full match and full output of 'Notes:' field.
I've tried the RegEx in some of the RegEx online testers out there and there it works as expected. It is like there are no LF/CR or some wierd encoding on the output when traversing the pipeline...
I could do a match GR, a date and everything in between i guess... but id like for knowledge sake want to know if the above thinking is not possible or totally wrong (RegEX is not my strongest suit).
Problem was indeed the code itself as pointed out by Wiktor. Hats of to him.
$UserName.Info -Replace '^GR.+'
Will remove the line i want removed.

PowerShell Select-String regular expression to locate two strings on the same line

How do I use Select-String cmdlet to search a text file for a string which starts with a specific string, then contains random text and has another specific string towards the end of the line? I'm only interested in matches across a single line in the text file, not across the entire file.
For example I am searching to match both 'Set-QADUser' and 'WhatIf' on the same line in the file. And my example file contains the following line:
Set-QADUser -Identity $($c.ObjectGUID) -ObjectAttributes #{extensionattribute7=$ekdvalue} -WhatIf | Out-Null
How do I use Select-String along with a Regular Expression to locate the pattern in question? I tried using the following and it does work but it also matches other instances of either 'Set-QADUser' or 'WhatIf' found elsewhere in the text file and I only want to match instances when both search strings are found on the same line.
Select-String -path "test.ps1" -Pattern "Set-QADUser.*WhatIf" | Select Matches,LineNumber
To make this more complicated I actually want to perform this search from within the script file that is being searched. Effectively this is used to warn the user that the script being run is currently set to 'WhatIf' mode for testing. But of course the regEx matches the text from the actual Select-String cmd within the script when it's run - so it finds multiple matches and I can't figure out a very good way to overcome that issue. So far this is what I've got:
#Warn user about 'WhatIf' if detected
$line=Select-String -path $myinvocation.mycommand.name -Pattern "Set-QADUser.*WhatIf" | Select Matches,LineNumber
If ($line.Count -gt 1)
{
Write-Host "******* Warning ******"
Write-Host "Script is currently in 'WhatIf' mode; to make changes please remove '-WhatIf' parameter at line no. $($line[1].LineNumber)"
}
I'm sure there must be a better way to do this. Hope somebody can help.
Thanks
If you use the -Quiet switch on Select-String it will just return a boolean True/False, depending on whether it found a match or not.
-Quiet <SwitchParameter>
Returns a Boolean value (true or false), instead of a MatchInfo object. The value is "true" if the pattern is found; otherwise, the value is "false".
Required? false
Position? named
Default value Returns matches
Accept pipeline input? false
Accept wildcard characters? false