Regular Expression Remove large portions of string

Regular Expression Remove large portions of string - regex

I am looking to extract the name value from the output returned by the line...
Gwmi win32_groupuser –computer $env:computername | ? {$_.groupcomponent –like '*"Administrators"'} | Select -Property PartComponent
I would think that I would use a regular expression to trim but I am unfamiliar with their operation and (as of yet) have been unable to find instructions sufficient to complete this.
For reference, the output is something like...
\\My_Machine\root\cimv2:Win32_UserAccount.Domain="My_Machine",Name="Administrator"
And I would like to extract 'Administrator' from that output.
Right now I'm trying...
$Report = Gwmi win32_groupuser –computer $env:computername | ? {$_.groupcomponent –like '*"Administrators"'} | Select -Property PartComponent
$Report | ForEach-Object {$_.PartComponent -match '(?<=Name=")[^"]+[^"]'
[PSCustomObject]#{Resultant_String=$Matches.Values}}
and I'm getting...
Resultant_String
{Administrator}
True
{admin}
True
{GroupName}
True
{UserName}
True
{CiscoHistRprtUsr}

Try something like this, (?<=Name=")[^"]+[^"]. What this is doing is a positive look behind for Name=" and then any char (except ") in repetition and then any char that isn't ". This is a little more robust, since this will still work if the Name= is not the last element in your string. If Name= is say the first value returned, it should still only capture what is in the quotes directly following it, and not everything else until the last ".
Using your test example, I did some testing here.

If Name is always at the end of the string, you can simply use
([^"]*)"$
Explanation here : http://regex101.com/r/yV3uD6

Two solutions:
(?<=Name=")[^"]*?(?=")
Note that the the ? after the * is important here. It makes the quantifier non-greedy (it's just a fancy way to say that it will capture as few characters as possible and never go beyond the closing ").
(?:Name=")([^"]*?)(?:")
Since the first and the last groups are non-capturing, you just have to retrieve the value of the second group (which is actually the first and only one in terms of capture) with something like \1 or $1.

Ultimately I ended up with this.
$FileServer = "My_File_Server"
$LocalHostName = $env:computername
$OutPutPath = "\\$FileServer\system_information$\"
$GetAdmins = Gwmi win32_groupuser –computer $LocalHostName | ? {$_.groupcomponent –like '*"Administrators"'} | Select -Property PartComponent
ForEach-Object{$GetAdmins | % {if ($_ -match 'Name="(.+)"') {[PSCustomObject]#{Name=$Matches[1]}}}} | Export-Csv -Path "$OutPutPath\$LocalHostName\$ReportName.csv" -NoTypeInformation}
It produces a table with a name header that has the username of all the local admins extracted from the returned string.

Related

Understanding access of object attributes in Powershell scripting

Firstly I'm trying to understand this. Second I would like to use it.
# test string
$pgNumString = 'C:\test\test5\AALTONEN-ALLAN_PENCARROW_PAGE_1.txt'
# Regex with capture group for number '1' ONLY from $pgNumString
# In other use cases it may be page 10 or any page in 100s
$pgNumRegex = "(?s)_(\d+)\."
# Simplest - not using -SimpleMatch because this example uses regex (Select-String docs)
$pgNum = $pgNumString | Select-String -Pattern $pgNumRegex -AllMatches
The match is not assigned to $pgNum. No capture grouping means no good anyway. A slightly more sophisticated attempt:
$pgNum = $pgNumString | Select-String -Pattern $pgNumRegex -AllMatches | Select-Object {$_.Matches.Groups[1].Value}
Output:
$_.Matches.Groups[1].Value
--------------------------
1
The match is still not assigned to $pgNum. But the output shows I'm on the right track. What am I doing wrong?

Especially if you're dealing with strings already in memory, but often also with files (except if they're exceptionally large), use of Select-String isn't necessary and both slows down and complicates the solution, as your example shows.
While -match works in principle too - to focus on matching only what should be extracted - it is limited to one match, whose results are reflected in the automatic $Matches variable.
However, you can make direct use of an underlying .NET API, namely [regex]::Matches().
# Sample input.
$pgNumString = #'
C:\test\test5\AALTONEN-ALLAN_PENCARROW_PAGE_1.txt
C:\test\test6\AALTONEN-ALLAN_PENCARROW_PAGE_42.txt
'#
# -> '1', '42'
# Note: To match PowerShell's case-*insensitive* behavior (not relevant here), use:
# [regex]::Matches($pgNumString, '(?<=_)\d+(?=\.)', 'IgnoreCase').Value
[regex]::Matches($pgNumString, '(?<=_)\d+(?=\.)').Value
As an aside:
Bringing the functionality of [regex]::MatchAll() natively to PowerShell in the future, in the form of a -matchall operator, is the subject of GitHub issue #7867.
Note that I've modified your regex to use look-around assertions so that what it captures consists solely of the substring to extract, reflected in the .Value property.
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Using your original approach requires extra work to extract the capture-group values, with the help of the intrinsic .ForEach() method:
[regex]::Matches($pgNumString, '_(\d+)\.').ForEach({ $_.Groups[1].Value })
As for what you tried:
As Santiago notes, you need to use ForEach-Object instead of Select-Object, but there's an additional requirement:
Given your use of -AllMatches, you need to access .Groups[1].Value on each of the matches reported in .Matches, otherwise you'll only get the first match's capture-group value:
$pgNumString |
Select-String -Pattern $pgNumRegex -AllMatches |
ForEach-Object { $_.Matches.ForEach({ $_.Groups[1].Value }) }
As an aside:
Making Select-String only return the matching parts of the input lines / strings, via an -OnlyMatching switch is a green-lit future enhancement - see GitHub issue #7712
While this wouldn't directly help with capture groups, it is usually possible to reformulate regexes with look-around assertions, as shown with [regex]::Matches() above.

Searching for a matching map name in Powershell

sorry i really dont know how to properly ask this question.
I would like to parse CS:GO Demo files in Powershell, and i would like to retrive the map name from it.
I opening dem files like this:
Get-Content $demo | Select -First 1 | Select-String -Pattern 'de_'
And i get this as response:
HL2DEMO đ5 MatchServer I.
GOTV Demo
de_mirage
csgo
##A g uÔ ~ř˙˙
ą Vđk (8wEÄü€ŢMĐhZăU X#`śh u <zcsgo‚ de_mirageŠ ’sky_dustšGOTV¨ ° ¸  ( 0 ž
I would like to get only the de_mirage as a variable. So if a map changes, then it will be de_dust2 or de_inferno and so on. Does anybody know a solution for this?
Thank you!

When using Get-Content, each line is passed down the pipeline one at a time, unless specifying the -Raw switch. The reason I bring this up is due to your Select cmdlet that you're piping to. When you specified the parameter of -First, with a value of 1, you're only grabbing the first line, and then trying to find the pattern in the first line.
Here's my poor attempt at RegEx:
Get-Content -Path $demo | Where-Object -FilterScript { $_ -match 'de_\w+' }
$Matches[0]
. . .where the $Matches Automatic Variable contains all the matched RegEx patterns (as the name indicates) stored in an array format; where we use the index number to reference the value. This would also work piping to Select-String when searching for a Pattern just like you had done.

How to convert a string containing 2 numbers to currency with powershell?

I have text files that contain 2 numbers separated by a '+' sign. Trying to figure out how to replace them with currency equivalent .
Example Strings:
20+2 would be converted to $0.20+$0.02 USD
1379+121 would be> $13.79+$1.21 USD
400+20 would be $4.00+$0.20 USD
and so on.
I have tried using a few angles but they do not work or provide odd results.
I tried to do it here by attempting to find by all patterns I think would come up .
.\Replace-FileString.ps1 "100+10" '$1.00+$0.10' $path1\*.txt -Overwrite
.\Replace-FileString.ps1 "1000+100" '$10.00+$1.00' $path1\*.txt -Overwrite
.\Replace-FileString.ps1 "300+30" '$3.00+$0.30' $path1\*.txt -Overwrite
.\Replace-FileString.ps1 "400+20" '$4.00+$0.20' $path1\*.txt -Overwrite
or this which just doesn't work.
Select-String -Path .\*txt -Pattern '[0-9][0-9]?[0-9]?[0-9]?[0-9]?\+[0-9][0-9]?[0-9]?[0-9]?[0-9]?' | ForEach-Object {$_ -replace ", ", $"} {$_ -replace "+", "+$"}

I tried to do it here by attempting to find by all patterns I think would come up
Don't try this - we're humans, and we won't think of all edge cases and even if we did, the amount of code we needed to write (or generate) would be ridiculous.
We need a more general solution here, and regex might indeed be helpful with this.
The pattern you describe could be expressed as three distinct parts:
1 or more consecutive digits
1 plus sign (+)
1 or more consecutive digits
With this in mind, let's start to simplifying the regex pattern to use:
\b\d+\+\d+\b
or, written out with explanations:
\b # a word boundary
\d+ # 1 or more digits
\+ # 1 literal plus sign
\d+ # 1 or more digits
\b # a word boundary
Now, in order to transform an absolute value of cents into dollars, we'll need to capture the digits on either side of the +, so let's add capture groups:
\b(\d+)\+(\d+)\b
Now, in order to do anything interesting with the captured groups, we can utilize the Regex.Replace() method - it can take a scriptblock as its substitution argument:
$InputString = '1000+10'
$RegexPattern = '\b(\d+)\+(\d+)\b'
$Substitution = {
param($Match)
$Results = foreach($Amount in $Match.Groups[1,2].Value){
$Dollars = [Math]::Floor(($Amount / 100))
$Cents = $Amount % 100
'${0:0}.{1:00}' -f $Dollars,$Cents
}
return $Results -join '+'
}
In the scriptblock above, we expect the two capture groups ($Match.Groups[1,2]), calculate the amount of dollars and cents, and then finally use the -f string format operator to make sure that the cents value is always two digits wide.
To do the substitution, invoke the Replace() method:
[regex]::Replace($InputString,$RegexPattern,$Substitution)
And there you go!
Applying to to a bunch of files is as easy as:
$RegexPattern = '\b(\d+)\+(\d+)\b'
$Substitution = {
param($Match)
$Results = foreach($Amount in $Match.Groups[1,2].Value){
$Dollars = [Math]::Floor(($Amount / 100))
$Cents = $Amount % 100
'${0:0}.{1:00}' -f $Dollars,$Cents
}
return $Results -join '+'
}
foreach($file in Get-ChildItem $path *.txt){
$Lines = Get-Content $file.FullName
$Lines |ForEach-Object {
[regex]::Replace($_, $RegexPattern, $Substitution)
} |Set-Content $file.FullName
}

this regular expression work too
\b\d{3,4}(?=\+)|\d{2,3}(?=\")
https://regex101.com/

Do you want something like this output?
$20+$2 would be converted to $0.20+$0.02 USD
$1379+$121 would be> $13.79+$1.21 USD
$400+$20 would be $4.00+$0.20 USD
Then, you may try this command in powershell.
(gc test.txt) -replace '\b(\d+)\+(\d+)\b','$$$1+$$$2' | sc test.txt
gc , sc : alias for get-content, set-content commands respectively
\b(\d+)\+(\d+)\b : match the target string (numbers+numbers) and capturing numbers to $1, $2 in order
$$ : $ must be escaped to indicate literal $ dollor character (what you want to place in front of numbers)
$1, $2 : back-reference to the captured value
test.txt : contains your sample text
Of course, this is applicable for multiple files like follows
gci '*.txt' -recurse | foreach-object{(gc $_ ) '\b(\d+)\+(\d+)\b','$$$1+$$$2' | sc $_ }
gci : alias for get-childitem command. In default, it returns list in the present directory. If you want to change the directory, then must use -path option and -include option.
-recurse option : enables to search sub-directory
Edited
If you want capturing & dividing values & replacing old value with new one like follows
$0.2+$0.02 would be converted to $0.20+$0.02 USD
$13.79+$1.21 would be> $13.79+$1.21 USD
$4+$0.2 would be $4.00+$0.20 USD
then, you may try this.
gci *.txt -recurse | % {(gc $_) | % { $_ -match "\b(\d+)\+(\d+)\b" > $null; $num1=[int]$matches[1]/100; $num2=[int]$matches[2]/100; $dol='$$'; $_ -replace "\b(\d+)\+(\d+)\b","$dol$num1+$dol$num2"}|sc $_}
This command search files in the present directory and sub-directory. If you don't want to search in sub-directory, then remove -recurse option. And if you want another path, then use -path option and -include option like follows.
gci -path "your_path" -include *.txt | % {(gc $_) ...

Other solutions seem excessively complicated, first turning the string to values and then back to strings. Looking at the examples, it is just chopping up a string and re-assembling it while ensuring that the different parts (dollars and cents) have the correct lengths:
('20+2','1379+121','400+20') -replace
'(\d+)\+(\d+)','00$1+00$2' -replace
'0*(\d+)(\d\d)\+0*(\d+)(\d\d)','$$$1.$2+$$$3.$4 USD'
$0.20+$0.02 USD
$13.79+$1.21 USD
$4.00+$0.20 USD
Explanation:
Substitute all the + separated cent values with 0 padded values so there is a minimum of three digits, i.e. at least one digit in the dollars and exactly 2 for the cents.
Collect the individual dollars and cents for each value into distinct capture groups while simultaneously discarding any extraneous leading zeroes.
Re-substitute the (just padded) strings with the appropriately formatted versions.
It is interesting to note how the second substitution relies on the greedy nature of *. The 0* will match just as many leading zeroes as will still leave enough for the remainder of the pattern.
You can put in the word boundary anchor (\b), at one or both ends of the patterns, if you have parts of a line where there are digits separated by + which are directly adjacent to other text and you want them to be NOT processed, otherwise it is unnecessary.
Note: the example above shows an array of String as input and producing an array of String (each element displayed on a separate line). When -Replace is applied to an array, it enumerates the array, applies the replace to each element and collects each (possibly replaced) element into a result array. The output of Get-Content is an array of String (enumerated by PowerShell when supplying a pipeline). Similarly, the 'input' to Set-Content is an array of String (possibly converted from a general Object[] and/or collected from pipeline input). Thus, to convert a file just use:
(gc somefile) -replace ... -replace ... | sc newfile
# or even
sc newfile ((gc somefile) -replace ... -replace ...)
# Set-Content [-Path] String[] [-Value] Object[]
In the above, newfile and somefile can be the same due to a nice feature of Set-Content whereby it does not even open/create its output file(s) until it has something to write. Thus,
#() | sc existingfile
does not destroy existingfile. Note, however, that
sc existingfile #()
does destroy existingfile. This is because the first example sends nothing to Set-Content while the second example gives Set-Content something (an empty array). Since the output from Get-Content is collected into an (anonymous) array before -Replace is applied, there is no conflict between Get-Content and Set-Content over accessing the same file. The functionally equivalent version
gc somefile | foreach { $_ -replace ... -replace ... } | sc newfile
does not work if newfile is somefile since Set-Content receives each (possibly substituted) line from Get-Content before the next one is read meaning Set-Content can't open the file because Get-Content still has it open.

This is a separate answer because it doesn't explain how to achieve the desired result (already did that) but explains why the listed attempts do not work (an educational motive).
If you're using Replace-FileString.ps1 from GitHub then not only are the examples not a general solution, it won't work as listed above because Replace-FileString.ps1 uses the Replace method of a [regex] object so "400+20" matches "40" then 1 or more "0" then "20". Similarly for other attempts. Note, no "+" is matched in the patterns so all fail (unless you have lines like "40020+125" which matches on the 40020). Just as well, the replacement includes the capture group specifier "$0" (as part of '$1.00+$0.10') and other specifiers. There are no capture groups specified in the pattern so all the group specifiers would be taken literally, except "$0" being the entire match (if found). Thus, "40020+125" would be replaced by substituting '$4.00+$0.20' giving "$4.00+40020.20" ($4='$4' and $0='40020'). Probably, no matches are found. Result -> files not changed. (Phew!)
As for the Select-String attempt, Select-String would probably have matched the required data since the pattern matched up to 5 digits on either side of a +. This would send the matching lines (and ignored the rest, if any) into the ForEach-Object as [Microsoft.PowerShell.Commands.MatchInfo] objects (not strings). (Aside: this is a common mistake by a lot of PowerShell, um, novices. They assume that what they see on the screen is the same as what is churning about inside PowerShell. This is far from the truth and probably leads to most of the confusion amongst new users. PowerShell processes entire objects and typically displays only a summary of the most useful bits.) Anyway, I am unsure what the ForEach-Object is trying to achieve, not least due to the apparent typo. There is at least one " missing in the first script block and possibly a comma also. The best I can interpret it is
{ $_ -replace ", ",", $" }
i.e. change every ", " into ", $". This assumes that the strings to be substituted are all preceded by ", ". Note: lone $ is not an error because it cannot be interpreted as a variable substitution (no following name or {) or capture reference (no following group specifier [0-9`+'_&]). The next script block is clearer, change every "+" into "+$". Unfortunately, again, the first string is interpreted as a regular expression and, unlike the lone $, a lone + here is an error. It needs to be escaped with \. However, even with these errors corrected, there are two big problems:
The default output from Select-String is a collection of [MatchInfo] objects which when (implicitly) converted to String for use as the LHS of -replace include the file name and line number, thereby corrupting the lines from the file. To use just the line itself, specify $_.Line.
A completely incorrect usage of the scriptblock parameters to ForEach-Object. While it would seem that the intent was to perform two replace operations, placing them in individual scriptblocks is an error. Even if it worked, it would output 2 separate partial replacements instead of one completed replacement since $_ is not updated between the two expressions. ($_ is writable!)
ForEach-Object has 3 basic scriptblock groups, 1 -Begin block, 1 -End block and all the rest collectively as the -Process blocks. (The -Parallel block is not relevant here.) The documentation mentions a group called -RemainingScripts but this is actually just an implementation construct to allow the -Process scriptblocks to be specified as individual parameters rather than collected into an array (similar to parameter arrays in C# and VB). I suspect this was done so that users could simply drop the parameter names (-Begin, -Process and -End) and treat the scriptblocks as if they were positional parameters even though, strictly speaking, only -Process is positional and expects an array of scriptblocks (i.e. separated by commas). The introduction of -RemainingScripts in PS3.0 (with attribute ValueFromRemainingArguments so it behaves like a parameter array) was probably done to tidy up what might have been a nasty kludge to get the user friendly behaviour prior to PS3.0. Or maybe it was just formalising what was already going on.
Anyway, back on topic. By specifying multiple scriptblocks, the first is treated as -Begin and, if there are more than 2, the last is treated as -End. Thus, for two scriptblocks, the first is -Begin and the other is -Process. Therefore, even if the first scriptblock were syntactically correct, it would only run once and then still do nothing since $_ is not assigned (=$null) in -Begin. The correct way would be to place both replacements, joined into a single expression, in one scriptblock:
{ $_.Line -replace ", ",", $" -replace "\+","+$" }
Of course, this is just describing how to get it to "work". It is not the correct solution to the problem in the original post (see other answer).

Match content between IF code block

I'm trying to ensure some stored procedures should not have a RETURN statement, except for the last one. For this task I'm trying to use regular expresions on my PowerShell scripts.
My strategy is to check every IF ##ERROR<>0 block and perform another search inside of them. How can I match content of every IF block of this SQL query?
--Some code here...
IF ##ERROR<>0
BEGIN
RAISERROR('MY ERROR HERE. %s',1,16,#STORE_PROCEDURE_NAME)
GOTO ROLL
END
--More code here...
IF ##ERROR<>0
BEGIN
RAISERROR('ANOTHER ERROR HERE. %s',16,1,#STORE_PROCEDURE_NAME)
GOTO ROLL
END
--more code here...
IF ##ERROR<>0
BEGIN
RAISERROR('MAIN ERROR %s',16,1,#STORE_PROCEDURE_NAME)
ROLL:
ROLLBACK TRANSACTION
RETURN
END
COMMIT TRANSACTION

Use Select-String with a regular expression like this:
IF .*\s+BEGIN([\s\S]*?)END
and select just the groups from the result:
... | Select-String 'IF .*\s+BEGIN([\s\S]*?)END' -AllMatches |
Select-Object -Expand Matches |
Select-Object -Expand Groups |
Where-Object { -not $_.Groups } |
Select-Object -Expand Value
The regular expression matches the keyword IF followed by a space and optional other text on the same line, one or more whitespace character, the keyword BEGIN, and the shortest amount of text (non-greedy match) up to the next occurrence of the keyword END. The subexpression between BEGIN and END is grouped with parentheses so it can be extracted from the full match.

With the following Regex /(If ##ERROR<>0\s+BEGIN)(.*?)(end)/sig I can get the content between the IF and END. Please note that only matches the inmediate END, so actually is matching the END of BEGIN of some blocks.
DEMO CODE

Capturing group not working at end of -Pattern for Select-String

I've recently started working with regex in Powershell and have come across an unexpected response from the Select-String cmdlet.
If you enter something like the following:
$thing = "135" | Select-String -Pattern "(.*?)5"
$thing.Matches
You receive the expected result from the Match-Info object:
Groups : {135, 13}
Success : True
Captures : {135}
Index : 0
Length : 3
Value : 135
But if you place the capturing group at the end of the -Pattern:
$thing = "135" | Select-String -Pattern "(.*?)"
$thing.Matches
The Match-Info doesn't seem to find anything, although one is created:
Groups : {, }
Success : True
Captures : {}
Index : 0
Length : 0
Value :
As I said, I'm quite new to Powershell, so I expect this behavior is operator error.
But what is the work around? This behavior hasn't caused me problems yet, but considering the files I'm working with (electronic manuals contained in XML files), I expect it will eventually.
...
With regards,
Schwert
...
Clarification:
I made my example very simple to illustrate the behavior, but my original issue was with this pattern:
$linkname = $line | Select-String -Pattern "`"na`"><!--(?<linkname>.*?)"
The file is one of our indices for the links between manuals, and the name of the link is contained within a comment block located on each line of the file.
The pattern is actually a typo, as the name and the comment don't go all the way to the end of the line. I found it when the program began giving errors when it couldn't find "linkname" in the Match-Info object.
Once I gave it the characters which occur after the link name (::), then it worked correctly. Putting it into the example:
$linkname = $line | Select-String -Pattern "`"na`"><!--(?<linkname>.*?)::"

I'm no regex expert but I believe your pattern "(.*?)" is the problem. If you remove the ?, for example, you get the groups as expected.
Also, PLEASE don't use regex to parse XML. :) There's much easier ways to do that such as:
[xml]$Manual = Get-Content -Path C:\manual.xml
or
$xdoc = New-Object System.Xml.XmlDocument
$file = Resolve-Path C:\manual.xml
$xdoc.Load($file)
Once you've got it in a structured format you can then use dot notation or XPath to navigate the nodes and attributes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js