I don't understand why Select-String isn't matching this - regex

$lines = '<string>D:\home\bob\utility.mdb</string>'
[String[]]$varr = $lines | Select-String -AllMatches -Pattern "<string>*.mdb" |
Select-Object -ExpandProperty Matches |
Select-Object -ExpandProperty Value
This is returning null.
Ultimately I want the entire line, from the <string> to the </string>.
But apparently I don't know how to express this in powershell.

In your initial example:
<string>[\s]*.mdb[\s]*
the first [\s]* will not match anything. You perhaps intended:
<string>[\S]*.mdb[\s]*
But then, I think that the property Matches will take in the whole string from start to end, meaning you'll have to put everything, and the dot needs to be escaped since you can call it a wildcard:
<string>[\S]*\.mdb[\s]*<\/string>
And I think you can remove some unneeded parts (I'm not too familiar with powershell's regex, but I haven't seen any where you have to put character classes written like \S within square brackets):
<string>\S*\.mdb<\/string>
Little explanation:
\s matches a space character, and often also matches a newline, or a tab (\n and \t respectively).
\S will match everything that \s doesn't match.

Related

Replacing a Matched Character in Powershell

I have a text file of 3 name entries:
# dot_test.txt
001 AALTON, Alan .....25 Every Street
006 JOHNS, Jason .... 3 Steep Street
002 BROWN. James .... 101 Browns Road
My task is to find instances of NAME. when it should be NAME, using the following:
Select-String -AllMatches -Path $input_path -Pattern '(?s)[A-Z]{3}.*?\D(?=\s|$)' -CaseSensitive |
ForEach-Object { if($_.Matches.Value -match '\.$'){$_.Matches.Value -replace '\,$'} }
The output is:
BROWN.
The conclusion is this script block identifies the instance of NAME. but fails to make the replacement.
Any suggestions on how to achieve this would be appreciated.
$_.Matches.Value -replace '\,$'
This attempts to replace a , (which you needn't escape as \,) at the end of ($) your match with the empty string (due to the absence of a second, replacement operand), i.e. it would effectively remove a trailing ,.
However, given that your match contains no , and that you instead want to replace its trailing . with ,, use the following:
$_.Matches.Value -replace '\.$', ',' # -> 'BROWN,'
You can use -replace directly, and if you need to replace both a comma and dot at the end of the string, use [.,]$ regex:
Select-String -AllMatches -Path $input_path -Pattern '(?s)[A-Z]{3}.*?\D(?=\s|$)' -CaseSensitive | % {$_.Matches.Value -replace '\.$', ','}
Details:
(?s)[A-Z]{3}.*?\D(?=\s|$) - matches
(?s) - RegexOptions.Singleline mode on and . can match line breaks
[A-Z]{3} - three uppercase ASCII letters
.*? - any zero or more chars as few as possible
\D - any non-digit char
(?=\s|$) - a positive lookahead that matches a location either immediately followed with a whitespace or end of string.
The \.$ pattern matches a . at the end of string.

Regular Expressions in powershell split

I need to strip out a UNC fqdn name down to just the name or IP depending on the input.
My examples would be
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
I want to end up with just tom or 123.43.234.23
I have the following code in my array which is striping out the domain name perfect, but Im still left with \\tom
-Split '\.(?!\d)')[0]
Your regex succeeds in splitting off the tokens of interest in principle, but it doesn't account for the leading \\ in the input strings.
You can use regex alternation (|) to include the leading \\ at the start as an additional -split separator.
Given that matching a separator at the very start of the input creates an empty element with index 0, you then need to access index 1 to get the substring of interest.
In short: The regex passed to -split should be '^\\\\|\.(?!\d)' instead of '\.(?!\d)', and the index used to access the resulting array should be [1] instead of [0]:
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' |
ForEach-Object { ($_ -Split '^\\\\|\.(?!\d)')[1] }
The above yields:
tom
123.43.234.23
Alternatively, you could remove the leading \\ in a separate step, using -replace:
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' |
ForEach-Object { ($_ -Split '\.(?!\d)')[0] -replace '^\\\\' }
Yet another alternative is to use a single -replace operation, which does not require a ForEach-Object call (doesn't require explicit iteration):
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' -replace
'?(x) ^\\\\ (.+?) \.\D .+', '$1'
Inline option (?x) (IgnoreWhiteSpace) allows you to make regexes more readable with insignificant whitespace: any unescaped whitespace can be used for visual formatting.
^\\\\ matches the \\ (escaped with \) at the start (^) of each string.
(.+?) matches one or more characters lazily.
\.\D matches a literal . followed by something other than a digit (\d matches a digit, \D is the negation of that).
.+ matches one or more remaining characters, i.e., the rest of the input.
$1 as the replacement operand refers to what the 1st capture group ((...)) in the regex matched, and, given that the regex was designed to consume the entire string, replaces it with just that.
I'm stealing Lee_Daileys $InSTuff
but appending a RegEx I used recently
$InStuff = -split #'
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
'#
$InStuff |ForEach-Object {($_.Trim('\\') -split '\.(?!\d{1,3}(\.|$))')[0]}
Sample Output:
tom
123.43.234.23
As you can see here on RegEx101 the dots between the numbers are not matched
The Select-String function uses regex and populates a MatchInfo object with the matches (which can then be queried).
The regex "(\.?\d+)+|\w+" works for your particular example.
"\\tom.overflow.corp.com", "\\123.43.234.23.overflow.corp.com" |
Select-String "(\.?\d+)+|\w+" | % { $_.Matches.Value }
while this is NOT regex, it does work. [grin] i suspect that if you have a really large number of such items, then you will want a regex. they do tend to be faster than simple text operators.
this will get rid of the leading \\ and then replace the domain name with .
# fake reading in a text file
# in real life, use Get-Content
$InStuff = -split #'
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
'#
$DomainName = '.overflow.corp.com'
$InStuff.ForEach({
$_.TrimStart('\\').Replace($DomainName, '')
})
output ...
tom
123.43.234.23

Regex in Powershell fails to check for newlines

I'm trying to get the first block of releasenotes...
(See sample content in the code)
Whenever I use something simple it works, it only breaks when I try to
search across multiple lines (\n). I'm using (Get-Content $changelog | Out-String) because that gives back 1 string instead of an array from each line.
$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
(Get-Content $changelog | Out-String) | Select-String -Pattern $regex -AllMatches
<#
SAMPLE:
------
v1.0.23
- Adds an IContainer API.
- Bugfixes.
v1.0.22
- Hotfix: Language operators.
v1.0.21
- Support duplicate query parameters.
v1.0.20
- Splitting up the ICommand interface.
- Fixing the referrer header empty field value.
#>
The result I need is:
v1.0.23
- Adds an IContainer API.
- Bugfixes.
Update:
Using options..
$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
Get-Content -Path $changelog -Raw | Select-String -Pattern $regex -AllMatches
I also get nothing.. (no matter if I use \n or \r\n)
Unless you're stuck with PowerShell v2, it's simpler and more efficient to use Get-Content -Raw to read an entire file as a single string; besides, Out-String adds an extra newline to the string.[1]
Since you're only looking for the first match, you can use the -match operator - no need for Select-String's -AllMatches switch.
Note: While you could use Select-String without it, it is more efficient to use the -match operator, given that you've read the entire file into memory already.
Regex matching is by default always case-insensitive in PowerShell, consistent with PowerShell's overall case-insensitivity.
Thus, the following returns the first block, if any:
if ((Get-Content -Raw $changelog) -match '(?m)^v\d+\.\d+\.\d+.*(\r?\n-\s?.*)+') {
# Match found - output it.
$Matches[0]
}
* (?m) turns on inline regex option m (multi-line), which causes anchors ^ and $ to match the beginning and end of individual lines rather than the overall string's.
\r?\n matches both CRLF and LF-only newlines.
You could make the regex slightly more efficient by making the (...) subexpression non-capturing, given that you're not interested in what it captured: (?:...).
Note that -match itself returns a Boolean (with a scalar LHS), but information about the match is recorded in the automatic $Matches hashtable variables, whose 0 entry contains the overall match.
As for what you tried:
'([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
doesn't work, because by default $ only matches at the very end of the input string, at the end of the last line (though possibly before a final newline).
To make $ to match the end of each line, you'd have to turn on the multiline regex option (which you did in your 2nd attempt).
As a result, nothing matches.
'(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
doesn't work as intended, because by using option s (single-line) you've made . match newlines too, so that a greedy subexpression such as .* will match the remainder of the string, across lines.
As a result, everything from the first block on matches.
[1] This problematic behavior is discussed in GitHub issue #14444.

Replacing part of a matched regular expression

I'm trying to parse some very badly delimited files and ultimately change them to a CSV. I'm looking to match on number, whitespace, and then letter, and replace the whitespace with a comma.
For example If I had the line '08:34:45 home' I'd like it to recognize the '5 h' and make it '08:34:45,home'. I understand why what I have below isn't working correctly, but can someone explain how to tell Powershell that I want to keep the \d and the \D?
Get-Content -path C:\file.txt |
ForEach-Object {$_ -replace "(\d\s\D)",','}
What you put in parentheses ( ) is captured by the regular expression, and you can access it with $1 for the first set, $2 for the second set, etc.
so you could try two capture groups:
$_ -replace '(\d)\s(\D)', '$1,$2'
See also: http://www.powershelladmin.com/wiki/Powershell_regular_expressions#Example_-_Replace_With_Captures for more details.

Trying to match this using regular expressions in PowerShell

I am trying to use regular expressions to match certain lines in a file, but I am having some trouble.
The file contains text like this:
Mario, 123456789
Luigi, 234-567-890
Nancy, 345 5666 77533
Bowser, 348759823745908732589
Peach, 534785
Daisy, 123-456-7890
I'm trying to match just the numbers as either XXX-XXX-XXX or XXX XXX XXX pattern.
I've tried a few different ways, but it always expects something I don't want it to or it tell me everything is false.
I'm using PowerShell to do this.
At first I tried:
{$match = $i -match "\d{3}\-\d{3}\-\d{3}|\d{3}\ \d{3}\ \d{3}"
Write-Host $match}
But when I do that it matches the long strong of numbers and XXX-XXX-XXXXX.
I read something saying that n would match the exact quantity, so I tried that...
{$match = $i -match "\d{n3}\-\d{n3}\-\d{n3}|\d{n3}\ \d{n3}\ \{n3}"
Write-Host $match}
That made everything false...
So I tried
{$match = $i -match "\d\n{3}\-\d\n{3}\-\d\n{3}|\d\n{3}\ \d\n{3}\ \d\n{3}"
I also tried the lazy quantifier, ?:
{$match = $i -match "\d{3?}\-\d{3?}\-\d{3?}|\d{3?}\ \{3?}\ \{3?}"
Write-Host $match}
Still false...
The final thing I tried was this...
{$match = $i -match "\d[0-9\{3\}\-\d[0-9]\{3\}\-\d[0-9]{3\}|\d[0-9]\{3\}\ \d[0-9]\{3}\ \d[0-9]\{3\}"<br>
Write-Host $match}
Still no luck...
The following pattern gives two matches:
Get-Content .\test.txt | Where-Object {$_ -match '\d{3}[-|\s]\d{3}[-|\s]\d{3}'}
Luigi, 234-567-890
Daisy,
123-456-7890
If you want to exclude the last match, add the '$' anchor (represents the end of the string:
Get-Content .\test.txt | Where-Object {$_ -match '\d{3}[-|\s]\d{3}[-|\s]\d{3}$'}
Luigi, 234-567-890
If you want to be very specific and match lines from start to end (use the ^ anchor, denotes the start of the string):
Get-Content .\test.txt | Where-Object {$_ -match '^\w+,\s+\d{3}[-|\s]\d{3}[-|\s]\d{3}$'}
Luigi, 234-567-890
Your first answer is the closest. The {3} matches exactly 3 characters. I think the n you saw was supposed to represent any number, not an actual n character. The reason it matches the long strings is that you only specified that the match must find 3 digits, dash or space, 3 digits, dash or space, then 3 more digits. You did not specify that it doesn't count if there are more digits after that.
To not match when there is a number after, you can use a negative lookahead.
(\d{3}-\d{3}-\d{3}|\d{3}\ \d{3}\ \d{3})(?!\d)
Alternatively, if you want to only match at the end of the line, possibly with trailing space
(\d{3}-\d{3}-\d{3}|\d{3}\ \d{3}\ \d{3})\s*$
As Gideon said, your first is the best place to start.
"\b\d{3}\-\d{3}\-\d{3}\b|\b\d{3}\ \d{3}\ \d{3}\b"
The \b special character added before and after each statement is a word boundary - basically a space or newline or punctuation like a period or comma. This ensures that 9999 doesn't match, but 999. does.
Try this:
/(\d+[- ])+\d+/
It's better not to have so rigid regular expressions, unless you are absolutely sure there that your input will not change.
So this regex matches at least a digit, then greedily searches for more digits followed by a space or a dash. This is also repeated as much as possible then followed by at least another digit.
When manipulating data in PowerShell, it usually is a good idea to create objects representing the data (after all, PowerShell is all about objects). Filtering based on object properties is usually easier and more robust. Your problem is a good example.
Here is what we are after:
the persons: $persons
where: where
the number of that person: $_.number
matches: -match
the pattern
starting with three digits: ^\d{3}
followed by three digits between dashes or spaces: (-\d{3}-|\ \d{3}\ )
ending on three digits: \d{3}$
Below is the entire script:
$persons = import-csv -Header "name", "number" -delimiter "," data.csv
$persons | where {$_.number -match "^\d{3}(\-\d{3}\-|\ \d{3}\ )\d{3}$"}
You can also use Select-String:
Select-String '(\d{3}[ -]){2}\d{3}$' .\file.txt | % {$_.Line}