The following code returns only one match.
$s = 'x.a,
x.b,
x.c
'
$s -match 'x\.(.*?)[,$]'
$Matches.Count # return 2
$Matches[1] # returns a only
Excepted to return a, b, c.
The -match operator only finds the first match. The -AllMatches with Select-String will fetch all matches in the input. Also, [,$] matches a , or $ literal chars, the $ is not a string/line end metacharacter.
A possible solution may look like
Select-String 'x\.([^,]+)' -input $s -AllMatches | Foreach {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
The pattern is x\.([^,]+), it matches x. and then captures into Group 1 any one or more chars other than ,.
Related
I would like to conditionally replace a character sequence from strings in a tab delimited file.
In the example below, I want to replace 'apple' with 'orange' when the character sequence starts with 'DEF'. 'xxx' can be any characters or any length (but unlikely to be 'DEF' or apple').
ie:
xxxDEFxapplexxx<tab>DEFxxxapplexxx<tab>xxxDEFxxxapplexxx
to:
xxxDEFxxxapplexxx<tab>DEFxxxorangexxx<tab>xxxDEFxxxapplexxx
Powershell script:
$fileName = "tabfile.txt"
(Get-Content -Path $fileName -Encoding UTF8) |
Foreach-Object { if ($_ -match "^DEF") { $_ -replace "apple", "orange"} else { $_ } } |
Set-Content -Path $fileName
It works fine when each string is separated by a new line (rather than a tab).
Output:
xxxDEFxxxapplexxx
DEFxxxorangexxx
xxxDEFxxxapplexxx
but doesn't work when the strings are separated by tabs (or spaces):
Output:
xxxDEFxxxapplexxx<tab>DEFxxxapplexxx<tab>xxxDEFxxxapplexxx
Thanks.
With help from the comments by iRon and Thomas, I figured out something that works:
Split the string at the tabs to create an array:
Get-Content with -Delimiter "`t" parameter.
Conditional match and replace text on each element:
Foreach-Object { if ($_ -match "^DEF") { $_ -replace "apple", "orange"} else { $_ } } |
Recreate original string by joining each element of the array with a tab character:
Join-String -Separator "`t"
Complete code:
Get-Content -Path "tabfile.txt" -Delimiter "`t"|
Foreach-Object { if ($_ -match "^DEF") { $_ -replace "apple", "orange"} else { $_ } } |
Join-String -Separator "`t"|
Out-File "tabfile.txt"
You do not need any conditional logic here because -replace does it for you implicitly: if there is no match, the string input is returned as is.
The regex you can use is
(?<=(?:^|\t)DEF_)apple
See the regex demo. Add \b word boundary if apple should not be followed with _, letter or digit, or add (?![^\W_]) if it cannot be followed with a digit or letter, but can be followed with _.
Details:
(?<=(?:^|\t)DEF_) - a positive lookbehind that matches a location that is immediately preceded with start of string (^) or (|) a tab (\t) and DEF_
apple - an apple string.
In Powershell, you could use
(Get-Content -Path $fileName -Encoding UTF8) -replace "(?<=(?:^|\t)DEF_)apple", "orange" | Set-Content -Path $fileName
I have a script that I am working on to parse each line in the log. My issue is the regex I use matches from src= until space.
I only want the ip address not the src= part. But I do still need to match from src= up to space but in the result only store digits. Below is what I use but it sucks really badly. So any help would be helpful
#example text
$destination=“src=192.168.96.112 dst=192.168.5.22”
$destination -match 'src=[^\s]+'
$result = $matches.Values
#turn it into string since trim doesn’t work
$result=echo $result
$result=$result.trim(“src=”)
You can use a lookbehind here, and since -match only returns the first match, you will be able to access the matched value using $matches[0]:
$destination -match '(?<=src=)\S+' | Out-Null
$matches[0]
# => 192.168.96.112
See the .NET regex demo.
(?<=src=) - matches a location immediately preceded with src=
\S+ - one or more non-whitespace chars.
To extract all these values, use
Select-String '(?<=src=)\S+' -input $destination -AllMatches | Foreach {$_.Matches} | Foreach-Object {$_.Value}
or
Select-String '(?<=src=)\S+' -input $destination -AllMatches | % {$_.Matches} | % {$_.Value}
Another way could be using a capturing group:
src=(\S+)
Regex demo | Powershell demo
For example
$destination=“src=192.168.96.112 dst=192.168.5.22”
$pattern = 'src=(\S+)'
Select-String $pattern -input $destination -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value}
Output
192.168.96.112
Or a bit more specific matching the dot and the digits (or see this page for an even more specific match for an ip number)
src=(\d{1,3}(?:\.\d{1,3}){3})
I have a file like:
abc WANT THIS
def NOT THIS
ghijk WANT THIS
lmno DO NOT LIKE
pqr WANT THIS
...
From which I want to extract:
abc
ghijk
pqr
When I apply the following:
(Select-String -Path $infile -Pattern "([^ ]+) WANT THIS").Matches.Groups[1].Value >$outfile
It only returns the match for the first line:
abc
(adding -AllMatches did not change the behaviour)
You may use
Select-String -path $infile -Pattern '^\s*(\S+) WANT THIS' -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {$_.Groups[1].Value} > $outfile
The ^\s*(\S+) WANT THIS pattern will match
^ - start of a line
\s* - 0+ whitespaces
(\S+) - Group 1: one or more non-whitespace chars
WANT THIS - a literal substring.
Now, -AllMatches will collect all matches, then, you need to iterate over all matches with Foreach-Object {$_.Matches} and access Group 1 values (with Foreach-Object {$_.Groups[1].Value}), and then save the results to the output file.
Re-reading the code, its matching them all, but only writing the value of the first match (doh!):
(Select-String -Path $scriptfile -Pattern "([^ ]+) WANT THIS").Matches.Groups.Value >$tmpfile
OTOH, it appears that the "captures" in the pattern output object also contain the "non-captured" content!!!!
I am trying to identify errors in a log file. The application uses five uppercase letters followed by three digits followed by 'E' as an error code. The error code is followed by a non-word character. I was identifying cases with:
$errors=Select-string -Path "logfile.txt" -Pattern "[A-Z]{5}[0-9]{3}E\W"
However the remainder of the content now includes
ab1bea8a-a00e-4211-b1db-2facecfd725e.
Which is being matched by the regex and flagged as an error. I changed the regex to
\p{Lu}{5}[0-9]{3}E\W
(which I expected to match five upper case characters), but why does it still match the non-error lower case pattern?
The "case-insensitive" regex flag is set by Select-String, which makes \p{Lu} case-insensitive, just as it does with [A-Z].
Try adding the -CaseSensitive parameter to the command.
You can confirm this by running some .NET code, for example in LINQPad:
(new Regex(#"\p{Lu}", RegexOptions.IgnoreCase)).IsMatch("a")
PowerShell regular expression matching is case-insensitive by default. There are several ways for making matches case-sensitive, though.
Add the -CaseSensitive switch when using the Select-String cmdlet:
-CaseSensitive
Makes matches case-sensitive. By default, matches are not case-sensitive.
C:\> 'abc' | Select-String -Pattern 'A'
abc
C:\> 'ABC' | Select-String -Pattern 'A'
ABC
C:\> 'abc' | Select-String -Pattern 'A' -CaseSensitive # ← no match here
C:\> 'ABC' | Select-String -Pattern 'A' -CaseSensitive
ABC
Use the case-sensitive version of the regular expression matching operators:
By default, all comparison operators are case-insensitive. To make a comparison operator case-sensitive, precede the operator name with a c. For example, the case-sensitive version of -eq is -ceq. To make the case-insensitivity explicit, precede the operator with an i. For example, the explicitly case-insensitive version of -eq is -ieq.
C:\> 'abc' -match 'A'
True
C:\> 'ABC' -match 'A'
True
C:\> 'abc' -cmatch 'A' # ← no match here
False
C:\> 'ABC' -cmatch 'A'
True
Force a case-sensitive match by adding a miscellaneous construct ((?...), not to be confused with non-capturing groups (?:...)) with the inverted "case-insensitive" regex option to the regular expression (this works with both Select-String cmdlet and -match operator):
C:\> 'abc' | Select-String -Pattern '(?-i)A' # ← no match here
C:\> 'ABC' | Select-String -Pattern '(?-i)A'
ABC
C:\> 'abc' -match '(?-i)A' # ← no match here
False
C:\> 'ABC' -match '(?-i)A'
True
I want to replace some text in every script file in folder, and I'm trying to use this PS code:
$pattern = '(FROM [a-zA-Z0-9_.]{1,100})(?<replacement_place>[a-zA-Z0-9_.]{1,7})'
Get-ChildItem -Path 'D:\Scripts' -Recurse -Include *.sql | ForEach-Object { (Get-Content $_.fullname) -replace $pattern, 'replace text' | Set-Content $_.fullname }
But I have no idea how to keep first part of expression, and just replace the second one. Any idea how can I do this? Thanks.
Not sure that provided regex for tables names is correct, but anyway you could replace with captures using variables $1, $2 and so on, and following syntax: 'Doe, John' -ireplace '(\w+), (\w+)', '$2 $1'
Note that the replacement pattern either needs to be in single quotes ('') or have the $ signs of the replacement group specifiers escaped ("`$2 `$1").
# may better replace with $pattern = '(FROM) (?<replacement_place>[a-zA-Z0-9_.]{1,7})'
$pattern = '(FROM [a-zA-Z0-9_.]{1,100})(?<replacement_place>[a-zA-Z0-9_.]{1,7})'
Get-ChildItem -Path 'D:\Scripts' -Recurse -Include *.sql | % `
{
(Get-Content $_.fullname) | % `
{ $_-replace $pattern, '$1 replace text' } |
Set-Content $_.fullname -Force
}
If you need to reference other variables in your replacement expression (as you may), you can use a double-quoted string and escape the capture dollars with a backtick
{ $_-replace $pattern, "`$1 replacement text with $somePoshVariable" } |