Select-string regex

Select-string regex - regex

I'm searching a ton of logs using a foreach loop for a string ($text) and currently outputting the entire line to an output file ($logfile)
Get-ChildItem "\\$server\$Path" -Filter "*.log" |select-string -pattern $text |select -expandproperty line |out-file $logfile -append
A sample line of one of the log files might look like this
May 25 04:08:36.640 2016 AUDITOF GUID 1312.2657.11075.54819.13021094807.198 opened by USER
where $text = "opened by USER"
All of this works fine and it spits out every line of every log file that includes $text which is great.
But.. what I think I'd like to do is get an output of the date time and the GUID. The Guid can change formats, lengths, etc., but it will always have dots and will always follow GUID (space) and precede (space) opened
In short, I'm trying to regex using a lookbehind (or lookforward) or match that would return something like this to the $logfile
May 25 04:08:36.640 2016,1312.2657.11075.54819.13021094807.198
Any help appreciated. I'm lousy with Regex.

One way would be to do this
$result = Get-ChildItem "\\$server\$Path" -Filter "*.log" -File |
Select-String -Pattern $text -SimpleMatch |
Select-Object -ExpandProperty Line |
ForEach-Object {
if ($_ -match '([a-z]{3,}\s*\d{2}\s*\d{2}:\d{2}:\d{2}\.\d{3}\s*\d{4}).*GUID ([\d.]+)') {
'{0},{1}' -f $matches[1], $matches[2]
}
}
$result | Out-File $logfile -Append
Explanation:
I added switch -SimpleMatch to the Select-String cmdlet, because it seems you want to match $text exactly and since it doesn't use regex there, this would be the best option.
Select-Object -ExpandProperty Line could return an array of matching lines, so I pipe this to ForEach-Object to loop though that
The if (..) uses the regex -match and if that condition is $true, we do whatever is inside the curly brackets.
Also, this test (if $true) automatically sets up an array of $matches objects and we use those matches to output a comma separated line, which is then collected in variable $result.
Finally we simply output that $result to a file
Regex details:
( Match the regular expression below and capture its match into backreference number 1
[a-z] Match a single character in the range between “a” and “z”
{3,} Between 3 and unlimited times, as many times as possible, giving back as needed (greedy)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d Match a single digit 0..9
{2} Exactly 2 times
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d Match a single digit 0..9
{2} Exactly 2 times
: Match the character “:” literally
\d Match a single digit 0..9
{2} Exactly 2 times
: Match the character “:” literally
\d Match a single digit 0..9
{2} Exactly 2 times
\. Match the character “.” literally
\d Match a single digit 0..9
{3} Exactly 3 times
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d Match a single digit 0..9
{4} Exactly 4 times
)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
GUID\ Match the characters “GUID ” literally
( Match the regular expression below and capture its match into backreference number 2
[\d.] Match a single character present in the list below
A single digit 0..9
The character “.”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)

Related

How to replace only a portion of string using PowerShell Get-Content -replace

I have a file in which I need to modify a URL, not knowing what that URL contains, just like in the following example:
In file file.txt I have to replace the URL, so that wether it is "https://SomeDomain/Release/SomeText" or "https://SomeDomain/Staging/SomeText" to "https://SomeDomain/Deploy/SomeText". So like, whatever is written between SomeDomain and SomeText, it should be replaced with a known String. Are there any regex that can help me achieve this?
I used to do it with the following command"
((Get-Content -path "file.txt" -Raw) -replace '"https://SomeDomain/Release/SomeText");','"https://SomeDomain/Staging/SomeText");') | Set-Content -Path "file.txt"
This works fine, but I have to know if in file.txt the URL contains Release or Staging before executing the command.
Thanks!

You can do this with a regex -replace, where you capture the parts you wish to keep and use the backreferences to recreate the new string
$fileName = 'Path\To\The\File.txt'
$newText = 'BLAHBLAH'
# read the file as single multilined string
(Get-Content -Path $fileName -Raw) -replace '(https?://\w+/)[^/]+(/.*)', "`$1$newText`$2" | Set-Content -Path $fileName
Regex details:
( Match the regular expression below and capture its match into backreference number 1
http Match the characters “http” literally
s Match the character “s” literally
? Between zero and one times, as many times as possible, giving back as needed (greedy)
:// Match the characters “://” literally
\w Match a single character that is a “word character” (letters, digits, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
/ Match the character “/” literally
)
[^/] Match any character that is NOT a “/”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 2
/ Match the character “/” literally
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)

Remove all following characters after the second occurence of a string in a filename with Powershell

In my music library I have filenames like this:
Artist - Song (feat. OtherArtist) (feat. OtherArtist).mp4
Artist - Song (feat. OtherArtist) (Radio Edit) (feat. OtherArtist).mp4
What I want is to remove the duplicate feature mention at the end. This is what I came up with so far:
Get-ChildItem -Path "path" -Recurse -Filter *feat*feat* | ForEach-Object { $_ | Rename-Item -NewName $_.Name.SubString(0,$_.Name.Length -10) }
This gets all the files with dublicate features and then just removes the last 10 characters (including the file extension unfortunately), which cleary wont work if the song features multiple artist or even artist with longer names.
I think I need regular expressions for this, but I'm still at most a beginner in using Powershell, so I would be really thankful for some help.

RegEx can indeed do what you want. You just need to do something very similar to what you have in your first filter. Here's the magic string:
(.*feat.*?)(\s*\(?feat.*\)\s*)(\..+)
You can use it like this (skipping the ForEach loop entirely):
Get-ChildItem -Path .\* -Recurse -Filter *feat*feat* | Rename-Item -NewName {$_.Name -replace '(.*feat.*?)(\s*\(?feat.*\)\s*)(\.[^\.]+)','$1$3'}
And here's how that string breaks down, and what all it does:
(.*feat.*)(\s\(feat.*\)\s*)(\.[^\.]+)
1st Capturing Group (.feat.)
. matches any character (except for line terminators)
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
feat matches the characters feat literally (case sensitive)
. matches any character (except for line terminators)
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
2nd Capturing Group (\s(feat.)\s)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( matches the character ( literally (case sensitive)
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
feat matches the characters feat literally (case sensitive)
. matches any character (except for line terminators)
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
) matches the character ) literally (case sensitive)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
3rd Capturing Group (\.[^\.]+)
\. matches the character . literally (case sensitive)
Match a single character not present in the list below [^\.]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
See it work here (where I got the string break down from, but much better formatted): https://regex101.com/r/FWlNLi/1

RegEx targeted replace with Named Captures

Given
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
I can use
$line -match '^\{(?<type>[a-z]+)(-\[(?<target>(C|F|CF))\])?(\[(?<tab>\d+)\])?\}_(?<string>.*)'
And $matches['tab'] will correctly have a value of 3. However, if I then want to increment that value, without also affecting the [3] in the string section things get more complicated. I can use $tabIndex = $line.indexOf("[$tab]") to get the index of the first occurrence, and I can also use $newLine = ([regex]"\[$tab\]").Replace($line, '[4]', 1) to only replace the first occurrence. But I wonder, is there a way to get at the this more directly? It's not strictly necessary, as I will only ever want to replace things within the initial {}_, which has a very consistent form, so replacing first instance works, just wondering if I am missing out on a more elegant solution, which also might be needed in a different situation.

I would change the regex a bit, because mixing Named captures with Numbered captures is not recommended, so it becomes this:
'^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)'
You could then use it like below to replace the tab value:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$newTabValue = 12345
$line -replace '^\{(?<type>[a-z]+)(?:-\[(?<target>[CF]{1,2})\])?(?:\[(?<tab>\d+)\])?\}_(?<string>.*)', "{`${type}-[`${target}][$newTabValue]}_`${string}"
The result of this will be:
{initError-[cf][12345]}_Invalid nodes(s): [3]
Regex details:
^ Assert position at the beginning of the string
\{ Match the character “{” literally
(?<type> Match the regular expression below and capture its match into backreference with name “type”
[a-z] Match a single character in the range between “a” and “z”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?: Match the regular expression below
- Match the character “-” literally
\[ Match the character “[” literally
(?<target> Match the regular expression below and capture its match into backreference with name “target”
[CF] Match a single character present in the list “CF”
{1,2} Between one and 2 times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
\[ Match the character “[” literally
(?<tab> Match the regular expression below and capture its match into backreference with name “tab”
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
\} Match the character “}” literally
_ Match the character “_” literally
(?<string> Match the regular expression below and capture its match into backreference with name “string”
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)

An alternative way of increasing the first number in the brackets is using the -Split operator to access the number you want to change:
$line = '{initError-[cf][3]}_Invalid nodes(s): [3]'
$NewLine = $line -split "(\d+)"
$NewLine[1] = [int]$newLine[1] + 1
-join $NewLine
Output:
{initError-[cf][4]}_Invalid nodes(s): [3]

RegEx after certain string

I have a manifest file
Bundle-ManifestVersion: 2
Bundle-Name: BundleSample
Bundle-Version: 4
I want to change the value of Bundle-Name using -replace in Powershell.
I used this pattern Bundle-Name:(.*)
But it returns including the Bundle-Name. What would be the pattern if I want to change only the value of the Bundle-Name?

You could capture both the Bundle-Name: and its value in two separate capture groups.
Then replace like this:
$manifest = #"
Bundle-ManifestVersion: 2
Bundle-Name: BundleSample
Bundle-Version: 4
"#
$newBundleName = 'BundleTest'
$manifest -replace '(Bundle-Name:\s*)(.*)', ('$1{0}' -f $newBundleName)
# or
# $manifest -replace '(Bundle-Name:\s*)(.*)', "`$1$newBundleName"
The above will result in
Bundle-ManifestVersion: 2
Bundle-Name: BundleTest
Bundle-Version: 4
Regex details:
( Match the regex below and capture its match into backreference number 1
Bundle-Name: Match the character string “Bundle-Name:” literally (case sensitive)
\s Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
( Match the regex below and capture its match into backreference number 2
. Match any single character that is NOT a line break character (line feed)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
Thanks to LotPings, there is even an easier regex that can be used:
$manifest -replace '(?<=Bundle-Name:\s*).*', $newBundleName
This uses a positive lookbehind.
The regex details for that are:
(?<= Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
Bundle-Name: Match the characters “Bundle-Name:” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)

Powershell adding CR at the end of regex match group

I'm gettting a CR between the regex match and the ','. What's going on?
$r_date ='ExposeDateTime=([\w /:]{18,23})'
$v2 = (Select-String -InputObject $_ -Pattern $r_date | ForEach-Object {$_.Matches.Groups[1].Value}) + ',';
Example of output:
9/25/2018 8:45:19 AM[CR],
Original String:
ExposeDateTime=9/25/2018 8:45:19 AM
Error=Dap
PostKvp=106
PostMa=400
PostTime=7.2
PostMas=2.88
PostDap=0

Try this:
$original = #"
ExposeDateTime=9/25/2018 8:45:19 AM
Error=Dap
PostKvp=106
PostMa=400
PostTime=7.2
PostMas=2.88
PostDap=0
"#
$r_date ='ExposeDateTime=([\d\s/:]+(?:(?:A|P)M)?)'
$v2 = (Select-String -InputObject $original -Pattern $r_date | ForEach-Object {$_.Matches.Groups[1].Value}) -join ','
Regex details:
ExposeDateTime= Match the characters “ExposeDateTime=” literally
( Match the regular expression below and capture its match into backreference number 1
[\d\s/:] Match a single character present in the list below
A single digit 0..9
A whitespace character (spaces, tabs, line breaks, etc.)
One of the characters “/:”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
(?: Match the regular expression below
Match either the regular expression below (attempting the next alternative only if this one fails)
A Match the character “A” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
P Match the character “P” literally
)
M Match the character “M” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)

if your input is a multiline string stored in $Original, then this rather simpler regex seems to do the job. [grin] it uses a named capture group and the multiline regex flag to capture the string after ExposedDateTime= and before the next line ending.
$Original -match '(?m)ExposeDateTime=(?<Date>.+)$'
$Matches.Date
output ...
9/25/2018 8:45:19 AM

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Select-string regex - regex

Related

How to replace only a portion of string using PowerShell Get-Content -replace

Remove all following characters after the second occurence of a string in a filename with Powershell

RegEx targeted replace with Named Captures

RegEx after certain string

Powershell adding CR at the end of regex match group

Categories

Resources