Powershell Drop the last part of a string with multiple "." - regex

I'm trying to do a regex expression in powershell to get only a specific part of a string. I know a way I can do this without regex but it can definitely be more efficient with. I have a string that looks like this:
Some/Stuff/Here/Then.drop.last
Ideally, I want to write a regex that gets me just:
Then.Drop

PS> 'Some/Stuff/Here/Then.drop.last' -replace '.*/(.+)\..*', '$1'
Then.drop
.*/ greedily matches everything up to the last /
(.+)\. greedily matches everything up to the last literal . and captures everything before that . in the first capture group ($1) - which is your string of interest.
.* matches the remaining part of the string.
Using $1 as the replacement string then replaces the overall match - the entire input string - with what the first capture group matched.
For more information about PowerShell's -replace operator, see this answer.

Related

Powershell regex for string between two special characters

A file name as below
$inpFiledev = "abc_XYZ.bak"
I need only XYZ in a variable to do a compare with other file name.
i tried below:
[String]$findev = [regex]::match($inpFiledev ,'_*.').Value
Write-Host $findev
Asterisks in regex don't behave in the same way as they do in filesystem listing commands. As it stands your regex is looking for underscore, repeated zero or more times, followed by any character (represented in regex by a period). So the regex finds zero underscores right at the start of the string, then it finds 'a', and that's the match it returns.
First, correct that bit:
'_*.'
Becomes "underscore, followed by any number of characters, followed by a literal period". The 'literal period' means we need to escape the period in the regex, by using \., remembering that period means any character:
'_.*\.'
_ underscore
.* any number of characters
\. a literal period
That returns:
_XYZ.
So, not far off.
If you're looking to return something from between characters, you'll need to use capturing groups. Put parentheses around the bit you want to keep:
'_(.*)\.'
Then you'll need to use PowerShell regex groups to get the value:
[regex]::match($inpFiledev ,'_(.*)\.').Groups[1].Value
Which returns: XYZ
The number 1 in the Groups[1] just means the first capturing group, you can add as many as you like to the expression by using more parentheses, but you only need one in this case.
To complement mjsqu's helpful answer with two PowerShell-idiomatic alternatives:
For an overview of how regexes (regular expressions) are used in PowerShell, see Get-Help about_regular_expressions.
Using -split to split by _ and ., extracting the resulting 3-element array's middle element:
PS> ("abc_XYZ.bak" -split '[_.]')[1]
XYZ
-split's (first) RHS operand is a regex; regex [_.] is a character set ([...]) that matches a single char. that is either a literal _ or a literal . Therefore, input abc_XYZ.bak is broken into an array containing the strings abc, XYZ, and bak. Applying index [1] therefore extracts the middle token, XYZ.
Using -replace to extract the token of interest via a capture group ((...), referred to in the replacement operand as $1):
PS> "abc_XYZ.bak" -replace '^.+_([^.]+).+$', '$1'
XYZ
-replace too operates on a regex as the first RHS operand - what to replace - whereas the second operand specifies what to replace the matched (sub)string with.
Regex ^.+_([^.]+).+$:
^.+_ matches one or more (+) characters (.) at the start of the input (^) - note how . - used outside of a character set ([...]) - is a regex metacharacter that represents any character (in a single-line input string).
([^.]+) is a capture group ((...)) that matches a negated character set ([^...]): [^.] matches any literal char. that isn't a literal ., one or more times (+).
Whatever matched the sub-expression inside (...) can be referenced in the replacement operand as $<n>, where <n> represents the 1-based index of the capture group in the regex; in this case, $1 can be used to refer to this first (and only) capture group.
.+$ matches one or more (+) remaining characters (.) until the end of the input is reached ($).
Replacement operand $1 simply refers to what the first capture group matched; in this case: XYZ.
For a comprehensive overview of the syntax of -replace replacement operands, see this answer.
Because you're using the [regex] accelerator, you need the backslash to escape your end . (if you want to match it), and you need a dot before your asterix to match any characters after your underscore. If the characters in between are all letters, then use \w+
$findev = [regex]::match($inpFiledev ,'_.*\.')
$findev
_XYZ.
this demos two other ways to get the desired info from the sample string. the 1st uses the basic .Split() string method on the raw string. the 2nd presumes you are dealing with file objects and starts off by getting the .BaseName for the file. that already removes the extension, so you need not bother doing it yourself.
if you are dealing with a large number of strings, and not file objects, then the previous regex answers will likely be faster. [grin]
$inpFiledev = 'abc_XYZ.bak'
$findev = $inpFiledev.Split('.')[0].Split('_')[-1]
# fake reading in a file with Get-Item or Get-ChildItem
$File = [System.IO.FileInfo]'c:\temp\testing\abc_XYZ.bak'
$WantedPart = $File.BaseName.Split('_')[-1]
'split on a string = {0}' -f $findev
'split on BaseName of file = {0}' -f $WantedPart
output ...
split on a string = XYZ
split on BaseName of file = XYZ

Match exactly two backslashes

I'm trying to match exactly two \ characters (first ones encountered going from the left) in a string via Powershell regexp -replace command, to replace them with /. Doing \\{2} doesn't work, as it only matches two backslashes together.. I've tried \\.+?\\, but that matches the whole substring between them.
I'm new to regexp, and nothing I found on various sites has helped me with this issue. And I know I can do that with a for loop that runs twice, but I'd first like to know if it could be done with regexp better.
EDIT: I'm looking to do something like this:
IN: \aaa\bbb(d\c)
OUT: /aaa/bbb(d\c)
You may use
$s -replace '\\([^\\]+)\\','/$1/'
Here, \\([^\\]+)\\ matches a \, then matches and captures any 1+ chars other than \ into Group 1 (later access with $1 from the replacement pattern) and then matches \, and replaces the match with /, the value in Group 1 and /.
To only replace the first occurrence, you may use
$s -replace '(?s)\\([^\\]+)\\(.*)','/$1/$2'
where the trailing (.*) will capture the rest of the string (if any) into Group 2 and the $2 replacement backreference will paste that part of the string back into the result. (?s) will allow . to match line break chars that it does not match by default.

regex preserve whitespace in replace

Using REGEX (in PowerShell) I would like to find a pattern in a text file that is over two lines and replace it with new text and preserve the whitespace. Example text:
ObjectType=Page
ObjectID=70000
My match string is
RunObjectType=Page;\s+RunObjectID=70000
The result I want is
ObjectType=Page
ObjectID=88888
The problem is my replacement string
RunObjectType=Page;`n+RunObjectID=88888
returns
ObjectType=Page
ObjectID=88888
And I need it to keep the original spacing. To complicate matters the amount of spacing may change.
Suggestions?
Leverage a capturing group and a backreference to that group in the replacement pattern:
$s -replace 'RunObjectType=Page;(\s+)RunObjectID=70000', 'RunObjectType=Page;$1RunObjectID=88888'
See the regex demo
With the (\s+), you capture all the whitespaces into the Group 1 buffer and then, using $1 backreference, the value is inserted into the result.

PCRE Regular expression : only one matching

I want to catch strings which respond to a pattern in a subject string.
Patterns examples: ##name##, ##address##, ##bankAccount##, ...
Subject example: This is the template with patterns : ##name##Your bank account is : ##bankAccount##Your address is : ##address##
With the following regex: .*(#{2}[a-zA-Z]*#{2}).*, only the last pattern is matched.
How to capture all the patterns, not just the last or first ?
Now that I've formatted your regex properly, the problem shows. A * in your regex was hidden since markdown took it to make the text italics.
Your opening .* matches greedily as much as it can, only backing up enough to let (#{2}[a-zA-Z]*#{2}) match. This matches the last pattern found in the line, everything before it having been matched by the .*.
You need to remove .* as I mentioned in my comment, and use preg_match_all:
$re = '~#{2}[a-zA-Z]*#{2}~';
preg_match_all($re, "##name##, ##address##, ##bankAccount##", $m);
print_r($m);
See the PHP demo
The .*#{2}[a-zA-Z]*#{2}.* matched 0 or more characters other than a newline at first, grabbing the whole line, and then backtracked until the last occurrence of #{2}[a-zA-Z]*#{2} pattern, and the last .* only grabbed the rest of the line. Removing the .* and using preg_match_all, all substrings matching the #{2}[a-zA-Z]*#{2} pattern can be extracted.

Powershell regex

Is there a Powershell regex command I could use to replace the last consecutive zero in a text string with a "M". For Example:
$Pattern = #("000123456", "012345678", "000000001", "000120000")
Final result:
00M123456
M12345678
0000000M1
00M120000
Thanks.
Search for the following regex:
"^(0*)0"
The regex searches for a consecutive string of 0 at the beginning ^ of the string. It captures all the 0 except the one for replacement. "^0(0*)" also works, since we only need to take note of the number of 0 which we don't touch.
With the replacement string:
'$1M'
Note that $1 is denotes the text captured by the first capturing group, which is (0*) in the regex.
Example by #SegFault:
"000120000" -replace "^(0*)0", '$1M'