Split string on character, except that doubled up character is an escape - regex

Input:
$string = "this-is-just-an--example"
Output:
this
is
just
an-example
Tried various things centered around Regex.Split and "-[^-]" or "-([^])".
Example of things that did not work:
[regex]::Split( $string, "-[^-]" )
[regex]::Split( $string, "-([^-])" )
Of course I can use the String.Split and iterate, and realize that empty string means I ran into escaped character... But it's ugly code.
P.S. tried searching for duped for a few minutes, didn't find any.

Use the lookahead and lookbehind assertions, and then do a replace to eliminate the remaining doubled characters:
$string = "this-is-just-an--example"
$string -split '(?<!-)-(?!-)' -replace '--','-'
this
is
just
an-example

Replacing the escape sequence with another value first eliminates the need for a complex split condition. All that is needed is a unique replacement value that does not already appear in the string.
For example, using # as the replacement value for $string = 'this-is-just-an--example', the following line will get the desired result:
$string -replace '--','#' -split '-' -replace '#','-'
-replace '--','#' eliminates the escape sequence (giving this-is-just-an#example),
-split '-' then separates the result (giving an array containing this, is, just, and an#example),
and finally -replace '#','-' restores the escaped value (giving this, is, just, an-example).
Both -split and -replace are built-in PowerShell operators that work on strings using regular expressions (equivalent to the Regex.Split and Regex.Replace methods in .NET).

Related

PowerShell regex does not match near newline

I have an exe output in form
Compression : CCITT Group 4
Width : 3180
and try to extract CCITT Group 4 to $var with PowerShell script
$var = [regex]::match($exeoutput,'Compression\s+:\s+([\w\s]+)(?=\n)').Groups[1].Value
The http://regexstorm.net/tester say, the regexp Compression\s+:\s+([\w\s]+)(?=\n) is correct but not PowerShell. PowerShell does not match. How can I write the regexp correctly?
You want to get all text from some specific pattern till the end of the line. So, you do not even need the lookahead (?=\n), just use .+, because . matches any char but a newline (LF) char:
$var = [regex]::match($exeoutput,'Compression\s+:\s+(.+)').Groups[1].Value
Or, you may use a -match operator and after the match is found access the captured value using $matches[1]:
$exeoutput -match 'Compression\s*:\s*(.+)'
$var = $matches[1]
Wiktor Stribiżew's helpful answer simplifies your regex and shows you how to use PowerShell's -match operator as an alternative.
Your follow-up comment about piping to Out-String fixing your problem implies that your problem was that $exeOutput contained an array of lines rather than a single, multiline string.
This is indeed what happens when you capture the output from a call to an external program (*.exe): PowerShell captures the stdout output lines as an array of strings (the lines without their trailing newline).
As an alternative to converting array $exeOutput to a single, multiline string with Out-String (which, incidentally, is slow[1]), you can use a switch statement to operate on the array directly:
# Stores 'CCITT Group 4' in $var
$var = switch -regex ($exeOutput) { 'Compression\s+:\s+(.+)' { $Matches[1]; break } }
Alternatively, given the specific format of the lines in $exeOutput, you could leverage the ConvertFrom-StringData cmdlet, which can perform parsing the lines into key-value pairs for you, after having replaced the : separator with =:
$var = ($exeoutput -replace ':', '=' | ConvertFrom-StringData).Compression
[1] Use of a cmdlet is generally slower than using an expression; with a string array $array as input, you can achieve what $array | Out-String does more efficiently with $array -join "`n", though note that Out-String also appends a trailing newline.

Regular Expressions in powershell split

I need to strip out a UNC fqdn name down to just the name or IP depending on the input.
My examples would be
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
I want to end up with just tom or 123.43.234.23
I have the following code in my array which is striping out the domain name perfect, but Im still left with \\tom
-Split '\.(?!\d)')[0]
Your regex succeeds in splitting off the tokens of interest in principle, but it doesn't account for the leading \\ in the input strings.
You can use regex alternation (|) to include the leading \\ at the start as an additional -split separator.
Given that matching a separator at the very start of the input creates an empty element with index 0, you then need to access index 1 to get the substring of interest.
In short: The regex passed to -split should be '^\\\\|\.(?!\d)' instead of '\.(?!\d)', and the index used to access the resulting array should be [1] instead of [0]:
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' |
ForEach-Object { ($_ -Split '^\\\\|\.(?!\d)')[1] }
The above yields:
tom
123.43.234.23
Alternatively, you could remove the leading \\ in a separate step, using -replace:
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' |
ForEach-Object { ($_ -Split '\.(?!\d)')[0] -replace '^\\\\' }
Yet another alternative is to use a single -replace operation, which does not require a ForEach-Object call (doesn't require explicit iteration):
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' -replace
'?(x) ^\\\\ (.+?) \.\D .+', '$1'
Inline option (?x) (IgnoreWhiteSpace) allows you to make regexes more readable with insignificant whitespace: any unescaped whitespace can be used for visual formatting.
^\\\\ matches the \\ (escaped with \) at the start (^) of each string.
(.+?) matches one or more characters lazily.
\.\D matches a literal . followed by something other than a digit (\d matches a digit, \D is the negation of that).
.+ matches one or more remaining characters, i.e., the rest of the input.
$1 as the replacement operand refers to what the 1st capture group ((...)) in the regex matched, and, given that the regex was designed to consume the entire string, replaces it with just that.
I'm stealing Lee_Daileys $InSTuff
but appending a RegEx I used recently
$InStuff = -split #'
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
'#
$InStuff |ForEach-Object {($_.Trim('\\') -split '\.(?!\d{1,3}(\.|$))')[0]}
Sample Output:
tom
123.43.234.23
As you can see here on RegEx101 the dots between the numbers are not matched
The Select-String function uses regex and populates a MatchInfo object with the matches (which can then be queried).
The regex "(\.?\d+)+|\w+" works for your particular example.
"\\tom.overflow.corp.com", "\\123.43.234.23.overflow.corp.com" |
Select-String "(\.?\d+)+|\w+" | % { $_.Matches.Value }
while this is NOT regex, it does work. [grin] i suspect that if you have a really large number of such items, then you will want a regex. they do tend to be faster than simple text operators.
this will get rid of the leading \\ and then replace the domain name with .
# fake reading in a text file
# in real life, use Get-Content
$InStuff = -split #'
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
'#
$DomainName = '.overflow.corp.com'
$InStuff.ForEach({
$_.TrimStart('\\').Replace($DomainName, '')
})
output ...
tom
123.43.234.23

Replace text after special character

I have string which should to be change from numbers to text in my case variable is:
$string = '18.3.0-31290741.41742-1'
I want to replace everything after '-' to be "-SNAPSHOT" and when perform echo $string to show information below. I tried with LastIndexOf(), Trim() and other things but seems not able to manage how to do it.
Expected result:
PS> echo $string
18.3.0-SNAPSHOT
Maybe that can be the light of the correct way, but when have two '-' is going to replace the last one not the first which can see:
$string = "18.3.0-31290741.41742-1" -replace '(.*)-(.*)', '$1-SNAPSHOT'
.* is a greedy match, meaning it will produce the longest matching (sub)string. In your case that would be everything up to the last hyphen. You need either a non-greedy match (.*?) or a pattern that won't match hyphens (^[^-]*).
Demonstration:
PS C:\> '18.3.0-31290741.41742-1' -replace '(^.*?)-.*', '$1-SNAPSHOT'
18.3.0-SNAPSHOT
PS C:\> '18.3.0-31290741.41742-1' -replace '(^[^-]*)-.*', '$1-SNAPSHOT'
18.3.0-SNAPSHOT
By using a positive lookbehind assertion ((?<=...)) you could eliminate the need for a capturing group and backreference:
PS C:\> "18.3.0-31290741.41742-1" -replace '(?<=^.*?-).*', 'SNAPSHOT'
18.3.0-SNAPSHOT
You could use Select-String and an regular expression to match the pattern, then pass the match to ForEach-Object (commonly shorthanded with alias %) to construct the final string:
$string = "18.3.0-31290741.41742-1" | Select-String -pattern ".*-.*-" | %{ "$($_.Matches.value)SNAPSHOT" }
$string

Replace xBD character to blank value

In powershell, how to replace XBD character to blank space from the text file?
Assuming that xBD refers to the underlying ASCII value, 0xBD, it should be as easy as:
$string.Replace("$(0xBD -as [char])","")
For multiple lines (eg. an entire text file), use the -replace regex operator.
The syntax for the -replace operator is:
"string(s)" -replace "regex pattern","replacement"
If you omit the "replacement" argument, the characters matched by the regex pattern will simply be removed.
The proper regex pattern in .NET to match that character would be \xBD:
(Get-Content .\myfile.txt) -replace '\xBD' | Set-Content .\mynewfile.txt

Powershell - Replacing a string with a variable ending with a dollar sign

I'm a bit lost with this one. For whatever reason the replace function in powershell doesn't play well with variables ending with a $ sign.
Command:
$var='A#$A#$'
$line=('$var='+"'"+"'")
$line -replace '^.+$',('$line='+"'"+$var+"'")
Expected output:
$line='A#$A#$'
Actual output:
$line='A#$A#
It looks like you're getting hit with a regex substitution that you don't want. The regex special variable $' represents everything after your match. Since your regex matches the entire string, $' is effectively empty. During the replace operation, the .Net regex engine sees $' in your expected output and substitutes in that empty string.
One way to avoid this is to replace all instances of $ in your $var string with $$:
$line -replace '^.+$',('$line='+"'"+($var.Replace('$','$$'))+"'")
You can see more information about regex substitution in .Net here:
Substitutions
I was able to find a band-aid of sorts by replacing $ with a special character and then reverting it back after the change. Preferably you would choose a character that doesn't have a key on your keyboard. For me I chose "¤".
$var='A#$A#$'
$var=$var -replace '\$','¤'
$line=("`$var=''")
$line -replace '^.+$',("`$line='$var'") -replace '¤','$'
I don't really understand the purpose of your posted lines, it seems to me that it would just make more sense to do $line='$line='''+$var+"'", BUT if you insist on your way, just do two replace calls, like this:
$line -replace '^.+$',('$line=''LOL''') -replace 'LOL',$var