Negative Lookbehind Works in Editor But Not in Powershell Script - regex

Using the following. I am attempting to replace spaces with comma-space for all instances in a string. While avoiding repeating commas already present in the string.
Test string:
'186 ATKINS, Cindy Maria 25 Every Street Smalltown, Student'
Using the following code:
Get-Content -Path $filePath |
ForEach-Object {
$match = ($_ | Select-String $regexPlus).Matches.Value
$c = ($_ | Get-Content)
$c = $c -replace $match,', '
$c
}
The output is:
'186, ATKINS,, Cindy, Maria, 25, Every, Street, Smalltown,, Student'
My $regexPlus value is:
$regexPlus = '(?s)(?<!,)\s'
I have tested the negative lookbehind assertion in my editor and it works. Why does it not work in this Powershell script? The regex 101 online editor produces this curious mention of case sensitivity:
Negative Lookbehind (?<!,)
Assert that the Regex below does not match
, matches the character , with index 4410 (2C16 or 548) literally (case sensitive)
I have tried editing to:
$match = ($_ | Select-String $regexPlus -CaseSensitive).Matches.Value
But still not working. Any ideas are welcome.

Part of the problem here is that you are trying to force through the regex to do the replacement, when, like #WiktorStribiżew mentions, simply use -replace like it's supposed to be used. i.e. -replace does all the hard work for you.
When you do this:
$match = ($_ | Select-String $regexPlus).Matches.Value
You are right, you are trying to find Regex matches. Congratulations! It found a space character, but when you do this:
$c = $c -replace $match,', '
It interprets $match as a space character like this:
$c = $c -replace ' ',', '
And not as a regular expression that you might have been expecting. That's why it's not seeing the negative lookbehind for the commas, because all it is searching for are spaces, and it is dutifully replacing all the spaces with comma spaces.
The solution is simple in that, all you have to do is simply use the Regex text in the -replace string:
$regexPlus = '(?s)(?<!,)\s'
$c = $c -replace $regexPlus,', '
e.g. The negative lookbehind working as advertised:
PS C:> $str = '186 ATKINS, Cindy Maria 25 Every Street Smalltown, Student'
PS C:> $regexPlus = '(?s)(?<!,)\s'
PS C:> $str -replace $regexPlus,', '
186, ATKINS, Cindy, Maria, 25, Every, Street, Smalltown, Student

You can use
(Get-Content -Path $filePath) -replace ',*\s+', ', '
This code replaces zero or more commas and all one or more whitespaces after them with a single comma + space.
See the regex demo.
More details:
,* - zero or more commas
\s+ - one or more whitespace chars.

Related

how to use regex for each line in a file in powershell

In a file I have the kind of line:
I have lot of spaces in me.
and I replace every space by one space with this powershell code:
$String = "I have lot of spaces in me."
while ($String.Contains(" "))
{
$String = $String -replace " "," "}
the result is :
I have lot of spaces in me.
I would like to do that for each line in a txt file. could you give me the best way to do that?
Part two:
How can I replace something only when there are more than one whitespace with e.g. ;?
The response will be:
;4828;toto;toto;Ticket;0112APPT
and not :
;4828;toto toto;Ticket;0112APPT
To be clear, I would like replace only two White-Space by the Character ;
Like I said in the comments, this should do it for you (atleast in my test):
Get-Content yourfile.txt | % {$_ -replace '\s+', ' '}
Explanation:
Get-Content - Gets Content from given File
| % - foreach line of the content given from Get-Content
$_ -replace '\s+', ' ' - '\s+' stands for one or more whitespaces
If you want to change the content of the File with the replaced strings you can also pipe it to Set-Content and save it in another file:
Get-Content yourfile.txt | % {$_ -replace '\s+', ' '} | Set-Content yourOutputFile.txt
If you want to write to the same file in the pipe, take a look at: Why you dont do it!
Given your second question to ignore single whitespaces in the regex, this is how you would go if you want to replace more than one whitspace with ;.
This will not replace spots with a single whitespace:
Get-Content yourfile.txt | % {$_ -replace '\s\s+', ';'}
You can do it like this:
(Get-Content '.\TextDocument.txt' -Raw) -replace ' +', ' '
Note that using \s instead of an actual space in the RegEx is an option, but it will remove not just spaces, but such things as tabs and, more crucially, end of line characters.

Replace text after special character

I have string which should to be change from numbers to text in my case variable is:
$string = '18.3.0-31290741.41742-1'
I want to replace everything after '-' to be "-SNAPSHOT" and when perform echo $string to show information below. I tried with LastIndexOf(), Trim() and other things but seems not able to manage how to do it.
Expected result:
PS> echo $string
18.3.0-SNAPSHOT
Maybe that can be the light of the correct way, but when have two '-' is going to replace the last one not the first which can see:
$string = "18.3.0-31290741.41742-1" -replace '(.*)-(.*)', '$1-SNAPSHOT'
.* is a greedy match, meaning it will produce the longest matching (sub)string. In your case that would be everything up to the last hyphen. You need either a non-greedy match (.*?) or a pattern that won't match hyphens (^[^-]*).
Demonstration:
PS C:\> '18.3.0-31290741.41742-1' -replace '(^.*?)-.*', '$1-SNAPSHOT'
18.3.0-SNAPSHOT
PS C:\> '18.3.0-31290741.41742-1' -replace '(^[^-]*)-.*', '$1-SNAPSHOT'
18.3.0-SNAPSHOT
By using a positive lookbehind assertion ((?<=...)) you could eliminate the need for a capturing group and backreference:
PS C:\> "18.3.0-31290741.41742-1" -replace '(?<=^.*?-).*', 'SNAPSHOT'
18.3.0-SNAPSHOT
You could use Select-String and an regular expression to match the pattern, then pass the match to ForEach-Object (commonly shorthanded with alias %) to construct the final string:
$string = "18.3.0-31290741.41742-1" | Select-String -pattern ".*-.*-" | %{ "$($_.Matches.value)SNAPSHOT" }
$string

Regex for multiple app versions

Im trying to get list of versions from my custom attribute in powershell script. Atrribute looks like this:
[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]
And I end up with regex like this but it does'nt work at all:
'\(\"([^\",?]*)\"+\)'
You should do this as a two-step process: First you parse out the CompatibleVersions attribute, and then you split out those version numbers. Otherwise you will have difficulties finding the version numbers individually without likely finding otheer version-like numbers.
$s = '[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]'
$versions = ($s | Select-String -Pattern 'CompatibleVersions\(([^)]+)\)' | % { $_.Matches }).Groups[1].Value
$versions.Split(',') | % { $_.Trim('"') } | Write-Host
# 1.7.1.0
# 1.7.1.1
# 1.2.2.3
Start by grabbing the parentheses pair and everything inside:
$string = '[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]'
if($string -match '\(([^)]+)\)'){
# Remove the parentheses themselves, split by comma and then trim the "
$versionList = $Matches[0].Trim("()") -split ',' |ForEach-Object Trim '"'
}
You may use
$s | select-string -pattern "\d+(?:\.\d+)+" -AllMatches | Foreach {$_.Matches} | ForEach-Object {$_.Value}
The \d+(?:\.\d+)+ pattern will match:
\d+ - 1 or more digits
(?:\.\d+)+ - 1 or more sequences of a . and 1+ digits.
See the regex demo on RegexStorm.
'"([.\d]+)"' will match any substring composed of dots and digits (\d) and comprised into double quotes (")
Try it here
A number between .. can be 0, but cannot be 00, 01 or similar.
Pay attention to the starting [
This is a regex for the check:
^\[assembly: CompatibleVersions\("(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}"(?:,"(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}")*\)]$
Here is the regex with tests.
But if you are reading a list, you should use instead:
^\[assembly: CompatibleVersions\("((?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}"(?:,"(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}")*)\)]$
By it you will extract the "...","..."... consequence from the inner parenthesis.
After that split the result string by '","' into a list and remove last " from the last element and the first " from the first element. Now you have list of correct versions Strings.
Alas, regex cannot create a list without split() function.

Trim More than 20 Characters

I am working on a script that will generate AD usernames based off of a csv file. Right now I have the following line working.
Select-Object #{n=’Username’;e={$_.FirstName.ToLower() + $_.LastName.ToLower() -replace "[^a-zA-Z]" }}
As of right now this takes the name and combines it into a AD friendly name. However I need to name to be shorted to no more than 20 characters. I have tried a few different methods to shorten the username but I haven't had any luck.
Any ideas on how I can get the username shorted?
Probably the most elegant approach is to use a positive lookbehind in your replacement:
... -replace '(?<=^.{20}).*'
This expression matches the remainder of the string only if it is preceded by 20 characters at the beginning of the string (^.{20}).
Another option would be a replacement with a capturing group on the first 20 characters:
... -replace '^(.{20}).*', '$1'
This captures at most 20 characters at the beginning of the string and replaces the whole string with just the captured group ($1).
$str[0..19] -join ''
e.g.
PS C:\> 'ab'[0..19]
ab
PS C:\> 'abcdefghijklmnopqrstuvwxyz'[0..19] -join ''
abcdefghijklmnopqrst
Which I would try in your line as:
Select-Object #{n=’Username’;e={(($_.FirstName + $_.LastName) -replace "[^a-z]").ToLower()[0..19] -join '' }}
([a-z] because PowerShell regex matches are case in-senstive, and moving .ToLower() so you only need to call it once).
And if you are using Strict-Mode, then why not check the length to avoid going outside the bounds of the array with the delightful:
$str[0..[math]::Min($str.Length, 19)] -join ''
To truncate a string in PowerShell, you can use the .NET String::Substring method. The following line will return the first $targetLength characters of $str, or the whole string if $str is shorter than that.
if ($str.Length -gt $targetLength) { $str.Substring(0, $targetLength) } else { $str }
If you prefer a regex solution, the following works (thanks to #PetSerAl)
$str -replace "(?<=.{$targetLength}).*"
A quick measurement shows the regex method to be about 70% slower than the substring method (942ms versus 557ms on a 200,000 line logfile)

Regular expression to match any character being repeated more than 10 times

I'm looking for a simple regular expression to match the same character being repeated more than 10 or so times. So for example, if I have a document littered with horizontal lines:
=================================================
It will match the line of = characters because it is repeated more than 10 times. Note that I'd like this to work for any character.
The regex you need is /(.)\1{9,}/.
Test:
#!perl
use warnings;
use strict;
my $regex = qr/(.)\1{9,}/;
print "NO" if "abcdefghijklmno" =~ $regex;
print "YES" if "------------------------" =~ $regex;
print "YES" if "========================" =~ $regex;
Here the \1 is called a backreference. It references what is captured by the dot . between the brackets (.) and then the {9,} asks for nine or more of the same character. Thus this matches ten or more of any single character.
Although the above test script is in Perl, this is very standard regex syntax and should work in any language. In some variants you might need to use more backslashes, e.g. Emacs would make you write \(.\)\1\{9,\} here.
If a whole string should consist of 9 or more identical characters, add anchors around the pattern:
my $regex = qr/^(.)\1{9,}$/;
In Python you can use (.)\1{9,}
(.) makes group from one char (any char)
\1{9,} matches nine or more characters from 1st group
example:
txt = """1. aaaaaaaaaaaaaaa
2. bb
3. cccccccccccccccccccc
4. dd
5. eeeeeeeeeeee"""
rx = re.compile(r'(.)\1{9,}')
lines = txt.split('\n')
for line in lines:
rxx = rx.search(line)
if rxx:
print line
Output:
1. aaaaaaaaaaaaaaa
3. cccccccccccccccccccc
5. eeeeeeeeeeee
. matches any character. Used in conjunction with the curly braces already mentioned:
$: cat > test
========
============================
oo
ooooooooooooooooooooooo
$: grep -E '(.)\1{10}' test
============================
ooooooooooooooooooooooo
={10,}
matches = that is repeated 10 or more times.
use the {10,} operator:
$: cat > testre
============================
==
==============
$: grep -E '={10,}' testre
============================
==============
You can also use PowerShell to quickly replace words or character reptitions. PowerShell is for Windows. Current version is 3.0.
$oldfile = "$env:windir\WindowsUpdate.log"
$newfile = "$env:temp\newfile.txt"
$text = (Get-Content -Path $oldfile -ReadCount 0) -join "`n"
$text -replace '/(.)\1{9,}/', ' ' | Set-Content -Path $newfile
PHP's preg_replace example:
$str = "motttherbb fffaaattther";
$str = preg_replace("/([a-z])\\1/", "", $str);
echo $str;
Here [a-z] hits the character, () then allows it to be used with \\1 backreference which tries to match another same character (note this is targetting 2 consecutive characters already), thus:
mother father
If you did:
$str = preg_replace("/([a-z])\\1{2}/", "", $str);
that would be erasing 3 consecutive repeated characters, outputting:
moherbb her
A slightly more generic powershell example. In powershell 7, the match is highlighted including the last space (can you highlight in stack?).
'a b c d e f ' | select-string '([a-f] ){6,}'
a b c d e f