Replace or substring first set of numbers with regex

Replace or substring first set of numbers with regex - regex

I am struggling to find a way to get only the first set of numbers in a file name in PowerShell. The file names can be similar to the ones below but I only want to get the first string of numbers and nothing else.
Example file names:
123456 (12).csv
123456abc.csv
123456(Copy 1).csv
123456 (Copy 1).csv
What I am currently attempting:
$test = "123456 (12).csv"
$POPieces = $test -match "^[0-9\s]+$"
Write-Host $POPieces
What I'd expect from above:
123456

The -match operator stores the matches in the automatic variable $matches. However, your regular expression includes not only digits, but also whitespace (\s), so you won't necessarily get just the number. Change the expression to ^\d+ to match only a number at the beginning of the string. Use Get-ChildItem to enumerate the files, as Martin Brandl suggested.
$POPieces = Get-ChildItem 'C:\root\folder' -Filter '*.csv' |
Where-Object { $_.Name -match '^\d+' } |
ForEach-Object { $matches[0] }

Related

PowerShell Replace Syntax: Regex Substitution and Environmental Variable

I am trying to substitute both regex and an environment variable and can't find the correct syntax (because of the mismatch of single and double quotes). The short script I am developing will rename files. Here is what my setup looks like a few of the ways I tried.
# Original File Name: (BRP-01-001-06K48b-SC-CC-01).tif
# Desired File Name: (BRP-12-345-06K48b-SC-CC-01).tif
# Variables defined by user:
PS ..\user> $pic,$change="01-001","12-345"
# The problem is with the "-replace" near the end of the command
PS ..\user> get-childitem .\* -recurse | where-object {$_.BaseName -match ".*\(BRP-$pic-.{15}\).*"} | foreach-object {$orig=($_.FullName); $new=($orig -replace "(\(BRP-)($pic)(-.{15}\))", '$1$change$3'); echo $new}
PS ..\user> (BRP-$change-06K48b-SC-CC-01).tif
# Also tried:
PS ..\user> get-childitem .\* -recurse | where-object {$_.BaseName -match ".*\(BRP-$pic-.{15}\).*"} | foreach-object {$orig=($_.FullName); $new=($orig -replace "(\(BRP-)($pic)(-.{15}\))", "`$1$change`$3"); echo $new}
PS ..\user> $112-345-06K48b-SC-CC-01).tif
# If I put a space before $change:
PS ..\user> get-childitem .\* -recurse | where-object {$_.BaseName -match ".*\(BRP-$pic-.{15}\).*"} | foreach-object {$orig=($_.FullName); $new=($orig -replace "(\(BRP-)($pic)(-.{15}\))", "`$1 $change`$3"); echo $new}
PS ..\user> (BRP- 12-345-06K48b-SC-CC-01).tif
In the last example it "works" if I add space before $change ... but I do not want the space. I realize I could do another replace operation to fix the space but I would like to do this all in one command if possible.
What syntax do I need to replace using both environment variables and regex substitutions?
Also, out of curiosity, once working, will this replace all occurrences within a file name or just the first. For instance, will the file:
"Text (BRP-01-001-06K48b-SC-CC-01) Text (BRP-01-001-06K48b-SC-OR-01)"
change to:
"Text (BRP-12-345-06K48b-SC-CC-01) Text (BRP-12-345-06K48b-SC-OR-01)"
or only the first match, like:
"Text (BRP-12-345-06K48b-SC-CC-01) Text (BRP-01-001-06K48b-SC-OR-01)"

Best practice is surrounding your capture group name in {} or using named capture groups within your substitution string. Using {} with your second example, should work out nicely.
"Text (BRP-01-001-06K48b-SC-CC-01) Text (BRP-01-001-06K48b-SC-OR-01)" -replace "(\(BRP-)($pic)(-.{15}\))", "`${1}$change`${3}"
When PowerShell variables, capture groups, and string literals are in the replacement string, you can't use surrounding single quotes. Using surrounding double quotes allows inner expansion to happen. As a result, you will need to backtick escape $ used to identify capture groups.
Your second example has the proper syntax, typically, but because $change begins with digits, it creates unintended consequences. You are escaping $ in the substitution string to use capture groups 1 and 3. Since $change evaluates to 12-345, the intended capture group 1 is actually capture group 112, which doesn't exist. See below for an illustration of your second attempt:
"(\(BRP-)($pic)(-.{15}\))":
Capture Group 1: (BRP-
Capture Group 2: 01-001
Capture Group 3: -06K48b-SC-CC-01)
"`$1$change`$3" at runtime becomes $112-345$3 and then becomes $112-345-06K48b-SC-CC-01). Notice that $112 has been interpolated before the capture groups are substituted. Then capture group 112 is checked. Since it does not exist, $112 is just assumed to be a string.

The below might be what you are after,
$pic = "01-001"
$change = "12-345"
$RenameFiles_FilterStr = "*BRP-$pic*.tif"
gci $RenameFiles_FilterStr -recurse | % { $_.BaseName -replace $pic,$change }
# The above returns renamed strings (files not renamed yet). If the expected result matches the returned ones, then uncomment the below and run to rename the files
# gci $RenameFiles_FilterStr -recurse | % { Rename-Item -NewName ($_.BaseName -replace $pic,$change) }

PowerShell to slip a text file on specific string

I am trying to split a large text file into several files based on a specific string. Every time I see the string ABCDE - 3 I want to cut and paste the content up to that string in a new text file. I also want to extract the last 4 of the social, last name and first name. The new text file needs be saved as first_name,last_name and last 4 of social.
See text file example and a bit of initial code. I would feel much more comfortbale doing it in Python but PowerShell is the only option.
$my_text = Get-Content .\ab.txt
$ssn_pattern = '([0-8]\d{2})-(\d{2})-(\d{4})'
ForEach ($file in my_text)

To get the firstname, lastname and the last 4 digits of the social, you could make use of capturing groups and use those groups when assembling the filename.
From your pattern, only the last 4 digits should be grouped.
You could use a pattern to start the match with TO: and from the next line get the values for the names and the number.
Then match all lines the do not start with ABCDE - 3 using a negative lookahead (?!
You can adjust the pattern and the code to match your exact text.
(?m)^[^\S\r\n]+TO:.*\r?\n\s*ATTN:\s*[A-Z]{3} ([^,\r\n]+),[^\S\r\n]*(.+?)[^\S\r\n]*[0-8]\d{2}-\d{2}-(\d{4})(?:\r?\n(?![^\S\r\n]+ABCDE - 3).*)*\r?\n[^\S\r\n]+ABCDE - 3.*
Regex demo
I constructed a code snippet using stackoverflow postings, so this might be improved. It basically comes down to load a raw string and get all the matches.
Then loop over all the matches and get the groups to assemble a filename an save the full match as the content.
If there are names which contain spaces and you don't want those to be in the filename, you could replace those with an empty string.
Example code:
$my_text = Get-Content -Raw ./Documents/stack-overflow/powershell/ab.txt
$pattern = "(?m)^[^\S\r\n]+TO:.*\r?\n\s*ATTN:\s*[A-Z]{3} ([^,\r\n]+),[^\S\r\n]*(.+?)[^\S\r\n]*[0-8]\d{2}-\d{2}-(\d{4})(?:\r?\n(?![^\S\r\n]+ABCDE - 3).*)*\r?\n[^\S\r\n]+ABCDE - 3.*"
Select-String $pattern -input $my_text -AllMatches |
ForEach-Object { $_.Matches } |
ForEach-Object {
$fileName = -join ($_.groups[2].Value, $_.groups[1].Value, $_.groups[3].Value)
Write-Host $fileName
Set-Content -Path "your-path-here/$fileName.txt" -Value $_.Value
}
When I run this, I get 2 files with the content for each match:
MIOTTISAREMO2222.txt
MIOTTSANREMO1111.txt

Select-String: match a string only if it isn't preceded by a specific character

I have a list of files that contain either of the two strings:
"stuff" or ";stuff"
I'm trying to write a PowerShell Script that will return only the files that contain "stuff". The script below currently returns all the files because obviously "stuff" is a substring of ";stuff"
For the life of me, I cannot figure out how to only matches file that contain "stuff", without a preceding ;
Get-Content "C:\temp\list\list.txt" |
Where-Object { Select-String -Quiet -Pattern "stuff" -SimpleMatch $_ }
Note: C:\temp\list\list.txt contains a list of file paths that are each passed to Select-String.
Thanks for the help.

You cannot perform the desired matching with literal substring searches (-SimpleMatch).
Instead, use a regex with a negative look-behind assertion ((?<!..)) to rule out stuff substrings preceded by a ; char.: (?<!;)stuff
Applied to your command:
Get-Content "C:\temp\list\list.txt" |
Where-Object { Select-String -Quiet -Pattern '(?<!;)stuff' -LiteralPath $_ }
Regex pitfalls:
It is tempting to use [^;]stuff instead, using a negated (^) character set ([...]) (see this answer); however, this will not work as expected if stuff appears at the very start of a line, because a character set - whether negated or not - only matches an actual character, not the start-of-the-line position.
It is then tempting to apply ? to the negated character set (for an optional match - 0 or 1 occurrence): [^;]?stuff. However, that would match a string containing ;stuff again, given that stuff is technically preceded by a "0-repeat occurrence" of the negated character set; thus, ';stuff' -match '[^;]?stuff' yields $true.
Only a look-behind assertion works properly in this case - see regular-expressions.info.

To complement #mklement0's answer, I suggest an alternative approach to make your code easier to read and understand:
#requires -Version 4
#(Get-Content -Path 'C:\Temp\list\list.txt').
ForEach([IO.FileInfo]).
Where({ $PSItem | Select-String -Pattern '(?<!;)stuff' -Quiet })
This will turn your strings into objects (System.IO.FilePath) and utilizes the array functions ForEach and Where for brevity/conciseness. Further, this allows you to pipe the paths as objects which will be accepted by the -Path parameter into Select-String to make it more understandable (I find long lists of parameter sets difficult to read).

The example code posted won't actually run, as it will look at each line as the -Path value.
What you need is to get the content, select the string you're after, then filter the results with Where-Object
Get-Content "C:\temp\list\list.txt" | Select-String -Pattern "stuff" | Where-Object {$_ -notmatch ";stuff"}
You could create a more complex regex if needed, but depends on what your result data from your files looks like

Keep first regex match and discard others

Yep another regex question... I am using PowerShell to extract a simple number from a filename when looping through a folder like so:
# sample string "ABCD - (123) Sample Text Here"
Get-ChildItem $processingFolder -filter *.xls | Where-Object {
$name = $_.Name
$pattern = '(\d{2,3})'
$metric = ([regex]$pattern).Matches($name) | { $_.Groups[1].Value }
}
All I am looking for is the number surrounded by brackets. This is successful, but it appears the $_.Name actually grabs more than just the name of the file, and the regex ends up picking up some other bits I don't want.
I understand why, as it's going through each regex match as an object and taking the value out of each and putting in $metric. I need some help editing the code so it only bothers with the first object.
I would just use -match etc if I wasn't bothered with the actual contents of the match, but it needs to be kept.

I don't see a cmdlet call before $_.Groups[1].Value which should be ForEach-Object but that is a minor thing. We need to make a small improvement on your regex pattern as well to account for the brackets but not include them in the return.
$processingFolder = "C:\temp"
$pattern = '\((\d+)\)'
Get-ChildItem $processingFolder -filter "*.xls" | ForEach-Object{
$details = ""
if($_.Name -match $pattern){$details = $matches[1]}
$_ | Add-Member -MemberType NoteProperty -Name Details -Value $details -PassThru
} | select name, details
This will loop all the files and try and match numbers in brackets. If there is more than one match it should only take the first one. We use a capture group in order to ignore the brackets in the results. Next we use Add-Member to make a new property called Details which will contain the matched value.
Currently this will return all files in the $processingFolder but a simple Where-Object{$_.Details} would return just the ones that have the property populated. If you have other properties that you need to make you can chain the Add-Members together. Just don't forget the -passthru.
You could also just make your own new object if you need to go that route with multiple custom parameters. It certainly would be more terse. That last question I answered has an example of that.

After doing some research in to the data being returned itself (System.Text.RegularExpressions.MatchCollection) I found the Item method, so called that on $metric like so:
$name = '(111) 123 456 789 Name of Report Here 123'
$pattern = '(\d{2,3})'
$metric = ([regex]$pattern).Matches($name)
Write-Host $metric.Item(1)
Whilst probably not the best approach, it returns what I'm expecting for now.

Trying to match this using regular expressions in PowerShell

I am trying to use regular expressions to match certain lines in a file, but I am having some trouble.
The file contains text like this:
Mario, 123456789
Luigi, 234-567-890
Nancy, 345 5666 77533
Bowser, 348759823745908732589
Peach, 534785
Daisy, 123-456-7890
I'm trying to match just the numbers as either XXX-XXX-XXX or XXX XXX XXX pattern.
I've tried a few different ways, but it always expects something I don't want it to or it tell me everything is false.
I'm using PowerShell to do this.
At first I tried:
{$match = $i -match "\d{3}\-\d{3}\-\d{3}|\d{3}\ \d{3}\ \d{3}"
Write-Host $match}
But when I do that it matches the long strong of numbers and XXX-XXX-XXXXX.
I read something saying that n would match the exact quantity, so I tried that...
{$match = $i -match "\d{n3}\-\d{n3}\-\d{n3}|\d{n3}\ \d{n3}\ \{n3}"
Write-Host $match}
That made everything false...
So I tried
{$match = $i -match "\d\n{3}\-\d\n{3}\-\d\n{3}|\d\n{3}\ \d\n{3}\ \d\n{3}"
I also tried the lazy quantifier, ?:
{$match = $i -match "\d{3?}\-\d{3?}\-\d{3?}|\d{3?}\ \{3?}\ \{3?}"
Write-Host $match}
Still false...
The final thing I tried was this...
{$match = $i -match "\d[0-9\{3\}\-\d[0-9]\{3\}\-\d[0-9]{3\}|\d[0-9]\{3\}\ \d[0-9]\{3}\ \d[0-9]\{3\}"<br>
Write-Host $match}
Still no luck...

The following pattern gives two matches:
Get-Content .\test.txt | Where-Object {$_ -match '\d{3}[-|\s]\d{3}[-|\s]\d{3}'}
Luigi, 234-567-890
Daisy,
123-456-7890
If you want to exclude the last match, add the '$' anchor (represents the end of the string:
Get-Content .\test.txt | Where-Object {$_ -match '\d{3}[-|\s]\d{3}[-|\s]\d{3}$'}
Luigi, 234-567-890
If you want to be very specific and match lines from start to end (use the ^ anchor, denotes the start of the string):
Get-Content .\test.txt | Where-Object {$_ -match '^\w+,\s+\d{3}[-|\s]\d{3}[-|\s]\d{3}$'}
Luigi, 234-567-890

Your first answer is the closest. The {3} matches exactly 3 characters. I think the n you saw was supposed to represent any number, not an actual n character. The reason it matches the long strings is that you only specified that the match must find 3 digits, dash or space, 3 digits, dash or space, then 3 more digits. You did not specify that it doesn't count if there are more digits after that.
To not match when there is a number after, you can use a negative lookahead.
(\d{3}-\d{3}-\d{3}|\d{3}\ \d{3}\ \d{3})(?!\d)
Alternatively, if you want to only match at the end of the line, possibly with trailing space
(\d{3}-\d{3}-\d{3}|\d{3}\ \d{3}\ \d{3})\s*$

As Gideon said, your first is the best place to start.
"\b\d{3}\-\d{3}\-\d{3}\b|\b\d{3}\ \d{3}\ \d{3}\b"
The \b special character added before and after each statement is a word boundary - basically a space or newline or punctuation like a period or comma. This ensures that 9999 doesn't match, but 999. does.

Try this:
/(\d+[- ])+\d+/
It's better not to have so rigid regular expressions, unless you are absolutely sure there that your input will not change.
So this regex matches at least a digit, then greedily searches for more digits followed by a space or a dash. This is also repeated as much as possible then followed by at least another digit.

When manipulating data in PowerShell, it usually is a good idea to create objects representing the data (after all, PowerShell is all about objects). Filtering based on object properties is usually easier and more robust. Your problem is a good example.
Here is what we are after:
the persons: $persons
where: where
the number of that person: $_.number
matches: -match
the pattern
starting with three digits: ^\d{3}
followed by three digits between dashes or spaces: (-\d{3}-|\ \d{3}\ )
ending on three digits: \d{3}$
Below is the entire script:
$persons = import-csv -Header "name", "number" -delimiter "," data.csv
$persons | where {$_.number -match "^\d{3}(\-\d{3}\-|\ \d{3}\ )\d{3}$"}

You can also use Select-String:
Select-String '(\d{3}[ -]){2}\d{3}$' .\file.txt | % {$_.Line}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replace or substring first set of numbers with regex - regex

Related

PowerShell Replace Syntax: Regex Substitution and Environmental Variable

PowerShell to slip a text file on specific string

Select-String: match a string only if it isn't preceded by a specific character

Keep first regex match and discard others

Trying to match this using regular expressions in PowerShell

Categories

Resources