Incorporating Regex in Get-ChildItem - regex

I want powershell to return files from a folder whose file name starts with America and has csv extension. The folder contains files for different countries with csv, excel and txt formats.
The following command returns the files that are .csv but how can I also incorporate start with filtering to the below to get only America items?
Get-ChildItem ("C:\Users\Documents\Sales") -Filter *.csv

The -Filter cannot use regex, but it does support the use of the * wildcard.
As RetiredGeek already commented, the easiest way to get the files that start with America and have the .csv extension is to use
Get-ChildItem -Path 'C:\Users\Documents\Sales' -File -Filter 'America*.csv'
the switch -File will make sure you only recieve files, not folders as well
If you do need to filter using regex, you can only do that by piping the results through to a Where-Object clause.
Suppose you want all files starting with 'America' or 'Mexico' and do not want files 'Canada' or any other country in your results. Then you could do
Get-ChildItem -Path 'C:\Users\Documents\Sales' -File -Filter '*.csv' | # first return ony .csv files
Where-Object { $_.Name -match '^(America|Mexico)' } # then filter out all that do not start with America or Mexico
Regex details:
^ # Assert position at the beginning of the string
( # Match the regex below and capture its match into backreference number 1
# Match this alternative (attempting the next alternative only if this one fails)
America # Match the character string “America” literally (case insensitive)
|
# Or match this alternative (the entire group fails if this one fails to match)
Mexico # Match the character string “Mexico” literally (case insensitive)
)

Related

Rename file name with Regex and get rid of everything after with powershell

Hi I'm trying to rename file names with regex and powershell. I got it to replace certain files with the following code but i want to do this for everything after the _underscore. I got over a hundred files i need to rename.
get-childitem *.pdf | foreach { rename-item $_ $_.Name.Replace("_R12B", "") }
Here are example files i'm trying to change;
YTER-01-0B-B-PD-00003-2_R12B
YTER-01-0A-B-PZ-00001-2_R9B
YTER-01-0U-B-PG-00003-1_R1B
YTER-01-0G-B-PP-00005-1_R4B
You may use
get-childitem *.pdf | foreach { rename-item $_ ($_.Name -replace '_[^_]+(?=\.pdf$)') }
The _[^_]+(?=\.pdf$) regex matches
_ - an underscore
[^_]+ - 1+ chars other than an underscore
(?=\.pdf$) - followed with .pdf at the end of string.
See the regex demo.

PowerShell to slip a text file on specific string

I am trying to split a large text file into several files based on a specific string. Every time I see the string ABCDE - 3 I want to cut and paste the content up to that string in a new text file. I also want to extract the last 4 of the social, last name and first name. The new text file needs be saved as first_name,last_name and last 4 of social.
See text file example and a bit of initial code. I would feel much more comfortbale doing it in Python but PowerShell is the only option.
$my_text = Get-Content .\ab.txt
$ssn_pattern = '([0-8]\d{2})-(\d{2})-(\d{4})'
ForEach ($file in my_text)
To get the firstname, lastname and the last 4 digits of the social, you could make use of capturing groups and use those groups when assembling the filename.
From your pattern, only the last 4 digits should be grouped.
You could use a pattern to start the match with TO: and from the next line get the values for the names and the number.
Then match all lines the do not start with ABCDE - 3 using a negative lookahead (?!
You can adjust the pattern and the code to match your exact text.
(?m)^[^\S\r\n]+TO:.*\r?\n\s*ATTN:\s*[A-Z]{3} ([^,\r\n]+),[^\S\r\n]*(.+?)[^\S\r\n]*[0-8]\d{2}-\d{2}-(\d{4})(?:\r?\n(?![^\S\r\n]+ABCDE - 3).*)*\r?\n[^\S\r\n]+ABCDE - 3.*
Regex demo
I constructed a code snippet using stackoverflow postings, so this might be improved. It basically comes down to load a raw string and get all the matches.
Then loop over all the matches and get the groups to assemble a filename an save the full match as the content.
If there are names which contain spaces and you don't want those to be in the filename, you could replace those with an empty string.
Example code:
$my_text = Get-Content -Raw ./Documents/stack-overflow/powershell/ab.txt
$pattern = "(?m)^[^\S\r\n]+TO:.*\r?\n\s*ATTN:\s*[A-Z]{3} ([^,\r\n]+),[^\S\r\n]*(.+?)[^\S\r\n]*[0-8]\d{2}-\d{2}-(\d{4})(?:\r?\n(?![^\S\r\n]+ABCDE - 3).*)*\r?\n[^\S\r\n]+ABCDE - 3.*"
Select-String $pattern -input $my_text -AllMatches |
ForEach-Object { $_.Matches } |
ForEach-Object {
$fileName = -join ($_.groups[2].Value, $_.groups[1].Value, $_.groups[3].Value)
Write-Host $fileName
Set-Content -Path "your-path-here/$fileName.txt" -Value $_.Value
}
When I run this, I get 2 files with the content for each match:
MIOTTISAREMO2222.txt
MIOTTSANREMO1111.txt

How to move first 7 characters of a file name to the end using Powershell

My company has millions of old reports in pdf form. They are Typically named in the format: 2018-09-18 - ReportName.pdf
The organization we need to submit these to is now requiring that we name the files in this format: Report Name - 2018-09.pdf
I need to move the first 7 characters of the file name to the end. I'm thinking there is probably an easy code to perform this task, but I cannot figure it out. Can anyone help me.
Thanks!
Caveat:
As jazzdelightsme points out, the desired renaming operation can result in name collisions, given that you're removing the day component from your dates; e.g., 2018-09-18 - ReportName.pdf and 2018-09-19 - ReportName.pdf would result in the same filename, Report Name - 2018-09.pdf.
Either way, I'm assuming that the renaming operation is performed on copies of the original files. Alternatively, you can create copies with new names elsewhere with Copy-Item while enumerating the originals, but the advantage of Rename-Item is that it will report an error in case of a name collision.
Get-ChildItem -Filter *.pdf | Rename-Item -NewName {
$_.Name -replace '^(\d{4}-\d{2})-\d{2} - (.*?)\.pdf$', '$2 - $1.pdf'
} -WhatIf
-WhatIf previews the renaming operation; remove it to perform actual renaming.
Add -Recurse to the Get-CildItem call to process an entire directory subtree.
The use of -Filter is optional, but it speeds up processing.
A script block ({ ... }) is passed to Rename-Item's -NewName parameter, which enables dynamic renaming of each input file ($_) received from Get-ChildItem using a string-transformation (replacement) expression.
The -replace operator uses a regex (regular expression) as its first operand to perform string replacements based on patterns; here, the regex breaks down as follows:
^(\d{4}-\d{2}) matches something like 2018-09 at the start (^) of the name and - by virtue of being enclosed in (...) - captures that match in a so-called capture group, which can be referenced in the replacement string by its index, namely $1, because it is the first capture group.
(.*?) captures the rest of the filename excluding the extension in capture group $2.
The ? after .* makes the sub-expression non-greedy, meaning that it will give subsequent sub-expressions a chance to match too, as opposed to trying to match as many characters as possible (which is the default behavior, termed greedy).
\.pdf$ matches the the filename extension (.pdf) at the end ($) - note that case doesn't matter. . is escaped as \., because it is meant to be matched literally here (without escaping, . matches any single character in a single-line string).
$2 - $1.pdf is the replacement string, which arranges what the capture groups captured in the desired form.
Note that any file whose name doesn't match the regex is quietly left alone, because the -replace operator passes the input string through if there is no match, and Rename-Item does nothing if the new name is the same as the old one.
Get-ChildItem with some RegEx and Rename-Item can do it:
Get-ChildItem -Path "C:\temp" | foreach {
$newName = $_.Name -replace '(^.{7}).*?-\s(.*?)\.(.*$)','$2 - $1.$3'
$_ | Rename-Item -NewName $newName
}
The RegEx
'(^.{7}).*?-\s(.*?)\.(.*$)' / $2 - $1.$3
(^.{7}) matches the first 7 characters
.*?-\s matches any characters until (and including) the first found - (space dash space)
(.*?)\. matches anything until the first found dot ( . )
(.*$) matches the file extension in this case
$2 - $1.$3 puts it all together in the changed order
This won't properly work if there are filenames with multiple dots ( . ) in it.
This should work (added some test data):
$test = '2018-09-18 - ReportName.pdf','2018-09-18 - Other name.pdf','other pattern.pdf','2018-09-18 - double.extension.pdf'
$test | % {
$match = [Regex]::Match($_, '(?<Date>\d{4}-\d\d)-\d\d - (?<Name>.+)\.pdf')
if ($match.Success) {
"$($match.Groups['Name'].Value) - $($match.Groups['Date'].Value).pdf"
} else {
$_
}
}
Something like this -
Get-ChildItem -path $path | Rename-Item -NewName {$_.BaseName.Split(' - ')[-1] + ' - ' + $_.BaseName.SubString(0,7) + $_.Extension} -WhatIf
Explanation -
Split will segregate the name of the file based on the parameter - and [-1] tells PowerShell to select the last of the segregated values.
SubString(0,7) will select 7 characters starting from the first character of the BaseName of the file.
Remove -WhatIf to apply the rename.

How to match a string unless it contains something?

I have a PowerShell script that will get a list of all files within a folder and then (based on regex matches within a Switch statement) will move each file to a specified folder (depending on the regex match).
I'm having an issue with a particular list. A group of files (PDF files named after their part number) that begin with "40" get moved to a specified folder.
The regex itself for just THAT is easy enough for me, the problem I am having is that, IF the file contains _ol OR _ol_ then it cannot be a match.
For example, the file names below should all match:
401234567.pdf
401234567a.pdf
401234567_a.pdf
401234567a_something.pdf
Those below should NOT match:
401234567_ol.pdf
401234567_ol_something.pdf
Using a ^(?i)40\w+[^_ol].pdf$ regex is the closest it seems I can get.
It will negate the 401234567_ol.pdf as being a match; however, it accepts the 401234567_ol_something.pdf. Does anybody know how I can negate that as being a match as well?
You can use a negative look ahead in your regex. The following regex will match any string that doesn't contains _ol:
^((?!_ol).)*$
DEMO
Note that you need to use modifier m (multiline) for multiline string.
Use a negative look-ahead:
^(?i)(?!.*_ol)40\w+\.pdf$
See demo
The look-ahead (?!.*_ol) in the very beginning of the pattern check if later in the string we do not have _ol. If it is present, we have no match. Dot must be escaped to match a literal dot.
Simply use the -notmatch operator with a pattern that matches what you want to exclude:
Get-ChildItem 'C:\source' -Filter '*.pdf' |
? { $_.BaseName -notmatch '_ol(_|$)' } |
Move-Item -Destination 'C:\destination'
or the -notlike operator (for better performance):
Get-ChildItem 'C:\source' -Filter '*.pdf' |
? { $_.BaseName -notlike '*_ol' -and $_.BaseName -notlike '*_ol_*' } |
Move-Item -Destination 'C:\destination'

remove date from filename programmatically

I'm trying to find a solution to strip some dates out of filenames programmatically. My files have the following format:
net_20110909_servercleanup.pdf
or
net_servercleanup_20110909.pdf
I've used the solution posted below (found on Stack Overflow also) to update some of the filenames but I would ideally have one solution that could update all files in my directories. I'd like to strip the date and one of the underscores out so the final file looks like this:
net_servercleanup.pdf
I'd like to do this from a batch file or PowerShell. I've seen some solutions that accomplish something like this using RegEx but I don't know enough about them to create something that will work.
Any suggestions on how to accomplish this?
$filelist = (get-childitem c:\folder | Where-Object {$_.mode -match "a"} | foreach-object {$_.name})
foreach ($file in $filelist)
{
$len = $file.length
$newname = $file.substring(0,$len -13)
$newname = $newname + '.txt'
Rename-Item C:\folder\$file $newname
clear-variable newname, len
}
PowerShell, untested but should work:
$filelist = Get-ChildItem C:\folder | Where-Object {$_.Mode -match "a"} `
| Foreach-Object {$_.FullName}
foreach ($fullpath in $filelist)
{
$newpath = $fullpath -replace "_(19|20)[0-9]{6}"
Rename-Item -Path $fullpath -NewName $newpath -WhatIf
}
The _(19|20)[0-9]{6} regular expression matches the following pattern: leading "_" followed by "19" or "20" and then any six digits. If you have file names where date does not strictly match your example, you may need to modify the regex to catch them all.
The -WhatIf switch allows you to do a "dry run" i.e. test cmdlets like Remove-Item without actually performing any file operations. Remove it when everything looks OK and you are ready to proceed with actual renaming.
I don't know what that language(?) is, but in C++, I'd do it by separating it into pieces based on your separator (this case, an underscore). Basically, I'd get the substring from the start to the character before the first underscore, store it into a stream (stringstream to be exact), get substring from the character after the first underscore to the character before the second underscore, ... , and so on. and then from the stream, I'd get the pieces one by one and check if it is an integer, if it is an integer then I discard it, otherwise it is appended to a string, if the string is not empty then I append a separator (an underscrore) before adding the piece.
I could write the code in c++ but I'm not sure if that would help
If you know that your filenames will always be of the form you mentioned you can just remove the underscore and 8 digits. Try this:
get-childitem c:\folder | Where-Object {$_.mode -match "a"} | foreach-object {
rename-item $_.FullName ($_.FullName -replace '_\d{8}') -WhatIF
}
Remove the -whatif to actually perform the rename. the -replace parameter takes a regex that matches an underscore followed by 8 digits. Since you do not specify what to replace the match with, it is replaced with an empty string.
Note that this renames all of the files to the same filename causing Rename-Item to error if the file exists. If these are in nested subfolders and you want to iterate through them all you need to add a -Recursive parameter to get-childitem.
try this regex:
_\d{8}
and replace with empty. this matchs _20110909 in
net_20110909_servercleanup.pdf or net_servercleanup_20110909.pdf
and result is net_servercleanup.pdf.
As this is also tagged as batch,
This code uses a for /f command to remove the numbers and underscores from the filename, keeping the first and second remaining elements joined with an underscore and then renames the file.
#echo off
setlocal enableextensions disabledelayedexpansion
for /r "c:\some\folder" %%f in ("net_*.pdf"
) do for /f "tokens=1,2 delims=_0123456789" %%a in ("%%~nf"
) do echo ren "%%~ff" "%%a_%%b%%~xf"
For testing, ren command is prefixed with a echo command. If the output is correct, remove the echo
Of course, if more than a matching file is found inside a folder, as it is impossible to have two files with the same name inside the same folder, the rename operation will fail for second or later files inside the same folder.