Finding directories with a certain regex - regex

I think the regex is right for powershell, but I think my logic is wrong.
What I want to do is get a list of all the directories that start with 4-to-6 digits. What I get so far is the child items in directories that start with 4-6 digits:
get-childitem -path \\server\share -recurse |
where { ($_.psiscontainer) -and ($_.name -match "^\d{4,6}") }
Can I somehow pipe into a write for 'current object' rather than child?

So in the end I just piped into | ft, making the total command:
get-childitem -path \\path\path -recurse | where {($_.psiscontainer)
-and ($_.name -match "^\d{4,6}" )} | Select-Object Name, FullName | ft

Related

Convert date format in file names to yyyy-mm-dd (ISO Format) - Currently in multiple other formats, including yyyy.mm.dd, mm.dd.yyyy, and mm-dd-yyyy

I have a file system that has files at various folder levels with the date in the filename in the wrong date format.
I'm looking to use powershell to go through and update them to the correct date format (ISO Date).
This is what I have figured out so far (for switching yyyy.mm.dd to yyyy-mm-dd), but it's not quite right:
Get-ChildItem -File -Recurse | % { Rename-Item -Path $_.PSPath -NewName $_.Name.replace("201[0-9]\.[0-9][0-9]\.[ -~].","201[0-9]-[0-9][0-9]-[ -~].")}
The following script is successful in changing dots to dashes, but it changes all dots to dashes, and I want to be careful not to change any dots after the 10th character, and only on files that match the formatting mentioned (e.g. 201[0-9].[0-9][0-9].[0-9][0-9]).
Get-ChildItem -Path $_.PSPath -Filter "*.pdf" | Rename-Item -NewName { $_.BaseName.Replace(".","-") + $_.Extension }
I know I'm getting close with that first one, but I'm not quite there. Does anyone have any suggestions for changes to make to it?
Thank you for the help.
Another way to do it with a regex. When you are confident that the files will be renamed correctly, remove the -WhatIf from the Rename-Item cmdlet.
Get-ChildItem -Path . -File |
ForEach-Object {
if ($_.Name -match '(.*)(\d{4})\.(\d{2})\.(\d{2})(.*)') {
Rename-Item -Path $_.FullName -NewName $($_.Name -replace '(.*)(\d{4})\.(\d{2})\.(\d{2})(.*)','$1$2-$3-$4$5') -Whatif
}
}
When working with dates I suggest parsing them into datetime objects so you can take advantage of the built in features.
Since you are working with a number of formats where the days and months are easily transposed, I suggest individual finely tuned filters. You could also use a switch and match on basename. Edit: I rewrote using a switch instead.
You can also use -WhatIf to confirm that the rename-item works properly.
Get-ChildItem -path $path -recurse -include *.pdf | Where-Object BaseName -match ".*[\d\.\-]{10}$" | Foreach-Object {
$null = $PSItem.BaseName -match "(.*)([\d\.\-]{10})$"
Switch -regex ($matches[2]){
"\d{4}\-\d{2}\-\d{2}" {$date = [datetime]::ParseExact($matches[2],"yyyy-dd-MM",$null)}
"\d{2}\.\d{2}\.\d{4}" {$date = [datetime]::ParseExact($matches[2],"MM.dd.yyyy",$null)}
"\d{4}\.\d{2}\.\d{2}" {$date = [datetime]::ParseExact($matches[2],"yyyy.dd.MM",$null)}
}
$PSItem | rename-Item -NewName "$(Matches[1]).($date.ToString("yyyy-MM-dd")$($PSItem.Extenstion))"
}

match multi-line string

I am using a PowerShell command to find all *.vue files (it's a simple text format) in a directory, where I need to match this:
7,Id
6,Default
So, these are 2 consecutive lines. With Notepad++ I see CRLF at the end of the line. Following Google searches, this must be close:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse |
Select-String -Pattern "7,Id\r\n6,Default" -CaseSensitive |
Out-File C:\test.txt
But it does not find the files. I checked that I can find the first part (7,Id) correctly, and also the second part (6,Default), but the combination with the newline is not working.
Any ideas please? Maybe an alternative?
I can have a workaround but it's inefficient and a lot of coding. For example, I could use PowerShell to provide a list of only the first sentence, then process these files to see if it matches the second sentence as well. I want to avoid that.
You need to pass the content of the file as a single string, otherwise Select-String will apply the pattern to each line separately.
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse | ForEach-Object {
Get-Content $_.FullName | Out-String |
Select-String -Pattern "7,Id\r\n6,Default" -CaseSensitive |
Select-Object -Expand Matches |
Select-Object -Expand Groups |
Select-Object -Expand Value
} | Out-File C:\test.txt
On PowerShell v3 and newer you can use Get-Content -Raw instead of Get-Content | Out-String.
As an alternative to Select-String you could use the -cmatch operator in a Where-Object filter:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse | ForEach-Object {
Get-Content $_.FullName | Out-String | Where-Object {
$_ -cmatch "7,Id\r\n6,Default"
} | ForEach-Object {
$matches[0]
}
} | Out-File C:\test.txt
With Select-String, the -Pattern parameter is regex capable, so try this:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse |
Select-String -Pattern "7,Id|6,Default" -CaseSensitive |
Out-File C:\test.txt
The vertical pipe bar (|) acts as an alternative separator, or in otherwords, an "or" operator. With the pattern it will match either.

Replace all but last instance of a character

I am writing a quick PowerShell script to replace all periods except the last instance.
EG:
hello. this. is a file.name.doc → hello this is a filename.doc
So far from another post I was able to get this regexp, but it does not work with PowerShell:
\.(?=[^.]*\.)
As per https://www.regex101.com/, it only matches the first occurrence of a period.
EDIT:
Basically I need to apply this match and replace to a directory with sub directories. So far I have this:
Get-ChildItem -Filter "*.*" | ForEach {
$_.BaseName.Replace('.','') + $_.Extension
}
But it does not actually replace the items, and I do not think it is recursive.
I tried a few variations:
Get-Item -Filter "*.*" -Recurse |
Rename-Item -NewName {$_.BaseName.Replace(".","")}
but I get the error message
source and destination path must be different
I had the PowerShell side of things working but was stuck on the RegEx part. I was able to match either all the "." or only the last "." which was part of the file extension. Then I found this post with the missing link: \.(?=[^.]*\.)
I added that to the rest of the PowerShell command and it worked perfectly.
Get-ChildItem -Recurse | Rename-Item -NewName {$_.Name -replace '\.(?=[^.]*\.)',' ' }
Exclude files that don't have a period in their basename from being renamed:
Get-ChildItem -File -Recurse | Where-Object { $_.BaseName -like '*.*' } |
Rename-Item -NewName {$_.BaseName.Replace('.', '') + $_.Extension}

List All Text Files in a Directory with more than 100 Lines using PowerShell

Hello I'm PowerShell beginner. I'm looking for a script which finds and list all text files in a directory (*.TXT) with more than 100 lines.
This code show max and min number of lines in a directory but doesn't list files with min or max number of lines.
dir . -filter "*.txt" -Recurse -name | foreach{(GC $_).Count} | measure-object -sum -max -min
You were very close. You had the correct commandlet (Measure-Object); you simply needed to use the -Line parameter, and then use Where-Object. Note the use of Select-Object with -ExpandProperty. That is what turns a collection of TextMeasureInfo objects into a collection of integers. This worked for me:
Get-ChildItem . -Filter "*.txt" -Recurse |
Where-Object {
(Get-Content $_.FullName |
Measure-Object -Line |
Select-Object -ExpandProperty Lines) -gt 100
}
If you want to fit it on one line and use aliases, this is the equivalent:
dir . -filter "*.txt" -Recurse | ? {(gc $_.FullName | Measure -Line | Select -Expand Lines) -gt 100 }
And you also asked about finding files that have a minimum and maximum number of lines. I wouldn't recommend writing that in one line, since it becomes unreadable. To do it, you need an intermediate variable inside the Where-Object ScriptBlock:
$minLines = 10
$maxLines = 200
Get-ChildItem . -Filter "*.txt" -Recurse |
Where-Object {
$numLines = Get-Content $_.FullName |
Measure-Object -Line |
Select-Object -ExpandProperty Lines
if (($numLines -gt $minLines) -and ($numLines -lt $maxLines)) {
return $_
}
}

How to find all regular expression matches in the file

I have a list of regular expressions(about 2000) and over a million html files. I want to check if each regular expression success on every file or not. How to do this on powershell?
Performance is important, so I don't want to loop through regular expressions.
I try
$text | Select-String -Pattern pattern1, pattern2,...
And it returns all matches, but I also want to find out, which pattern success which one not. I need to build a list of success regular expressions for each file
You could try something like this:
$regex = "^test","e2$" #Or use (Get-Content <path to your regex file>)
$ht = #{}
#Modify Get-Childitem to your criterias(filter, path, recurse etc.)
Get-ChildItem -Filter *.txt | Select-String -Pattern $regex | ForEach-Object {
$ht[$_.Path] += #($_ | Select-Object -ExpandProperty Pattern)
}
Test-output:
$ht | Format-Table -AutoSize
Name Value
---- -----
C:\Users\graimer\Desktop\New Text Document (2).txt {e2$}
C:\Users\graimer\Desktop\New Text Document.txt {^test, e2$}
You didn't specify how you wanted the output.
UPDATE: To match multiple patterns on a single line, try this(mjolinor's answer is probably faster then this).
$regex = "^test","e2$" #Or use (Get-Content <path to your regex file>)
$ht = #{}
#Modify Get-Childitem to your criterias(filter, path, recurse etc.)
$regex | ForEach-Object {
$pattern = $_
Get-ChildItem -Filter *.txt | Select-String -Pattern $pattern | ForEach-Object {
$ht[$_.Path] += #($_ | Select-Object -ExpandProperty Pattern)
}
}
UPDATE2: I don't have enough samples to try it, but since you have such a huge amount of files, you migh want to try reading the file into memory before looping through the patterns. It may be faster.
$regex = "^test","e2$" #Or use (Get-Content <path to your regex file>)
$ht = #{}
#Modify Get-Childitem to your criterias(filter, path, recurse etc.)
Get-ChildItem -Filter *.txt | ForEach-Object {
$text = $_ | Get-Content
$filename = $_.FullName
$regex | ForEach-Object {
$text | Select-String -Pattern $_ | ForEach-Object {
$ht[$filename] += #($_ | Select-Object -ExpandProperty Pattern)
}
}
}
I don't see any way around doing a foreach through the regex collection.
This is the best I could come up with performance-wise:
$regexes = 'pattern1','pattern2'
$files = get-childitem -Path <file path> |
select -ExpandProperty fullname
$ht = #{}
foreach ($file in $files)
{
$ht[$file] = New-Object collections.arraylist
foreach ($regex in $regexes)
{
if (select-string $regex $file -Quiet)
{
[void]$ht[$file].add($regex)
}
}
}
$ht
You could speed up the process by using background jobs and dividing up the file collection among the jobs.