Replacing filenames by ID - compare

For a website I'm trying to format about 5000 filenames using VBA/Powershell and/or CSV files:
These files currently have filenames like: blabla_45359_blabla.jpg and 45359--blabla.jpg where as 45359 is the ID of the item in our webshop.
I'd like to format these filenames where as only the ID will be the filename... so: 45359.jpg
I tried making a list within Excel showing all current filenames and tried formatting them by automatically filtering out the ID-part but without any success.
Any tips?
EDIT: Example
It really depends on the file name, for example I have a file named: Futterklappe_PrimoLux_60_schwarz_93412_f.jpg
Goal is to automatically rename this file to: 93412.jpg
I have a list of all ID's as well however the files should be renamed to one of these ID's if it matches that certain ID in the file name). There are also some exceptions to the above as some filenames excists of multiple ID's so I would have to make copies of these files and rename them for each ID as well.

A regex-based User Defined Function or UDF seems the best worksheet approach.
Option Explicit
Function fixFilename(str As String, ndx As Integer) As String
Dim tmp As String, ext As String
Static cmat As Object, regex As Object
If regex Is Nothing Then
Set regex = CreateObject("VBScript.RegExp")
With regex
.Global = True
.MultiLine = False
.IgnoreCase = True
End With
Else
Set cmat = Nothing
End If
With regex
.Pattern = "\d{5}"
If regex.Test(str) Then
Set cmat = .Execute(str)
If ndx <= cmat.Count Then _
tmp = cmat.Item(ndx - 1)
End If
End With
ext = Mid(str, InStrRev(str, Chr(46)))
If CBool(Len(tmp)) Then _
fixFilename = tmp & ext
End Function
In the following sample image, B5 has been filled right to C5 to collect the second ID.

In PowerShell, I would just use a regex with a capturing group:
Get-ChildItem -Path $Path -File |
ForEach-Object {
if ($_.Name -match '(?<ID>\d{5})') {
$NewName = $Matches.ID + $_.Extension
$_ | Rename-Item -NewName $NewName
}
}

Related

Powershell regex to get DC from Distinguished Name

I'm trying to get the first DC that appears for a given Distinguished name as below:
CN=blah1,CN=Computers,DC=blah2,DC=blah3
So in plain English I wish to "replace all strings, up to 'DC=', and return any value from DC= and up to the next ,
I've tried working it out using online calculators but somehow doesn't work.
Have a look at this:
$str = "CN=blah1,CN=Computers,DC=blah2,DC=blah3"
$str -match '^.*?(DC=.*?),'
$Matches[1] # DC=blah2
It finds the first DC=* where * is whatever follows the = until the next comma.
Well to show the second easy method to grep certain parts of your string. Try to use - split:
$string = "CN=blah1,CN=Computers,DC=blah2,DC=blah3"
#Seperate the string at ",DC=" and get the second part
($string -split ',DC=')[1]
Returns: blah2
I think the best (by "best" I mean the "most robust") answer is not to do string parsing or a regex in the first place and use the IADSPathname interface to retrieve the part of the name you're interested in. This is available (albeit initially somewhat complex) in PowerShell via the Pathname COM object. The Pathname object handles all escaping of characters automatically.
Example:
$ADS_SETTYPE_DN = 4
$ADS_DISPLAY_VALUE_ONLY = 2
$Pathname = New-Object -ComObject Pathname
$distinguishedName = "CN=blah1,CN=Computers,DC=blah2,DC=blah3"
# Set the AD path in the object
$Pathname.GetType().InvokeMember("Set", "InvokeMethod", $null, $Pathname, #($distinguishedName,$ADS_SETTYPE_DN))
# Get the number of name elements
$numElements = $Pathname.GetType().InvokeMember("GetNumElements", "InvokeMethod", $null, $Pathname, $null)
# Retrieve the second-to-last name element (outputs "DC=blah2")
$Pathname.GetType().InvokeMember("GetElement", "InvokeMethod", $null, $Pathname, $numElements - 2)
# Set the display type to values only
$Pathname.GetType().InvokeMember("SetDisplayType", "InvokeMethod", $null, $Pathname, $ADS_DISPLAY_VALUE_ONLY)
# Retrieve the second-to-last name element (outputs "blah2")
$Pathname.GetType().InvokeMember("GetElement", "InvokeMethod", $null, $Pathname, $numElements - 2)
Admittedly, the Pathname COM object is not easy to use in PowerShell because you have to call it indirectly. This can be alleviated somewhat by using a "wrapper" function to invoke the object's methods. Example:
$ADS_SETTYPE_DN = 4
$ADS_DISPLAY_VALUE_ONLY = 2
$Pathname = New-Object -ComObject Pathname
function Invoke-Method {
param(
[__ComObject] $object,
[String] $method,
$parameters
)
$output = $object.GetType().InvokeMember($method, "InvokeMethod", $null, $object, $parameters)
if ( $output ) { $output }
}
$distinguishedName = "CN=blah1,CN=Computers,DC=blah2,DC=blah3"
# Set the AD path in the object
Invoke-Method $Pathname "Set" #($distinguishedName,$ADS_SETTYPE_DN)
# Get the number of name elements
$numElements = Invoke-Method $Pathname "GetNumElements"
# Retrieve the second-to-last name element (outputs "DC=blah2")
Invoke-Method $Pathname "GetElement" ($numElements - 2)
# Set the display type to values only
Invoke-Method $Pathname "SetDisplayType" $ADS_DISPLAY_VALUE_ONLY
# Retrieve the second-to-last name element (outputs "blah2")
Invoke-Method $Pathname "GetElement" ($numElements - 2)
For a complete solution, I wrote a PowerShell module called ADName that provides an easy-to-use interface for both the Pathname and the NameTranslate objects.
In the ADName module, the Get-ADName cmdlet is a wrapper for the Pathname object, and the Convert-ADName cmdlet is a wrapper for the NameTranslate object. Example:
# Get elements of name as an array
$nameElements = Get-ADName "CN=blah1,CN=Computers,DC=blah2,DC=blah3" -Split
# Output second-to-last element (e.g., "DC=blah2")
$nameElements[-2]
# Get name elements (values only)
$nameElements = Get-ADName "CN=blah1,CN=Computers,DC=blah2,DC=blah3" -Split -ValuesOnly
# Output second-to-last element (e.g., "blah2")
$nameElements[-2]
I've found that the Get-ADName and Convert-ADName cmdlets are extremely useful in a variety of scenarios. One example:
$name = "CN=blah1,CN=Computers,DC=blah2,DC=blah3"
# Output canonical name of parent path; e.g.: "blah2.blah3/Computers"
$name | Get-ADName -Format Parent | Convert-ADName Canonical
Split is enough for this, i.e:
$s = "CN=blah1,CN=Computers,DC=blah2,DC=blah3"
$s.Split(",")[2].Split("=")[1]
# blah2
Powershell Demo

match regex and replace bug with special charakters

I've built a script to read all Active Directory Group Memberships and save them to a file.
Problem is, the Get-ADPrincipalGroupMembership cmdlet outputs all groups like this:
CN=Group_Name,OU=Example Mail,OU=Example Management, DC=domain,DC=de
So I need to do a bit of a regex and/or replacement magic here to replace the whole line with just the first string beginning from "CN=" to the first ",".
The result would be like this:
Group_Name
So, there is one AD group that's not gonna be replaced. I already got an idea why tho, but I don't know how to work around this. In our AD there is a group with a special character, something like this:
CN=AD_Group_Name+up,OU=Example Mail,OU=Example Management, DC=domain,DC=de
So, because of the little "+" sign, the whole line doesn't even get touched.
Does anyone know why this is happening?
Import-Module ActiveDirectory
# Get Username
Write-Host "Please enter the Username you want to export the AD-Groups from."
$UserName = Read-Host "Username"
# Set Working-Dir and Output-File Block:
$WorkingDir = "C:\Users\USER\Desktop"
Write-Host "Working directory is set to " + $WorkingDir
$OutputFile = $WorkingDir + "\" + $UserName + ".txt"
# Save Results to File
Get-ADPrincipalGroupMembership $UserName |
select -Property distinguishedName |
Out-File $OutputFile -Encoding UTF8
# RegEx-Block to find every AD-Group in Raw Output File and delete all
# unnaccessary information:
[regex]$RegEx_mark_whole_Line = "^.*"
# The ^ matches the start of a line (in Ruby) and .* will match zero or more
# characters other than a newline
[regex]$RegEx_mark_ADGroup_Name = "(?<=CN=).*?(?=,)"
# This regex matches everything behind the first "CN=" in line and stops at
# the first "," in the line. Then it should jump to the next line.
# Replace-Block (line by line): Replace whole line with just the AD group
# name (distinguishedName) of this line.
foreach ($line in Get-Content $OutputFile) {
if ($line -like "CN=*") {
$separator = "CN=",","
$option = [System.StringSplitOptions]::RemoveEmptyEntries
$ADGroup = $line.Split($separator, $option)
(Get-Content $OutputFile) -replace $line, $ADGroup[0] |
Set-Content $OutputFile -Encoding UTF8
}
}
Your group name contains a character (+) that has a special meaning in a regular expression (one or more times the preceding expression). To disable special characters escape the search string in your replace operation:
... -replace [regex]::Escape($line), $ADGroup[0]
However, I fail to see what you need that replacement for in the first place. Basically you're replacing a line in the output file with a substring from that line that you already extracted before. Just write that substring to the output file and you're done.
$separator = 'CN=', ','
$option = [StringSplitOptions]::RemoveEmptyEntries
(Get-Content $OutputFile) | ForEach-Object {
$_.Split($separator, $option)[0]
} | Set-Content $OutputFile
Better yet, use the Get-ADObject cmdlet to expand the names of the group members:
Get-ADPrincipalGroupMembership $UserName |
Get-ADObject |
Select-Object -Expand Name
First off, depending on what you're doing here this might or might not be a good idea. The CN is /not/ immutable so if you're storing it somewhere as a key you're likely to run into problems down the road. The objectGUID property of the group is a good primary key, though.
As far as getting this value, I think you can simplify this a lot. The name property that the cmdlet outputs will always have your desired value:
Get-ADPrincipalGroupMembership <username> | select name
Ansgar's answer is much better in terms of using the regex, but I believe that in this case you could do a dirty workaround with the IndexOf function. In your if-statement you could do the following:
if ($line -like "CN=*") {
$ADGroup = $line.Substring(3, $line.IndexOf(',')-3)
}
The reason this works here is that you know the output will begin with CN=YourGroupName meaning that you know that the string you want begins at the 4th character. Secondly, you know that the group name will not contain any comma, meaning that the IndexOf(',') will always find the end of that string so you don't need to worry about the nth occurrence of a string in a string.

Append character after each instance of a regex match in PowerShell

I was wondering if it was possible to append a character (to be used as a delimiter later on) to each instance of a regex match in a string.
I'm parsing text for a string between < >, and have a working regex pattern -- though this collapses each instance of the match.
What I would like to do is append each instance of a match with a , so I can call the .split(',') method later on and have a collection of string I can loop through.
$testString = "<blah#gmail.com><blah1#gmail.com>"
$testpattern = [regex]::Match($testString, '(?<=<)(.*)(?=>)').Value
$testPattern will now be "blah#gmail.combblah1#gmail.com"
What I would like to is to add a delimiter between each instance of the match, to call the .split() method to work with a collection after the fact.
$testpattern is blah#gmail.com><blah1#gmail.com
You should use <(.*)><(.*)> to keep both email address and then concatenate both strings: $testpattern = $testpattern[0] + "your string you want inbetween" + $testpattern[1]
Not sure about 0 and 1, depends on the language.
Another point, be carefull, if there are some spaces or invalid characters for email, it'll still capture them. you should use something like <([a-zA-Z0-9\-#\._]*\#[a-zA-Z0-9-]*\.[a-z-A-Z]*)><([a-zA-Z0-9\-#\._]*\#[a-zA-Z0-9-]*\.[a-z-A-Z]*)>
I know this isn't the only way to handle the problem above, and definitely not the most efficient -- but I ended up doing the following.
So to restate the question, I need to parse email headers (to line), for all the smtp addresses (value between '<' and '>'), and store all the addresses in a collection after the fact.
$EMLToCol = #()
$parseMe = $CDOMessage.to
# select just '<emailAddress>'
$parsed = Select-String -Pattern '(<.*?>)+' -InputObject $parseMe -AllMatches | ForEach-Object { $_.matches }
# remove this guy '<', and this guy '>'
$parsed = $parsed.Value | ForEach-Object {$_ -replace '<' -replace '>'}
# add to EMLToCol array
$parsed | ForEach-Object {$EMLToCol += $_}

Powershell Find String Between Characters and Replace

In Powershell script, I have Hashtable contains personal information. The hashtable looks like
{first = "James", last = "Brown", phone = "12345"...}
Using this hashtable, I would like to replace strings in template text file. For each string matches #key# format, I want to replace this string to value that correspond to key in hashtable. Here is a sample input and output:
input.txt
My first name is #first# and last name is #last#.
Call me at #phone#
output.txt
My first name is James and last name is Brown.
Call me at 12345
Could you advise me how to return "key" string between "#"s so I can find their value for the string replacement function? Any other ideas for this problem is welcomed.
You could do this with pure regex, but for the sake of readability, I like doing this as more code than regex:
$tmpl = 'My first name is #first# and last name is #last#.
Call me at #phone#'
$h = #{
first = "James"
last = "Brown"
phone = "12345"
}
$new = $tmpl
foreach ($key in $h.Keys) {
$escKey = [Regex]::Escape($key)
$new = $new -replace "#$escKey#", $h[$key]
}
$new
Explanation
$tmpl contains the template string.
$h is the hashtable.
$new will contain the replaced string.
We enumerate through each of the keys in the hash.
We store a regex escaped version of the key in $escKey.
We replace $escKey surrounded by # characters with the hashtable lookup for the particular key.
One of the nice things about doing this is that you can change your hashtable and your template, and never have to update the regex. It will also gracefully handle the cases where a key has no corresponding replacable section in the template (and vice-versa).
You can create a template using an expandable (double-quoted) here-string:
$Template = #"
My first name is $($hash.first) and last name is $($hash.last).
Call me at $($hash.phone)
"#
$hash = #{first = "James"; last = "Brown"; phone = "12345"}
$Template
My first name is James and last name is Brown.
Call me at 12345

Powershell: Replacing regex named groups with variables

Say I have a regular expression like the following, but I loaded it from a file into a variable $regex, and so have no idea at design time what its contents are, but at runtime I can discover that it includes the "version1", "version2", "version3" and "version4" named groups:
"Version (?<version1>\d),(?<version2>\d),(?<version3>\d),(?<version4>\d)"
...and I have these variables:
$version1 = "3"
$version2 = "2"
$version3 = "1"
$version4 = "0"
...and I come across the following string in a file:
Version 7,7,0,0
...which is stored in a variable $input, so that ($input -match $regex) evaluates to $true.
How can I replace the named groups from $regex in the string $input with the values of $version1, $version2, $version3, $version4 if I do not know the order in which they appear in $regex (I only know that $regex includes these named groups)?
I can't find any references describing the syntax for replacing a named group with the value of a variable by using the group name as an index to the match - is this even supported?
EDIT:
To clarify - the goal is to replace templated version strings in any kind of text file where the version string in a given file requires replacement of a variable number of version fields (could be 2, 3, or all 4 fields). For example, the text in a file could look like any of these (but is not restricted to these):
#define SOME_MACRO(4, 1, 0, 0)
Version "1.2.3.4"
SomeStruct vs = { 99,99,99,99 }
Users can specify a file set and a regular expression to match the line containing the fields, with the original idea being that the individual fields would be captured by named groups. The utility has the individual version field values that should be substituted in the file, but has to preserve the original format of the line that will contain the substitutions, and substitute only the requested fields.
EDIT-2:
I think I can get the result I need with substring calculations based on the position and extent of each of the matches, but was hoping Powershell's replace operation was going to save me some work.
EDIT-3:
So, as Ansgar correctly and succinctly describes below, there isn't a way (using only the original input string, a regular expression about which you only know the named groups, and the resulting matches) to use the "-replace" operation (or other regex operations) to perform substitutions of the captures of the named groups, while leaving the rest of the original string intact. For this problem, if anybody's curious, I ended up using the solution below. YMMV, other solutions possible. Many thanks to Ansgar for his feedback and options provided.
In the following code block:
$input is a line of text on which substitution is to be performed
$regex is a regular expression (of type [string]) read from a file that has been verified to contain at least one of the supported named groups
$regexToGroupName is a hash table that maps a regex string to an array of group names ordered according to the order of the array returned by [regex]::GetGroupNames(), which matches the left-to-right order in which they appear in the expression
$groupNameToVersionNumber is a hash table that maps a group name to a version number.
Constraints on the named groups within $regex are only (I think) that the expression within the named groups cannot be nested, and should match at most once within the input string.
# This will give us the index and extent of each substring
# that we will be replacing (the parts that we will not keep)
$matchResults = ([regex]$regex).match($input)
# This will hold substrings from $input that were not captured
# by any of the supported named groups, as well as the replacement
# version strings, properly ordered, but will omit substrings captured
# by the named groups
$lineParts = #()
$startingIndex = 0
foreach ($groupName in $regexToGroupName.$regex)
{
# Excise the substring leading up to the match for this group...
$lineParts = $lineParts + $input.Substring($startingIndex, $matchResults.groups[$groupName].Index - $startingIndex)
# Instead of the matched substring, we'll use the substitution
$lineParts = $lineParts + $groupNameToVersionNumber.$groupName
# Set the starting index of the next substring that we will keep...
$startingIndex = $matchResults.groups[$groupName].Index + $matchResults.groups[$groupName].Length
}
# Keep the end of the original string (if there's anything left)
$lineParts = $lineParts + $input.Substring($startingIndex, $input.Length - $startingIndex)
$newLine = ""
foreach ($part in $lineParts)
{
$newLine = $newLine + $part
}
$input= $newLine
Simple Solution
In the scenario where you simply want to replace a version number found somewhere in your $input text, you could simply do this:
$input -replace '(Version\s+)\d+,\d+,\d+,\d+',"`$1$Version1,$Version2,$Version3,$Version4"
Using Named Captures in PowerShell
Regarding your question about named captures, that can be done by using curly brackets. i.e.
'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}. '
Gives:
I have a pet dog. I have a pet cat. cher
Issue with multiple captures & solution
You can't replace multiple values in the same replace statement, since the replacement string is used for everything. i.e. if you did this:
'dogcatcher' -replace '(?<pet>dog|cat)|(?<singer>cher)','I have a pet ${pet}. I like ${singer}''s songs. '
You'd get:
I have a pet dog. I like 's songs. I have a pet cat. I like 's songs. I have a pet . I like cher's songs.
...which is probably not what you're hoping for.
Rather, you'd have to do a match per item:
'dogcatcher' -replace '(?<pet>dog|cat)','I have a pet ${pet}. ' -replace '(?<singer>cher)', 'I like ${singer}''s songs. '
...to get:
I have a pet dog. I have a pet cat. I like cher's songs.
More Complex Solution
Bringing this back to your scenario, you're not actually using the captured values; rather you're hoping to replace the spaces they were in with new values. For that, you'd simply want this:
$input = 'I''m running Programmer''s Notepad version 2.4.2.1440, and am a big fan. I also have Chrome v 56.0.2924.87 (64-bit).'
$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7
$v1Pattern = '(?<=\bv(?:ersion)?\s+)\d+(?=\.\d+\.\d+\.\d+)'
$v2Pattern = '(?<=\bv(?:ersion)?\s+\d+\.)\d+(?=\.\d+\.\d+)'
$v3Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.)\d+(?=\.\d+)'
$v4Pattern = '(?<=\bv(?:ersion)?\s+\d+\.\d+\.\d+\.)\d+'
$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4
Which would give:
I'm running Programmer's Notepad version 1.3.5.7, and am a big fan. I also have Chrome v 1.3.5.7 (64-bit).
NB: The above could be written as a 1 liner, but I've broken it down to make it simpler to read.
This takes advantage of regex lookarounds; a way of checking the content before and after the string you're capturing, without including those in the match. i.e. so when we select what to replace we can say "match the number that appears after the word version" without saying "replace the word version".
More info on those here: http://www.regular-expressions.info/lookaround.html
Your Example
Adapting the above to work for your example (i.e. where versions may be separated by commas or dots, and there's no consistency to their format beyond being 4 sets of numbers:
$input = #'
#define SOME_MACRO(4, 1, 0, 0)
Version "1.2.3.4"
SomeStruct vs = { 99,99,99,99 }
'#
$version1 = 1
$version2 = 3
$version3 = 5
$version4 = 7
$v1Pattern = '(?<=\b)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v2Pattern = '(?<=\b\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\s*[\.,]\s*\d+\b)'
$v3Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+(?=\s*[\.,]\s*\d+\b)'
$v4Pattern = '(?<=\b\d+\s*[\.,]\s*\d+\s*[\.,]\s*\d+\s*[\.,]\s*)\d+\b'
$input -replace $v1Pattern, $version1 -replace $v2Pattern, $version2 -replace $v3Pattern,$version3 -replace $v4Pattern,$version4
Gives:
#define SOME_MACRO(1, 3, 5, 7)
Version "1.3.5.7"
SomeStruct vs = { 1,3,5,7 }
Regular expressions don't work that way, so you can't. Not directly, that is. What you can do (short of using a more appropriate regular expression that groups the parts you want to keep) is to extract the version string and then in a second step replace that substring with the new version string:
$oldver = $input -replace $regexp, '$1,$2,$3,$4'
$newver = $input -replace $oldver, "$Version1,$Version2,$Version3,$Version4"
Edit:
If you don't even know the structure, you must extract that from the regular expression as well.
$version = #($version1, $version2, $version3, $version4)
$input -match $regexp
$oldver = $regexp
$newver = $regexp
for ($i = 1; $i -le 4; $i++) {
$oldver = $oldver -replace "\(\?<version$i>\\d\)", $matches["version$i"]
$newver = $newver -replace "\(\?<version$i>\\d\)", $version[$i-1]
}
$input -replace $oldver, $newver