Remove Substring with Regex Characters in Filename

Remove Substring with Regex Characters in Filename - regex

so i'm trying to organize and clean up on how I download music. One of these ways is using a pluging for converting videos into MP3, though this usually leaves a watermark in the filename, which I'd like to remove via a Powershell Script.
So I essentially would have this "artist - songname[watermark.com].mp3"
I've looked into it and try to just get the brackets removed due to them being regex and i've had this done so far:
$removeMe = "[watermark.com]"
$list = Get-ChildItem *.mp3 | -replace '[[\]]',''
That's what I have so far before I get lost, It removes the brackets like so
Artist - Songnamewatermark.com.mp3
So I tried the same with
-replace 'watermark.com' yet it brings back the brackets.
Artist - Songname[watermark.com].mp3
I'm kind of struggling here, RegEx is not my forte, any help would be appreciated.

You can use code:
-replace '\[[A-Za-z.]+\]',''

Related

Can't seem to get RegEx to match

I am trying to extract the Get-Help comment headers from a PowerShell script...using PowerShell. The file I'm reading looks something like this:
<#
.SYNOPSIS
Synopsis goes here.
It could span multiple lines.
Like this.
.DESCRIPTION
A description.
It could also span multiple lines.
.PARAMETER MyParam
Purpose of MyParam
.PARAMETER MySecondParam
Purpose of MySecondParam.
Notice that this section also starts with '.PARAMETER'.
This one should not be captured.
...and many many more lines like this...
#>
# Rest of the script...
I would like to get all the text below .DESCRIPTION, up to the first instance of .PARAMETER. So the desired output would be:
A description.
It could also span multiple lines.
Here's what I've tried:
$script = Get-Content -Path "C:\path\to\the\script.ps1" -Raw
$pattern = '\.DESCRIPTION(.*?)\.PARAMETER'
$description = $script | Select-String -Pattern $pattern
Write-Host $description
When I run that, $description is empty. If I change $pattern to .*, I get the entire contents of the file, as expected; So there must be something wrong with my RegEx pattern, but I can't seem to figure it out.
Any ideas?

(get-help get-date).description
The `Get-Date` cmdlet gets a DateTime object that represents the current date
or a date that you specify. It can format the date and time in several Windows
and UNIX formats. You can use `Get-Date` to generate a date or time character
string, and then send the string to other cmdlets or programs.
(get-help .\script.ps1).description

the Select-String cmdlet works on entire strings and you have given it ONE string. [grin]
so, instead of fighting with that, i went with the -match operator. the following presumes you have loaded the entire file into $InStuff as one multiline string with -Raw.
the (?ms) stuff is two regex flags - multiline & singleline.
$InStuff -match '(?ms)(DESCRIPTION.*?)\.PARAMETER'
$Matches.1
output ...
DESCRIPTION
A description.
It could also span multiple lines.
note that there is a blank line at the end. you likely will want to trim that away.

In the words of #Mathias R. Jessen:
Don't use regex to parse PowerShell code in PowerShell
Use the PowerShell parser instead!
So, let's use PowerShell to parse PowerShell:
$ScriptFile = "C:\path\to\the\script.ps1"
$ScriptAST = [System.Management.Automation.Language.Parser]::ParseFile($ScriptFile, [ref]$null, [ref]$null)
$ScriptAST.GetHelpContent().Description
We use the [System.Management.Automation.Language.Parser]::ParseFile() to parse our file and ouput an Abstract Syntax Tree (AST).
Once we have the Abstract Syntax Tree, we can then use the GetHelpContent() method (exactly what Get-Help uses) to get our parsed help content.
Since we are only interested in the Description portion, we can simply access it directly with .GetHelpContent().Description

how to delete one or more spaces at the end of a filename using windows powershell regex?

i have a directory containing a lot of files with missformated filenames. some of them does have "spaces" right at the end of the filename. others have some keywords meshed within the filename at the end of the filename string. for example "xxx xxx xxx somewordEng .txt"
im trying to get rid of them using this script, but it wont do yet. Spaces at the end of the Filename (Basename) still there and so is the "Eng" keyword that is somehow added to the word before:
dir | Rename-Item -NewName { $_.BaseName.replace("Eng$","").replace(" {2,}"," ").replace("\s$","") + $_.Extension }
.replace("Eng$","") is supposed to remove the "Eng" keyword if it appears at the END of the filename (basename), seems not working so far.
.replace(" {2,}"," ") is supposed to replace 2 or more following spaces with just ONE space within the filename, seems not working so far.
.replace("\s$","") is supposed to remove spaces at the end of the filename, does not work neither.
I searched for powershell regex examples, but it seems nothing worked so far for me. :( cant see the problem yet.

The issue you have here is that the string method .Replace() does not support regular expressions which is what you are trying to do here. You should be using the replace operator -replace instead. The differences between the two options are covered a little more in this answer
The following two examples show this differnce
PS C:\Users\mcameron> "Te.t".Replace(".","s")
Test
PS C:\Users\mcameron> "Te.t" -Replace ".","s"
ssss
In your case
$_.BaseName -replace "Eng$" -replace " {2,}"," " -replace "\s$"
We use the correct operator and you can still "chain" them like you see above. That would remove the trailing word "Eng" and any trailing single whitespace. As well as reduce a group of whitespace to a single space. Also if you are replacing with nothing then you can omit the second parameter.
However you can tighten those together a little if you wanted.
$_.BaseName -replace "(Eng|\s+)$" -replace "\s{2,}"," "

How to trim the file modification value from SVN log output with PowerShell

I have an SVN log being captured in PowerShell which I am then trying to modify and string off everything except the file URL. The problem I am having is getting a regex to remove everything before the file URL. My entry is matched as:
M /trunk/project/application/myFile.cs
There are two spaces at the beginning which originally I was trying to replace with a Regex but that did not seem to work, so I use a trim and end up with:
M /trunk/project/application/myFile.cs
Now I want to get rid of the File status indicator so I have a regular expression like:
$entry = $entry.Replace("^[ADMR]\s+","")
Where $entry is the matched file URL but this doesn't seem to do anything, even removing the caret to just look for the value and space did not do anything. I know that $entry is a string, I originally thought Replace was not working as $entry was not a string, but running Get-Member during the script shows I have a string type. Is there something special about the svn file indicator or is the regex somehow off?

Given your example string:
$entry = 'M /trunk/project/application/myFile.cs'
$fileURL = ($entry -split ' /')[1]

Your regex doesn't work because string.Replace just does a literal string replacement and doesn't know about regexes. You'd probably want [Regex]::Replace or just the -replace operator.
But when using SVN with PowerShell, I'd always go with the XML format. SVN allows a --xml option to all commands which then will output XML (albeit invalid if it dies in between).
E.g.:
$x = [xml](svn log -l 3 --verbose --xml)
$x.log.logentry|%{$_.paths}|%{$_.path}|%{$_.'#text'}
will give you all paths.
But if you need a regex:
$entry -replace '^.*?\s+'
which will remove everything up to (and including) the first sequence of spaces which has the added benefit that you don't need to remember what characters may appear there, too.

Batch rename screen shots on Mac OS X

Custom batch rename files
Hello, Mac OS X takes screen shot's in a very long format of filename. I would like to rename any of them that sit at path /Users/me/desktop.
Here are some examples of the filenames:
Screen Shot 2012-08-02 at 1.15.29 AM.png
Screen Shot 2012-08-02 at 1.22.12 AM.png
Screen Shot 2012-08-02 at 1.22.14 PM.png
Screen Shot 2012-08-02 at 1.22.16 PM.png
I was once told, not to do a for loop against an ls so I am trying globbing this time around. So far, this is all I can come up with, but done know how to karen wrap the expression and then get that to a file rename in the format I desire:
for i in *; do
screen_name=$(echo $i | grep --only-matching --extended-regexp '(Screen\ Shot)\ [0-9]+-[0-9]+-[0-9]+\ at\ [0-9]+\.[0-9]+.[0-9]+.[AP]M\.png');
echo $screen_name;
done
I am not sure about the hour of the time, it may be safest to assume possible 2 digits on all chunks of the time, so 1.14.29 and 01.15.29
ss.08-02-12-01.15.29-AM.png
ss.08-02-12-01.22.12-AM.png
ss.08-02-12-01.22.14-PM.png
ss.08-02-12-01.22.16-PM.png
The end goal, is a bash script that when run will rename ALL files at the above mentioned path to the new format listed.
Thank you for any help.

for i in "Screen Shot"*.png; do
new=`echo $i |awk '
{
split($3,a,"-")
split($5,b,".")
printf("ss.%s-%s-%s-%02d.%02d.%02d-%s",a[2],a[3],a[1],b[1],b[2],b[3],$6)
}
'`
mv "$i" $new
done
Before:
Screen Shot 2012-08-02 at 1.22.16 PM.png
Screen Shot 2012-09-02 at 13.42.06 PM.png
After:
ss.08-02-2012-01.22.16-PM.png
ss.09-02-2012-13.42.06-PM.png
EDIT:
as suggested by steve
printf("ss.%s-%s-%s-%02d.%02d.%02d-%s",a[2],a[3],substr(a[1]3,2),b[1],b[2],b[3],$6)
which yields
ss.08-02-12-01.22.16-PM.png
ss.09-02-12-13.42.06-PM.png

You can use stream editor sed to match and substitute using regular expressions. You would do something like this
echo $i | sed "s/PATTERN/REPLACE/"
to genereate the filename out of $i. sed will read from stdin, search (s command) for pattern and replace it with REPLACE.
In your REGEXP pattern you can mark seperate groups by surrounding them with brackets (), in most situations you will have to escape them by () and access these parts in the replace pattern by using #, where # is the number of the subgroup starting from 1. Here's a simple example:
echo "ScreenShotXYZ.png" | sed "s/ScreenShot\(.*\)\.png/\1.png/"
Here, the XYZ is matched by the expression in brackets and can be accessed using \1 in the replacment string. The whole pattern in thus replaced by XYZ.png.
So use your regexp for matching, put brackets around the relevant blocks and do something like
ss.\1.\2.(and so on)
for your replacement pattern. There's still some way to optimize the process by first using sed to replace dashes by dots, then grouping the whole time block in just one pattern but for a start it's easier to code like that.

PowerShell - Replace contents of every tag using REGEX

I am trying to write a PowerShell script to replace the contents of tags i have put into an XML file. The tags appear multiple times within the XML, this is resulting in everything between the first and last tag being replaced as it is not stopping the first time the end tag is found.
I am using this:
$NewFile = (Get-Content .\myXML_file.xml) -join "" | foreach{$_ -replace "<!--MyCustom-StartTag-->(.*)<!--MyCustom-EndTag-->","<!--MyCustom-StartTag-->New Contents of Tag<!--MyCustom-EndTag-->"};
Set-Content .\newXMLfile.xml $newfile;
The file has contents like:
<!--MyCustom-StartTag-->
Lots of content
<!--MyCustom-EndTag-->
More stuff here
<!--MyCustom-StartTag-->
Lots of content
<!--MyCustom-EndTag-->
And i am ending up with:
<!--MyCustom-StartTag-->
New Content Here
<!--MyCustom-EndTag-->
Instead of:
<!--MyCustom-StartTag-->
New content
<!--MyCustom-EndTag-->
More stuff here
<!--MyCustom-StartTag-->
New content
<!--MyCustom-EndTag-->
I have tried using: (?!MyCustom-StartTag) but that does work either.
Any ideas of what i should do to get this to work.
Thanks,
Richard

You should use the non-greedy version of *, namely *?. For more info, see: http://www.dijksterhuis.org/csharp-regular-expression-operator-cheat-sheet/ (Powershell uses same regex engine as C#).
$NewFile = (Get-Content .\myXML_file.xml) -join "" | foreach{$_ -replace "<!--MyCustom-StartTag-->(.*?)<!--MyCustom-EndTag-->","<!--MyCustom-StartTag-->New Contents of Tag<!--MyCustom-EndTag-->"};
Set-Content .\newXMLfile.xml $newfile;

I think the reason that you are left with just a single pair of start and end tags is because your query pattern finds three matches in the search string.
The first pair of start and end.
The second pair of start and end.
The start from the first one, and the end tag from the second one (and if this match is found last, it will in fact replace all thats between the first and last with the new value).
So in your "(.*)" you might have to exclude any other start and end tags?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove Substring with Regex Characters in Filename - regex

You can use code: -replace '\[[A-Za-z.]+\]',''

Related

Can't seem to get RegEx to match

how to delete one or more spaces at the end of a filename using windows powershell regex?

How to trim the file modification value from SVN log output with PowerShell

Batch rename screen shots on Mac OS X

PowerShell - Replace contents of every tag using REGEX

Categories

Resources