Remove Unused Functions in AutoIt Script with PowerShell - regex

Alrighty..
So I am editing an AutoIt script that has a lot of unused functions in it. The original author saw fit to add all the functions from his/her includes files.
At first I tried to use the tools within AutoIt/SciTe to remove unused functions however for some freakish reason this rendered the script/compiled file useless. So now I am thinking it would be best to write a function remover.
Here is what I have so far:
Search for lines with "Func _" count number of times that function appears in the file. If 1 time then Select String
$FileName=".\FILENAME.au3"
$File=Get-Content $FileName
$Funcs=$File|Select-String "Func _"
foreach ($Func in $Funcs) {
$FuncName=$Func.ToString().Split('( ')[1]
$Count=($File|Select-String $FuncName | Measure-Object).Count
if ($count -eq 1) {
$File|Select-String "Func _" $FuncName
}
}
What I would like to do is remove the function, likely with regex. So something like:
REMOVE "Func _"$func * "EndFunc"
The trouble has been that this is a search that spans multiple lines, from Func _NAMEOFFUCTION to EndFunc. Its unclear to me if regex in PowerShell can even do this. Not all regex implementations seem to be able to span a search across lines. Is regex even the answer? I don't know.

When you use Get-Content in PowerShell 1.0 or 2.0 you can only get back an array of strings - one for each line. This isn't going to work when you need a regex to span multiple lines. Use this approach to read the file as a single string:
$FileContent = [io.file]::ReadAllText($FileName)
If you are on PowerShell V3 you can use the -Raw parameter to read the file as a single string:
$FileContent = Get-Content $FileName -Raw
Then when you use Select-String you will need to modify the regex to enable singleline s (and probably multiline m) mode e.g.:
$FileContent | Select-String "(?smi)$FuncName" -AllMatches
Note the i is there to be case-insensitive. Use the -AllMatches parameter to match multiple function definitions within a file.

Here's a regex that should match an AutoIt function definition. It assumes the Func and EndFunc keywords are always placed at the beginning of a line and that they're case sensitive. The function name is captured in the group named FuncName (in C# you would access it via Groups["FuncName"]);
"(?m)^Func\s+\b(?<FuncName>_\w+\b).*\n(?:(?!EndFunc\b).*\n)*EndFunc.*\n?"
For the function names alone you can use "\b_\w+\b" or maybe "\b_[A-Za-z]+\b"; I don't know how strict you need to be). Having almost zero experience with PowerShell, I would probably use [regex]::Matches and [regex]::Replace to do the work. I don't know if PS offers a better way.
I'm assuming you've read the whole file into a string as #Keith suggested, not line by line as you were doing originally.

Related

Can't seem to get RegEx to match

I am trying to extract the Get-Help comment headers from a PowerShell script...using PowerShell. The file I'm reading looks something like this:
<#
.SYNOPSIS
Synopsis goes here.
It could span multiple lines.
Like this.
.DESCRIPTION
A description.
It could also span multiple lines.
.PARAMETER MyParam
Purpose of MyParam
.PARAMETER MySecondParam
Purpose of MySecondParam.
Notice that this section also starts with '.PARAMETER'.
This one should not be captured.
...and many many more lines like this...
#>
# Rest of the script...
I would like to get all the text below .DESCRIPTION, up to the first instance of .PARAMETER. So the desired output would be:
A description.
It could also span multiple lines.
Here's what I've tried:
$script = Get-Content -Path "C:\path\to\the\script.ps1" -Raw
$pattern = '\.DESCRIPTION(.*?)\.PARAMETER'
$description = $script | Select-String -Pattern $pattern
Write-Host $description
When I run that, $description is empty. If I change $pattern to .*, I get the entire contents of the file, as expected; So there must be something wrong with my RegEx pattern, but I can't seem to figure it out.
Any ideas?
(get-help get-date).description
The `Get-Date` cmdlet gets a DateTime object that represents the current date
or a date that you specify. It can format the date and time in several Windows
and UNIX formats. You can use `Get-Date` to generate a date or time character
string, and then send the string to other cmdlets or programs.
(get-help .\script.ps1).description
the Select-String cmdlet works on entire strings and you have given it ONE string. [grin]
so, instead of fighting with that, i went with the -match operator. the following presumes you have loaded the entire file into $InStuff as one multiline string with -Raw.
the (?ms) stuff is two regex flags - multiline & singleline.
$InStuff -match '(?ms)(DESCRIPTION.*?)\.PARAMETER'
$Matches.1
output ...
DESCRIPTION
A description.
It could also span multiple lines.
note that there is a blank line at the end. you likely will want to trim that away.
In the words of #Mathias R. Jessen:
Don't use regex to parse PowerShell code in PowerShell
Use the PowerShell parser instead!
So, let's use PowerShell to parse PowerShell:
$ScriptFile = "C:\path\to\the\script.ps1"
$ScriptAST = [System.Management.Automation.Language.Parser]::ParseFile($ScriptFile, [ref]$null, [ref]$null)
$ScriptAST.GetHelpContent().Description
We use the [System.Management.Automation.Language.Parser]::ParseFile() to parse our file and ouput an Abstract Syntax Tree (AST).
Once we have the Abstract Syntax Tree, we can then use the GetHelpContent() method (exactly what Get-Help uses) to get our parsed help content.
Since we are only interested in the Description portion, we can simply access it directly with .GetHelpContent().Description

Edit within multi-line sed match

I have a very large file, containing the following blocks of lines throughout:
start :234
modify 123 directory1/directory2/file.txt
delete directory3/file2.txt
modify 899 directory4/file3.txt
Each block starts with the pattern "start : #" and ends with a blank line. Within the block, every line starts with "modify # " or "delete ".
I need to modify the path in each line, specifically appending a directory to the front. I would just use a general regex to cover the entire file for "modify #" or "delete ", but due to the enormous amount of other data in that file, there will likely be other matches to this somewhat vague pattern. So I need to use multi-line matching to find the entire block, and then perform edits within that block. This will likely result in >10,000 modifications in a single pass, so I'm also trying to keep the execution down to less than 30 minutes.
My current attempt is a sed one-liner:
sed '/^start :[0-9]\+$/ { :a /^[modify|delete] .*$/ { N; ba }; s/modify [0-9]\+ /&Appended_DIR\//g; s/delete /&Appended_DIR\//g }' file_to_edit
Which is intended to find the "start" line, loop while the lines either start with a "modify" or a "delete," and then apply the sed replacements.
However, when I execute this command, no changes are made, and the output is the same as the original file.
Is there an issue with the command I have formed? Would this be easier/more efficient to do in perl? Any help would be greatly appreciated, and I will clarify where I can.
I think you would be better off with perl
Specifically because you can work 'per record' by setting $/ - if you're records are delimited by blank lines, setting it to \n\n.
Something like this:
#!/usr/bin/env perl
use strict;
use warnings;
local $/ = "\n\n";
while (<>) {
#multi-lines of text one at a time here.
if (m/^start :\d+/) {
s/(modify \d+)/$1 Appended_DIR\//g;
s/(delete) /$1 Appended_DIR\//g;
}
print;
}
Each iteration of the loop will pick out a blank line delimited chunk, check if it starts with a pattern, and if it does, apply some transforms.
It'll take data from STDIN via a pipe, or myscript.pl somefile.
Output is to STDOUT and you can redirect that in the normal way.
Your limiting factor on processing files in this way are typically:
Data transfer from disk
pattern complexity
The more complex a pattern, and especially if it has variable matching going on, the more backtracking the regex engine has to do, which can get expensive. Your transforms are simple, so packaging them doesn't make very much difference, and your limiting factor will be likely disk IO.
(If you want to do an in place edit, you can with this approach)
If - as noted - you can't rely on a record separator, then what you can use instead is perls range operator (other answers already do this, I'm just expanding it out a bit:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
if ( /^start :/ .. /^$/)
s/(modify \d+)/$1 Appended_DIR\//g;
s/(delete) /$1 Appended_DIR\//g;
}
print;
}
We don't change $/ any more, and so it remains on it's default of 'each line'. What we add though is a range operator that tests "am I currently within these two regular expressions" that's toggled true when you hit a "start" and false when you hit a blank line (assuming that's where you would want to stop?).
It applies the pattern transformation if this condition is true, and it ... ignores and carries on printing if it is not.
sed's pattern ranges are your friend here:
sed -r '/^start :[0-9]+$/,/^$/ s/^(delete |modify [0-9]+ )/&prepended_dir\//' filename
The core of this trick is /^start :[0-9]+$/,/^$/, which is to be read as a condition under which the s command that follows it is executed. The condition is true if sed currently finds itself in a range of lines of which the first matches the opening pattern ^start:[0-9]+$ and the last matches the closing pattern ^$ (an empty line). -r is for extended regex syntax (-E for old BSD seds), which makes the regex more pleasant to write.
I would also suggest using perl. Although I would try to keep it in one-liner form:
perl -i -pe 'if ( /^start :/ .. /^$/){s/(modify [0-9]+ )/$1Append_DIR\//;s/(delete )/$1Append_DIR\//; }' file_to_edit
Or you can use redirection of stdout:
perl -pe 'if ( /^start :/ .. /^$/){s/(modify [0-9]+ )/$1Append_DIR\//;s/(delete )/$1Append_DIR\//; }' file_to_edit > new_file
with gnu sed (with BRE syntax):
sed '/^start :[0-9][0-9]*$/{:a;n;/./{s/^\(modify [0-9][0-9]* \|delete \)/\1NewDir\//;ba}}' file.txt
The approach here is not to store the whole block and to proceed to the replacements. Here, when the start of the block is found the next line is loaded in pattern space, if the line is not empty, replacements are performed and the next line is loaded, etc. until the end of the block.
Note: gnu sed has the alternation feature | available, it may not be the case for some other sed versions.
a way with awk:
awk '/^start :[0-9]+$/,/^$/{if ($1=="modify"){$3="newdirMod/"$3;} else if ($1=="delete"){$2="newdirDel/"$2};}{print}' file.txt
This is very simple in Perl, and probably much faster than the sed equivalent
This one-line program inserts Appended_DIR/ after any occurrence of modify 999 or delete at the start of a line. It uses the range operator to restrict those changes to blocks of text starting with start :999 and ending with a line containing no printable characters
perl -pe"s<^(?:modify\s+\d+|delete)\s+\K><Appended_DIR/> if /^start\s+:\d+$/ .. not /\S/" file_to_edit
Good grief. sed is for simple substitutions on individual lines, that is all. Once you start using constructs other than s, g, and p (with -n) you are using the wrong tool. Just use awk:
awk '
/^start :[0-9]+$/ { inBlock=1 }
inBlock { sub(/^(modify [0-9]+|delete) /,"&Appended_DIR/") }
/^$/ { inBlock=0 }
{ print }
' file
start :234
modify 123 Appended_DIR/directory1/directory2/file.txt
delete Appended_DIR/directory3/file2.txt
modify 899 Appended_DIR/directory4/file3.txt
There's various ways you can do the above in awk but I wrote it in the above style for clarity over brevity since I assume you aren't familiar with awk but should have no trouble following that since it reuses your own sed scripts regexps and replacement text.

Powershell replace function has escape characters

I am writing a batch script in which I am trying to replace a value in a prop file. I am using PowerShell for the replacement code as I couldn't find any comparable way to do in batch script.
powershell -Command "(gc %PROPFILEPATH%) -replace '%FTPoldfilepath%', '%FTPnewfile%' | Set-Content %PROPFILEPATH%"
The variables %PROPFILEPATH%, %FTPoldfilepath% and %FTPnewfile% contain double backslashes (Eg: C:\\testing\\feed)
I realize that backslashes need to be escaped, can anyone guide me how to implement the escape function here.
Use double backslashes. Does not hurt if they come through doubled, or even tripled.
You will need to use $ENV:PROFILEPATH, $ENV:FTPoldfilepath, and $ENV:FTPnewpath in place of %PROPFILEPATH%, '%FTPoldfilepath%', and '%FTPnewfile%'
If your goal is to load the current path, replace the old path with the new one and save the new path, consider doing so with a full script instead of a single command:
$oldftppath = 'c:\some\path'
$newftppath = 'c:\new\path'
$newpath = $ENV:PROFILEPATH.replace($oldftppath,$newftppath)
But then it gets tricky. If you need a persisent environment variable, you need to use .NET framework to set it. https://technet.microsoft.com/en-us/library/ff730964.aspx
[Environment]::SetEnvironmentVariable("TestVariable", "Test value.", "User")
So, using this syntax:
[Environment]::SetEnvironmentVariable("PROFILEPATH", "$newpath", "User")
Or it could be "machine" for the context.
For one thing, as #Xalorous mentioned, you'll have to use PowerShell syntax for accessing environment variables:
powershell -Command "(gc $env:PROPFILEPATH) -replace $env:FTPoldfilepath, $env:FTPnewfile | Set-Content $env:PROPFILEPATH"
Also, only the search string needs to be escaped, not the replacement string. You can use the Escape() method of the regex class for that:
powershell -Command "(gc $env:PROPFILEPATH) -replace [regex]::Escape($env:FTPoldfilepath), $env:FTPnewfile | Set-Content $env:PROPFILEPATH"
Escaping is required here, because the -replace operator treats the search string as a regular expression.
However, since you apparently want just a simple string replacement, not a regular expression match, you could also use the Replace() method of the source string:
powershell -Command "(gc $env:PROPFILEPATH) | % { $_.Replace($env:FTPoldfilepath, $env:FTPnewfile) } | Set-Content $env:PROPFILEPATH"
As a side note, since you're using PowerShell anyway, you should seriously consider writing the whole script in PowerShell. It usually makes things a lot easier.

PowerShell Select-String regular expression to locate two strings on the same line

How do I use Select-String cmdlet to search a text file for a string which starts with a specific string, then contains random text and has another specific string towards the end of the line? I'm only interested in matches across a single line in the text file, not across the entire file.
For example I am searching to match both 'Set-QADUser' and 'WhatIf' on the same line in the file. And my example file contains the following line:
Set-QADUser -Identity $($c.ObjectGUID) -ObjectAttributes #{extensionattribute7=$ekdvalue} -WhatIf | Out-Null
How do I use Select-String along with a Regular Expression to locate the pattern in question? I tried using the following and it does work but it also matches other instances of either 'Set-QADUser' or 'WhatIf' found elsewhere in the text file and I only want to match instances when both search strings are found on the same line.
Select-String -path "test.ps1" -Pattern "Set-QADUser.*WhatIf" | Select Matches,LineNumber
To make this more complicated I actually want to perform this search from within the script file that is being searched. Effectively this is used to warn the user that the script being run is currently set to 'WhatIf' mode for testing. But of course the regEx matches the text from the actual Select-String cmd within the script when it's run - so it finds multiple matches and I can't figure out a very good way to overcome that issue. So far this is what I've got:
#Warn user about 'WhatIf' if detected
$line=Select-String -path $myinvocation.mycommand.name -Pattern "Set-QADUser.*WhatIf" | Select Matches,LineNumber
If ($line.Count -gt 1)
{
Write-Host "******* Warning ******"
Write-Host "Script is currently in 'WhatIf' mode; to make changes please remove '-WhatIf' parameter at line no. $($line[1].LineNumber)"
}
I'm sure there must be a better way to do this. Hope somebody can help.
Thanks
If you use the -Quiet switch on Select-String it will just return a boolean True/False, depending on whether it found a match or not.
-Quiet <SwitchParameter>
Returns a Boolean value (true or false), instead of a MatchInfo object. The value is "true" if the pattern is found; otherwise, the value is "false".
Required? false
Position? named
Default value Returns matches
Accept pipeline input? false
Accept wildcard characters? false

Powershell v2. Delete or Remove lines of text. Regex Issue

I've been through so many posts today that offer Powershell examples of how to remove entire lines from a file by using line numbers. Unfortunately none of them do quite what I need, or they have some 'but' type clauses in them.
The closest example I found uses the -remove method. This managed to do the job, until I noticed that not all lines that I was trying to remove, were removed. After some more research I found that -remove is reliant on Regex and Regex does not like certain special characters which happen to be in some of the lines I wish to delete/remove.
The example I'm using is not my own work. user2233949 made it (cheers buddy) I found it very simple though, so I made it into a Function:
Function DeleteLineFromFile ($Path, $File, $LineNumber){
$Contents = Get-Content $Path\$File
$Contents -replace $Contents[$LineNumber -1],"" | Set-Content $Path\$File
}
A great example I reckon. Only regex won't work with the special chars.
The goal is this: To make a function that doesn't care about what is in the line. I wan't to feed it a path, file and line, then have the function delete that line and save the file.
This is fairly easy and does not need regex at all.
Read the file:
$lines = Get-Content $Path\$File
We then have an array that contains the lines in the file. When we have an array we can use indexes to get elements from the array back, e.g.
$lines[4]
would be the fifth line. You can also pass an array into the index to get multiple lines back:
$lines[0,1,5] # returns the first, second and 6th line
$lines[0..5] # returns the first 6 lines
We can make use of that with a little trick. PowerShell's comparison operators, e.g. -eq work differently with an array on the left side, in that they don't return $true or $false, but rather all elements from the array matching the comparison:
1..5 -ge 3 # returns 3,4,5
0..8 -ne 7 # returns 0 through 8, *except* 7
You probably can see where this is going ...
$filteredLines = $lines[0..($lines.Length-1) -ne $LineNumber - 1]
Technically you can ignore the - 1 after $lines.Length because indexing outside of an array simply does nothing. This does actually remove the line you want to remove, though. If you just want it replaced by an empty line (which your code seems to be doing, but it doesn't sound like that's what you want), then this approach won't work.
There are other options, though, e.g. with a ForEach-Object:
Get-Content $Path\$File |
ForEach-Object { $n = 1 } {
if ($n -ne $LineNumber) { $_ } else { '' }
}
A word of advice on writing functions: Usually you don't have separate $Path and $File parameters. They serve no real useful purpose. Every cmdlet uses only a $Path parameter that points to a file if needed. If you need the folder that file resides in, you can always use Split-Path -Parent $Path to get it.