I have a text file with thousands of lines, containing both directory paths and file paths.
I would like to loop through each line of the text file and remove any lines containing a directory path, and keep all lines containing a file path. An example of two lines (one directory, and one path from the text file):
exampleDirectoryPath/tags/10.0.0.8/tools/
exampleFilePath/tags/10.0.0.8/tools/hello.txt
So far, to loop through the text file, I have:
foreach ($line in [System.IO.File]::ReadLines("file.txt")) {
if ($line -match ".*/.*$") {
$line
}
}
Goal output:
exampleFilePath/tags/10.0.0.8/tools/hello.txt
Note: I do not want to hardcode file extensions. There are thousands of files to traverse and I dont know what extensions are present, so I would like to return all of them.
So, the basic logic here is easy:
Get-Content "file.txt" | where { $_ is a file path... }
It kind of depends on how you want to determine, if it's a file path
If all of your directory paths end in "/", you could simply do:
where { -not $_.EndsWith("/") }
or:
where { [system.io.Path]::GetFileName($_) -eq "" }
If not, but all of your file paths definitely have an extension, you could do:
where { [system.io.Path]::GetExtension($_) -ne "" }
If all of the paths actually exist, you could also do this:
where { Test-Path $_ -Type Leaf }
To provide a concise solution that also performs well:
(Get-Content -ReadCount 0 file.txt) -notmatch '\\$'
Using -ReadCount 0 with Get-Content is a performance optimization that returns all lines in the input file as a single array object rather than collecting the lines one by one.
Additionally, -ReadCount 0 ensures that an array is output even if the input file happens to have just one line.
-notmatch, the negated form of the regex-based -match operator, acts as a filter with an array-valued LHS, returning the (non)matching elements (lines) (as a new array).
Regex \\$ matches a verbatim \ at the end ($) of each input string (line).
Note: As your question suggests, the solution above assumes that directories can be distinguished from files formally, based on whether the lines in the input file end in / or not.
I personally would not use regex for this for the simple reason that, even though you may be able to validate if the path's pattern matches the pattern of a file or folder, it cannot validate if it actually exists. I would use this following your code:
$result = foreach($line in [System.IO.File]::ReadLines("file.txt"))
{
if(([System.IO.DirectoryInfo]$line).Attributes -eq 'Archive')
{
$line
}
}
I am trying to extract the Get-Help comment headers from a PowerShell script...using PowerShell. The file I'm reading looks something like this:
<#
.SYNOPSIS
Synopsis goes here.
It could span multiple lines.
Like this.
.DESCRIPTION
A description.
It could also span multiple lines.
.PARAMETER MyParam
Purpose of MyParam
.PARAMETER MySecondParam
Purpose of MySecondParam.
Notice that this section also starts with '.PARAMETER'.
This one should not be captured.
...and many many more lines like this...
#>
# Rest of the script...
I would like to get all the text below .DESCRIPTION, up to the first instance of .PARAMETER. So the desired output would be:
A description.
It could also span multiple lines.
Here's what I've tried:
$script = Get-Content -Path "C:\path\to\the\script.ps1" -Raw
$pattern = '\.DESCRIPTION(.*?)\.PARAMETER'
$description = $script | Select-String -Pattern $pattern
Write-Host $description
When I run that, $description is empty. If I change $pattern to .*, I get the entire contents of the file, as expected; So there must be something wrong with my RegEx pattern, but I can't seem to figure it out.
Any ideas?
(get-help get-date).description
The `Get-Date` cmdlet gets a DateTime object that represents the current date
or a date that you specify. It can format the date and time in several Windows
and UNIX formats. You can use `Get-Date` to generate a date or time character
string, and then send the string to other cmdlets or programs.
(get-help .\script.ps1).description
the Select-String cmdlet works on entire strings and you have given it ONE string. [grin]
so, instead of fighting with that, i went with the -match operator. the following presumes you have loaded the entire file into $InStuff as one multiline string with -Raw.
the (?ms) stuff is two regex flags - multiline & singleline.
$InStuff -match '(?ms)(DESCRIPTION.*?)\.PARAMETER'
$Matches.1
output ...
DESCRIPTION
A description.
It could also span multiple lines.
note that there is a blank line at the end. you likely will want to trim that away.
In the words of #Mathias R. Jessen:
Don't use regex to parse PowerShell code in PowerShell
Use the PowerShell parser instead!
So, let's use PowerShell to parse PowerShell:
$ScriptFile = "C:\path\to\the\script.ps1"
$ScriptAST = [System.Management.Automation.Language.Parser]::ParseFile($ScriptFile, [ref]$null, [ref]$null)
$ScriptAST.GetHelpContent().Description
We use the [System.Management.Automation.Language.Parser]::ParseFile() to parse our file and ouput an Abstract Syntax Tree (AST).
Once we have the Abstract Syntax Tree, we can then use the GetHelpContent() method (exactly what Get-Help uses) to get our parsed help content.
Since we are only interested in the Description portion, we can simply access it directly with .GetHelpContent().Description
I have text files that contain 2 numbers separated by a '+' sign. Trying to figure out how to replace them with currency equivalent .
Example Strings:
20+2 would be converted to $0.20+$0.02 USD
1379+121 would be> $13.79+$1.21 USD
400+20 would be $4.00+$0.20 USD
and so on.
I have tried using a few angles but they do not work or provide odd results.
I tried to do it here by attempting to find by all patterns I think would come up .
.\Replace-FileString.ps1 "100+10" '$1.00+$0.10' $path1\*.txt -Overwrite
.\Replace-FileString.ps1 "1000+100" '$10.00+$1.00' $path1\*.txt -Overwrite
.\Replace-FileString.ps1 "300+30" '$3.00+$0.30' $path1\*.txt -Overwrite
.\Replace-FileString.ps1 "400+20" '$4.00+$0.20' $path1\*.txt -Overwrite
or this which just doesn't work.
Select-String -Path .\*txt -Pattern '[0-9][0-9]?[0-9]?[0-9]?[0-9]?\+[0-9][0-9]?[0-9]?[0-9]?[0-9]?' | ForEach-Object {$_ -replace ", ", $"} {$_ -replace "+", "+$"}
I tried to do it here by attempting to find by all patterns I think would come up
Don't try this - we're humans, and we won't think of all edge cases and even if we did, the amount of code we needed to write (or generate) would be ridiculous.
We need a more general solution here, and regex might indeed be helpful with this.
The pattern you describe could be expressed as three distinct parts:
1 or more consecutive digits
1 plus sign (+)
1 or more consecutive digits
With this in mind, let's start to simplifying the regex pattern to use:
\b\d+\+\d+\b
or, written out with explanations:
\b # a word boundary
\d+ # 1 or more digits
\+ # 1 literal plus sign
\d+ # 1 or more digits
\b # a word boundary
Now, in order to transform an absolute value of cents into dollars, we'll need to capture the digits on either side of the +, so let's add capture groups:
\b(\d+)\+(\d+)\b
Now, in order to do anything interesting with the captured groups, we can utilize the Regex.Replace() method - it can take a scriptblock as its substitution argument:
$InputString = '1000+10'
$RegexPattern = '\b(\d+)\+(\d+)\b'
$Substitution = {
param($Match)
$Results = foreach($Amount in $Match.Groups[1,2].Value){
$Dollars = [Math]::Floor(($Amount / 100))
$Cents = $Amount % 100
'${0:0}.{1:00}' -f $Dollars,$Cents
}
return $Results -join '+'
}
In the scriptblock above, we expect the two capture groups ($Match.Groups[1,2]), calculate the amount of dollars and cents, and then finally use the -f string format operator to make sure that the cents value is always two digits wide.
To do the substitution, invoke the Replace() method:
[regex]::Replace($InputString,$RegexPattern,$Substitution)
And there you go!
Applying to to a bunch of files is as easy as:
$RegexPattern = '\b(\d+)\+(\d+)\b'
$Substitution = {
param($Match)
$Results = foreach($Amount in $Match.Groups[1,2].Value){
$Dollars = [Math]::Floor(($Amount / 100))
$Cents = $Amount % 100
'${0:0}.{1:00}' -f $Dollars,$Cents
}
return $Results -join '+'
}
foreach($file in Get-ChildItem $path *.txt){
$Lines = Get-Content $file.FullName
$Lines |ForEach-Object {
[regex]::Replace($_, $RegexPattern, $Substitution)
} |Set-Content $file.FullName
}
this regular expression work too
\b\d{3,4}(?=\+)|\d{2,3}(?=\")
https://regex101.com/
Do you want something like this output?
$20+$2 would be converted to $0.20+$0.02 USD
$1379+$121 would be> $13.79+$1.21 USD
$400+$20 would be $4.00+$0.20 USD
Then, you may try this command in powershell.
(gc test.txt) -replace '\b(\d+)\+(\d+)\b','$$$1+$$$2' | sc test.txt
gc , sc : alias for get-content, set-content commands respectively
\b(\d+)\+(\d+)\b : match the target string (numbers+numbers) and capturing numbers to $1, $2 in order
$$ : $ must be escaped to indicate literal $ dollor character (what you want to place in front of numbers)
$1, $2 : back-reference to the captured value
test.txt : contains your sample text
Of course, this is applicable for multiple files like follows
gci '*.txt' -recurse | foreach-object{(gc $_ ) '\b(\d+)\+(\d+)\b','$$$1+$$$2' | sc $_ }
gci : alias for get-childitem command. In default, it returns list in the present directory. If you want to change the directory, then must use -path option and -include option.
-recurse option : enables to search sub-directory
Edited
If you want capturing & dividing values & replacing old value with new one like follows
$0.2+$0.02 would be converted to $0.20+$0.02 USD
$13.79+$1.21 would be> $13.79+$1.21 USD
$4+$0.2 would be $4.00+$0.20 USD
then, you may try this.
gci *.txt -recurse | % {(gc $_) | % { $_ -match "\b(\d+)\+(\d+)\b" > $null; $num1=[int]$matches[1]/100; $num2=[int]$matches[2]/100; $dol='$$'; $_ -replace "\b(\d+)\+(\d+)\b","$dol$num1+$dol$num2"}|sc $_}
This command search files in the present directory and sub-directory. If you don't want to search in sub-directory, then remove -recurse option. And if you want another path, then use -path option and -include option like follows.
gci -path "your_path" -include *.txt | % {(gc $_) ...
Other solutions seem excessively complicated, first turning the string to values and then back to strings. Looking at the examples, it is just chopping up a string and re-assembling it while ensuring that the different parts (dollars and cents) have the correct lengths:
('20+2','1379+121','400+20') -replace
'(\d+)\+(\d+)','00$1+00$2' -replace
'0*(\d+)(\d\d)\+0*(\d+)(\d\d)','$$$1.$2+$$$3.$4 USD'
$0.20+$0.02 USD
$13.79+$1.21 USD
$4.00+$0.20 USD
Explanation:
Substitute all the + separated cent values with 0 padded values so there is a minimum of three digits, i.e. at least one digit in the dollars and exactly 2 for the cents.
Collect the individual dollars and cents for each value into distinct capture groups while simultaneously discarding any extraneous leading zeroes.
Re-substitute the (just padded) strings with the appropriately formatted versions.
It is interesting to note how the second substitution relies on the greedy nature of *. The 0* will match just as many leading zeroes as will still leave enough for the remainder of the pattern.
You can put in the word boundary anchor (\b), at one or both ends of the patterns, if you have parts of a line where there are digits separated by + which are directly adjacent to other text and you want them to be NOT processed, otherwise it is unnecessary.
Note: the example above shows an array of String as input and producing an array of String (each element displayed on a separate line). When -Replace is applied to an array, it enumerates the array, applies the replace to each element and collects each (possibly replaced) element into a result array. The output of Get-Content is an array of String (enumerated by PowerShell when supplying a pipeline). Similarly, the 'input' to Set-Content is an array of String (possibly converted from a general Object[] and/or collected from pipeline input). Thus, to convert a file just use:
(gc somefile) -replace ... -replace ... | sc newfile
# or even
sc newfile ((gc somefile) -replace ... -replace ...)
# Set-Content [-Path] String[] [-Value] Object[]
In the above, newfile and somefile can be the same due to a nice feature of Set-Content whereby it does not even open/create its output file(s) until it has something to write. Thus,
#() | sc existingfile
does not destroy existingfile. Note, however, that
sc existingfile #()
does destroy existingfile. This is because the first example sends nothing to Set-Content while the second example gives Set-Content something (an empty array). Since the output from Get-Content is collected into an (anonymous) array before -Replace is applied, there is no conflict between Get-Content and Set-Content over accessing the same file. The functionally equivalent version
gc somefile | foreach { $_ -replace ... -replace ... } | sc newfile
does not work if newfile is somefile since Set-Content receives each (possibly substituted) line from Get-Content before the next one is read meaning Set-Content can't open the file because Get-Content still has it open.
This is a separate answer because it doesn't explain how to achieve the desired result (already did that) but explains why the listed attempts do not work (an educational motive).
If you're using Replace-FileString.ps1 from GitHub then not only are the examples not a general solution, it won't work as listed above because Replace-FileString.ps1 uses the Replace method of a [regex] object so "400+20" matches "40" then 1 or more "0" then "20". Similarly for other attempts. Note, no "+" is matched in the patterns so all fail (unless you have lines like "40020+125" which matches on the 40020). Just as well, the replacement includes the capture group specifier "$0" (as part of '$1.00+$0.10') and other specifiers. There are no capture groups specified in the pattern so all the group specifiers would be taken literally, except "$0" being the entire match (if found). Thus, "40020+125" would be replaced by substituting '$4.00+$0.20' giving "$4.00+40020.20" ($4='$4' and $0='40020'). Probably, no matches are found. Result -> files not changed. (Phew!)
As for the Select-String attempt, Select-String would probably have matched the required data since the pattern matched up to 5 digits on either side of a +. This would send the matching lines (and ignored the rest, if any) into the ForEach-Object as [Microsoft.PowerShell.Commands.MatchInfo] objects (not strings). (Aside: this is a common mistake by a lot of PowerShell, um, novices. They assume that what they see on the screen is the same as what is churning about inside PowerShell. This is far from the truth and probably leads to most of the confusion amongst new users. PowerShell processes entire objects and typically displays only a summary of the most useful bits.) Anyway, I am unsure what the ForEach-Object is trying to achieve, not least due to the apparent typo. There is at least one " missing in the first script block and possibly a comma also. The best I can interpret it is
{ $_ -replace ", ",", $" }
i.e. change every ", " into ", $". This assumes that the strings to be substituted are all preceded by ", ". Note: lone $ is not an error because it cannot be interpreted as a variable substitution (no following name or {) or capture reference (no following group specifier [0-9`+'_&]). The next script block is clearer, change every "+" into "+$". Unfortunately, again, the first string is interpreted as a regular expression and, unlike the lone $, a lone + here is an error. It needs to be escaped with \. However, even with these errors corrected, there are two big problems:
The default output from Select-String is a collection of [MatchInfo] objects which when (implicitly) converted to String for use as the LHS of -replace include the file name and line number, thereby corrupting the lines from the file. To use just the line itself, specify $_.Line.
A completely incorrect usage of the scriptblock parameters to ForEach-Object. While it would seem that the intent was to perform two replace operations, placing them in individual scriptblocks is an error. Even if it worked, it would output 2 separate partial replacements instead of one completed replacement since $_ is not updated between the two expressions. ($_ is writable!)
ForEach-Object has 3 basic scriptblock groups, 1 -Begin block, 1 -End block and all the rest collectively as the -Process blocks. (The -Parallel block is not relevant here.) The documentation mentions a group called -RemainingScripts but this is actually just an implementation construct to allow the -Process scriptblocks to be specified as individual parameters rather than collected into an array (similar to parameter arrays in C# and VB). I suspect this was done so that users could simply drop the parameter names (-Begin, -Process and -End) and treat the scriptblocks as if they were positional parameters even though, strictly speaking, only -Process is positional and expects an array of scriptblocks (i.e. separated by commas). The introduction of -RemainingScripts in PS3.0 (with attribute ValueFromRemainingArguments so it behaves like a parameter array) was probably done to tidy up what might have been a nasty kludge to get the user friendly behaviour prior to PS3.0. Or maybe it was just formalising what was already going on.
Anyway, back on topic. By specifying multiple scriptblocks, the first is treated as -Begin and, if there are more than 2, the last is treated as -End. Thus, for two scriptblocks, the first is -Begin and the other is -Process. Therefore, even if the first scriptblock were syntactically correct, it would only run once and then still do nothing since $_ is not assigned (=$null) in -Begin. The correct way would be to place both replacements, joined into a single expression, in one scriptblock:
{ $_.Line -replace ", ",", $" -replace "\+","+$" }
Of course, this is just describing how to get it to "work". It is not the correct solution to the problem in the original post (see other answer).
I am new to scripting, and Powershell. I have been doing some study lately and trying to build a script to find/replace text in a bunch of text files (Each text file having code, not more than 4000 lines). However, I would like to keep the FindString and ReplaceString as variables, for there are multiple values, which can in turn be read from a separate csv file.
I have come up with this code, which is functional, but I would like to know if this is the optimal solution for the aforementioned requirement. I would like to keep the FindString and ReplaceString as regular expression compatible in the script, as I would also like to Find/Replace patterns. (I am yet to test it with Regular Expression Pattern)
Sample contents of Input.csv: (Number of objects in csv may vary from 50 to 500)
FindString ReplaceString
AA1A 171PIT9931A
BB1B 171PIT9931B
CC1C 171PIT9931E
DD1D 171PIT9932A
EE1E 171PIT9932B
FF1F 171PIT9932E
GG1G 171PIT9933A
The Code
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Host "Processing Find Replace on : $Graphic"
foreach($item in $FindReplaceList){
Get-Content $Graphic | ForEach-Object { $_ -replace "$($item.FindString)", "$($item.ReplaceString)" } | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
$Iteration = $Iteration +1
Write-Host "String Replace Completed for $($item.ReplaceString)"
}
}
I have gone through other posts here in Stackoverflow, and gathered valuable inputs, based on which the code was built. This post from Ivo Bosticky came pretty close to my requirement, but I had to perform the same on a nested foreach loop with Find/Replace Strings as Variables reading from an external source.
To summarize,
I would like to know if the above code can be optimized for
execution, since I feel it takes a long time to execute. (I prefer
not using aliases for now, as I am just starting out, and am fine
with a long and functional script rather than a concise one which is
hard to understand)
I would like to add the number of Iterations being carried out in
the loop. I was able to add the current Iteration number onto the
console, but couldn't figure how to pipe the output of
Measure-Command onto a variable, which could be used in Write-Host
Command. I would also like to display the time taken for code
execution, on completion.
Thanks for the time taken to read this Query. Much appreciate your support!
First of all, unless your replacement string is going to contain newlines (which would change the line boundaries), I would advise getting and setting each $Graphic file's contents only once, and doing all replacements in a single pass. This will also result in fewer file renames and deletions.
Second, it would be (probably marginally) faster to pass $item.FindString and $item.ReplaceString directly to the -replace operator rather than invoking the templating engine to inject the values into string literals.
Third, unless you truly need the output to go directly to the console instead of going to the normal output stream, I would avoid Write-Host. See Write-Host Considered Harmful.
And fourth, you might actually want to remove the Write-Host that gets called for every find and replace, as it may have a fair bit of effect on the overall execution time, depending on how many replacements there are.
You'd end up with something like this:
$timeTaken = (measure-command {
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Output "Processing Find Replace on : $Graphic"
Get-Content $Graphic | ForEach-Object {
foreach($item in $FindReplaceList){
$_ = $_ -replace $item.FindString, $item.ReplaceString
}
$Iteration += 1
$_
} | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
}
}).TotalMilliseconds
I haven't tested it but it should run a fair bit faster, plus it will save the elapsed time to a variable.
Alrighty..
So I am editing an AutoIt script that has a lot of unused functions in it. The original author saw fit to add all the functions from his/her includes files.
At first I tried to use the tools within AutoIt/SciTe to remove unused functions however for some freakish reason this rendered the script/compiled file useless. So now I am thinking it would be best to write a function remover.
Here is what I have so far:
Search for lines with "Func _" count number of times that function appears in the file. If 1 time then Select String
$FileName=".\FILENAME.au3"
$File=Get-Content $FileName
$Funcs=$File|Select-String "Func _"
foreach ($Func in $Funcs) {
$FuncName=$Func.ToString().Split('( ')[1]
$Count=($File|Select-String $FuncName | Measure-Object).Count
if ($count -eq 1) {
$File|Select-String "Func _" $FuncName
}
}
What I would like to do is remove the function, likely with regex. So something like:
REMOVE "Func _"$func * "EndFunc"
The trouble has been that this is a search that spans multiple lines, from Func _NAMEOFFUCTION to EndFunc. Its unclear to me if regex in PowerShell can even do this. Not all regex implementations seem to be able to span a search across lines. Is regex even the answer? I don't know.
When you use Get-Content in PowerShell 1.0 or 2.0 you can only get back an array of strings - one for each line. This isn't going to work when you need a regex to span multiple lines. Use this approach to read the file as a single string:
$FileContent = [io.file]::ReadAllText($FileName)
If you are on PowerShell V3 you can use the -Raw parameter to read the file as a single string:
$FileContent = Get-Content $FileName -Raw
Then when you use Select-String you will need to modify the regex to enable singleline s (and probably multiline m) mode e.g.:
$FileContent | Select-String "(?smi)$FuncName" -AllMatches
Note the i is there to be case-insensitive. Use the -AllMatches parameter to match multiple function definitions within a file.
Here's a regex that should match an AutoIt function definition. It assumes the Func and EndFunc keywords are always placed at the beginning of a line and that they're case sensitive. The function name is captured in the group named FuncName (in C# you would access it via Groups["FuncName"]);
"(?m)^Func\s+\b(?<FuncName>_\w+\b).*\n(?:(?!EndFunc\b).*\n)*EndFunc.*\n?"
For the function names alone you can use "\b_\w+\b" or maybe "\b_[A-Za-z]+\b"; I don't know how strict you need to be). Having almost zero experience with PowerShell, I would probably use [regex]::Matches and [regex]::Replace to do the work. I don't know if PS offers a better way.
I'm assuming you've read the whole file into a string as #Keith suggested, not line by line as you were doing originally.