Extracting string with regex retrieves 'True 111499' - regex

I have a function, which should retrieve the revision number in a text file. The text file looks like this:
componentname1:123456
componentname2:234567
The second part is a number, which represents the subversion revision number. The powershell script reads this file line by line and every line is then processed with this function:
function getRevision($line) {
$line -match ":(?<revision>[0-9]*)"
$result = $Matches['revision']
Write-Host "Found component revision '$($result)'"
return $result
}
When the function getRevision is processed, then it gives the output
Found component revision '111499'
This function is called by another function like this:
$rev = getRevision($Line)
# ....
someOtherFunction -Revision "$($rev)"
In this someOtherFunction, I now get this output:
Handle component with revision 'True 111499'
Edit
I add the basic code of the function someOtherFunction:
function someOtherFunction {
param(
[Parameter(Mandatory=$True)]
[String]$Revision
)
Write-Host "Handle component with revision '$($Revision)'"
}
Now, the question occurs: Why is there this True stuff? Where is my mistake?

If that is how your source data is formed i.e colon delimited key value pairs. I think ConvertFrom-StringData removes all of the "complexity" from this issue. It however wants "key=value" so we need to make a small change and, in the end, you get the whole file as a hashtable.
If that sample data was in a file called c:\temp\file.txt
$revisions = (Get-Content -Raw C:\temp\file.txt) -replace ":","=" | ConvertFrom-StringData
$revisions['componentname1']
If there was a risk of bad data you could do some basic filtering by only working with lines that have colons:
$data = (Get-Content C:\temp\file.txt | Where-Object{$_ -match ":"}) -replace ":","=" | Out-String | ConvertFrom-StringData
$data['componentname1']

The keyword True is from -match operator in first line of your function
$line -match ":(?<revision>[0-9]*)"
If -match is executed inside the function, its output (binary value) is added automatically to the output of your function and then $result is being added as a second element. Therefore if you check the value of $rev you'll see:
PowerShell> $rev
True
123456
If you want to get rid of the first element you just need to pipe it to Out-Null:
$line -match ":(?<revision>[0-9]*)" | Out-Null
Edit: as pointed out by Matt, if $line doesn't match with the pattern, it may return unexpected results (last successful match). To prevent this you could use:
if ($line -match ":(?<revision>[0-9]*)") {
$result = $Matches['revision']
Write-Host "Found component revision '$($result)'"
return $result}
else {return $false}
In this case Out-Null is not needed as -match inside if condition is not being added to the output.

Related

How can i replace all lines in a file with a pattern using Powershell?

I have a file with lines that i wish to remove like the following:
key="Id" value=123"
key="FirstName" value=Name1"
key="LastName" value=Name2"
<!--key="FirstName" value=Name3"
key="LastName" value=Name4"-->
key="Address" value=Address1"
<!--key="Address" value=Address2"
key="FirstName" value=Name1"
key="LastName" value=Name2"-->
key="ReferenceNo" value=765
have tried the following: `
$values = #('key="FirstName"','key="Lastname"', 'add key="Address"');
$regexValues = [string]::Join('|',$values)
$lineprod = Get-Content "D:\test\testfile.txt" | Select-String $regexValues|Select-Object -
ExpandProperty Line
if ($null -ne $lineprod)
{
foreach ($value in $lineprod)
{
$prod = $value.Trim()
$contentProd | ForEach-Object {$_ -replace $prod,""} |Set-Content "D:\test\testfile.txt"
}
}
The issue is that only some of the lines get replaced and or removed and some remain.
The output should be
key="Id" value=123"
key="ReferenceNo" value=765
But i seem to get
key="Id" value=123"
key="ReferenceNo" value=765
<!--key="Address" value=Address2"
key="FirstName" value=Name1"
key="LastName" value=Name2"-->
Any ideas as to why this is happening or changes to the code above ?
Based on your comment, the token 'add key="Address"' should be changed for just 'key="Address"' then the concatenating logic to build your regex looks good. You need to use the -NotMatch switch so it matches anything but those values. Also, Select-String can read files, so, Get-Content can be removed.
Note, the use of (...) in this case is important because you're reading and writing to the same file in the same pipeline. Wrapping the statement in parentheses ensure that all output from Select-String is consumed before passing it through the pipeline. Otherwise, you would end up with an empty file.
$values = 'key="FirstName"', 'key="Lastname"', 'key="Address"'
$regexValues = [string]::Join('|', $values)
(Select-String D:\test\testfile.txt -Pattern $regexValues -NotMatch) |
ForEach-Object Line | Set-Content D:\test\testfile.txt
Outputs:
key="Id" value=123"
key="ReferenceNo" value=765

Issues finding and replacing strings in PowerShell

I'm rather new to PowerShell and I'm trying to write a PowerShell script to convert some statements in VBScript to Microsoft JScript. Here is my code:
$vbs = 'C:\infile.vbs'
$js = 'C:\outfile.js'
(Get-Content $vbs | Set-Content $js)
(Get-Content $js) |
Foreach-Object { $_ -match "Sub " } | Foreach-Object { "$_()`n`{" } | Foreach-Object { $_ -replace "Sub", "function" } | Out-File $js
Foreach-Object { $_ -match "End Sub" } | Foreach-Object { $_ -replace "End Sub", "`}" } | Out-File $js
Foreach-Object { $_ -match "Function " } | Foreach-Object { "$_()`n`{" } | Foreach-Object { $_ -replace "Function", "function" } | Out-File $js
Foreach-Object { $_ -match "End Function" } | Foreach-Object { $_ -replace "End Function", "`}" } | Out-File $js
What I want is for my PowerShell program to take the code from the VBScript input file infile.vbs, convert it, and output it to the JScript output file outfile.js. Here is an example of what I want it to do:
Input file:
Sub HelloWorld
(Code Here)
End Sub
Output File:
function HelloWorld()
{
(Code Here)
}
Something similar would happen with regard to functions. From there, I would tweak the code manually to convert it. When I run my program in PowerShell v5.1, it does not show any errors. However, when I open outfile.js, I see only one line:
False
So really, I have two questions. 1. Why is this happening?2. How can I fix this program so that it behaves how I want it to (as detailed above)?
Thanks,
Gabe
You could also do this with the switch statement. Like so:
$vbs = 'C:\infile.vbs'
$js = 'C:\outfile.js'
Get-Content $vbs | ForEach-Object {
switch -Regex ($_) {
'Sub '{
'function {0}(){1}{2}' -f $_.Remove($_.IndexOf('Sub '),4).Trim(),[Environment]::NewLine,'{'
}
'End Sub'{
'}'
}
'Function ' {
'function {0}(){1}{2}' -f $_.Remove($_.IndexOf('Function '),9).Trim(),[Environment]::NewLine,'{'
}
'End Function' {
'}'
}
default {
$_
}
}
} | Out-File $js
As for question #2 (How can I fix this program [...]?):
Kirill Pashkov's helpful answer offers an elegant solution based on the switch statement.
Note, however, that his solution:
is predicated on Sub <name> / Function <name> statement parts not being on the same line as the matching End Sub / End Function parts - while this is typically the case, it isn't a syntactical requirement; e.g., Sub Foo() WScript.Echo("hi") End Sub - on a single line - works too.
in line with your own solution attempt, blindly appends () to Sub / Function definitions, which won't work with input procedures / functions that already have parameter declarations (e.g., Sub Foo (bar, baz)).
The following solution:
also works with single-line Sub / Function definition
correctly preserves parameter declarations
Get-Content $vbs | ForEach-Object {
$_ -replace '\b(?:sub|function)\s+(\w+)\s*(\(.*?\))', 'function $1$2 {' `
-replace '\bend\s+(?:sub|function)\b', '}'
} | Out-File $js
The above relies heavily on regexes (regular expressions) to transform the input; for specifics on how regex matching results can be referred to in the -replace operator's replacement-string operand, see this answer.
Caveat: There are many other syntax differences between VBScript and JScript that your approach doesn't cover, notably that VBScript has no return statement and instead uses <funcName> = ... to return values from functions.
As for question #1:
However, when I open outfile.js, I see only one line:
False
[...]
1. Why is this happening?
All but the first ForEach-Object cmdlet call run in separate statements, because the initial pipeline ends with the first call to Out-File $js.
The subsequent ForEach-Object calls each start a new pipeline, and since each pipeline ends with Out-File $js, each such pipeline writes to file $js - and thereby overwrites whatever the previous one wrote.
Therefore, it is the last pipeline that determines the ultimate contents of file $js.
A ForEach-Object that starts a pipeline receives no input. However, its associated script block ({...}) is still entered once in this case, with $_ being $null[1]:
The last pipeline starts with Foreach-Object { $_ -match "End Function" }, so its output is the equivalent of $null -match "End Function", which yields $False, because -match with a scalar LHS (a single input object) outputs a Boolean value that indicates whether a match was found or not.
Therefore, given that the middle pipeline segment (Foreach-Object { $_ -replace "End Function", "}" }) is an effective no-op ($False is stringified to 'False', and the -replace operator therefore finds no match to replace and passes the stringified input out unmodified), Out-File $js receives string 'False' and writes just that to output file $js.
Even if you transformed your separate commands into a single pipeline with a single Out-File $js segment at the very end, your command wouldn't work, however:
Given that Get-Content sends the input file's lines through the pipeline one by one, something like $_ -match "Sub " will again produce a Boolean result - indicating whether the line at hand ($_) matched string "Sub " - and pass that on.
While you could turn -match into a filter by making the LHS an array - by enclosing it in the array-subexpression operator #(...); e.g., #($_) -match "Sub " - that would:
pass line that contain substring Sub through as a whole, and
omit lines that don't.
In other words: This wouldn't work as intended, because:
lines that do not contain a matching substring would be omitted from the output, and
the lines that do match are reflected in full in $_ in the next pipeline segment - not just the matched part.
[1] Strictly speaking, $_ will retain whatever value it had in the current scope, but that will only be non-$null if you explicitly assigned a value to $_ - given that $_ is an automatic variable that is normally controlled by PowerShell itself, however, doing so is ill-advised - see this GitHub discussion.
OK there is a few things wrong with this script.
Foreach-Object otherwise known as % is to iterate every item in a pipe.
Example is
#(1..10) | %{ "This is Array Item $_"}
This will out put 10 lines counting the array items. In you current script you are using this where a Where-Object also known as ? should be.
#(1..10) | ?{ $_ -gt 5 }
This will output all numbers greater then 5.
A example of what you are kind of trying to go for is something like
function ConvertTo-JS([string]$InputFilePath,[string]$SaveAs){
Get-Content $InputFilePath |
%{$_ -replace "Sub", "function"} |
%{$_ -replace "End Function", "}"} |
%{$_ -replace "Function", "function"} |
%{$_ -replace "End Function", "}" } |
Out-File $SaveAs
}
ConvertTo-JS -InputFilePath "C:\TEST\TEST.vbs" -SaveAs "C:\TEST\TEST.JS"
This doesnt take into account adding a { at the beginning of a function or adding the () ether. But with the information provided hopefully that puts you on the right track.

Parsing Data in powershell, with the format of Label:Data

I am doing a Invoke-Webrequest in powershell to an url that does not contain any HTML, just text. I am needing to pick out a specific part of this data that is in the format of Label:Data. Each piece of data is one it's own separate line. I'm looking for some ideas on how to accomplish this. Here is a sample of the $Response.Contentdata below. I am looking to isolate the speed-over-ground:0.0
rate-of-turn:0.0
course-over-ground:293.0
speed-over-ground:0.0
heading-true:243.0
hdop:1.0
active-waypoint-name:
bearing-to-waypoint:
distance-to-waypoint:
cross-track-error:0
cross-track-error-limit:
cross-track-error-scale:0
lateral-speed-bow:0.09
lateral-speed-stern:-0.05
longitudinal-speed:-0.05
I guess it's a single string, rather than an array of lines. So, split it into lines:
$Response.Content -split "`r?`n"
Find the one which says speed-over-ground
$line = $Response.Content -split "`r?`n" | Where-Object { $_ -match 'speed-over-ground' }
Split the text from the number, using the : separator, and take the second item, converted from text to a number if appropriate:
[decimal]$speedOverGround = $line.Split(':')[1]
Although, I might try to turn all of them into an object in a bulk transform. Complexity varies with the exact possible inputs, but this tries to convert numbers to numbers and leave empty ones as nulls:
$data = New-Object -TypeName PSCustomObject
$Response.Content -split "`r?`n" -replace ':\s*$', ':$null' |
ForEach-Object {
$name, $value = $_.Split(':').Trim()
$decimalValue = 0
if ([decimal]::TryParse($value, [ref]$decimalValue))
{
$value = $decimalValue
}
$data | Add-Member -NotePropertyName $name -NotePropertyValue $value
}
# Then you can do:
$data.'speed-over-ground'

Is there a way to optimise my Powershell function for removing pattern matches from a large file?

I've got a large text file (~20K lines, ~80 characters per line).
I've also got a largish array (~1500 items) of objects containing patterns I wish to remove from the large text file. Note, if the pattern from the array appears on a line in the input file, I wish to remove the entire line, not just the pattern.
The input file is CSVish with lines similar to:
A;AAA-BBB;XXX;XX000029;WORD;WORD-WORD-1;00001;STRING;2015-07-01;;010;
The pattern in the array which I search each line in the input file for resemble the
XX000029
part of the line above.
My somewhat naïve function to achieve this goal looks like this currently:
function Remove-IdsFromFile {
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$BigFile,
[Parameter(Mandatory=$true,Position=1)]
[Object[]]$IgnorePatterns
)
try{
$FileContent = Get-Content $BigFile
}catch{
Write-Error $_
}
$IgnorePatterns | ForEach-Object {
$IgnoreId = $_.IgnoreId
$FileContent = $FileContent | Where-Object { $_ -notmatch $IgnoreId }
Write-Host $FileContent.count
}
$FileContent | Set-Content "CleansedBigFile.txt"
}
This works, but is slow.
How can I make it quicker?
function Remove-IdsFromFile {
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$BigFile,
[Parameter(Mandatory=$true,Position=1)]
[Object[]]$IgnorePatterns
)
# Create the pattern matches
$regex = ($IgnorePatterns | ForEach-Object{[regex]::Escape($_)}) -join "|"
If(Test-Path $BigFile){
$reader = New-Object System.IO.StreamReader($BigFile)
$line=$reader.ReadLine()
while ($line -ne $null)
{
# Check if the line should be output to file
If($line -notmatch $regex){$line | Add-Content "CleansedBigFile.txt"}
# Attempt to read the next line.
$line=$reader.ReadLine()
}
$reader.close()
} Else {
Write-Error "Cannot locate: $BigFile"
}
}
StreamReader is one of the preferred methods to read large text files. We also use regex to build pattern string to match based on. With the pattern string we use [regex]::Escape() as a precaution if regex control characters are present. Have to guess since we only see one pattern string.
If $IgnorePatterns can easily be cast as strings this should working in place just fine. A small sample of what $regex looks like would be:
XX000029|XX000028|XX000027
If $IgnorePatterns is populated from a database you might have less control over this but since we are using regex you might be able to reduce that pattern set by actually using regex (instead of just a big alternative match) like in my example above. You could reduce that to XX00002[7-9] for instance.
I don't know if the regex itself will provide an performance boost with 1500 possibles. The StreamReader is supposed to be the focus here. However I did sully the waters by using Add-Content to the output which does not get any awards for being fast either (could use a stream writer in its place).
Reader and Writer
I still have to test this to be sure it works but this just uses streamreader and streamwriter. If it does work better I am just going to replace the above code.
function Remove-IdsFromFile {
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$BigFile,
[Parameter(Mandatory=$true,Position=1)]
[Object[]]$IgnorePatterns
)
# Create the pattern matches
$regex = ($IgnorePatterns | ForEach-Object{[regex]::Escape($_)}) -join "|"
If(Test-Path $BigFile){
# Prepare the StreamReader
$reader = New-Object System.IO.StreamReader($BigFile)
#Prepare the StreamWriter
$writer = New-Object System.IO.StreamWriter("CleansedBigFile.txt")
$line=$reader.ReadLine()
while ($line -ne $null)
{
# Check if the line should be output to file
If($line -notmatch $regex){$writer.WriteLine($line)}
# Attempt to read the next line.
$line=$reader.ReadLine()
}
# Don't cross the streams!
$reader.Close()
$writer.Close()
} Else {
Write-Error "Cannot locate: $BigFile"
}
}
You might need some error prevention in there for the streams but it does appear to work in place.

Retain carriage returns in text filtered through a regular expression

I need to search though a folder of logs and retrieve the most recent logs. Then I need to filter each log, pull out the relevant information and save to another file.
The problem is the regular expression I use to filter the log is dropping the carriage return and the line feed so the new file just contains a jumble of text.
$Reg = "(?ms)\*{6}\sBEGIN(.|\n){98}13.06.2015(.|\n){104}00000003.*(?!\*\*)+"
get-childitem "logfolder" -filter *.log |
where-object {$_.LastAccessTime -gt [datetime]$Test.StartTime} |
foreach {
$a=get-content $_;
[regex]::matches($a,$reg) | foreach {$_.groups[0].value > "MyOutFile"}
}
Log structure:
******* BEGIN MESSAGE *******
<Info line 1>
Date 18.03.2010 15:07:37 18.03.2010
<Info line 2>
File Number: 00000003
<Info line 3>
*Variable number of lines*
******* END MESSAGE *******
Basically capture everything between the BEGIN and END where the dates and file numbers are a certain value. Does anyone know how I can do this without losing the line feeds? I also tried using Out-File | Select-String -Pattern $reg, but I've never had success with using Select-String on a multiline record.
As #Matt pointed out, you need to read the entire file as a single string if you want to do multiline matches. Otherwise your (multiline) regular expression would be applied to single lines one after the other. There are several ways to get the content of a file as a single string:
(Get-Content 'C:\path\to\file.txt') -join "`r`n"
Get-Content 'C:\path\to\file.txt' | Out-String
Get-Content 'C:\path\to\file.txt' -Raw (requires PowerShell v3 or newer)
[IO.File]::ReadAllText('C:\path\to\file.txt')
Also, I'd modify the regular expression a little. Most of the time log messages may vary in length, so matching fixed lengths may fail if the log message changes. It's better to match on invariant parts of the string and leave the rest as variable length matches. And personally I find it a lot easier to do this kind of content extraction in several steps (makes for simpler regular expressions). In your case I would first separate the log entries from each other, and then filter the content:
$date = [regex]::Escape('13.06.2015')
$fnum = '00000003'
$re1 = "(?ms)\*{7} BEGIN MESSAGE \*{7}\s*([\s\S]*?)\*{7} END MESSAGE \*{7}"
$re2 = "(?ms)[\s\S]*?Date\s+$date[\s\S]*?File Number:\s+$fnum[\s\S]*"
Get-ChildItem 'C:\log\folder' -Filter '*.log' | ? {
$_.LastAccessTime -gt [DateTime]$Test.StartTime
} | % {
Get-Content $_.FullName -Raw |
Select-String -Pattern $re1 -AllMatches |
select -Expand Matches |
% {
$_.Groups[1].Value |
Select-String -Pattern $re2 |
select -Expand Matches |
select -Expand Groups |
select -Expand Value
}
} | Set-Content 'C:\path\to\output.txt'
BTW, don't use the redirection operator (>) inside a loop. It would overwrite the output file's content with each iteration. If you must write to a file inside a loop use the append redirection operator instead (>>). However, performance-wise it's usually better to put writing to output files at the end of the pipeline (see above).
Wanted to see if I could make that regex better but for now if you are using those regex modes you should be reading your text file in as a single string which helps a lot.
$a=get-content $_ -Raw
or if you don't have PowerShell 3.0
$a=(get-content $_) -join "`r`n"
I had to solve the problem of disappearing newlines in a completely different context. What you get when you do a get-content of a text file is an array of records, where each record is a line of text.
The only way I found to put the newline back in after some transformation was to use the automatic variable $OFS (output field separator). The default value is space, but if you set it to carriage return line feed, then you get separate records on separate lines.
So try this (it might work):
$OFS = "`r`n"