powershell get-content ignore newline - regex

When you use set-content Set-Content C:\test.txt "test","test1" by default the two provided strings are separated by a newline, but there is also a newline at the end of the file.
How do you ignore this newline or newline with spaces when using Get-Content?

You can remove empty line like this:
Set-Content C:\test.txt "test",'',"test1"
Get-Content c:\test.txt | ? { $_ }
However, it will remove the string in the middle as well.
Edit: Actually as I tried the example, I noticed that Get-Content ignores the last empty line added by Set-Content.
I think your problem is in Set-Content. If you use workaround with WriteAllText, it will work fine:
[io.file]::WriteAllText('c:\test.txt', ("test",'',"test1" -join "`n"))
You pass a string as second parameter. That's why I first joined the strings via -join and then passed it to the method.
Note: this is not recommended for large files, because of the string concatenation that is not efficient.

Get-Content C:\test.txt | Where-Object {$_ -match '\S'}

it is default behavior that Set-Content adds new line because it lets you set-content with arrays of strings and get them one per line. Anyway Get-Content ignores last "new line" (if there are no spaces behind).
work around for Set-Content:
([byte[]][char[]] "test"), ([byte]13), ([byte]10) ,([byte[]][char[]] "test1") |
Set-Content c:\test.txt -Encoding Byte
or use much simplier [io.file]::WriteAllText
can you specify exact situation (or code)?
for example if you want to ignore last line when getting content it would look like:
$content = Get-Content c:\test.txt
$length = ($content | measure).Count
$content = $content | Select-Object -first ($length - 1)
but if you just do:
"test","test1" | Set-Content C:\test.txt
$content = Get-Content C:\test.txt
$content variable contains two items: "test","test1"

Related

How can i replace all lines in a file with a pattern using Powershell?

I have a file with lines that i wish to remove like the following:
key="Id" value=123"
key="FirstName" value=Name1"
key="LastName" value=Name2"
<!--key="FirstName" value=Name3"
key="LastName" value=Name4"-->
key="Address" value=Address1"
<!--key="Address" value=Address2"
key="FirstName" value=Name1"
key="LastName" value=Name2"-->
key="ReferenceNo" value=765
have tried the following: `
$values = #('key="FirstName"','key="Lastname"', 'add key="Address"');
$regexValues = [string]::Join('|',$values)
$lineprod = Get-Content "D:\test\testfile.txt" | Select-String $regexValues|Select-Object -
ExpandProperty Line
if ($null -ne $lineprod)
{
foreach ($value in $lineprod)
{
$prod = $value.Trim()
$contentProd | ForEach-Object {$_ -replace $prod,""} |Set-Content "D:\test\testfile.txt"
}
}
The issue is that only some of the lines get replaced and or removed and some remain.
The output should be
key="Id" value=123"
key="ReferenceNo" value=765
But i seem to get
key="Id" value=123"
key="ReferenceNo" value=765
<!--key="Address" value=Address2"
key="FirstName" value=Name1"
key="LastName" value=Name2"-->
Any ideas as to why this is happening or changes to the code above ?
Based on your comment, the token 'add key="Address"' should be changed for just 'key="Address"' then the concatenating logic to build your regex looks good. You need to use the -NotMatch switch so it matches anything but those values. Also, Select-String can read files, so, Get-Content can be removed.
Note, the use of (...) in this case is important because you're reading and writing to the same file in the same pipeline. Wrapping the statement in parentheses ensure that all output from Select-String is consumed before passing it through the pipeline. Otherwise, you would end up with an empty file.
$values = 'key="FirstName"', 'key="Lastname"', 'key="Address"'
$regexValues = [string]::Join('|', $values)
(Select-String D:\test\testfile.txt -Pattern $regexValues -NotMatch) |
ForEach-Object Line | Set-Content D:\test\testfile.txt
Outputs:
key="Id" value=123"
key="ReferenceNo" value=765

Regular expression seems not to work in Where-Object cmdlet

I am trying to add quote characters around two fields in a file of comma separated lines. Here is one line of data:
1/22/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
which I would like to become this:
1/22/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
I began developing my regular expression in a simple PowerShell script, and soon I have the following:
$strData = '1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0'
$strNew = $strData -replace "([^,]*),([^,]*),([^,]*),(.*)",'$1,"$2","$3",$4'
$strNew
which gives me this output:
1/29/2018 0:00:00,"0000000","001B9706BE",1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
Great! I'm all set. Extend this example to the general case of a file of similar lines of data:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4'
}
This is a listing of test_data.csv:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
This is the output of my script:
1/29/2018 0:00:00,0000000,001B9706BE,1,21,0,1,0,0,0,0,0,0,0,0,0,0,13,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104938428,0016C4C483,1,45,0,1,0,0,0,0,0,0,0,0,0,0,35,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104943875,0016C4B0BC,1,31,0,1,0,0,0,0,0,0,0,0,0,0,25,0,1,0,0,0,0,0,0,0,0,0,0
1/29/2018 0:00:00,104948067,0016C4834D,1,33,0,1,0,0,0,0,0,0,0,0,0,0,23,0,1,0,0,0,0,0,0,0,0,0,0
I have also tried this version of the script:
Get-Content test_data.csv | Where-Object -FilterScript {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",$4"
}
and obtained the same results.
My simple test script has convinced me that the regex is correct, but something happens when I use that regex inside a filter script in the Where-Object cmdlet.
What simple, yet critical, detail am I overlooking here?
Here is my PSVerion:
Major Minor Build Revision
----- ----- ----- --------
5 0 10586 117
You're misunderstanding how Where-Object works. The cmdlet outputs those input lines for which the -FilterScript expression evaluates to $true. It does NOT output whatever you do inside that scriptblock (you'd use ForEach-Object for that).
You don't need either Where-Object or ForEach-Object, though. Just put Get-Content in parentheses and use that as the first operand for the -replace operator. You also don't need the 4th capturing group. I would recommend anchoring the expression at the beginning of the string, though.
(Get-Content test_data.csv) -replace '^([^,]*),([^,]*),([^,]*)', '$1,"$2","$3"'
This seems to work here. I used ForEach-Object to process each record.
Get-Content test_data.csv |
ForEach-Object { $_ -replace "([^,]*),([^,]*),([^,]*),(.*)", '$1,"$2","$3",$4' }
This also seems to work. Uses the ? to create a reluctant (lazy) capture.
Get-Content test_data.csv |
ForEach-Object { $_ -replace '(.*?),(.*?),(.*?),(.*)', '$1,"$2","$3",$4' }
I would just make a small change to what you have in order for this to work. Simply change the script to the following, noting that I changed the -FilterScript to a ForEach-Object and fixed a minor typo that you had on the last item in the regular expression with the quotes:
Get-Content c:\temp\test_data.csv | ForEach-Object {
$_ -replace "([^,]*),([^,]*),([^,]*),(.*)", "`$1,`"`$2`",`"`$3`",`"`$4"
}
I tested this with the data you provided and it adds the quotes to the correct columns.

Powershell replace lines stating with given pattern

Background:
I am trying to read a config file and place it in a string variable for further use. For now I've managed to remove the newlines, which was fairly simple. However, I'm having a bit of trouble with removing comments (lines starting with a #).
What I have so far
$var = (Get-Content $HOME/path/to/config.txt -Raw).Replace("`r`n","")
What I've tried
A lot of it is something along the lines of
$var = (Get-Content $HOME/path/to/config.txt -Raw).Replace("#.*?`r`n","").Replace("`r`n","")
( .Replace("(?<=#).*?(?=`r`n)","") , .Replace('^[^#].*?`r`n','') etc)
A lot of the resources I've found have treated how to iteratively read from a file and write back to it or a new one, but what I need is for the result to stay in a variable and for the original file not to be altered in any way (also I'd rather avoid using temp files or even variables if possible). I think there is something fundamental I'm missing about the input to Replace. (Also found this semi-relevant piece when you're using the ConvertFrom-Csv Using Import-CSV in Powershell, ignoring commented lines .)
Sample input:
text;
weird text;
other-sort-of-text;
#commented out possibility;
more-input/with-comment;#comment
Sample output:
text;weird text;other-sort-of-text;more-input/with-comment;
Additional info:
Am going to run this on current builds of Windows 10, locally now I seem to have pwershell version 5.1.14393.693
Split the string at semicolons, remove element starting with a #, then join the result back to a string.
((Get-Content $HOME/path/to/config.txt -Raw).Replace("`r`n","") -split ';' |
Where-Object { $_.Trim() -notlike '#*' }) -join ';'
This might be identical to some of the other responses, but here is how I would do it:
$Data = Get-Content $HOME/path/to/config.txt -Raw
(($Data -split(';')).replace("`r`n",'') | Where-Object { $_ -notlike '^#*' }) -join(';')
Anyway you do it, remember that rn needs to be expanded, so it has to be encased in double quotes, unlike the rest of your characters.
#############solution 1 with convertfrom-string####################
#short version
(gc "$HOME/path/to/config.txt" | ? {$_ -notlike "#*"} | cfs -D ";").P1 -join ";"
#verbose version
(Get-Content "$HOME/path/to/config.txt" | where {$_ -notlike "#*"} | ConvertFrom-String -Delimiter ";").P1 -join ";"
#############Solution 2 with convertfrom-csv#######################
(Get-Content "C:\temp\test\config.txt" | where {$_ -notlike "#*"} | ConvertFrom-csv -Delimiter ";" -Header "P1").P1 -join ";"
#############Solution 3 with split #######################
(Get-Content "C:\temp\test\config.txt" | where {$_ -notlike "#*"} | %{$_.Split(';')[0]}) -join ";"

Retain carriage returns in text filtered through a regular expression

I need to search though a folder of logs and retrieve the most recent logs. Then I need to filter each log, pull out the relevant information and save to another file.
The problem is the regular expression I use to filter the log is dropping the carriage return and the line feed so the new file just contains a jumble of text.
$Reg = "(?ms)\*{6}\sBEGIN(.|\n){98}13.06.2015(.|\n){104}00000003.*(?!\*\*)+"
get-childitem "logfolder" -filter *.log |
where-object {$_.LastAccessTime -gt [datetime]$Test.StartTime} |
foreach {
$a=get-content $_;
[regex]::matches($a,$reg) | foreach {$_.groups[0].value > "MyOutFile"}
}
Log structure:
******* BEGIN MESSAGE *******
<Info line 1>
Date 18.03.2010 15:07:37 18.03.2010
<Info line 2>
File Number: 00000003
<Info line 3>
*Variable number of lines*
******* END MESSAGE *******
Basically capture everything between the BEGIN and END where the dates and file numbers are a certain value. Does anyone know how I can do this without losing the line feeds? I also tried using Out-File | Select-String -Pattern $reg, but I've never had success with using Select-String on a multiline record.
As #Matt pointed out, you need to read the entire file as a single string if you want to do multiline matches. Otherwise your (multiline) regular expression would be applied to single lines one after the other. There are several ways to get the content of a file as a single string:
(Get-Content 'C:\path\to\file.txt') -join "`r`n"
Get-Content 'C:\path\to\file.txt' | Out-String
Get-Content 'C:\path\to\file.txt' -Raw (requires PowerShell v3 or newer)
[IO.File]::ReadAllText('C:\path\to\file.txt')
Also, I'd modify the regular expression a little. Most of the time log messages may vary in length, so matching fixed lengths may fail if the log message changes. It's better to match on invariant parts of the string and leave the rest as variable length matches. And personally I find it a lot easier to do this kind of content extraction in several steps (makes for simpler regular expressions). In your case I would first separate the log entries from each other, and then filter the content:
$date = [regex]::Escape('13.06.2015')
$fnum = '00000003'
$re1 = "(?ms)\*{7} BEGIN MESSAGE \*{7}\s*([\s\S]*?)\*{7} END MESSAGE \*{7}"
$re2 = "(?ms)[\s\S]*?Date\s+$date[\s\S]*?File Number:\s+$fnum[\s\S]*"
Get-ChildItem 'C:\log\folder' -Filter '*.log' | ? {
$_.LastAccessTime -gt [DateTime]$Test.StartTime
} | % {
Get-Content $_.FullName -Raw |
Select-String -Pattern $re1 -AllMatches |
select -Expand Matches |
% {
$_.Groups[1].Value |
Select-String -Pattern $re2 |
select -Expand Matches |
select -Expand Groups |
select -Expand Value
}
} | Set-Content 'C:\path\to\output.txt'
BTW, don't use the redirection operator (>) inside a loop. It would overwrite the output file's content with each iteration. If you must write to a file inside a loop use the append redirection operator instead (>>). However, performance-wise it's usually better to put writing to output files at the end of the pipeline (see above).
Wanted to see if I could make that regex better but for now if you are using those regex modes you should be reading your text file in as a single string which helps a lot.
$a=get-content $_ -Raw
or if you don't have PowerShell 3.0
$a=(get-content $_) -join "`r`n"
I had to solve the problem of disappearing newlines in a completely different context. What you get when you do a get-content of a text file is an array of records, where each record is a line of text.
The only way I found to put the newline back in after some transformation was to use the automatic variable $OFS (output field separator). The default value is space, but if you set it to carriage return line feed, then you get separate records on separate lines.
So try this (it might work):
$OFS = "`r`n"

Using powershell, in a csv doc, need to iterate and insert a character

So my csv file looks something like:
J|T|W
J|T|W
J|T|W
I'd like to iterate through, most likely using a regex so that after the two pipes and content \|.+{2}, and insert a tab character `t.
I'm assuming I'd use get-content to loop through, but I'm unsure of where to go from there.
Also...just thought of this, it is possible that the line will overrun to the next line, and therefore the two pipes will be on different lines, which I'm pretty sure makes a difference.
-Thanks
Ok, I'll move the comment discussion to an answer since it seems like it is a potentially valid solution:
Import-csv .\test.csv -Delimiter '|' -Header 'One', 'two', 'three' | %{$_.Three = "`t$($_.Three)"; $_} | Export-CSV .\test_result.cs
This works for a file that is known to have 3 fields. For a more generic solution, if you have the ability to determine the number of fields initially being exported to CSV, then:
Import-csv .\test.csv -Delimiter '|' -Header (1..$fieldCount) | %{$_.$fieldCount = "`t$($_.$fieldCount)"; $_} | Export-CSV .\test_result.cs
In PowerShell you can use the -replace operator with a regex e.g.:
$c = Get-Content foo.csv | Foreach {$_ -replace '<regex_here>','new_string'}
$c | Out-File foo.csv -encoding ascii
Note that in new_string you can refer to capture groups using $1 but you'll want to put that string in single quotes so PowerShell won't try to interpret $1 as a variable reference.