This question already has answers here:
PowerShell String Matching and the Pipe Character
(3 answers)
Unable to escape pipe character (|) in powershell
(2 answers)
Closed 1 year ago.
I am very new to powershell. I have a csv file that i want to find and replace some text with. after some searching, this seems simple to do, but i still seem to be having problems with the code:
$csv = get-content .\test.csv
$csv = $csv -replace "|", "$"
$csv | out-file .\test.csv
My file is located here: C:\Users\CB1\test.csv
How do I specify that location in powershell?
I've tried this but it doesn't work:
$csv = get-content C:\Users\CB1\test.csv
$csv = $csv -replace "|", "$"
$csv | out-file C:\Users\CB1\test.csv
The problem isn't whether you're using relative or absolute paths (assuming your relative paths are relative to the right directory).
Rather, the problem is that the -replace operator is regex-based, and that | is therefore interpreted as a regex metacharacter (representing alternation).
Therefore, you need to escape such metacharacters, using \ (or, if you were to do this programmatically, you could use the [regex]::Escape() method).
Additionally, since your replacement operation isn't line-specific, you can speed up your operation by reading the file into memory as a whole, using the -Raw switch.
That, in turn, requires that you use the -NoNewLine switch when (re)writing the file.
Also, with text input, Set-Content is preferable to Out-File for performance reasons.
To put it all together:
(Get-Content -Raw .\test.csv) -replace '\|', '$' | Set-Content -NoNewLine .\test.csv
Note: Use the -Encoding parameter as needed, as the input file's encoding will not be honored:
In Windows PowerShell, Out-File produces UTF-16LE ("Unicode") files by default, whereas Set-Content uses the system's ANSI code page.
In PowerShell (Core) 7+, BOM-less UTF-8 is the consistently applied default.
Related
I have huge csv file with data, and some of lines are incorrect and contains enters. When file is imported into Excel then I need to correct hundreds lines manually. I have regex which is work in Notepad++ and remove enters from line which is not start with specific string in this case ";" However same regex is not working in PowerShell script.
Example of input
;BP;7165378;XX_RAW;200SSS952;EU-PL;PL02;PL02;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
15:00:00;;;;Jhon Name;;;;;;;;9444253;;;;;;;;;;;;;"Jhon Name";;;;;;;;;;Jhon Name;;;;;;;;Final Check Approved;;;;;;;;;09.01.2023;;;;;Approve;;;;;;12077;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
How it should look:
;BP;7165378;XX_RAW;200SSS952;EU-PL;PL02;PL02;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;15:00:00;;;;Jhon Name;;;;;;;;9444253;;;;;;;;;;;;;"Jhon Name";;;;;;;;;;Jhon Name;;;;;;;;Final Check Approved;;;;;;;;;09.01.2023;;;;;Approve;;;;;;12077;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Code:
$content = Get-Content -path "C:\Users\TUF17\Desktop\File\Fix\xx_fix_temp.csv"
$content -Replace '"\R(?!;)"', ' ' | Out-File "C:\Users\TUF17\Desktop\File\Fix\xx_noenters.csv"
It has to do with line continuation \ in your ps script.
I would also suggest adding -Raw if you want to get content of file as single string, rather than an array of strings, for easier replacing.
I'm assuming it's a .csv file you are using.
$content = Get-Content -Path "C:\Users\TUF17\Desktop\File\Fix\xx_fix_temp.csv" -Raw
$content -Replace '(?m)(^[^;].*)\r?\n(?!;)', '$1 ' | Out-File "C:\Users\TUF17\Desktop\File\Fix\xx_noenters.csv"
Building on the helpful comments on the question:
In order to perform replacements across lines of a text file, you need to either read the file in full - with Get-Content -Raw - or perform stateful line-by-line processing, such as with the -File parameter of a switch statement.
Note: While you could also do stateful line-by-line processing by combining Get-Content (without -Raw) with a ForEach-Object call, such a solution would be much slower - see this answer.
Your regex, '"\R(?!;)"', has two problems:
It accidentally uses embedded " quoting. Use only '...' quoting. PowerShell has no special syntax for regex literals - it simply uses strings.
To avoid confusion with PowerShell's own up-front string interpolation, it is better to use verbatim '...' strings rather than expandable (interpolating) "..." strings - see the conceptual about_Quoting_Rules help topic.
\R is an unsupported regex escape sequence; you presumably meant \r, i.e. a CR char. (CARRIAGE RETURN, U+000D)
If you instead want to match CRLF, a Windows-format newline sequence, use \r\n
If you want to match LF (LINE FEED, U+000A)) alone (a Unix-format newline), use \n
If you want to match both newline formats, use \r?\n
As an aside: While use of CR alone is rare in practice, PowerShell treats stand-alone CR characters as newlines as well, which is why Get-Content without -Raw, which reads line by line (as you've tried) wouldn't work.
Get-Content -Raw solution (easier and faster than switch -File, but requires the whole file to fit into memory twice):
# Adjust the '\r' part as needed (see above).
(Get-Content -Raw -LiteralPath $inFile) -replace '\r(?!;)' |
Set-Content -NoNewLine -Encoding utf8 -LiteralPath $outFile
Note:
By not specifying a substitution operand to -replace, the command removes all newlines not followed by a ; ((?!;)), effectively joining the line that follows the CR directly to the previous line, which is the desired behavior based on your sample output.
For saving text, Set-Content is a bit faster than Out-File (it'll make no appreciable difference here, given that only a single, large string is written).
-NoNewLine prevents a(n additional) trailing newline from getting appended to the file.
-Encoding utf8 specifies the output character encoding. Note that PowerShell never preserves the input character encoding, so unless you use -Encoding on output, you'll get the respective cmdlet's default character encoding, which in Windows PowerShell varies from cmdlet to cmdlet; in PowerShell (Core) 7+, the consistent default is now BOM-less UTF-8. Note that in Windows PowerShell -Encoding utf8 always create a file with a BOM; see this answer for background information and workarounds.
This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 5 months ago.
Solution was adding (?ms) to the front of my regex query
I am trying to search for chunks of text within a file, and preserving the line breaks in a chunk.
When I define my variable as $variable = get-content $fromfile,
my function (below) is able to find the text I'm looking for but it is difficult to parse further due to a lack of line breaks
function FindBetween($first, $second, $importing){
$pattern = "$first(.*?)$second"
$result = [regex]::Match($importing, $pattern).Groups[1].Value
return $result
}
when I define my variable as $variable = get-content $fromfile -raw, the output of my query is blank. I'm able to print the variable, and it does preserve the line breaks.
I run into the same issue regardless of if I add \r\n to the end of my pattern, if I use #() around my variable definition, if I use -Delimiter \n, or any combination of all those.
Whole code is here:
param($fromfile)
$working = get-content $fromfile -raw
function FindBetween($first, $second, $importing){
$pattern = "(?ms)$first(.*?)$second"
$result = [regex]::Match($importing, $pattern).Groups[1].Value
#$result = select-string -InputObject $importing -Pattern $pattern
return $result
}
FindBetween -first "host ####" -second "#### flag 2" -importing $working | Out-File "testresult.txt"
the file I'm testing it against looks like:
#### flag 1 host ####
stuff in between
#### flag 2 server ####
#### process manager ####
As to why I'm doing this:
I'm trying to automate taking a file that has defined sections with titles and outputting the content of those separate sections into a .csv (each section is formatted drastically different from each other). These files are all uniform to each other, containing the same sections and general content.
If you're doing -raw you probably need to change your RegEx to "(?ms)$first(.*?)$second" so that . will match new lines.
I've been through other similar questions and tried their advice, but it wouldn't help.
I'm trying to delete a specific line of text in a text file.
My code which works
(Get-Content -Path "MyPath.txt" -Raw).Replace('this is the line', '') | Set-Content "MyPath.txt" -Encoding UTF8
Now this works but leaves an ugly empty line in the text file. I wanted to also replace an optional newline character by adding this regex at the end of the line
\n?
and this wouldn't work. The other threads made other recommendations and I've tried all combinations but just can't match. I'm using windows style ending (CRLF)
Both using -Raw and not using it
\n
\r\n
`n
`r`n
I haven't even added the regex question mark at the end (or non-capturing group in case it needs the \r\n syntax).
The [string] type's .Replace() method doesn't support regexes (regular expressions), whereas PowerShell's -replace operator does.
However, the simplest solution in this case is to take advantage of the fact that the -ne operator acts as a filter with an array-valued LHS (as other comparison operators do):
#(Get-Content -Path MyPath.txt) -ne 'this is the line' |
Set-Content MyPath.txt -Encoding UTF8
Note how Get-Content is called without -Raw in order to return an array of lines, from which -ne then filters out the line of (non)-interest; #(...), the array-subexpression operator ensures that the output is an array even if the file happens to contain just one line.
The assumption is that string 'this is the line' matches the whole line (case-insensitively).
If that is not the case, instead of -ne you could use -notlike with a wildcard expression or -notmatch with a regex (e.g.,
-notmatch 'this is the line' or -notlike '*this is the line')
I wanted to extract some strings from some text files. After some researching for that files, I found some pattern that strings appear in a text file.
I composed a short powershell script by help of google-search. This script receives two parameters (textfile path and extracting keyword) and operates extracting strings from text file.
As finding & extracting the target strings from the file $tpath\temp.txt, this script saves it to another file $tpath\tmpVI.txt.
Set-PSDebug -Trace 2 -step
$txtpath=$args[0]
$exkey=$args[1]
$tfile=gc "$tpath\temp.txt"
$savextracted="$tpath\tmpVI.txt"
$tfile -replace '&', '&' -replace '^.*$exkey', '' -replace '\s.*$', '' -replace '\\.*$','' | out-file "$savextracted" -encoding ascii
But until now, the extracted & saved result has been fault, never wanted strings.
By PS debugging, it seems the regular expressions in the last line make troubles and variable $exkey does so in replace quotation. But I don't know how to fix this. What shall I do?
If you're looking to capture lines that have your match, here's a snippet that solves that problem:
Function Get-Matches
{
Param(
[Parameter(Mandatory,Position=0)]
[String] $Path,
[Parameter(Mandatory,Position=1)]
[String] $Regex
)
#(Get-Content -Path $Path) -match $Regex
}
I'm running into problems trying to pull the thousands separators out of some currency values in a set of files. The "bad" values are delimited with commas and double quotes. There are other values in there that are < $1000 that present no issue.
Example of existing file:
"12,345.67",12.34,"123,456.78",1.00,"123,456,789.12"
Example of desired file (thousands separators removed):
"12345.67",12.34,"123456.78",1.00,"123456789.12"
I found a regex expression for matching the numbers with separators that works great, but I'm having trouble with the -replace operator. The replacement value is confusing me. I read about $& and I'm wondering if I should use that here. I tried $_, but that pulls out ALL my commas. Do I have to use $matches somehow?
Here's my code:
$Files = Get-ChildItem *input.csv
foreach ($file in $Files)
{
$file |
Get-Content | #assume that I can't use -raw
% {$_ -replace '"[\d]{1,3}(,[\d]{3})*(\.[\d]+)?"', ("$&" -replace ',','')} | #this is my problem
out-file output.csv -append -encoding ascii
}
Tony Hinkle's comment is the answer: don't use regex for this (at least not directly on the CSV file).
Your CSV is valid, so you should parse it as such, work on the objects (change the text if you want), then write a new CSV.
Import-Csv -Path .\my.csv | ForEach-Object {
$_ | ForEach-Object {
$_ -replace ',',''
}
} | Export-Csv -Path .\my_new.csv
(this code needs work, specifically the middle as the row will have each column as a property, not an array, but a more complete version of your CSV would make that easier to demonstrate)
You can try with this regex:
,(?=(\d{3},?)+(?:\.\d{1,3})?")
See Live Demo or in powershell:
% {$_ -replace ',(?=(\d{3},?)+(?:\.\d{1,3})?")','' }
But it's more about the challenge that regex can bring. For proper work, use #briantist answer which is the clean way to do this.
I would use a simpler regex, and use capture groups instead of the entire capture.
I have tested the follow regular expression with your input and found no issues.
% {$_ -replace '([\d]),([\d])','$1$2' }
eg. Find all commas with a number before and after (so that the weird mixed splits dont matter) and replace the comma entirely.
This would have problems if your input has a scenario without that odd mixing of quotes and no quotes.