Replace an entire line of text using powershell and regexp? - regex

I have a programming background, but I am fairly new to both powershell scripting and regexp. Regexp has always eluded me, and my prior projects have never 'forced' me to learn it.
With that in mind I have a file with a line of text that I need to replace. I can not depend on knowing where the line exists, if it has whitespace in front of it, or what the ACTUAL text being replaced IS. I DO KNOW what will preface and preceed the text being replaced.
AGAIN, I will not KNOW the value of "Replace This Text". I will only know what prefaces it "" and what preceeds it "". Edited OP to clarify. Thanks!
LINE OF TEXT I NEED TO REPLACE
<find-this-text>Replace This Text</find-this-text>
POTENTIAL CODE
(gc $file) | % { $_ -replace “”, “” } | sc $file
Get the content of the file, enclose this in parentheses to ensure file is first read and then closed so it doesnt throw an error when trying to save the file.
Iterate through each line, and issue replace statement. THIS IS WHERE I COULD USE HELP.
Save the file by using Set-Content. My understanding is that this method is preferable, because it takes encoding into consideration,like UTF8.

XML is not a line oriented format (nodes may span several lines, just as well as a line may contain several nodes), so it shouldn't be edited as if it were. Use a proper XML parser instead.
$xmlfile = 'C:\path\to\your.xml'
[xml]$xml = Get-Content $xmlfile
$node = $xml.SelectSingleNode('//find-this-text')
$node.'#text' = 'replacement text'
For saving the XML in "UTF-8 without BOM" format you can call the Save() method with a StreamWriter doing The Right Thing™:
$UTF8withoutBOM = New-Object Text.UTF8Encoding($false)
$writer = New-Object IO.StreamWriter ($xmlfile, $false, $UTF8withoutBOM)
$xml.Save($writer)
$writer.Close()

The .* in the regular expression would be considered "greedy" and dangerous by many. If the line that contains this tag and it's data contains nothing else, then there really isn't any significant risk according to my understanding.
$file = "c:\temp\sms.txt"
$OpenTag = "<find-this-text>"
$CloseTag = "</find-this-text>"
$NewText = $OpenTag + "New text" + $CloseTag
(Get-Content $file) | Foreach-Object {$_ -replace "$OpenTag.*$CloseTag", $NewText} | Set-Content $file

Related

Using Regex to replace multiple lines of text in file

Basically, I have a .bas file that I am looking to update. Basically the script requires some manual configuration and I don't want my team to need to reconfigure the script every time they run it. What I would like to do is have a tag like this
<BEGINREPLACEMENT>
'MsgBox ("Loaded")
ReDim Preserve STIGArray(i - 1)
ReDim Preserve SVID(i - 1)
STIGArray = RemoveDupes(STIGArray)
SVID = RemoveDupes(SVID)
<ENDREPLACEMENT>
I am kind of familiar with powershell so what I was trying to do is to do is create an update file and to replace what is in between the tags with the update. What I was trying to do is:
$temp = Get-Content C:\Temp\file.bas
$update = Get-Content C:\Temp\update
$regex = "<BEGINREPLACEMENT>(.*?)<ENDREPLACEMENT>"
$temp -replace $regex, $update
$temp | Out-File C:\Temp\file.bas
The issue is that it isn't replacing the block of text. I can get it to replace either or but I can't get it to pull in everything in between.
Does anyone have any thoughts as to how I can do this?
You need to make sure you read the whole files in with newlines, which is possible with the -Raw option passed to Get-Content.
Then, . does not match a newline char by default, hence you need to use a (?s) inline DOTALL (or "singleline") option.
Also, if your dynamic content contains something like $2 you may get an exception since this is a backreference to Group 2 that is missing from your pattern. You need to process the replacement string by doubling each $ in it.
$temp = Get-Content C:\Temp\file.bas -Raw
$update = Get-Content C:\Temp\update -Raw
$regex = "(?s)<BEGINREPLACEMENT>.*?<ENDREPLACEMENT>"
$temp -replace $regex, $update.Replace('$', '$$')

Replace text between two string powershell

I have a question which im pretty much stuck on..
I have a file called xml_data.txt and another file called entry.txt
I want to replace everything between <core:topics> and </core:topics>
I have written the below script
$test = Get-Content -Path ./xml_data.txt
$newtest = Get-Content -Path ./entry.txt
$pattern = "<core:topics>(.*?)</core:topics>"
$result0 = [regex]::match($test, $pattern).Groups[1].Value
$result1 = [regex]::match($newtest, $pattern).Groups[1].Value
$test -replace $result0, $result1
When I run the script it outputs onto the console it doesnt look like it made any change.
Can someone please help me out
Note: Typo error fixed
There are three main issues here:
You read the file line by line, but the blocks of texts are multiline strings
Your regex does not match newlines as . does not match a newline by default
Also, the literal regex pattern must when replacing with a dynamic replacement pattern, you must always dollar-escape the $ symbol. Or use simple string .Replace.
So, you need to
Read the whole file in to a single variable, $test = Get-Content -Path ./xml_data.txt -Raw
Use the $pattern = "(?s)<core:topics>(.*?)</core:topics>" regex (it can be enhanced in case it works too slow by unrolling it to <core:topics>([^<]*(?:<(?!</?core:topics>).*)*)</core:topics>)
Use $test -replace [regex]::Escape($result0), $result1.Replace('$', '$$') to "protect" $ chars in the replacement, or $test.Replace($result0, $result1).

How can I make this PowerShell script more efficient?

I am trying to make a script that takes an XML file, looks for a matching condition, if it finds it adds a new line of asteriks, then when done going through the file to strip it of all its XML tags and leave the data in a plain text file.
The script has been tested on a small input xml file and works fine, but when I pass a large XML file to it takes forever (not actually sure how long as I ran it for over an hour and still no result so I just stopped it).
I'm guessing I must be performing the work in an extremely inefficient manner, hoping you guys can help me make it fast and efficient.
Here is the script below:
# Takes input XML File, cleans up XML elements, outputs plain text file
$FileName = "C:\Users\someguy\Desktop\input.xml"
$Pattern = "ProcessSpecifier = ""true"""
$FileOriginal = Get-Content $FileName
[String[]] $FileModified = #()
Foreach ($Line in $FileOriginal)
{
$FileModified += $Line
if ($Line -match $Pattern)
{
#Add Lines after the selected pattern
$FileModified += "*************isActive=true*****************"
}
}
$FileModified -replace "<[^>]+>", "" | Out-File C:\Users\someguy\Desktop\Output.txt
Let's go with a look behind and a bunch of regex to speed things up here. Also, I'm not going to store the whole thing in memory, I'm just going to pass it down the pipeline, which should help. I remove whitespace from the beginning and ends of lines, and filter out blank lines, but you can remove that bit if you want.
# Takes input XML File, cleans up XML elements, outputs plain text file
$FileName = "C:\Users\someguy\Desktop\input.xml"
$Pattern = '(?<=^.*ProcessSpecifier = "true".*$)'
(Get-Content $FileName) -replace $Pattern, "`n*************isActive=true*****************" -replace '<[^>]+?>' -replace '^\s*|\s$' | ?{$_} | Set-Content C:\Users\someguy\Desktop\Output.txt
So, the main thing here is that I use a look behind to find your pattern text, and then add a new line and the asterisk line to that line. So that the line
<SomeTag>ProcessSpecifier = "true"</SomeTag>
becomes:
<SomeTag>ProcessSpecifier = "true"</SomeTag>`n*************isActive=true*****************
When used inside double quote a backtick ` followed by n creates a new line, so the '*************isActive=true*****************' is on its own line immediately following your search pattern line. Past that I remove the XML tags, and then any leading or trailing whitespace from any line.
After the RegEx replacements I pass the result to a Where statement that removes blank lines, and then pass the remaining lines to Set-Content which I've seen better performance out of than Out-File.
Variation of TheMadTechnician's answer:
# Takes input XML File, cleans up XML elements, outputs plain text file
$FileName = "C:\Users\someguy\Desktop\input.xml"
$Pattern = '(?<=^.*ProcessSpecifier = "true".*$)'
Set-Content -Path C:\Users\someguy\Desktop\Output.txt -Value (((Get-Content $FileName) -replace $Pattern, "`n*************isActive=true*****************" -replace '<[^>]+?>' -replace '^\s*|\s$').Where{$_})
I actually try to avoid the pipeline, it is rather slow afaik. Of course you will run into problem with memory consumption if the files are very large.
The "().Where" construct doesn't work on all powershell versions (Version 4+ iirc).
This is a guess, I am not sure whether this is actually faster than TheMadTechnician's. I'd be curious about the result :)

How can I extract strings from some text file with powershell script?

I wanted to extract some strings from some text files. After some researching for that files, I found some pattern that strings appear in a text file.
I composed a short powershell script by help of google-search. This script receives two parameters (textfile path and extracting keyword) and operates extracting strings from text file.
As finding & extracting the target strings from the file $tpath\temp.txt, this script saves it to another file $tpath\tmpVI.txt.
Set-PSDebug -Trace 2 -step
$txtpath=$args[0]
$exkey=$args[1]
$tfile=gc "$tpath\temp.txt"
$savextracted="$tpath\tmpVI.txt"
$tfile -replace '&', '&' -replace '^.*$exkey', '' -replace '\s.*$', '' -replace '\\.*$','' | out-file "$savextracted" -encoding ascii
But until now, the extracted & saved result has been fault, never wanted strings.
By PS debugging, it seems the regular expressions in the last line make troubles and variable $exkey does so in replace quotation. But I don't know how to fix this. What shall I do?
If you're looking to capture lines that have your match, here's a snippet that solves that problem:
Function Get-Matches
{
Param(
[Parameter(Mandatory,Position=0)]
[String] $Path,
[Parameter(Mandatory,Position=1)]
[String] $Regex
)
#(Get-Content -Path $Path) -match $Regex
}

Powershell - How to UpperCase a string found with a Regex [duplicate]

This question already has answers here:
Lambda Expression in Powershell
(3 answers)
Closed 3 years ago.
I am writing a powershell script to parse the HTM file. I need to find all the links file in the file and then uppercase the filepath, filename and extention. (could be 30 or 40 links in any file). The part I'm having trouble with is the 2nd part of the -replace staement below (the 'XXXX' part). The regex WILL find the strings I'm looking for but I can't figure out how to 'replace' that string with a uppercase version, then update the existing file with a new links.
I hope I'm explaining this correctly. Appreciate any assistance that anyone can provide.
This is the code I have so far...
$FilePath = 'C:\WebDev'
$FileName = 'Class.htm'
[regex]$regex='(href="[^.]+\.htm")'
#Will Match the string href="filepath/file.htm"
( Get-Content "$FilePath\$FileName") -replace $regex , 'XXXX' | Set-Content "$FilePath\$FileName";
Final string that gets updated in the existing file should look like this HREF="FILEPATH/FILE.HTM"
Both beatcracker and briantist refer you to this answer, which shows the correct approach. Regex expressions cannot convert to uppercase, so you need to hook into the .NET String.ToUpper() function.
Instead of using -replace, use the .Replace() method on your $regex object (as described in the other answer). You also need the ForEach-Object construct so it gets called for each string in the pipeline. I've split up the last line for readability, but you can keep it on one line if you must.
$FilePath = 'C:\WebDev'
$FileName = 'Class.htm'
[regex]$regex='(href="[^.]+\.htm")'
(Get-Content "$FilePath\$FileName") |
ForEach-Object { $regex.Replace($_, {$args[0].Value.ToUpper()}) } |
Set-Content "$FilePath\$FileName"