Replace text between two string powershell - regex

I have a question which im pretty much stuck on..
I have a file called xml_data.txt and another file called entry.txt
I want to replace everything between <core:topics> and </core:topics>
I have written the below script
$test = Get-Content -Path ./xml_data.txt
$newtest = Get-Content -Path ./entry.txt
$pattern = "<core:topics>(.*?)</core:topics>"
$result0 = [regex]::match($test, $pattern).Groups[1].Value
$result1 = [regex]::match($newtest, $pattern).Groups[1].Value
$test -replace $result0, $result1
When I run the script it outputs onto the console it doesnt look like it made any change.
Can someone please help me out
Note: Typo error fixed

There are three main issues here:
You read the file line by line, but the blocks of texts are multiline strings
Your regex does not match newlines as . does not match a newline by default
Also, the literal regex pattern must when replacing with a dynamic replacement pattern, you must always dollar-escape the $ symbol. Or use simple string .Replace.
So, you need to
Read the whole file in to a single variable, $test = Get-Content -Path ./xml_data.txt -Raw
Use the $pattern = "(?s)<core:topics>(.*?)</core:topics>" regex (it can be enhanced in case it works too slow by unrolling it to <core:topics>([^<]*(?:<(?!</?core:topics>).*)*)</core:topics>)
Use $test -replace [regex]::Escape($result0), $result1.Replace('$', '$$') to "protect" $ chars in the replacement, or $test.Replace($result0, $result1).

Related

Reg ex involving new line for Powershell script

I have a long text file that looks like this:
("B3501870","U00357"),
INSERT INTO [dbo].[Bnumbers] VALUES
("B3501871","U11019"),
("B3501899","U28503"),
I want every line before INSERT to end not with , but with ; instead.
So the end result should look like this:
("B3613522","U00357");
INSERT INTO [dbo].[Bnumbers] VALUES
("B3615871","U11019"),
("B3621899","U28503"),
I tried multiple ways to achieve this but it does not appear to work with multiple lines.
One way I tried was like this:
(Get-Content -path C:\temp\bnr\list.sql -Raw) -replace ",\nINSERT", ";\nINSERT" | Add-Content -Path C:\temp\bnr\test.sql
Tried with
[io.file]::ReadAllText("C:\temp\bnr\list.sql")
hoping it treat the file as one giant string but to no avail.
Any way to tell PS to find comma+newline+INSERT and do changes to it?
,\nINSERT
works on Sublime text with reg ex but not in PS.
You can use
(Get-Content -path C:\temp\bnr\list.sql -Raw) -replace ',(\r?\nINSERT)', ';$1'
Or,
(Get-Content -path C:\temp\bnr\list.sql -Raw) -replace ',(?=\r?\nINSERT)', ';'
See the regex demo.
The ,(?=\r?\nINSERT) regex matches a comma that is immediately followed with an optional CR char, then LF char, then INSERT text. The ,(\r?\nINSERT) variation captures the CRLF/LF ending + INSERT string into Group 1, hence the $1 backreference in the replacement pattern that puts this text back into the result.

Powershell script to replace link:lalala.html[lalala] with xref:lalala.adoc[lalala] capture pattern and replace recursively

I have a folder full of text documents in .adoc format that have some text in them. The text is following: link:lalala.html[lalala]. I want to replace this text with xref:lalala.adoc[lalala]. So, basically, just replace link: with xref:, .html with .adoc, leave all the rest unchanged.
But the problem is that lalala can be anything from a word to ../topics/halva.html.
I definitely know that I need to use regex patterns, I previously used similar script. A replace directive wrapped in an object:
Get-ChildItem -Path *.adoc -file -recurse | ForEach-Object {
$lines = Get-Content -Path $PSItem.FullName -Encoding UTF8 -Raw
$patterns = #{
'(\[\.dfn \.term])#(.*?)#' = '$1_$2_' ;
}
$option = [System.Text.RegularExpressions.RegexOptions]::Singleline
foreach($k in $patterns.Keys){
$pat = [regex]::new($k, $option)
$lines = $pat.Replace($lines, $patterns.$k)
}
$lines | Set-Content -Path $PSItem.FullName -Encoding UTF8 -Force
}
Looks like I need a different script since the new task cannot be added as just another object. I could've just replaced each part separately, using two objects: replace link: with xref:, then replace .html with .adoc.
But this can interfere with other links that end with .html and don't start with link:. In the text, absolute links usually don't have link: in the beginning. They always start with http:// or https://. And they still may or may not end with .html. So the best idea is to take the whole string link:lalala.html[lalala] and try to replace it with xref:lalala.adoc[lalala].
I need the help of someone who knows regex and PowerShell, please this would save me.
As a pattern, you might use
\blink:(.+?)\.html(?=\[[^][]*])
\blink: Match link:
(.+?) Capture 1+ chars as least as possbile in group 1
\.html match .html
(?=\[[^][]*]) Assert from an opening till closing square bracket at the right
Regex demo
In the replacement use group 1 using $1
xref:$1.adoc
Example
$Strings = #("link:lalala.html[lalala]", "link:../topics/halva.html[../topics/halva.html]")
$Strings -replace "\blink:(.+?)\.html(?=\[[^][]*])",'xref:$1.adoc'
Output
xref:lalala.adoc[lalala]
xref:../topics/halva.adoc[../topics/halva.html]

Using Regex to replace multiple lines of text in file

Basically, I have a .bas file that I am looking to update. Basically the script requires some manual configuration and I don't want my team to need to reconfigure the script every time they run it. What I would like to do is have a tag like this
<BEGINREPLACEMENT>
'MsgBox ("Loaded")
ReDim Preserve STIGArray(i - 1)
ReDim Preserve SVID(i - 1)
STIGArray = RemoveDupes(STIGArray)
SVID = RemoveDupes(SVID)
<ENDREPLACEMENT>
I am kind of familiar with powershell so what I was trying to do is to do is create an update file and to replace what is in between the tags with the update. What I was trying to do is:
$temp = Get-Content C:\Temp\file.bas
$update = Get-Content C:\Temp\update
$regex = "<BEGINREPLACEMENT>(.*?)<ENDREPLACEMENT>"
$temp -replace $regex, $update
$temp | Out-File C:\Temp\file.bas
The issue is that it isn't replacing the block of text. I can get it to replace either or but I can't get it to pull in everything in between.
Does anyone have any thoughts as to how I can do this?
You need to make sure you read the whole files in with newlines, which is possible with the -Raw option passed to Get-Content.
Then, . does not match a newline char by default, hence you need to use a (?s) inline DOTALL (or "singleline") option.
Also, if your dynamic content contains something like $2 you may get an exception since this is a backreference to Group 2 that is missing from your pattern. You need to process the replacement string by doubling each $ in it.
$temp = Get-Content C:\Temp\file.bas -Raw
$update = Get-Content C:\Temp\update -Raw
$regex = "(?s)<BEGINREPLACEMENT>.*?<ENDREPLACEMENT>"
$temp -replace $regex, $update.Replace('$', '$$')

How to use regex to remove everything except certain "key"/"character containing"

Running my code gives me this output in a txt file:
19:27:28.636 ASSOS\032AB5601\0223-\032312DEEE8EB423._http._tcp.local. can
be reached at ASSOS-032DEEE8EB423.local.:80 (interface 1)
So I just want to parse out string "ASSOS-032DEEE8EB423.local" and remove everything else from the txt file. I can't figure out how to use regex to do so to remove everything except string containing ASSOS-. So the thing is that the string will always contain ASSOS- but the rest is always changing to different numbers. So I'm trying to always be able to get ASSOS-XXXXXXXXXXX.local
This is how I'm trying to do:
$string = 'Get-Content C:\MyFile.Txt'
$pattern = ''
$string -replace $pattern, ' '
It's just that I don't know so much about regex and how to write it to parse out string containing "ASSOS-" and remove everything after ASSOS-XXXXXXXXXXX.local
I would pipe the file content to Select-String and return the values of matches for a string starting with "ASSOS-", ending with "local" and having whatever non-whitespace characters in between:
Get-Content test.txt | Select-String -Pattern "ASSOS-\S*local" | ForEach-Object {$_.Matches.Value}
A possible solution:
$str = "19:27:28.636 ASSOS\032AB5601\0223-\032312DEEE8EB423._http._tcp.local. can
be reached at **ASSOS-032DEEE8EB423.local**.:80 (interface 1)"
$str -replace '.*\*\*(.*?)\*\*.*', '$1'
The RegEx .*\*\*(.*?)\*\*.* captures all characters within **...**. The * have to be escaped by a \ to make it work.

How can I make this PowerShell script more efficient?

I am trying to make a script that takes an XML file, looks for a matching condition, if it finds it adds a new line of asteriks, then when done going through the file to strip it of all its XML tags and leave the data in a plain text file.
The script has been tested on a small input xml file and works fine, but when I pass a large XML file to it takes forever (not actually sure how long as I ran it for over an hour and still no result so I just stopped it).
I'm guessing I must be performing the work in an extremely inefficient manner, hoping you guys can help me make it fast and efficient.
Here is the script below:
# Takes input XML File, cleans up XML elements, outputs plain text file
$FileName = "C:\Users\someguy\Desktop\input.xml"
$Pattern = "ProcessSpecifier = ""true"""
$FileOriginal = Get-Content $FileName
[String[]] $FileModified = #()
Foreach ($Line in $FileOriginal)
{
$FileModified += $Line
if ($Line -match $Pattern)
{
#Add Lines after the selected pattern
$FileModified += "*************isActive=true*****************"
}
}
$FileModified -replace "<[^>]+>", "" | Out-File C:\Users\someguy\Desktop\Output.txt
Let's go with a look behind and a bunch of regex to speed things up here. Also, I'm not going to store the whole thing in memory, I'm just going to pass it down the pipeline, which should help. I remove whitespace from the beginning and ends of lines, and filter out blank lines, but you can remove that bit if you want.
# Takes input XML File, cleans up XML elements, outputs plain text file
$FileName = "C:\Users\someguy\Desktop\input.xml"
$Pattern = '(?<=^.*ProcessSpecifier = "true".*$)'
(Get-Content $FileName) -replace $Pattern, "`n*************isActive=true*****************" -replace '<[^>]+?>' -replace '^\s*|\s$' | ?{$_} | Set-Content C:\Users\someguy\Desktop\Output.txt
So, the main thing here is that I use a look behind to find your pattern text, and then add a new line and the asterisk line to that line. So that the line
<SomeTag>ProcessSpecifier = "true"</SomeTag>
becomes:
<SomeTag>ProcessSpecifier = "true"</SomeTag>`n*************isActive=true*****************
When used inside double quote a backtick ` followed by n creates a new line, so the '*************isActive=true*****************' is on its own line immediately following your search pattern line. Past that I remove the XML tags, and then any leading or trailing whitespace from any line.
After the RegEx replacements I pass the result to a Where statement that removes blank lines, and then pass the remaining lines to Set-Content which I've seen better performance out of than Out-File.
Variation of TheMadTechnician's answer:
# Takes input XML File, cleans up XML elements, outputs plain text file
$FileName = "C:\Users\someguy\Desktop\input.xml"
$Pattern = '(?<=^.*ProcessSpecifier = "true".*$)'
Set-Content -Path C:\Users\someguy\Desktop\Output.txt -Value (((Get-Content $FileName) -replace $Pattern, "`n*************isActive=true*****************" -replace '<[^>]+?>' -replace '^\s*|\s$').Where{$_})
I actually try to avoid the pipeline, it is rather slow afaik. Of course you will run into problem with memory consumption if the files are very large.
The "().Where" construct doesn't work on all powershell versions (Version 4+ iirc).
This is a guess, I am not sure whether this is actually faster than TheMadTechnician's. I'd be curious about the result :)