How to use regex to remove everything except certain "key"/"character containing" - regex

Running my code gives me this output in a txt file:
19:27:28.636 ASSOS\032AB5601\0223-\032312DEEE8EB423._http._tcp.local. can
be reached at ASSOS-032DEEE8EB423.local.:80 (interface 1)
So I just want to parse out string "ASSOS-032DEEE8EB423.local" and remove everything else from the txt file. I can't figure out how to use regex to do so to remove everything except string containing ASSOS-. So the thing is that the string will always contain ASSOS- but the rest is always changing to different numbers. So I'm trying to always be able to get ASSOS-XXXXXXXXXXX.local
This is how I'm trying to do:
$string = 'Get-Content C:\MyFile.Txt'
$pattern = ''
$string -replace $pattern, ' '
It's just that I don't know so much about regex and how to write it to parse out string containing "ASSOS-" and remove everything after ASSOS-XXXXXXXXXXX.local

I would pipe the file content to Select-String and return the values of matches for a string starting with "ASSOS-", ending with "local" and having whatever non-whitespace characters in between:
Get-Content test.txt | Select-String -Pattern "ASSOS-\S*local" | ForEach-Object {$_.Matches.Value}

A possible solution:
$str = "19:27:28.636 ASSOS\032AB5601\0223-\032312DEEE8EB423._http._tcp.local. can
be reached at **ASSOS-032DEEE8EB423.local**.:80 (interface 1)"
$str -replace '.*\*\*(.*?)\*\*.*', '$1'
The RegEx .*\*\*(.*?)\*\*.* captures all characters within **...**. The * have to be escaped by a \ to make it work.

Related

Reg ex involving new line for Powershell script

I have a long text file that looks like this:
("B3501870","U00357"),
INSERT INTO [dbo].[Bnumbers] VALUES
("B3501871","U11019"),
("B3501899","U28503"),
I want every line before INSERT to end not with , but with ; instead.
So the end result should look like this:
("B3613522","U00357");
INSERT INTO [dbo].[Bnumbers] VALUES
("B3615871","U11019"),
("B3621899","U28503"),
I tried multiple ways to achieve this but it does not appear to work with multiple lines.
One way I tried was like this:
(Get-Content -path C:\temp\bnr\list.sql -Raw) -replace ",\nINSERT", ";\nINSERT" | Add-Content -Path C:\temp\bnr\test.sql
Tried with
[io.file]::ReadAllText("C:\temp\bnr\list.sql")
hoping it treat the file as one giant string but to no avail.
Any way to tell PS to find comma+newline+INSERT and do changes to it?
,\nINSERT
works on Sublime text with reg ex but not in PS.
You can use
(Get-Content -path C:\temp\bnr\list.sql -Raw) -replace ',(\r?\nINSERT)', ';$1'
Or,
(Get-Content -path C:\temp\bnr\list.sql -Raw) -replace ',(?=\r?\nINSERT)', ';'
See the regex demo.
The ,(?=\r?\nINSERT) regex matches a comma that is immediately followed with an optional CR char, then LF char, then INSERT text. The ,(\r?\nINSERT) variation captures the CRLF/LF ending + INSERT string into Group 1, hence the $1 backreference in the replacement pattern that puts this text back into the result.

Powershell: append text after string in file

Problem: I am trying to append a string after a tag. I got a large text file, and I only need to append some text after the tag (including the text xxxxxx) <xxxxxx>, and I cannot seem to figure it out just yet.
Currently im trying this with regex: <[(xxxxxx)]+>, which according to regex101.com does match the exact tag <xxxxxx>, but when I use this in Powershell it returns a lot of other stuff.
How can I make sure that Powershell only matches <xxxxxx> ? And to append some string after <xxxxxx> ?
Sample snippet from the text file: PredefinedSettings=<xxxxxx><abc test123 /abc></xxxxxx>
Sample PS command: Get-Content .\samplefile.ini | Select-String -Pattern "<[(xxxxxx)]+>"
Which returns the entire line PredefinedSettings=<xxxxxx><abc test123 /abc></xxxxx> instead of just <xxxxxx>
If you want to output just the matched text, you can do the following:
Select-String -Path sample.ini -Pattern '<(/?xxxxxx)>' -AllMatches | Foreach-Object {
$_.Matches.Groups[1].Value # Outputs matched text between `<>`
$_.Matches.Value # Outputs all matched text
}
The -AllMatches switch will allow matching beyond the first match. So it would return <xxxxxx> and </xxxxxx>.
If you want to replace text in a file, you can do the following:
(Get-Content .\samplefile.ini) -replace '<(/?xxxxxx)>','<$1Text>' |
Set-Content .\sampplefile.ini
If your replacement text is in a variable, you will need to escape the $ for the capture group.
$Text = 'replacement Text'
(Get-Content .\samplefile.ini) -replace '<(/?xxxxxx)>',"<`$1$Text>" |
Set-Content .\sampplefile.ini
$1 is the capture group 1 data matched within the first (). Depending on your Text, it may be wise to name your capture group. If Text is 23OtherText, <$123OtherText> will attempt to substitute capture group 123. Using a named capture group, you can do the following:
(Get-Content .\samplefile.ini) -replace '<(?<Tag>/?xxxxxx)>','<${Tag}Text>' |
Set-Content .\sampplefile.ini
/? matches zero or more / characters.
-replace will return all text not matched and all text replaced by the operator.
I hope I got your question right.
In regex Quantifiers are greedy so it will select from the first open tag to the last closing tag, you can change that by using a ?.
So your Regex will be <[(xxxxxx)]+?>.

Replace text between two string powershell

I have a question which im pretty much stuck on..
I have a file called xml_data.txt and another file called entry.txt
I want to replace everything between <core:topics> and </core:topics>
I have written the below script
$test = Get-Content -Path ./xml_data.txt
$newtest = Get-Content -Path ./entry.txt
$pattern = "<core:topics>(.*?)</core:topics>"
$result0 = [regex]::match($test, $pattern).Groups[1].Value
$result1 = [regex]::match($newtest, $pattern).Groups[1].Value
$test -replace $result0, $result1
When I run the script it outputs onto the console it doesnt look like it made any change.
Can someone please help me out
Note: Typo error fixed
There are three main issues here:
You read the file line by line, but the blocks of texts are multiline strings
Your regex does not match newlines as . does not match a newline by default
Also, the literal regex pattern must when replacing with a dynamic replacement pattern, you must always dollar-escape the $ symbol. Or use simple string .Replace.
So, you need to
Read the whole file in to a single variable, $test = Get-Content -Path ./xml_data.txt -Raw
Use the $pattern = "(?s)<core:topics>(.*?)</core:topics>" regex (it can be enhanced in case it works too slow by unrolling it to <core:topics>([^<]*(?:<(?!</?core:topics>).*)*)</core:topics>)
Use $test -replace [regex]::Escape($result0), $result1.Replace('$', '$$') to "protect" $ chars in the replacement, or $test.Replace($result0, $result1).

Powershell regex match sequence doesn't work although it matches in Sublime Text find and replace

I am trying to create a Powershell regex statement to remove the top five lines of this output from a git diff file that has already been modified with Powershell regex.
[1mdiff --git a/uk1.adoc b/uk2.adoc</span>+++
[1mindex b5d3bf7..90299b8 100644</span>+++
[1m--- a/uk1.adoc</span>+++
[1m+++ b/uk2.adoc</span>+++
[36m## -1,9 +1,9 ##</span>+++
= Heading
Body text
Image shown because binary code doesn't show in the text
The following statement matches the text so the '= Heading' line is placed at the top of the page if I replace with nothing.
^[^=]*.[+][\n]
But in Powershell, it isn't matching the text.
Get-Content "result2.adoc" | % { $_ -Replace '^[^=]*.[+][\n]', '' } | Out-File "result3.adoc";
Any ideas about why it doesn't work in Powershell?
My overall goal is to create a diff file of two versions of an AsciiDoc file and then replace the ASCII codes with HTML/CSS code to display the resulting AsciiDoc file with green/red track changes.
The simplest - and faster - approach is to read the input file as a single, multiline string with Get-Content -Raw and let the regex passed to -replace operate across multiple lines:
(Get-Content -Raw result2.adoc) -replace '(?s)^.+?\n(?==)' |
Set-Content result3.adoc
(?s) activates in-line option s which makes . match newline (\n) characters too.
^.+?\n(?==) matches from the start of the string (^) any number of characters (including newlines) (.+), non-greedily (?)
until a newline (\n) followed by a = is found.
(?=...) is a look-ahead assertion, which matches = without consuming it, i.e., without considering it part of the substring that matched.
Since no replacement operand is passed to -replace, the entire match is replace with the implied empty string, i.e., what was matched is effectively removed.
As for what you tried:
The -replace operator passes its LHS through if no match is found, so you cannot use it to filter out non-matching lines.
Even if you match an undesired line in full and replace it with '' (the empty string), it will show up as an empty line in the output when sent to Set-Content or Out-File (>).
As for your specific regex, ^[^=]*.[+][\n] (whether or not the first ^ is followed by an ESC (0x1b) char.):
[\n] (just \n would suffice) tries to match a newline char. after a literal + ([+]), yet lines read individually with Get-Content (without -Raw) by definition are stripped of their trailing newline, so the \n will never match; instead, use $ to match the end of a line.
Instead of % (the built-in alias for the ForEach-Object cmdlet) you could have used ? (the built-in alias for the Where-Object cmdlet) to perform the desired filtering:
Get-Content result2.adoc | ? { $_ -notmatch '^\e\[' }
$_ -notmatch '^\e[' returns $True only for lines that don't start (^) with an ESC character (\e, whose code point is 0x1b) followed by a literal (\) [, thereby effectively filtering out the lines before the = Heading line.
However, the multi-line -replace command at the top is a more direct and faster expression of your intent.
Here is the code I ended up with after help from #mklement0. This Powershell script creates MS Word-style track changes for two versions of an AsciiDoc file. It creates the Diff file, uses regex to replace ASCII codes with HTML/CSS tags, removes the Diff header (thank you!), uses AsciiDoctor to create an HTML file and then PrinceXML to create a PDF file of the output that I can send to document reviewers.
git diff --color-words file1.adoc file2.adoc > result.adoc;
Get-Content "result.adoc" | % {
$_ -Replace '(=+ ?)([A-Za-z\s]+)(\[m)', '$1$2' `
-Replace '\[32m', '+++<span style="color: #00cd00;">' `
-Replace '\[31m', '+++<span style="color: #cd0000; text-decoration: line-through;">' `
-Replace '\[m', '</span>+++' } | Out-File -encoding utf8 "result2.adoc" ;
(Get-Content -Raw result2.adoc) -replace '(?s)^.+?\n(?==)', '' | Out-File -encoding utf8 "result3.adoc" ;
asciidoctor result3.adoc -o result3.html;
prince result3.html --javascript -o result3.pdf;
Read-Host -Prompt "Press Enter to exit"
Here's a screenshot of the result using some text from Wikipedia:

Trim More than 20 Characters

I am working on a script that will generate AD usernames based off of a csv file. Right now I have the following line working.
Select-Object #{n=’Username’;e={$_.FirstName.ToLower() + $_.LastName.ToLower() -replace "[^a-zA-Z]" }}
As of right now this takes the name and combines it into a AD friendly name. However I need to name to be shorted to no more than 20 characters. I have tried a few different methods to shorten the username but I haven't had any luck.
Any ideas on how I can get the username shorted?
Probably the most elegant approach is to use a positive lookbehind in your replacement:
... -replace '(?<=^.{20}).*'
This expression matches the remainder of the string only if it is preceded by 20 characters at the beginning of the string (^.{20}).
Another option would be a replacement with a capturing group on the first 20 characters:
... -replace '^(.{20}).*', '$1'
This captures at most 20 characters at the beginning of the string and replaces the whole string with just the captured group ($1).
$str[0..19] -join ''
e.g.
PS C:\> 'ab'[0..19]
ab
PS C:\> 'abcdefghijklmnopqrstuvwxyz'[0..19] -join ''
abcdefghijklmnopqrst
Which I would try in your line as:
Select-Object #{n=’Username’;e={(($_.FirstName + $_.LastName) -replace "[^a-z]").ToLower()[0..19] -join '' }}
([a-z] because PowerShell regex matches are case in-senstive, and moving .ToLower() so you only need to call it once).
And if you are using Strict-Mode, then why not check the length to avoid going outside the bounds of the array with the delightful:
$str[0..[math]::Min($str.Length, 19)] -join ''
To truncate a string in PowerShell, you can use the .NET String::Substring method. The following line will return the first $targetLength characters of $str, or the whole string if $str is shorter than that.
if ($str.Length -gt $targetLength) { $str.Substring(0, $targetLength) } else { $str }
If you prefer a regex solution, the following works (thanks to #PetSerAl)
$str -replace "(?<=.{$targetLength}).*"
A quick measurement shows the regex method to be about 70% slower than the substring method (942ms versus 557ms on a 200,000 line logfile)