Powershell advanced regex to select from file

Powershell advanced regex to select from file - regex

I would like to search for a pattern in a file which I can do easily with something like:
gc $filename | select-string $pattern
However once I have found this first pattern, using the location (line) of the first match as a starting point I would then like to start searching for a second pattern. Once the second pattern has been matched I would then like to return all lines between the first and second matches, discarding the matched lines themselves.

Let's say your first pattern is pattern1 and the second pattern is pattern2
then expression would be (?<=pattern1)(.*?)(?=pattern2)
(?<=pattern1) - this will match prefix pattern but exclude it from capture
(?=pattern2) - this will match suffix pattern but exclude it from capture

There may be a more elegant way but this will work
function ParseFile
{
param([string] $FileName)
$s = gc $FileName;
for($x = 0 ; $X -lt $s.count; $x++)
{
if(-not $first ){
if($s[$x] -match "1000"){
$first =$x
}
}
else{
if($s[$x] -match "1075"){
$second = $x ;
break;
}
}
}
(($first +1) .. ($second -1))|%{
$ret += $s[$_]
}
return $ret;
}

I've used foreach with $foreach.Movenext():
foreach ($line in (Get-Content $file))
{
if ($line -match $firstTag)
{
do {
$line
$foreach.MoveNext()
} until ($foreach.current -match $secondTag)
continue
}
}
This will simply return each line one by one, but you can do what you like within the do-loop if you need to process the result in some way

Here is my one (a french bricolage ;o) ), imagine the file c:\temp\gorille.txt :
C'est à travers de larges grilles,
Que les femelles du canton,
Contemplaient un puissant gorille,
Sans souci du qu'en-dira-t-on.
Avec impudeur, ces commères
Lorgnaient même un endroit précis
Que, rigoureusement ma mère
M'a défendu de nommer ici...
Gare au gorille !...
Here is the text between "canton" and "endroit"
PS > (((Get-Content -Path C:\temp\gorille.txt) -join "£" | Select-String -Pattern "(?=canton)(.*)(?<=endroit)").matches[0].groups[0].value) -split "£"
canton,
Contemplaient un puissant gorille,
Sans souci du qu'en-dira-t-on.
Avec impudeur, ces commères
Lorgnaient même un endroit
I join all the lines with a special character "£" (choose onather one if used) then use #Alex Aza pattern in CmdLet Select-String then split again.

$start = select-string -Path $path -pattern $pattern1 -list |
select -expand linenumber
$end = select-string -path $path -pattern $pattern2 |
where-object {$_.linenumber -gt $start} |
sort linenumber -desc |
select -first 1 -expand linenumber
(get-content $path)[$start..($end -2)]

Related

Powershell Regex Function Deleting CRLFs

I am using the below function to create a JSON file from a SQL file. Unfortunately it is deleting the CRLF at the end of each line of the SQL file. I want it to keep them instead.
function GetStringBetweenTwoStrings($firstString, $secondString, $importPath){
>>
>> #Get content from file
>> $file = Get-Content $importPath
>>
>> #Regex pattern to compare two strings
>> $pattern = "$firstString(.*?)$secondString"
>>
>> #Perform the opperation
>> $result = [regex]::Match($file,$pattern).Groups[1].Value
>>
>> #Return result
>> return "{""sql"":"""+$result+"""}"
>>
>> }
I have tried using -raw but it does not seem to work
Thanks,
John

Interesting question
Unfortunately, I couldn't figure out a way to keep CRLF characters from `[regex]::Match` command.
It captures them fine but seems to return them as a single string by default.
If someone can figure that out, I'd be glad to see it.
Thanks to people much smarter than me, the following way with [regex]::match seems to work
function Get-StringBetweenTwoStrings {
[cmdletBinding()]
param (
$firstString,
$secondString,
$fullString
)
# Get content from file WITH -RAW
$file = Get-Content -Path $fullString -Raw
Write-Verbose $file -Verbose
# Regex pattern to compare two strings
$pattern = '{0}(.*?){1}' -f $firstString, $secondString
Write-Verbose $pattern -Verbose
# Perform the operation
$result = [regex]::Match($file, $pattern, 'SingleLine, MultiLine, IgnoreCase').Value
# Result
"{""sql"":""$result""}"
}
Test the code
Get-StringBetweenTwoStrings -firstString '(?<=GO)' -secondString '(?=GO)' -fullString .\Downloads\test.txt
Image
Workaround
When all else fails, I go back to brute force.
Start capturing when we see our $firstString, and keep capturing until we find our $secondString or reach the end.
Sample Data
$s = #'
# This is a random comment
GOSELECT TOP (1)
*
FROM dbo.Users
WHERE CaffeineLevel = 'Low';
# Can we get a cafGOfeine drip?
GO
# Why isn't this easier
'# -split '\r?\n'
Code
$capture = [Text.StringBuilder]::new()
$capturing = $false
$firstString = 'GO'
$secondString = 'GO'
foreach ($line in $s) {
if ($line -match $secondString -and $capturing) {
Write-Verbose "Stopping...$line" -Verbose
<#
In case we want to capture a partial line
look for everything UNTIL our second string
#>
$splitLine = ($line | Select-String -Pattern ".*(?=$secondString)").Matches.Value
Write-Verbose "Capturing: [$splitLine]" -Verbose
$null = $capture.AppendLine($splitLine)
$capturing = $false
<# second string found, stop altogether #>
break
}
if ($capturing) {
Write-Verbose "Capturing: [$line]" -Verbose
$null = $capture.AppendLine($line)
}
if ($line -match $firstString) {
Write-Verbose "Starting...$line" -Verbose
<#
In case we want to capture a partial line,
look for everything AFTER our first string
#>
$splitLine = ($line | Select-String -Pattern "(?<=$firstString).*").Matches.Value
Write-Verbose "Capturing: [$splitLine]" -Verbose
$null = $capture.AppendLine($splitLine)
$capturing = $true
}
}
$capture.ToString()
Dirty Testing Results

PowerShell Regex with csv file

I'm currently trying to match a pattern of IDs and replace with 0 or 1.
example pc0045601234 replace with 1234 the last 4 and add the 3rd digit in front "01234"
I tried the code below but the out only filled the userid column with No matching employee
$reportPath = '.\report.csv'`$reportPath = '.\report.csv'`
$csvPath = '.\output.csv'
$data = Import-Csv -Path $reportPath
$output = #()
foreach ($row in $data) {
$table = "" | Select ID,FirstName,LastName,userid
$table.ID = $row.ID
$table.FirstName = $row.FirstName
$table.LastName = $row.LastName
switch -Wildcard ($row.ID)
{
{$row.ID -match 'P\d\d\d\d\d\D\D\D'} {$table.userid = "Contractor"; continue}
{$row.ID -match 'SEC\d\d\d\D\D\D\D'} {$table.userid = "Contractor"; continue}
{$row.ID.StartsWith("P005700477")} {$table.userid = $row.ID -replace "P005700477","0477"; continue}
{$row.ID.StartsWith("P00570")} {$table.userid = $row.ID -replace "P00570","0"; continue}
default {$table.userid = "No Matching Employee"}
}
$output += $table
}
$output | Export-csv -NoTypeInformation -Path $csvPath

Here are three different ways to achieve the desired result. The first two use the same technique, just written in a different way.
First we put the sample data in a variable as a multiline string array. This is the equivalent as $text = Get-Content $somefile
$text = #'
PC05601234
PC15601234
'# -split [environment]::NewLine
Option 1 # convert to character array, select the 3rd and last 4 digits.
$text | foreach {-join ($_.ToChararray()| select -Skip 2 -First 1 -Last 4)}
Option 2 # same as above, requiring an extra -join to avoid spaces.
$text | foreach {(-join $_.ToChararray()| foreach{$_[2]+(-join $_[-4..-1])})}
Option 3 # my preference, regex. Capture the desired digits and replace the entire string with those two captured values.
$text -replace '^\D+(?!=\d)(\d)\w+([\d]{4}$)','$1$2'
All of these output
01234
11234
Further testing with different char/digit combinations and lengths.
$text = #'
PC05601234
PC15601234
PC0ABC124321
PC1DE4321
PC0A5678
PC1ABCD215678
'# -split [environment]::NewLine
Running the new sample data through each option all produce this output
01234
11234
04321
14321
05678
15678

Get the numbers after ":" and count them with the help of powershell

Could someone please help me with extracting and counting the numbers from a text file with PowerShell?
Example: c:\temp\1.txt is some text with semicolon and numbers after them. I need to sum all of these numbers.
blablabl:5 dzfdsfdsfsdfsf:10
sdfsdfsdfdffs:8sdfsfsfdsfdsf:111
5+10+8+111...
What I've tried so far:
$LogText = "C:\temp\1.txt"
[regex]$Regex = "\. (\d+):[1]"
$Matches = $Regex.Matches($LogText)
$Matches | ForEach-Object {
Write-Host $Matches
}
#$array = #()
#$array = new-object collections.arraylist
$array = while ($Matches.Success) {
Write-Host $array[i++]
}
# -------------------------------------------------------------------
$text = Get-Content "C:\temp\1.txt"
[regex]$Regex = "\d"
$Matches = $Regex.Matches($text)
# -------------------------------------------------------------------
$pos = $text.IndexOf(":")
$rightPart = $text.Substring($pos+1)
Write-Host $rightPart

Use Select-String to extract the matches from the file and Measure-Object to do the calculation.
Select-String -Path 'C:\temp\1.txt' -Pattern '(?<=:)\d+' -AllMatches |
Select-Object -Expand Matches |
Select-Object -Expand Value |
Measure-Object -Sum |
Select-Object -Expand Sum
(?<=:) is a positive lookbehind assertion to match the colon preceding the number without making it part of the match.

Try it like that:
$txt=
#"
blablabl:5 dzfdsfdsfsdfsf:10
sdfsdfsdfdffs:8sdfsfsfdsfdsf:111
"#
[regex]$Regex = '\d+'
$sum=0;
$Regex.Matches($txt) | ForEach-Object {
$val = [int]$_.Value
$val
$sum+=$val
}
$sum

PowerShell -split on Pipe Character

Consider the ASCII text file test1.txt:
a,b,c
d,e,f
And the following Powershell Script test1.ps1:
$input -split "`n" | ForEach-Object {
$row = $_ -split ","
$row[0]
}
The output is, as excpected:
a
d
However, if we change the separator to | everything fails as in test2.txt:
a|b|c
d|e|f
And the following Powershell Script test2.ps1:
$input -split "`n" | ForEach-Object {
$row = $_ -split "|"
$row[0]
}
The output is all but empty. Why does the -split fail?

It seems -split expects a regular expression and thus you need to escape the pipe as in:
$row = $_ -split "\|"
Or specify the SimpleMatch option to split on the literal string or character:
$row = $_ -split "|", 0, "SimpleMatch"
The 0 stands for MaxSubstrings: "The maximum number of substrings, by default all (0)."
Source: http://ss64.com/ps/split.html
Also: Get-Help about_Split

Regex replace using a substring of the regex result value

I've been reading a ton of material and thought I had found my solution but no luck. I need to find apostrophes contained in a name and then replace them with a double. I am loading a file to an array and then looping through that, looking for the apostrophes. The catch is that each row can have several apostrophes so that's why it's not a simple find and replace.
Here is a sample of the file:
create(xxxxxxx)using(xxxxxxx)name('O'Doe, John')
replace(xxxxxxx)instdata('ab 1234 ')
create(xxxxxxx)using(xxxxxxx)name('Doe, O'Jane')
replace(xxxxxxx)instdata('ab 5678 ')
There are other lines inbetween but they don't contain apostrophes.
Here is what I have so far:
$Pattern = "[A-Z]'[A-Z]"
$user = gc C:\Temp\mfnewuser.ins
for ($i = 0; $i -lt $user.count; $i++) {
if ($user[$i] -match $Pattern) {
$user[$i] = [regex]::replace($strText, $Pattern.substring(2,1), "''")
$user | out-file C:\Temp\mfnewuser.ins
}
}
I'm looking for a capital letter, followed by an apostrophe, followed by another capital. Because of the other commas, I can't just do a global replace. I know my pattern matching is working but I can't seem to manipulate it with the substring. The substring looks at $Pattern as a string instead of the result of a regex. If I can save the regex result to a variable, that would be great. I think then the replace would be easy.
Tried this as well but no luck either:
$Pattern = "[A-Z]'[A-Z]"
$NewPattern = "[A-Z]''[A-Z]"
$f = Get-Content C:\Temp\mfnewuser.ins
$f = $f -replace $Pattern, $NewPattern
$f | out-file C:\Temp\mfnewuser.ins
I may be approaching this all wrong and there is an easier way but I haven't seen anything yet.
EDIT:
Based on Bill_Stewarts example below, I've got this to work on the First Name but not yet the Last Name:
$Pattern = "[A-Z]'[A-Z]"
$user = gc C:\Temp\mfnewuser.ins
for ($i = 0; $i -lt $user.count; $i++) {
if ($user[$i] -match $Pattern) {
$user[$i] = $user[$i] -replace "(.*[A-Z])'([A-Z]+.*)", "`$1''`$2"
$user | out-file C:\Temp\mfnewuser.ins
}
}

Perhaps something like this?
get-content "test.txt" | foreach-object {
$_ -replace "([A-Z])'([A-Z])", "`$1''`$2"
}
Regular expressions can be grouped using ( ) and the -replace operator supports substring replacement ($1 and $2).

Replace your line, with the following.
$user[$i] = $user[$i] -replace "([A-Z])'([A-Z])", "`$1`''`$2"
Or try one of the following. This should suffice.
get-content "mfnewuser.ins" | foreach-object {
$_ -replace "([A-Z])'([A-Z])", "`$1`''`$2"
} | set-content "mfnewuser.ins"
...
get-content "mfnewuser.ins" | foreach-object {
$_ -replace "([a-zA-Z', ]+)'([a-zA-Z', ]+)", "`$1`''`$2"
} | set-content "mfnewuser.ins"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Powershell advanced regex to select from file - regex

Let's say your first pattern is pattern1 and the second pattern is pattern2 then expression would be (?<=pattern1)(.*?)(?=pattern2) (?<=pattern1) - this will match prefix pattern but exclude it from capture (?=pattern2) - this will match suffix pattern but exclude it from capture

$start = select-string -Path $path -pattern $pattern1 -list | select -expand linenumber $end = select-string -path $path -pattern $pattern2 | where-object {$_.linenumber -gt $start} | sort linenumber -desc | select -first 1 -expand linenumber (get-content $path)[$start..($end -2)]

Related

Powershell Regex Function Deleting CRLFs

PowerShell Regex with csv file

Get the numbers after ":" and count them with the help of powershell

PowerShell -split on Pipe Character

Regex replace using a substring of the regex result value

Categories

Resources