Powershell RegEx Match Index is always the same - regex

I'm using the following code snippet to search for several system names in a text file and save them in an array.
Now, I need to save the position of the matches but just get always only the position of the first match.
$pattern_sysname = '(?<=Computername).+?($)'
Get-Content $path | Foreach {if ([Regex]::IsMatch($_, $pattern_sysname)) {
$arr_sysname += [Regex]::Match($_, $pattern_sysname)
}
}
$arr_sysname.index
I need the position of every single match.

See this demo:
#demo data
#'
Computername12
This is Computername1
ComputernameABC
NotMatched
'# | out-file regex.test
$pattern_sysname = '(?<=Computername).+?$'
Select-String -Path regex.test -Pattern $pattern_sysname -AllMatches |
select LineNumber,#{N='OffsetInLine';E={$_.Matches[0].Index}}
Result:
LineNumber OffsetInLine
---------- ------------
1 12
2 20
3 12

Related

How to address "Select-String -pattern" return object via pipeline without using foreach loop?

Let's say I have a file with a following content:
1,first_string,somevalue
2,second_string,someothervalue
n,n_nd_string,somemorevalue
I need to Get-Content of this file, get the last string and get the number before the "," symbol (n in this case). (I'll just increment it and append n+1 string to this file, but it does not matter right now). I want all this stuff be done with pipeline cascade
I have come to this solution so far:
[int]$number = Get-Content .\1.mpcpl | Select-Object -Last 1 | Select-String -Pattern '^(\d)' | ForEach-Object {$_.matches.value}
It actually works, but I wonder if there are any ways of addressing Select-String -Pattern '^(\d)' return object without using the foreach loop? Beacause I know that the return collection in my case will only consist of a 1 element (I'm selecting a single last string of a file and I get only one match)
You may use
$num = [int]((Get-Content .\1.mpcpl | Select-Object -Last 1) -replace '^(\d).*', '$1')
Notes
(Get-Content .\1.mpcpl | Select-Object -Last 1) - reads the file and gets the last line
(... -replace '^(\d).*', '$1') - gets the first digit from the returned line (NOTE: if the line does not start with a digit, it will fail as the output will be the whole line)
[int] casts the string value to an int.
Another way can be getting a match and then retrieving it from the default $matches variable:
[IO.File]::ReadAllText($filepath) -match '(?m)^(\d).*\z' | Out-Null
[int]$matches[1]
The (?m)^(\d).*\z pattern gets the digit at the start of the last line into Group 1, hence $matches[1].
It looks like a csv to me...
import-csv 1.mpcpl -header field1,field2,field3 | select -last 1 | select -expand field1
output:
n

Export data to CSV different row using regex

I used a regular expression to extract a string from a file and export to CSV. I could figure out how to extract each match value to different rows. The result would end up in single cell
{ 69630e4574ec6798, 78630e4574ec6798, 68630e4574ec6798}
I need it to be in different rows in CSV as below:
69630e4574ec6798
78630e4574ec6798
68630e4574ec6798
$Regex = [regex]"\s[a-f0-9]{16}"
Select-Object #{Name="Identity";Expression={$Regex.Matches($_.Textbody)}} |
Format-Table -Wrap |
Export-Csv -Path c:\temp\Inbox.csv -NoTypeInformation -Append
Details screenshot:
Edit:
I have been trying to split the data I have in my CSV but I am having difficulty in splitting the output data "id" to next line as they all come in one cell "{56415465456489944,564544654564654,46565465}".
In the screenshot below the first couple lines are the source input and the highlighted lines in the second group is the output that I am trying to get.
Change your regular expression so that it has the hexadecimal substrings in a capturing group (to exclude the leading whitespace):
$Regex = [regex]"\s([a-f0-9]{16})"
then extract the first group from each match:
$Regex.Matches($_.Textbody) | ForEach-Object {
$_.Groups[1].Value
} | Set-Content 'C:\temp\a.txt'
Use Set-Content rather than Out-File, because the latter will create the output file in Unicode format by default whereas the former defaults to ASCII output (both cmdlets allow overriding the default via a parameter -Encoding).
Edit:
To split the data from you id column and create individual rows for each ID you could do something like this:
Import-Csv 'C:\path\to\input.csv' | ForEach-Object {
$row = $_
$row.id -replace '[{}]' -split ',' | ForEach-Object {
$row | Select-Object -Property *,#{n='id';e={$_}} -ExcludeProperty id
}
} | Export-Csv 'C:\path\to\output.csv' -NoType

Capturing multiple points of data from a text file using Regex in powershell

So I got this regex expression to work in Regex101 and it captures exactly what I want to capture. https://regex101.com/r/aJ1bZ4/3
But when I try the same thing in powershell all I get is the first set of matches. I've tried using the (?s:), the (?m:) but none of these modifiers seem to do the job. Here is my powershell script.
$reportTitleList = type ReportExecution.log | Out-String |
where {$_ -match "(?<date>\d{4}\/\d{2}\/\d{2}).*ID=(?<reportID>.*):.*Started.*Title=(?<reportName>.*)\[.*\n.*Begin ....... (?<reportHash>.*)"} |
foreach {
new-object PSObject -prop #{
Date=$matches['date']
ReportID=$matches['reportID']
ReportName=$matches['reportName']
ReportHash=$matches['reportHash']
}
}
$reportTitleList > reportTitleList.txt
What am I doing wrong? Why am I not getting all the matches as the regex101 example?
-match only find the first match. To use a global search you need to use [regex]::Matches() or Select-String with the -AllMatches switch. Ex:
#In PoweShell 3.0+ you can replace `Get-Content | Out-String` with `Get-Content -Raw`
$reportlist = Get-Content -Path ReportExecution.log | Out-String |
Select-String -Pattern $pattern -AllMatches |
Select-Object -ExpandProperty Matches |
Select-Object #{n="Date";e={$_.Groups["date"]}},
#{n="ReportID";e={$_.Groups["reportID"]}},
#{n="ReportName";e={$_.Groups["reportName"]}},
#{n="ReportHash";e={$_.Groups["reportHash"]}}
#Show output
$reportlist
Output:
Date ReportID ReportName ReportHash
---- -------- ---------- ----------
2015/03/23 578 Calendar Day Activity/Calendar Day Activity 38C19F4E790446709B8C7A32FF97BC...
2015/03/23 861 Program Format Report/Program Format Report 3C9CB2150AF14B15A1B361729C007B...
2015/03/23 1077 Multi-Station Program Availability/Multi-Station Program Availability 52526430EE4E401BA4376B38A2D88B...
2015/03/23 1299 Program Audit Trail/Program Audit Trail FDD1B7D9F34E46549A377A17B9A7A1...
2015/03/23 1541 Program Availability/Program Availability 843B44F4475C4950A7784C8961B642...
2015/03/23 1756 Program Description Export/Program Description Export E5800A76C68E4D5281B8D680DB2E93...
-match returns as soon as it finds a match (they should have a -matches operator right?). If you want multiple matches, use:
$mymatches = [regex]::matches($input,$pattern)
output will be different than -match, however, and you'll have to massage it a bit, something like: (see here for another example of conversion)
$mymatches | ForEach-Object { if ( $_.Success) { echo $_.value}}

Powershell Return the highest 4 Digit Number Found in a String Pattern - Search Word Documents

I am trying to return the highest 4 digit number found in string pattern, in a set of documents.
String Pattern: 3 Letters dash 4 Digits
The word documents contain within them a document identifier code such as below.
Sample Files:
Car Parts.docx > CPW - 2345
CarHandles.docx > CPW - 8723
CarList.docx > CPA - 9083
I have referenced sample code that I am trying to adapt. I am not a VBA or powershell programmer - so I may be wrong in what I am trying to do?
I am happy to look at alternatives - on a Windows platform.
I have referenced this to get me started
http://chris-nullpayload.rhcloud.com/2012/07/find-and-replace-string-in-all-docx-files-recursively/
PowerShell: return the number of instances find in a file for a search pattern
Powershell: return filename with highest number
$list = gci "C:\Users\WP\Desktop\SearchFiles" -Include *.docx -Force -recurse
foreach ($foo in $list) {
$objWord = New-Object -ComObject word.application
$objWord.Visible = $False
$objDoc = $objWord.Documents.Open("$foo")
$objSelection = $objWord.Selection
$Pat1 = [regex]'[A-Z]{3}-[0-9]{4}' # Find the regex match 3 letters followed by 4 numbers eg HGW - 1024
$findtext= "$Pat1"
$highestNumber =
# Find the highest occurrence of this pattern found in the documents searched - output to text file or on screen
Sort-Object | # This may also be wrong -I added it for when I find the pattern
Select-Object -Last 1 -ExpandProperty Name
<# The below may not be needed - ?
$ReplaceText = ""
$ReplaceAll = 2
$FindContinue = 1
$MatchFuzzy = $False
$MatchCase = $False
$MatchPhrase = $false
$MatchWholeWord = $True
$MatchWildcards = $True
$MatchSoundsLike = $False
$MatchAllWordForms = $False
$Forward = $True
$Wrap = $FindContinue
$Format = $False
$objSelection.Find.execute(
$FindText,
$MatchCase,
$MatchWholeWord,
$MatchWildcards,
$MatchSoundsLike,
$MatchAllWordForms,
$Forward,
$Wrap,
$Format,
$ReplaceText,
$ReplaceAll
}
}
#>
I appreciate any advice on how to proceed -
Try this:
# This library is needed to extact zip archives. A .docx is a zip archive
# .NET 4.5 or later is requried
Add-Type -AssemblyName System.IO.Compression.FileSystem
# This function gets plain text from a word document
# adapted from http://stackoverflow.com/a/19503654/284111
# It is not ideal, but good enough
function Extract-Text([string]$fileName) {
#Generate random temporary file name for text extaction from .docx
$tempFileName = [Guid]::NewGuid().Guid
#Extract document xml into a variable ($text)
$entry = [System.IO.Compression.ZipFile]::OpenRead($fileName).GetEntry("word/document.xml")
[System.IO.Compression.ZipFileExtensions]::ExtractToFile($entry,$tempFileName)
$text = [System.IO.File]::ReadAllText($tempFileName)
Remove-Item $tempFileName
#Remove actual xml tags and leave the text behind
$text = $text -replace '</w:r></w:p></w:tc><w:tc>', " "
$text = $text -replace '</w:r></w:p>', "`r`n"
$text = $text -replace "<[^>]*>",""
return $text
}
$fileList = Get-ChildItem "C:\Users\WP\Desktop\SearchFiles" -Include *.docx -Force -recurse
# Adapted from http://stackoverflow.com/a/36023783/284111
$fileList |
Foreach-Object {[regex]::matches((Extract-Text $_), '(?<=[A-Za-z]{3}\s*(?:-|–)\s*)\d{4}')} |
Select-Object -ExpandProperty captures |
Sort-Object value -Descending |
Select-Object -First 1 -ExpandProperty value
The main idea behind this is not to monkey around the COM api for Word, but instead just try and extract the text information from the document manually.
The way to get the highest number is first isolate it using a regex and then sort and select the first item. Something like this:
[regex]::matches($objSelection, '(?<=[A-Z]{3}\s*-\s*)\d{4}') `
| Select -ExpandProperty captures `
| sort value -Descending `
| Select -First 1 -ExpandProperty value `
| Add-Content outfile.txt
I think the problem you are having with your regex is that your example data contains spaces around the dash in the code which haven't allowed for in your pattern.

Regex to match only words without _ or -

I am trying to extract word out of a text file which contains exactly one word per each line. But I only want to match the word if there are no "_"(underscore) or "-" (dash) in the word:
File might look like :
< someword
< SomeOtherword
< wordwith-dash-anotherd
< wordwith_under_anotheru
I only want to extract line 1 & 2 and ignore line 3 & 4
(i.e. result when regex match each line should be: someword SomeOtherword without "<" and space for each line)
I have been trying with "[\w-]+" which matches words with both _ & -
I am using PowerShell regex engine.
I am processing a file with close to 100000 lines. I don't want to loop through each line as need the processing time to be very quick. code I am using:
$rx = '[\w-]+'
Get-Content $filename | Select-String -Pattern $rx -AllMatches | select -ExpandProperty Matches | select -ExpandProperty Value | out-file $outputfile
If you are performance sensitive, this approach is measurably faster (2.6 secs vs. 80 millisecs):
(Select-String '^[a-zA-Z]+$' file.txt -AllMatches).Matches.Value
This does require a feature that is new to PowerShell v3. You don't say which version you are using.
To do a regex match in powershell you can use either -match operator or select-string. There is also a -notmatch operator and a -NotMatch flag for select-string. Both filter for the absence of a match.
So one option is
gc 'file.txt' | where { $_ -notmatch '-|_' } | foreach { $_.Trim('<', ' ') }
and another is
gc 'file.txt' | select-string -NotMatch '-|_' | foreach { $_.Line.Trim('<', ' ') }