Powershell match Regex and Replace - regex

I have a large file that I am searching through to locate and replace invalid dates. I’m using a REGEX expression to locate the dates and then determining if they are valid or not. If the script finds an invalid date it needs to replace the date with the current date. For audit purposes I need to record the invalid string and the line number on which the error was found. So far (with some prior help to SO) I have been able to locate the invalid dates, but I have not been able to successfully change them.
This is the code I’m using to locate the invalid dates. How can I locate and change the date in a single pass?
$matchInfos = #(Select-String -Pattern $regex -AllMatches -Path $file)
foreach ($minfo in $matchInfos)
{
#"LineNumber $($minfo.LineNumber)"
foreach ($match in #($minfo.Matches | Foreach {$_.Groups[0].value}))
{
if (([Boolean]($match -as [DateTime]) -eq $false ) -or ([DateTime]::parseexact($match,"MM-dd-yyyy",$null).Year -lt "1800")) {
Write-host "Invalid date on line $($minfo.LineNumber) - $match"
#Add-Content -Path $LOGFILE -Value "Invalid date on line $($minfo.LineNumber) - $match"
# Replace the invalid date with a corrected one
Write-Host "Replacing $match with $(Get-Date -Format "MM-dd-yyyy")"
#Add-Content -Path $LOGFILE -Value "Replacing $match with $(Get-Date -Format "MM-dd-yyyy")"
}
}
}

You have to write out a temporary file with the changes and replace the file with the temporary. Here's one I wrote that will do that part for you:
Windows IT Pro: Replacing Strings in Files Using PowerShell
Example of use:
replace-filestring -pattern 'find' -replacement 'replace' -path myfile.txt -overwrite
With this command, the script will read myfile.txt, replace 'find' with 'replace', write the output to a temporary file, and then replace myfile.txt with the temporary file. (Without the -Overwrite parameter, the script will only output the contents of myfile.txt with the changes.)
Bill

$lines = get-content $file
$len = $lines.count
$bad = #{}
for($i=0;$i-lt$len;$i++){
if($lines[$i] -match ""){
$bad_date = $lines[$i].substring(10) #Get the bad date
$good_date = Get-Date -Format G
$bad["$i"] += #($line[$i])
$lines[$i] = $lines[$i].Replace($bad_date,$good_date)
}
}
$lines > $NewFile
$bad > $bad_date_file
Here is some pseudo code of how I would combat this problem. Not sure how big your file is. Reading and writing could be slow.

Related

PowerShell to match multiple lines with regex pattern

I write a Powershell script and regex to search two configs text files to find matches for Management Vlan. For example, each text file has two Management vlan configured as below:
Config1.txt
123 MGMT_123_VLAN
234 MGMT_VLAN_234
Config2.txt
890 MGMT_VLAN_890
125 MGMT_VLAN_USERS
Below is my script. It has several problems.
First, if I ran the script with the $Mgmt_vlan = Select-String -Path $File -Pattern $String -AllMatches then the screen output shows the expected four (4) Mgmt vlan, but in the CSV file output shows as follow
Filename Mgmt_vlan
Config1.txt System.Object[]
Config2.txt System.Object[]
I ran the script the output on the console screen shows exactly four (4) Management vlans that I expected, but in the CSV file it did not. It shows only these vlans
Second, if I ran the script with $Mgmt_vlan = Select-String -Path $File -Pattern $String | Select -First 1
Then the CSV shows as follows:
Filename Mgmt_vlan
Config1.txt 123 MGMT_123_VLAN
Config2.txt 890 MGMT_VLAN_890
The second method Select -First 1 appears to select only the first match in the file. I tried to change it to Select -First 2 and then CSV shows column Mgmt_Vlan as System.Object[].
The result output to the screen shows exactly four(4) Mgmt Vlans as expected.
$folder = "c:\config_folder"
$files = Get-childitem $folder\*.txt
Function find_management_vlan($Text)
{
$Vlan = #()
foreach($file in files) {
Mgmt_Vlan = Select-String -Path $File -Pattern $Text -AllMatches
if($Mgmt_Vlan) # if there is a match
{
$Vlan += New-Object -PSObject -Property #{'Filename' = $File; 'Mgmt_vlan' = $Mgmt_vlan}
$Vlan | Select 'Filename', 'Mgmt_vlan' | export-csv C:\documents\Mgmt_vlan.csv
$Mgmt_Vlan # test to see if it shows correct matches on screen and yes it did
}
else
{
$Vlan += New-Object -PSObject -Property #{'Filename' = $File; 'Mgmt_vlan' = "Mgmt Vlan Not Found"}
$Vlan | Select 'Filename', 'Mgmt_vlan' | Export-CSV C:\Documents\Mgmt_vlan.csv
}
}
}
find_management_vlan "^\d{1,3}\s.MGMT_"
Regex correction
First of all, there are a lot of mistakes in this code.
So this is probably not code that you actually used.
Secondly, that pattern will not match your strings, because if you use "^\d{1,3}\s.MGMT_" you will match 1-3 numbers, any whitespace character (equal to [\r\n\t\f\v ]), any character (except for line terminators) and MGMT_ chars and anything after that. So not really what you want. So in your case you can use for example this: ^\d{1,3}\sMGMT_ or with \s+ for more than one match.
Code Correction
Now back to your code... You create array $Vlan, that's ok.
After that, you tried to get all strings (in your case 2 strings from every file in your directory) and you create PSObject with two complex objects. One is FileInfo from System.IO and second one is an array of strings (String[]) from System. Inside the Export-Csv function .ToString() is called on every property of the object being processed. If you call .ToString() on an array (i.e. Mgmt_vlan) you will get "System.Object[]", as per default implementation. So you must have a collection of "flat" objects if you want to make a csv from it.
Second big mistake is creating a function with more than one responsibility. In your case your function is responsible for gathering data and after that for exporting data. That's a big no no. So repair your code and move that Export somewhere else. You can use for example something like this (i used get-content, because I like it more, but you can use whatever you want to get your string collection.
function Get-ManagementVlans($pattern, $files)
{
$Vlans = #()
foreach ($file in $files)
{
$matches = (Get-Content $file.FullName -Encoding UTF8).Where({$_ -imatch $pattern})
if ($matches)
{
$Vlans += $matches | % { New-Object -TypeName PSObject -Property #{'Filename' = $File; 'Mgmt_vlan' = $_.Trim()} }
}
else
{
$Vlans += New-Object -TypeName PSObject -Property #{'Filename' = $File; 'Mgmt_vlan' = "Mgmt Vlan Not Found"}
}
}
return $Vlans
}
function Export-ManagementVlans($path, $data)
{
#do something...
$data | Select Filename,Mgmt_vlan | Export-Csv "$path\Mgmt_vlan.csv" -Encoding UTF8 -NoTypeInformation
}
$folder = "C:\temp\soHelp"
$files = dir "$folder\*.txt"
$Vlans = Get-ManagementVlans -pattern "^\d{1,3}\sMGMT_" -files $files
$Vlans
Export-ManagementVlans -path $folder -data $Vlans```
Summary
But in my opinion in this case is overprogramming to create something like you did. You can easily do it in oneliner (but you didn't have information if the file doesn't include anything). The power of powershell is this:
$pattern = "^\d{1,3}\s+MGMT_"
$path = "C:\temp\soHelp\"
dir $path -Filter *.txt -File | Get-Content -Encoding UTF8 | ? {$_ -imatch $pattern} | select #{l="FileName";e={$_.PSChildName}},#{l="Mgmt_vlan";e={$_}} | Export-Csv -Path "$path\Report.csv" -Encoding UTF8 -NoTypeInformation
or with Select-String:
dir $path -Filter *.txt -File | Select-String -Pattern $pattern -AllMatches | select FileName,#{l="Mgmt_vlan";e={$_.Line}} | Export-Csv -Path "$path\Report.csv" -Encoding UTF8 -NoTypeInformation

Extract "Keywords" from a pdf plus the next 200 characters from the keyword in Windows Powershell

I have a powershell script to search a keyword and find from pdf documents, however what i would requires is to get the "Keyword" + next 200 characters.
The keyword in the below script is "Address" , regex is used to find the keyword. I tried several ways ,but any means I am no expert in this.
Also below script currently giving output in powershell itself , is there a way to get the output in csv format.
Below is the code:
$pdflist = Get-ChildItem -Path "C:\Users\U6013303\Desktop\Muni Refresh\DOC\old\4295479598" -Filter "*.pdf"
foreach ($pdff in $pdflist){
Add-Type -Path "C:\Users\U6013303\Desktop\Muni Refresh\Archives\itextsharp.dll"
$pdffile = $pdff.Name
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList "C:\Users\U6013303\Desktop\Muni Refresh\DOC\old\4295479598\$pdffile"
Write-Host "Reading file $pdffile" -BackgroundColor Black -ForegroundColor Green
for ($page = 1; $page -le $reader.NumberOfPages; $page++)
{
$strategy = new-object 'iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy'
$currentText = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page, $strategy);
[string[]]$Text += [system.text.Encoding]::UTF8.GetString([System.Text.ASCIIEncoding]::Convert( [system.text.encoding]::default , [system.text.encoding]::UTF8, [system.text.Encoding]::Default.GetBytes($currentText)));
}
$Text
[regex]::matches( $text, '(Address)' ) | select *
$reader.Close()
}
Thanks,
Garry
Simple change to your regex:
$results = [regex]::matches($text, 'Address.{200}')
Export to CSV:
$results | ConvertTo-Csv
# or
$results | Export-Csv -Path "c:\your-path\results.csv"
Or if you want just the actual values:
$results | select -ExpandProperty Value
Take look at this to see the change you need to make.
$Data = "
Match Sequence using RegEx After a Specified Character ...
https://stackoverflow.com/questions/10768924/match...
You have the correct regex only the tool you're using is highlighting the entire match and not just your capture group. Hover over the match.
"
[regex]::matches( $Data, 'You').value
# Results
<#
You
#>
[regex]::matches( $Data, 'You.{50}').value
# Results
<#
You have the correct regex only the tool you're using
#>
[regex]::matches( $Data, 'You.{100}').value
# Results
<#
You have the correct regex only the tool you're using is highlighting the entire match and not just you
#>
Notice '.Value' property because "[regex]::Matches" does not bring back a single string but an object that you must pick the value property to get the results.

How to change values in text files?

I am simply trying to create a powershell script that will change number values in a set of text files. The data in the text files are separated by semi-colons. The values I want to change are always the 2nd and 3rd tokens on each line of the text file.
An example of a line in one of the files:
"Bridge_Asphalt_F";202498.396728;1104.362183;9.721280;0.000000;0.000000;1.000000;-1.299559;
I want to allow the user of the script to enter values to be added to(or subtracted from) the 2nd and 3rd values in all the lines of all the text files in the current directory.
I have a very basic understanding of scripting, but I've been searching around for hours trying to wrap my head around how this would be accomplished.
This is what I have so far but I'm sure I'm getting a few things wrong:
$east = Read-Host 'Easting?'
$north = Read-Host 'Northing?'
Get-ChildItem *.txt |
Foreach-Object {
$c = ($_ | Get-Content)
$c = $c -replace $regexB,$regexB+$east
$c = $c -replace $regexC,$regexC+$north
[IO.File]::WriteAllText($_.FullName, ($c -join "`r`n"))
}
The values determine an object's location on a map (for a game) and I want to be able to move all objects on the entire map by a certain distance on both x and y axis.
Assuming that each line in the file has the same format as your example, then you can treat the file as a CSV and update it like this:
$offset2 = 100
$offset3 = 100
Import-Csv .\data.txt -Delimiter ';' -Header (1 .. 9) |
ForEach-Object {
$_.2 = ([double]$_.2) + $offset2
$_.3 = ([double]$_.3) + $offset3
$_
} | ConvertTo-Csv -NoTypeInformation -Delimiter ';' |
Select-Object -Skip 1 |
Add-Content .\updated.txt
Note:
ConvertTo-Csv surrounds each item with quotes, so you end up with something like this:
"Bridge_Asphalt_F";"202198.396728";"1104.362183";"9.721280";"0.000000";"0.000000";"1.000000";"-1.299559"
This may cause problems if this isn't expected by your game. If so, then some more processing on the pipeline could be done to strip it out.
Also, I've had issues in the past with trying to import and export to the same CSV file, hence my code outputs to a different file. Test it yourself and if it works with the same file, great, otherwise, copy my example, then add a line to replace the existing file with the new one (e.g. using Move-Item).
I guess that's what you need:
cls
cd C:\Users\dandraka\Desktop\test #or whereever
$eastStr = Read-Host 'Easting?'
$northStr = Read-Host 'Northing?'
# convert input to number
$east = [decimal]::Parse($eastStr)
$north = [decimal]::Parse($northStr)
# loop through files
$files = Get-ChildItem *.txt
$files | Foreach-Object {
$fileName = $_.FullName # just for clarity
Write-Host $fileName
$newLines = New-Object System.Collections.ArrayList
# loop through lines of each file
$lines = Get-Content -Path $fileName
$lines | ForEach-Object {
$line = $_.ToString() # just for clarity
$lineItems = $line -split ';'
$pointName = $lineItems[0]
$latitudeStr = $lineItems[1]
$longitudeStr = $lineItems[2]
# convert to number
$latitude = [decimal]::Parse($latitudeStr)
$longitude = [decimal]::Parse($longitudeStr)
Write-Host "$pointName latitude $latitude , longitude $longitude"
# do the math
$newLatitude = $latitude + $north
$newLongitude = $longitude + $east
Write-Host "$pointName new latitude $newLatitude , new longitude $newLongitude"
# recontruct the line
$newLine = ""
for($i=0; $i -lt $lineItems.Count; $i++) {
if ($i -eq 1) {
$newLine += "$newLatitude;"
continue
}
if ($i -eq 2) {
$newLine += "$newLongitude;"
continue
}
# this if fixes a small bug, without it there are two ; at the end of each line
if ($lineItems[$i].Length -gt 0) {
$newLine += "$($lineItems[$i]);"
}
}
Write-Host "Old line $line"
Write-Host "New line $newLine"
$newLines.Add($newLine) | Out-Null
}
# write file
$newFilename = $fileName.Replace(".txt", ".dat")
[System.IO.File]::WriteAllLines($newFilename, $newLines)
Write-Host "File $newFilename written"
}
A few things to note here:
As you mention that you're starting with powershell, I've written the code more verbose than I would for, say, a seasoned developer. But that actually doesn't hurt.
For the same reason, the code is sub-optimal on purpose (makes for easier to read code). But for better performance and large files (say, a few 10s of MB or more) you need to do things differently, e.g. avoid strings and use string builder instead.
Obviously you can comment out all the Write-Host statements, they're there just to help you make sure the code is working properly.
Hope that helps!
Jim
If your game cannot handle the quoted coordinate values you get when using ConvertTo-Csv or Export-Csv, this should update the values while leaving the quotes off:
$eastOffset = 100
$northOffset = -200
(Get-Content 'D:\coordinates.txt') | ForEach-Object {
$fields = $_ -split ';'
[double]$fields[1] += $eastOffset
[double]$fields[2] += $northOffset
# write the updated stuff to file
Add-Content -Path 'D:\newcoordinates.txt' -Value ($fields -join ';')
}
this content
"Bridge_Asphalt_F";202498.396728;1104.362183;9.721280;0.000000;0.000000;1.000000;-1.299559;
"Road_Asphalt_F";202123.396728;1104.362456;9.721280;0.000000;0.000000;1.000000;-1.299559;
would become
"Bridge_Asphalt_F";202598.396728;904.362183;9.721280;0.000000;0.000000;1.000000;-1.299559;
"Road_Asphalt_F";202223.396728;904.362456;9.721280;0.000000;0.000000;1.000000;-1.299559;

Search mutiple words using regular expression in powershell

I am new to powershell. I highly appreciate any help you can provide for the below. I have a powershell script but not being able to complete to get all the data fields from the text file.
I have a file 1.txt as below.
I am trying to extract output for "pid" and "ctl00_lblOurPrice" from the file in table format below so that I can get open this in excel. Column headings are not important. :
pid ctl00_lblOurPrice
0070362408 $6.70
008854787666 $50.70
Currently I am only able to get pid as below. Would like to also get the price for each pid. -->
0070362408
008854787666
c:\scan\1.txt:
This is sentence 1.. This is sentence 1.1... This is sentence A1...
fghfdkgjdfhgfkjghfdkghfdgh gifdgjkfdghdfjghfdg
gkjfdhgfdhgfdgh
ghfghfjgh
...
href='http://example.com/viewdetails.aspx?pid=0070362408'>
This is sentence B1.. This is sentence B2... This is sentence B3...
GFGFGHHGH
HHGHGFHG
<p class="price" style="display:inline;">
ctl00_lblOurPrice=$6.70
This is sentence 1.. This is sentence 1.1... This is sentence A1...
fghfdkgjdfhgfkjghfdkghfdgh gifdgjkfdghdfjghfdg
gkjfdhgfdhgfdgh
ghfghfjgh
...
href='http://example.com/viewdetails.aspx?pid=008854787666'>
This is sentence B1.. This is sentence B2... This is sentence B3...
6GBNGH;L
887656HGFHG
<p class="price" style="display:inline;">
ctl00_lblOurPrice=$50.70
...
...
Current powershell script:
$files=Get-ChildItem c:\scan -recurse
$output_file = ‘c:\output\outdata.txt’
foreach ($file in $files) {
$input_path = $file
$regex = ‘num=\d{1,13}’
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % {
($_.Value) -replace "num=","" } | Out-File $output_file -Append }
Thanks in advance for your help
I'm going to assume that you either mean pid=\d{1,13} in your code, or that your sample text should have read num= instead of pid=. We will go with the assumption that it is in fact supposed to be pid.
In that case we will turn the entire file into one long string with -Join "", and then split it on "href" to create records for each site to parse against. Then we match for pid= and ending when it comes across a non-numeric character, and then we look for a dollar amount (a $ followed by numbers, followed by a period, and then two more numbers).
When we have a pair of PID/Price matches we can create an object with two properties, PID and Price, and output that. For this I will assign it to an array, to be used later. If you do not have PSv3 or higher you will have to change [PSCustomObject][ordered] into New-Object PSObject -Property but that loses the order of properties, so I like the former better and use it in my example here.
$files=Get-ChildItem C:\scan -recurse
$output_file = 'c:\output\outdata.csv'
$Results = #()
foreach ($file in $files) {
$Results += ((gc $File) -join "") -split "href" |?{$_ -match "pid=(\d+?)[^\d].*?(\$\d*?\.\d{2})"}|%{[PSCustomObject][ordered]#{"PID"=$Matches[1];"Price"=$Matches[2]}}
}
$Results | Select PID,Price | Export-Csv $output_file -NoTypeInformation

Replace part of text in a file using batch

The file 'MyFile.txt' has a line in it and a part of that line I need replaced.
Example:
The line in the file is like this
53544THOIN91111160000000
I want to replace '111116' from the existing line in 'MyFile.txt', the thing here is '111116' is a variable and would keep changing. Its basically a Date with the format YYMMDD, i want to read the modified date from another file and replace these numbers in 'MyFile.txt'
Here is the code i tried.
set b=MyFile.txt
for /f "tokens= 1" %%c in (%b%) do (set line=%%c)
Set OLDDate=%line:~11,6%
SET filename="AnotherFile.txt"
FOR %%f IN (%filename%) DO SET filedatetime=%%~tf
SET Month=%filedatetime:~0,2%
SET Date=%filedatetime:~3,2%
SET Year=%filedatetime:~8,2%
SET NEWDate=%Year%%Month%%date%
ECHO OLD DATE = %OLDDate%
ECHO NEW DATE = %NEWDate%
I need %OLDDate% to be replaced by %NEWDate% in 'MyFile.txt' in the position ~11,6
Any reason why powershell couldn't do it?
# Example of PowerShell -replace parameter
clear-Host
$file = Get-ChildItem "D:\powershell\snippets\g*.txt"
foreach ($str in $file)
{
$content = Get-Content -path $str
$content | foreach {$_ -replace "the the", "the"} | Set-Content $str
}
write-Host "After replace `n"
$file
In the foreach loop you can replace any string with another.
Date logic can be used as follows:
$anotherfile = gi anotherfile.txt
Retrieving date info
$year = $anotherfile.LastWriteTime.Year