I am pretty new to powershell scripting.The scenario is that I have to replace the first occurrence of a string with different value and second occurrence with a different value.
So far, I have this :
$dbS = Select-String $repoPath\AcceptanceTests\sample.config -Pattern([regex]'dbServer = "#DB_SERVER#"')
write-output $dbS[0]
write-output $dbS[1]
This gives the output as :
D:\hg\default\AcceptanceTests\sample.config:5: dbServer = "#DB_SERVER#"
D:\hg\default\AcceptanceTests\sample.config:12: dbServer = "#DB_SERVER#"
I can see that both the occurrences are correct, and this returns a MatchInfo object.Now I need to replace the contents,I tried :
Get-Content $file | ForEach-Object { $_ -replace "dbserver",$dbS[0] } | Set-Content ($file+".tmp")
Remove-Item $file
Rename-Item ($file+".tmp") $file
But this replaces all occurence and that too with the entire path. Please help..
Here is what i have come up with:
$dbs = Select-String .\test.config -pattern([regex]'dbServer = "Test1"')
$file = Get-Content .\test.config
$dbs | % {$file[$_.linenumber-1] = $file[$_.linenumber-1] -replace "Test1", "Test3" }
set-content .\test.config $file
It cycles through all results of Select-String and uses its .LineNumber Property (-1) as array index to replace the text only in that line. Next we just set the content again.
If you want to assign different Values for occurance 1 and 2 you can do this:
#replace first occurance
$file[$dbs[0].LineNumber-1] = $file[$dbs[0].LineNumber-1] -replace "Test1", "Test2"
#replace second occurance
$file[$dbs[1].LineNumber-1] = $file[$dbs[1].LineNumber-1] -replace "Test1", "Test3"
This approach obviously only works if you know how many occurances you will have and which of them you want to replace.
Related
I need to read 10K+ files, search the files line by line, for the string of characters after the word SUFFIX. Once I capture that string I need to remove all traces of it from the file then re-save the file.
With the example below - I would capture -4541. Then I would replace all occurrences of -4541 with NULL.
Once I replace all the occurrences I then save the changes.
Here is my Data:
ABSDOMN VER 1 D SUFFIX -4541
05 ST-CTY-CDE-FMHA-4541
10 ST-CDE-FMHA-4541 9(2)
10 CTY-CDE-FMHA-4541 9(3)
05 NME-CTY-4541 X(20)
05 LST-UPDTE-DTE-4541 9(06)
05 FILLER X
Here is a starting script. I can Display the line that has the word SUFFIX but I cannot capture the string after it. In this case -4541.
$CBLFileList = Get-ChildItem -Path "C:\IDMS" -File -Recurse
$regex = "\bSUFFIX\b"
$treat = $false
ForEach($CBLFile in $CBLFileList) {
Write-Host "Processing .... $CBLFile" -foregroundcolor green
Get-content -Path $CBLFile.FullName |
ForEach-Object {
if ($_ -match $regex) {
Write-Host "Found Match - $_" -foregroundcolor green
$treat=$true
}
}
Try the following:
Note: Be sure to make backup copies of the input files first, as they will be updated in place. Use -Encoding with Set-Content to specify the desired encoding, if it should be different from Set-Content's default.
$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse
$regex = '(?<=SUFFIX) -\d+'
ForEach ($CBLFile in $CBLFileList) {
$firstLine, $remainingLines = $CBLFile | Get-Content
if ($firstLine -cmatch $regex) {
$toRemove = $Matches[0].Trim()
& { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
Set-Content -LiteralPath $CBLFile.FullName
}
}
Based on your feedback, the regex that worked for you in the end was (?<=SUFFIX).*$ (which could be simplified to (?<=SUFFIX).+ in this case), i.e. one that captures whatever follows substring SUFFIX, instead of only capturing a space followed by a - and one or more digits (\d+).
I am working on a powershell script and I've got several text files where I need to replace backslashes in lines which matches this pattern: .. >\\%name% .. < .. (.. could be anything)
Example string from one of the files where the backslashes should match:
<Tag>\\%name%\TST$\Program\1.0\000\Program.msi</Tag>
Example string from one of the files where the backslashes should not match:
<Tag>/i /L*V "%TST%\filename.log" /quiet /norestart</Tag>
So far I've managed to select every char between >\\%name% and < with this expression (Regex101):
(?<=>\\\\%name%)(.*)(?=<)
but I failed to select only the backslashes.
Is there a solution which I could not yet find?
I'd recommend selecting the relevant tags with an XPath expression and then do the replacement on the text body of the selected nodes.
$xml.SelectNodes('//Tag[substring(., 1, 8) = "\\%name%"]' | ForEach-Object {
$_.'#text' = $_.'#text' -replace '\\', '\\'
}
So here's my solution:
$original_file = $Filepath
$destination_file = $Filepath + ".new"
Get-Content -Path $original_file | ForEach-Object {
$line = $_
if ($line -match '(?<=>\\\\%name%)(.*)(?=<)'){
$line = $line -replace '\\','/'
}
$line
} | Set-Content -Path $destination_file
Remove-Item $original_file
Rename-Item $destination_file.ToString() $original_file.ToString()
So this will replace every \ with an / in the given pattern but not in the way which my question was about.
I am trying to return the highest 4 digit number found in string pattern, in a set of documents.
String Pattern: 3 Letters dash 4 Digits
The word documents contain within them a document identifier code such as below.
Sample Files:
Car Parts.docx > CPW - 2345
CarHandles.docx > CPW - 8723
CarList.docx > CPA - 9083
I have referenced sample code that I am trying to adapt. I am not a VBA or powershell programmer - so I may be wrong in what I am trying to do?
I am happy to look at alternatives - on a Windows platform.
I have referenced this to get me started
http://chris-nullpayload.rhcloud.com/2012/07/find-and-replace-string-in-all-docx-files-recursively/
PowerShell: return the number of instances find in a file for a search pattern
Powershell: return filename with highest number
$list = gci "C:\Users\WP\Desktop\SearchFiles" -Include *.docx -Force -recurse
foreach ($foo in $list) {
$objWord = New-Object -ComObject word.application
$objWord.Visible = $False
$objDoc = $objWord.Documents.Open("$foo")
$objSelection = $objWord.Selection
$Pat1 = [regex]'[A-Z]{3}-[0-9]{4}' # Find the regex match 3 letters followed by 4 numbers eg HGW - 1024
$findtext= "$Pat1"
$highestNumber =
# Find the highest occurrence of this pattern found in the documents searched - output to text file or on screen
Sort-Object | # This may also be wrong -I added it for when I find the pattern
Select-Object -Last 1 -ExpandProperty Name
<# The below may not be needed - ?
$ReplaceText = ""
$ReplaceAll = 2
$FindContinue = 1
$MatchFuzzy = $False
$MatchCase = $False
$MatchPhrase = $false
$MatchWholeWord = $True
$MatchWildcards = $True
$MatchSoundsLike = $False
$MatchAllWordForms = $False
$Forward = $True
$Wrap = $FindContinue
$Format = $False
$objSelection.Find.execute(
$FindText,
$MatchCase,
$MatchWholeWord,
$MatchWildcards,
$MatchSoundsLike,
$MatchAllWordForms,
$Forward,
$Wrap,
$Format,
$ReplaceText,
$ReplaceAll
}
}
#>
I appreciate any advice on how to proceed -
Try this:
# This library is needed to extact zip archives. A .docx is a zip archive
# .NET 4.5 or later is requried
Add-Type -AssemblyName System.IO.Compression.FileSystem
# This function gets plain text from a word document
# adapted from http://stackoverflow.com/a/19503654/284111
# It is not ideal, but good enough
function Extract-Text([string]$fileName) {
#Generate random temporary file name for text extaction from .docx
$tempFileName = [Guid]::NewGuid().Guid
#Extract document xml into a variable ($text)
$entry = [System.IO.Compression.ZipFile]::OpenRead($fileName).GetEntry("word/document.xml")
[System.IO.Compression.ZipFileExtensions]::ExtractToFile($entry,$tempFileName)
$text = [System.IO.File]::ReadAllText($tempFileName)
Remove-Item $tempFileName
#Remove actual xml tags and leave the text behind
$text = $text -replace '</w:r></w:p></w:tc><w:tc>', " "
$text = $text -replace '</w:r></w:p>', "`r`n"
$text = $text -replace "<[^>]*>",""
return $text
}
$fileList = Get-ChildItem "C:\Users\WP\Desktop\SearchFiles" -Include *.docx -Force -recurse
# Adapted from http://stackoverflow.com/a/36023783/284111
$fileList |
Foreach-Object {[regex]::matches((Extract-Text $_), '(?<=[A-Za-z]{3}\s*(?:-|–)\s*)\d{4}')} |
Select-Object -ExpandProperty captures |
Sort-Object value -Descending |
Select-Object -First 1 -ExpandProperty value
The main idea behind this is not to monkey around the COM api for Word, but instead just try and extract the text information from the document manually.
The way to get the highest number is first isolate it using a regex and then sort and select the first item. Something like this:
[regex]::matches($objSelection, '(?<=[A-Z]{3}\s*-\s*)\d{4}') `
| Select -ExpandProperty captures `
| sort value -Descending `
| Select -First 1 -ExpandProperty value `
| Add-Content outfile.txt
I think the problem you are having with your regex is that your example data contains spaces around the dash in the code which haven't allowed for in your pattern.
Is there any easy way to do this?
input: 123215-85_01_test
expected output: 01_test
Another example
input: 12154_02_test
expected output: 02_test
There will be always string "test", but different numbering before
for example this code..
$path = "c:\tmp\*.sql"
get-childitem $path | forEach-object {
$name = $_.Name
$result = $name -replace "","" # I don't know how write this regex..
$extension = $_.Extension
$newName = $prefix+"_"+ $result -f, $extension
Rename-Item -Path $_.FullName -NewName $newName
}
There are two ways you go go at this. Simple split and join or you can use one of many regexes....
Split on underscore and rejoin last 2 elements
$split = "123215-85_01_test" -split "_"
$split[-2..-1] -join "_" # $split[-2,-1] would also work.
Regex to locate the data between the last underscores
"123215-85_01_test" -replace "^.*_(\d+)_(.*)$", '$1_$2'
Note this fails if there is more than 2 underscores.
I am new to powershell. I highly appreciate any help you can provide for the below. I have a powershell script but not being able to complete to get all the data fields from the text file.
I have a file 1.txt as below.
I am trying to extract output for "pid" and "ctl00_lblOurPrice" from the file in table format below so that I can get open this in excel. Column headings are not important. :
pid ctl00_lblOurPrice
0070362408 $6.70
008854787666 $50.70
Currently I am only able to get pid as below. Would like to also get the price for each pid. -->
0070362408
008854787666
c:\scan\1.txt:
This is sentence 1.. This is sentence 1.1... This is sentence A1...
fghfdkgjdfhgfkjghfdkghfdgh gifdgjkfdghdfjghfdg
gkjfdhgfdhgfdgh
ghfghfjgh
...
href='http://example.com/viewdetails.aspx?pid=0070362408'>
This is sentence B1.. This is sentence B2... This is sentence B3...
GFGFGHHGH
HHGHGFHG
<p class="price" style="display:inline;">
ctl00_lblOurPrice=$6.70
This is sentence 1.. This is sentence 1.1... This is sentence A1...
fghfdkgjdfhgfkjghfdkghfdgh gifdgjkfdghdfjghfdg
gkjfdhgfdhgfdgh
ghfghfjgh
...
href='http://example.com/viewdetails.aspx?pid=008854787666'>
This is sentence B1.. This is sentence B2... This is sentence B3...
6GBNGH;L
887656HGFHG
<p class="price" style="display:inline;">
ctl00_lblOurPrice=$50.70
...
...
Current powershell script:
$files=Get-ChildItem c:\scan -recurse
$output_file = ‘c:\output\outdata.txt’
foreach ($file in $files) {
$input_path = $file
$regex = ‘num=\d{1,13}’
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % {
($_.Value) -replace "num=","" } | Out-File $output_file -Append }
Thanks in advance for your help
I'm going to assume that you either mean pid=\d{1,13} in your code, or that your sample text should have read num= instead of pid=. We will go with the assumption that it is in fact supposed to be pid.
In that case we will turn the entire file into one long string with -Join "", and then split it on "href" to create records for each site to parse against. Then we match for pid= and ending when it comes across a non-numeric character, and then we look for a dollar amount (a $ followed by numbers, followed by a period, and then two more numbers).
When we have a pair of PID/Price matches we can create an object with two properties, PID and Price, and output that. For this I will assign it to an array, to be used later. If you do not have PSv3 or higher you will have to change [PSCustomObject][ordered] into New-Object PSObject -Property but that loses the order of properties, so I like the former better and use it in my example here.
$files=Get-ChildItem C:\scan -recurse
$output_file = 'c:\output\outdata.csv'
$Results = #()
foreach ($file in $files) {
$Results += ((gc $File) -join "") -split "href" |?{$_ -match "pid=(\d+?)[^\d].*?(\$\d*?\.\d{2})"}|%{[PSCustomObject][ordered]#{"PID"=$Matches[1];"Price"=$Matches[2]}}
}
$Results | Select PID,Price | Export-Csv $output_file -NoTypeInformation