Requirement: Need way to handle Special characters like % and &. Need to tweak code below so that Special characters which come via $Control file are treated as it is.
For example: I have one of entry in $control file as 25% Dextrose(25ml). I need a way so that $ie.Navigate should simply navigate to https://www.xxxy.com/search/all?name=25% Dextrose(25ml). Currently it gets routed to https://www.xxxy.com/search/all?name=25%% Dextrose(25ml) (note a extra % in URL) and thus does not find that web-page.
**Few examples of special characters that need to be tackled:**
'/' - 32care Mouth/Throat
'%' - 3d1% Gel(30g)
'&' - Accustix Glucose & Protein
'/' - Ace Revelol(25/(2.5mg)
function getStringMatch
{
# Loop through all 2 digit combinations in the $path directory
foreach ($control In $controls)
{
$ie = New-Object -COMObject InternetExplorer.Application
$ie.visible = $true
$site = $ie.Navigate("https://www.xxxy.com/search/all?name=$control")
$ie.ReadyState
while ($ie.Busy -and $ie.ReadyState -ne 4){ sleep -Milliseconds 100 }
$link = $null
$link = $ie.Document.get_links() | where-object {$_.innerText -eq "$control"}
$link.click()
while ($ie.Busy -and $ie.ReadyState -ne 4){ sleep -Milliseconds 100 }
$ie2 = (New-Object -COM 'Shell.Application').Windows() | ? {
$_.Name -eq 'Windows Internet Explorer' -and $_.LocationName -match "^$control"
}
# NEED outerHTML of new page. CURRENTLY it is working for some.
$ie.Document.body.outerHTML > d:\med$control.txt
}
}
$controls = "Sporanox"
getStringMatch
You want to URL encode the URI. Add this at the very start:
Add-Type -AssemblyName 'System.Web'
And then encode the URL like this:
$controlUri = [System.Web.HttpUtility]::UrlEncode($control)
$site = $ie.Navigate("https://www.xxxy.com/search/all?name=$controlUri")
As Biffen pointed out, Web servers will treat special characters as codes. So in your case, $control needs to be modified so that the Web server understands where you want to go.
One way to fix it is the look for specific characters in the original product name you are looking for, and replace them with something that the server will understand:
Here is the entire code:
function getStringMatch
{
# Loop through all 2 digit combinations in the $path directory
foreach ($control In $controls)
{
$ie = New-Object -COMObject InternetExplorer.Application
$ie.visible = $true
$s = $control -replace '%','%25'
$s = $s -replace ' ','+'
$s = $s -replace '&','%26'
$s = $s -replace '/','%2F'
$site = $ie.Navigate("https://www.xxxy.com/search/all?name=$s")
while ($ie.Busy -and $ie.ReadyState -ne 4){ sleep -Milliseconds 100 }
$link = $null
$link = $ie.Document.get_links() | where-object {if ($_.innerText){$_.innerText.contains($control)}}
$link.click()
while ($ie.Busy){ sleep -Milliseconds 100 }
$ie.Document.body.outerHTML > d:\TEMP\med$control.txt
}
}
$controls = "Accustix Glucose & Protein"
getStringMatch
I tried with the following strings:
"3d1% Gel(30g)"
"Ace Revelol(25/2mg)"
"Accustix Glucose & Protein"
"32care Mouth/Throat"
Related
I have following beggining of a Powershell script in which I would like to replace the values of variables for different enviroment.
$SomeVar1 = "C:\path\to\file\a"
$SomeVar1 = "C:\path\to\file\a" # Copy for test - Should not be rewriten
$SomeVar2 = "C:\path\to\file\b"
# Note $SomeVar1 = "C:\path\to\file\a" - Should not be rewriten
When I run the rewrite script, the result should look like this:
$SomeVar1 = "F:\different\path\to\file\a"
$SomeVar1 = "C:\path\to\file\a" # Copy for test - Should not be rewrite
$SomeVar2 = "F:\different\path\to\file\b"
# Note $SomeVar1 = "C:\path\to\file\a" - Should not be rewriten
Current script that does(n't) rewrite:
$arr = #(
[PSCustomObject]#{Regex = '$SomeVar1 = "'; Replace = '$SomeVar1 = "F:\different\path\to\file\a"'}
[PSCustomObject]#{Regex = '$SomeVar2 = "'; Replace = '$SomeVar1 = "F:\different\path\to\file\b"'}
)
for ($i = 0; $i -lt $arr.Length; $i++) {
$ArrRegex = [Regex]::Escape($arr[$i].Regex)
$ArrReplace = $arr[$i].Replace
# Get full line for replacement
$Line = Get-Content $Workfile | Select-String $ArrRegex | Select-Object -First 1 -ExpandProperty Line
# Rewrite part
$Line = [Regex]::Escape($Line)
$Content = Get-Content $Workfile
$Content -replace "^$Line",$ArrReplace | Set-Content $Workfile
}
This replaces all the occurences in file on the start of the line (and I need only the 1st one) and doest not replace the one in Note which is okay.
Then I found this Powershell: Replace last occurence of a line in a file which does the exact oposite of what I need, only rewrites the last occurence of the string and it does it in the Note aswell and I would somehow like to change it to do the opposite - 1st occurence, line begining (Wont target the Note)
Code in my case looks like this:
# Rewrite part
$Line = [Regex]::Escape($Line)
$Content = Get-Content $Workfile -Raw
$Line = "(?s)(.*)$Line"
$ArrReplace = "`$1$ArrReplace"
$Content -replace $Line,$ArrReplace | Set-Content $Workfile
Do you have any recommendations on how to archive my goal, or is there a more sothisticated way to replace variables for powershell scripts like this?
Thanks in advance.
So I finally figured it out, I had to add Select-String "^$ArrRegex" during $Line creation to exclude any string that were on on line beggining and then use this Regex to do the job: ^(?s)(.*?\n)$Line
In my case it does the following: Only selects 1st occurnece on the beggining of the line and replaces it. It ignores everything else and when re-run, does not rewrite others. The copies of vars will not really exist in final version and will be set once like $Var1 = "Value" and never changed during script, but I wanted to be sure that I won't replace something by mistake.
The final replacing part does look like this:
for ($i = 0; $i -lt $arr.Length; $i++) {
$ArrRegex = [Regex]::Escape($arr[$i].Regex)
$ArrReplace = $arr[$i].Replace
$Line = Get-Content $Workfile | Select-String "^$ArrRegex" | Select-Object -First 1 -ExpandProperty Line
$Line = [Regex]::Escape($Line)
$Line = "^(?s)(.*?\n)$Line"
$ArrReplace = "`$1$ArrReplace"
$Content -replace $Line, $ArrReplace | Set-Content $Workfile
}
You could possibly use flag variables like below to only do the first replacement for each of your regex patterns.
$Altered = Get-Content -Path $Workfile |
Foreach-Object {
if(-not $a) { #If replacement hasn't been done, replace
$_ = $_ -replace 'YOUR_REGEX1','YOUR_REPLACEMENT1'
if($_ -match 'YOUR_REPLACEMENT1') { $a = 'replacement done' } #Set Flag
}
if(-not $b) { #If replacement hasn't been done, replace
$_ = $_ -replace 'YOUR_REGEX2','YOUR_REPLACEMENT2'
if($_ -match 'YOUR_REPLACEMENT2') { $b = 'replacement done' } #Set Flag
}
$_ # Pipe back to $Altered
}
$Altered | Set-Content -Path $WorkFile
Just reverse the RegEx, if that is what you are after:
Clear-Host
#'
abc
abc
abc
'# -replace '^(.*?)\babc\b', '$1HelloWorld'
# Results
<#
HelloWorld
abc
abc
#>
I am using the below function to create a JSON file from a SQL file. Unfortunately it is deleting the CRLF at the end of each line of the SQL file. I want it to keep them instead.
function GetStringBetweenTwoStrings($firstString, $secondString, $importPath){
>>
>> #Get content from file
>> $file = Get-Content $importPath
>>
>> #Regex pattern to compare two strings
>> $pattern = "$firstString(.*?)$secondString"
>>
>> #Perform the opperation
>> $result = [regex]::Match($file,$pattern).Groups[1].Value
>>
>> #Return result
>> return "{""sql"":"""+$result+"""}"
>>
>> }
I have tried using -raw but it does not seem to work
Thanks,
John
Interesting question
Unfortunately, I couldn't figure out a way to keep CRLF characters from `[regex]::Match` command.
It captures them fine but seems to return them as a single string by default.
If someone can figure that out, I'd be glad to see it.
Thanks to people much smarter than me, the following way with [regex]::match seems to work
function Get-StringBetweenTwoStrings {
[cmdletBinding()]
param (
$firstString,
$secondString,
$fullString
)
# Get content from file WITH -RAW
$file = Get-Content -Path $fullString -Raw
Write-Verbose $file -Verbose
# Regex pattern to compare two strings
$pattern = '{0}(.*?){1}' -f $firstString, $secondString
Write-Verbose $pattern -Verbose
# Perform the operation
$result = [regex]::Match($file, $pattern, 'SingleLine, MultiLine, IgnoreCase').Value
# Result
"{""sql"":""$result""}"
}
Test the code
Get-StringBetweenTwoStrings -firstString '(?<=GO)' -secondString '(?=GO)' -fullString .\Downloads\test.txt
Image
Workaround
When all else fails, I go back to brute force.
Start capturing when we see our $firstString, and keep capturing until we find our $secondString or reach the end.
Sample Data
$s = #'
# This is a random comment
GOSELECT TOP (1)
*
FROM dbo.Users
WHERE CaffeineLevel = 'Low';
# Can we get a cafGOfeine drip?
GO
# Why isn't this easier
'# -split '\r?\n'
Code
$capture = [Text.StringBuilder]::new()
$capturing = $false
$firstString = 'GO'
$secondString = 'GO'
foreach ($line in $s) {
if ($line -match $secondString -and $capturing) {
Write-Verbose "Stopping...$line" -Verbose
<#
In case we want to capture a partial line
look for everything UNTIL our second string
#>
$splitLine = ($line | Select-String -Pattern ".*(?=$secondString)").Matches.Value
Write-Verbose "Capturing: [$splitLine]" -Verbose
$null = $capture.AppendLine($splitLine)
$capturing = $false
<# second string found, stop altogether #>
break
}
if ($capturing) {
Write-Verbose "Capturing: [$line]" -Verbose
$null = $capture.AppendLine($line)
}
if ($line -match $firstString) {
Write-Verbose "Starting...$line" -Verbose
<#
In case we want to capture a partial line,
look for everything AFTER our first string
#>
$splitLine = ($line | Select-String -Pattern "(?<=$firstString).*").Matches.Value
Write-Verbose "Capturing: [$splitLine]" -Verbose
$null = $capture.AppendLine($splitLine)
$capturing = $true
}
}
$capture.ToString()
Dirty Testing Results
I´m currently working on a script that should based on user´s choice replace two lines in a file after a matching string.
The file I want to edit looks like this:
[default]
string_a=sadasdasdas
string_b=dasdasdasdas
[profile1]
string_a=xxxxxx
string_b=xsaassaasas
[profile2]
string_a=yyyyyyy
string_b=yaayayayaya
I want always to override string_a & string_b after [default].
Note that [default] could also be at the very bottom of the file, therefore I cannot just count lines an do it that static.
The user can pick between (in this case) profile 1 & profile 2. After he picked e.g profile 2, string_a & string_b of profile2 should be replaced with string_a & string_b of default.
My current code like like this:
$filePath = './credentials'
$fileContent = Get-Content $filePath
$profiles = [regex]::Matches($fileContent, '\[(.*?)\]') |ForEach-Object { $_.Groups[1].Value }
Write-Host "Following profiles found: "
for ($i=0; $i -lt $profiles.length; $i++){
Write-Host $i"." $profiles[$i]
}
$userInput = Read-Host "Which profile set to default? "
Write-Host $profiles[$userInput]
$fileContent | Select-String $profiles[$userInput] -Context 1,2 | ForEach-Object {
$stringA = $_.Context.PostContext[0]
$stringB = $_.Context.PostContext[1]
#At this point I have access to the both string´s I want to replace the string´s of the default profile
# I could do this, but then I still have the old lines in the file...
# So the following code is not an option.
$NewContent = Get-Content -Path $filePath |
ForEach-Object {
# Output the existing line to pipeline in any case
$_
# If line matches regex
if($_ -match ('^' + [regex]::Escape('[default]')))
{
# Add output additional line
$stringA
$stringB
}
}
# Write content of $NewContent varibale back to file
$NewContent | Out-File -FilePath $filePath -Encoding Default -Force
}
Example output file, in case the user picked profile1 as the new default
[default]
string_a=xxxxxx
string_b=xsaassaasas
[profile1]
string_a=xxxxxx
string_b=xsaassaasas
[profile2]
string_a=yyyyyyy
string_b=yaayayayaya
Hope this is not obvious, but as it is my first real powershell script I was not able to find a solution for my problem yet.
Any help would be great!
Thanks
Example:
# This is sample data
$lines = #(#'
[default]
string_a=sadasdasdas
string_b=dasdasdasdas
[profile1]
string_a=xxxxxx
string_b=xsaassaasas
[profile2]
string_a=yyyyyyy
string_b=yaayayayaya
'# -split "`r`n")
# In real world use:
# $encoding = [System.Text.Encoding]::ASCII
# $lines = [System.IO.File]::ReadAllLines($path, $encoding)
#Read file
$lines = $lines | ForEach-Object { $_.Trim()} # Trim spaces
$sections = #{}
$currentSection = $null
$hasErrors = $false
$lines | ForEach-Object {
if ([String]::IsNullOrWhiteSpace($_)) {
#ignore
} elseif ($_.StartsWith('[') -and $_.EndsWith(']') ) {
$currentSection = $_.Substring($_.IndexOf('[') + 1, $_.LastIndexOf(']') - 1)
$sections[$currentSection] = #{}
} elseif ($sections.ContainsKey($currentSection)) {
$PVPair = [String[]]$_.Split('=',2)
if ($PVPair.Count -eq 2) {
$sections[$currentSection][$PVPair[0]] = $PVPair[1]
} else {
Write-Warning -Message "Wrong line format [$($_)]"
$hasErrors = $true
}
} else {
Write-Warning -Message "Unexpected behaviour on section $currentSection, line $($_)"
$hasErrors = $true
}
}
if ($hasErrors) {
Write-Error -Message 'Errors occured'
return
}
# Choice
$choice = $null
$choiceVariants = #($sections.Keys | Where-Object { $_ -ne 'default' })
while ($choiceVariants -notcontains $choice) {
Write-Host "Choose between $($choiceVariants -join ',')"
$choice = $choiceVariants | Out-GridView -Title 'Choose variant' -OutputMode Single
#Alternative: $choice = Read-Host -Prompt "Your choice"
}
Write-Host -ForegroundColor Yellow "You choose $($choice)"
# Change
$sections[$choice]['string_a'] = $sections['default']['string_a']
$sections[$choice]['string_b'] = 'newXSAA'
# Output
$outputLines = $sections.Keys | ForEach-Object {
$sectionName = $_
Write-Output "[$sectionName]"
$sections[$sectionName].Keys | ForEach-Object {
Write-Output "$_=$($sections[$sectionName][$_])"
}
}
# This is sample output
$outputLines | % { Write-Host $_ -f Magenta }
# In Real world:
# [System.IO.File]::WriteAllLines($path, $outputLines, $encoding)
I have to replace multiple strings with the same pattern, and several strings are on the same line. The replacement value should be incremental. I need to match and replace only the pattern as in the example, not requesId, nor messageId.
Input:
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv4d-zxcv56</requestId>
<requestId>1234qw-12qw9x-123456</requestId> Stevie Wonder <messageId>1234qw-12qw9x-123456</msg
reportId>plmkjh8765FGH4rt6As</msg:reportId> something <keyID>qwer1234asdf5678zxcv0987bnml65gh</msgdc
The desired output should be:
<requestId>Request-1</requestId>Ace of Base Order: Request-2<something else...
<requestId>Request-3</requestId>
<requestId>Request-4</requestId> Stevie Wonder <messageId>Request-4</msg
reportId>ReportId-1</msg:reportId> something <keyId>KeyId-1</msg
The regex finds all matching values but I cannot make the loop and replace these values. The code I am trying to make work is:
#'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
reportId>plmkjh8765FGH4rt6As</msg:reportId> something <keyID>qwer1234asdf5678zxcv0987bnml65gh</msgdc
'# | Set-Content $log -Encoding UTF8
$requestId = #{
Count = 1
Matches = #()
}
$tmp = Get-Content $log | foreach { $n = [regex]::matches((Get-Content $log),'\w{6}-\w{6}-\w{6}').value
if ($n)
{
$_ -replace "$n", "Request-$($requestId.count)"
$requestId.count++
} $_ }
$tmp | Set-Content $log
You want Regex.Replace():
$requestId = 1
$tmp = Get-Content $log |ForEach-Object {
[regex]::Replace($_, '\w{6}-\w{6}-\w{6}', { 'Request-{0}' -f ($script:requestId++) })
}
$tmp |Set-Content $log
The script block will run once per match to calculate the substitue value, allowing us to resolve and increment the $requestId variable, resulting in the consecutive numbering you need.
You can do this for multiple patterns in succession if necessary, although you may want to use an array or hashtable for the individual counters:
$counters = { requestId = 1; keyId = 1 }
$tmp = Get-Content $log |ForEach-Object {
$_ = [regex]::Replace($_, '\w{6}-\w{6}-\w{6}', { 'Request-{0}' -f ($counters['requestId']++) })
[regex]::Replace($_, '\b\w{32}\b', { 'Key-{0}' -f ($counters['keyId']++) })
}
$tmp |Set-Content $log
If you want to capture and the mapping between the original and the new value, do that inside the substitution block:
$translations = #{}
# ...
[regex]::Replace($_, '\w{6}-\w{6}-\w{6}', {
# capture value we matched
$original = $args[0].Value
# generate new value
$substitute = 'Request-{0}' -f ($counters['requestId']++)
# remember it
$translations[$substitute] = $original
return $substitute
})
In PowerShell 6.1 and newer versions, you can also do this directly with the -replace operator:
$requestId = 0
$tmp = Get-Content $log |ForEach-Object {
$_ -replace '\w{6}-\w{6}-\w{6}', { 'Request-{0}' -f ($requestId++) }
}
$tmp |Set-Content $log
I have a simple requirement. I need to search a string in Word document and as result I need to get matching line / some words around in document.
So far, I could successfully search a string in folder containing Word documents but it returns True / False based on whether it could find search string or not.
#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path = "c:\MORLAB"
$files = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output = "c:\wordfiletry.txt"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "CRHPCD01"
Function getStringMatch
{
# Loop through all *.doc files in the $path directory
Foreach ($file In $files)
{
$document = $application.documents.open($file.FullName,$false,$true)
$range = $document.content
$wordFound = $range.find.execute($findText)
if($wordFound)
{
"$file.fullname has $wordfound" | Out-File $output -Append
}
}
$document.close()
$application.quit()
}
getStringMatch
#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path = "c:\Temp"
$files = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output = "c:\temp\wordfiletry.csv"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "First"
$charactersAround = 30
$results = #{}
Function getStringMatch
{
# Loop through all *.doc files in the $path directory
Foreach ($file In $files)
{
$document = $application.documents.open($file.FullName,$false,$true)
$range = $document.content
If($range.Text -match ".{$($charactersAround)}$($findtext).{$($charactersAround)}"){
$properties = #{
File = $file.FullName
Match = $findtext
TextAround = $Matches[0]
}
$results += New-Object -TypeName PsCustomObject -Property $properties
}
}
If($results){
$results | Export-Csv $output -NoTypeInformation
}
$document.close()
$application.quit()
}
getStringMatch
import-csv $output
There are a couple of ways to get what you want. A simple approach is since you have the text of the document already lets perform a regex match on it and return the results and more. This helps in trying to address getting some words around in document.
We have the variable $charactersAround which sets the number of characters to match around the $findtext. Also I though the output was a better fit for a CSV file so I used $results to capture a hashtable of properties that, in the end, are output to a csv file.
Be sure to change the variables for your own testing. Now that we are using regex to locate the matches this opens up a world of possibilities.
Sample Output
Match TextAround File
----- ---------- ----
First dley Air Services Limited dba First Air meets or exceeds all term C:\Temp\20120315132117214.docx
Thanks! You provided a great solution to use PowerShell regex expressions to look for information in a Word document. I needed to modify it to meet my needs. Maybe, it will help someone else. It reads each line of the word document, and then uses the regex expression to determine if the line is a match. The output could easily be modified or dumped to a log file.
Set-StrictMode -Version latest
$path = "c:\Temp\pii"
$files = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "[0-9]" #regex
Function getStringMatch
{
# Loop through all *.doc files in the $path directory
Foreach ($file In $files) {
$document = $application.documents.open($file.FullName,$false,$true)
$arrContents = $document.content.text.split()
$varCounter = 0
ForEach ($line in $arrContents) {
$varCounter++
If($line -match $findtext) {
"File: $file Found: $line Line: $varCounter"
}
}
$document.close()
}
$application.quit()
}
getStringMatch
Good answer from #Matt.
I improved it a little (new PowerShell version have problems with the given array. And to search big amount of documents it runs out of memory.
Here is my improved version:
#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path = "c:\Temp"
$files = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output = "c:\temp\wordfiletry.csv"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "First"
$charactersAround = 30
$results = #{}
Function getStringMatch
{
# Loop through all *.doc files in the $path directory
Foreach ($file In $files)
{
$document = $application.documents.open($file.FullName,$false,$true)
$range = $document.content
If($range.Text -match ".{$($charactersAround)}$($findtext).{$($charactersAround)}"){
$properties = #{
File = $file.FullName
Match = $findtext
TextAround = $Matches[0]
}
$results += #(New-Object -TypeName PsCustomObject -Property $properties)
}
$document.close()
}
If($results){
$results | Export-Csv $output -NoTypeInformation
}
$application.quit()
}
getStringMatch
import-csv $output
Use the function like this:
PS> WordGrep -File ./Myfile.docx -Grep one, two, three
function WordGrep{
param(
[string]$File,
[string[]]$Grep,
[switch]$WordMode,
[switch]$EscapeMode
)
$WordApp = New-Object -comobject word.application
$WordApp.visible = $False
try {
$document = $WordApp.documents.open($File, $false, $true)
$arrContents = $document.content.text.split()
$found = $false
foreach ($line in $arrContents) {
foreach ($pattern in $Grep) {
if ($EscapeMode) {
$pattern = [Regex]::Escape($pattern)
}
if ($WordMode) {
$pattern = "\b${pattern}\b"
}
if ($line -imatch $pattern) {
write-host -ForegroundColor Cyan -NoNewLine "$file`:"
write-host " $line"
break;
}
}
}
$document.close()
}
finally {
$WordApp.quit()
}
}