Replacing sub strings with regex in powershell - regex

I have the following regex code in my powershell to identify URL's that I need to update:
'href[\s]?=[\s]?\"[^"]*(https:\/\/oursite.org\/[^"]*News and Articles[^"]*)+\"'
'href[\s]?=[\s]?\"[^"]*(https:\/\/oursite.org\/[^"]*en\/News-and-Articles[^"]*)+\"'
These are getting me the results I need to update, now I need to know how to replace the values "News and Articles" with "news-and-articles" and "en" with "news-and-articles".
I have some code that has a replacement url like so:
$newUrl = 'href="https://oursite.org/"' #replaced value
So the beginning result would be:
https://www.oursite.org/en/News-and-Articles/2017/11/article-name
to be replaced with
https://www.oursite.org/news-and-articles/2017/11/article-name
Here is the function that is going through all the articles and doing a replacement:
function SearchItemForMatch
{
param(
[Data.Items.Item]$item
)
Write-Host "------------------------------------item: " $item.Name
foreach($field in $item.Fields) {
#Write-Host $field.Name
if($field.Type -eq "Rich Text") {
#Write-Host $field.Name
if($field.Value -match $pattern) {
ReplaceFieldValue -field $field -needle $pattern -replacement $newUrl
}
#if($field.Value -match $registrationPattern) {
# ReplaceFieldValue -field $field -needle $registrationPattern -replacement $newRegistrationUrl
#}
if($field.Value -match $noenpattern){
ReplaceFieldValue -field $field -needle $noenpattern -replacment $newnoenpattern
}
}
}
}
Here is the replacement method:
Function ReplaceFieldValue
{
param (
[Data.Fields.Field]$field,
[string]$needle,
[string]$replacement
)
Write-Host $field.ID
$replaceValue = $field.Value -replace $needle, $replacement
$item = $field.Item
$item.Editing.BeginEdit()
$field.Value = $replaceValue
$item.Editing.EndEdit()
Publish-Item -item $item -PublishMode Smart
$info = [PSCustomObject]#{
"ID"=$item.ID
"PageName"=$item.Name
"TemplateName"=$item.TemplateName
"FieldName"=$field.Name
"Replacement"=$replacement
}
[void]$list.Add($info)
}

Forgive me if I'm missing something, but it seems to me that all you really want to achieve is to get rid if the /en part and finally convert the whole url to lowercase.
Given your example url, this could be as easy as:
$url = 'https://www.oursite.org/en/News-and-Articles/2017/11/article-name'
$replaceValue = ($url -replace '/en/', '/').ToLower()
Result:
https://www.oursite.org/news-and-articles/2017/11/article-name
If it involves more elaborate replacements, then please edit your question and give us more examples and desired output.

Try Regex: (?<=oursite\.org\/)(?:en\/)?News-and-Articles(?=\/)
Replace with news-and-articles
Demo

Related

I want to split a string from : to \n in Powershell script

I am using a config file that contains some information as shown below.
User1:xyz#gmail.com
User1_Role:Admin
NAME:sdfdsfu4343-234324-ffsdf-34324d-dsfhdjhfd943
ID:xyz#abc-demo-test-abc-mssql
Password:rewrfsdv34354*fds*vdfg435434
I want to split each value from*: to newline* in my Powershell script.
I am using -split '[: \n]' it matches perfectly until there is no '' in the value. If there is an '*' it will fetch till that. For example, for Password, it matches only rewrfsdv34354. Here is my code:
$i = 0
foreach ($keyOrValue in $Contents -split '[: *\n]') {
if ($i++ % 2 -eq 0) {
$varName = $keyOrValue
}
else {
Set-Variable $varName $keyOrValue
}
}
I need to match all the chars after : to \n. Please share your ideas.
It's probably best to perform two separate splits here, it makes things easier to work out if the code is going wrong for some reason, although the $i % 2 -eq 0 part is a neat way to pick up key/value.
I would go for this:
# Split the Contents variable by newline first
foreach ($line in $Contents -split '[\n]') {
# Now split each line by colon
$keyOrValue = $line -split ':'
# Then set the variables based on the parts of the colon-split
Set-Variable $keyOrValue[0] $keyOrValue[1]
}
You could also convert to a hashmap and go from there, e.g.:
$h = #{}
gc config.txt | % { $key, $value = $_ -split ' *: *'; $h[$key] = $value }
Or with ConvertFrom-StringData:
$h = (gc -raw dims.txt) -replace ':','=' | ConvertFrom-StringData
Now you have convenient access to keys and values, e.g.:
$h
Output:
Name Value
---- -----
Password rewrfsdv34354*fds*vdfg435434
User1 xyz#gmail.com
ID xyz#abc-demo-test-abc-mssql
NAME sdfdsfu4343-234324-ffsdf-34324d-dsfhdjhfd943
User1_Role Admin
Or only keys:
$h.keys
Output:
Password
User1
ID
NAME
User1_Role
Or only values:
$h.values
Output:
rewrfsdv34354*fds*vdfg435434
xyz#gmail.com
xyz#abc-demo-test-abc-mssql
sdfdsfu4343-234324-ffsdf-34324d-dsfhdjhfd943
Admin
Or specific values:
$h['user1'] + ", " + $h['user1_role']
Output:
xyz#gmail.com, Admin
etc.

Powershell use ForEach to match and replace string with regex and replace with incremental value

I have to replace multiple strings with the same pattern, and several strings are on the same line. The replacement value should be incremental. I need to match and replace only the pattern as in the example, not requesId, nor messageId.
Input:
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv4d-zxcv56</requestId>
<requestId>1234qw-12qw9x-123456</requestId> Stevie Wonder <messageId>1234qw-12qw9x-123456</msg
reportId>plmkjh8765FGH4rt6As</msg:reportId> something <keyID>qwer1234asdf5678zxcv0987bnml65gh</msgdc
The desired output should be:
<requestId>Request-1</requestId>Ace of Base Order: Request-2<something else...
<requestId>Request-3</requestId>
<requestId>Request-4</requestId> Stevie Wonder <messageId>Request-4</msg
reportId>ReportId-1</msg:reportId> something <keyId>KeyId-1</msg
The regex finds all matching values but I cannot make the loop and replace these values. The code I am trying to make work is:
#'
<requestId>qwerty-qwer12-qwer56</requestId>Ace of Base Order: Q2we45-Uj87f6-gh65De<something else...
<requestId>zxcvbn-zxcv12-zxcv56</requestId>
<requestId>1234qw-12qw12-123456</requestId> Stevie Wonder <messageId>1234qw-12qw12-123456</msg
reportId>plmkjh8765FGH4rt6As</msg:reportId> something <keyID>qwer1234asdf5678zxcv0987bnml65gh</msgdc
'# | Set-Content $log -Encoding UTF8
$requestId = #{
Count = 1
Matches = #()
}
$tmp = Get-Content $log | foreach { $n = [regex]::matches((Get-Content $log),'\w{6}-\w{6}-\w{6}').value
if ($n)
{
$_ -replace "$n", "Request-$($requestId.count)"
$requestId.count++
} $_ }
$tmp | Set-Content $log
You want Regex.Replace():
$requestId = 1
$tmp = Get-Content $log |ForEach-Object {
[regex]::Replace($_, '\w{6}-\w{6}-\w{6}', { 'Request-{0}' -f ($script:requestId++) })
}
$tmp |Set-Content $log
The script block will run once per match to calculate the substitue value, allowing us to resolve and increment the $requestId variable, resulting in the consecutive numbering you need.
You can do this for multiple patterns in succession if necessary, although you may want to use an array or hashtable for the individual counters:
$counters = { requestId = 1; keyId = 1 }
$tmp = Get-Content $log |ForEach-Object {
$_ = [regex]::Replace($_, '\w{6}-\w{6}-\w{6}', { 'Request-{0}' -f ($counters['requestId']++) })
[regex]::Replace($_, '\b\w{32}\b', { 'Key-{0}' -f ($counters['keyId']++) })
}
$tmp |Set-Content $log
If you want to capture and the mapping between the original and the new value, do that inside the substitution block:
$translations = #{}
# ...
[regex]::Replace($_, '\w{6}-\w{6}-\w{6}', {
# capture value we matched
$original = $args[0].Value
# generate new value
$substitute = 'Request-{0}' -f ($counters['requestId']++)
# remember it
$translations[$substitute] = $original
return $substitute
})
In PowerShell 6.1 and newer versions, you can also do this directly with the -replace operator:
$requestId = 0
$tmp = Get-Content $log |ForEach-Object {
$_ -replace '\w{6}-\w{6}-\w{6}', { 'Request-{0}' -f ($requestId++) }
}
$tmp |Set-Content $log

Matching Something Against Array List Using Where Object

I've found multiple examples of what I'm trying here, but for some reason it's not working.
I have a list of regular expressions that I'm checking against a single value and I can't seem to get a match.
I'm attempting to match domains. e.g. gmail.com, yahoo.com, live.com, etc.
I am importing a csv to get the domains and have debugged this code to make sure the values are what I expect. e.g. "gmail.com"
Regular expression examples AKA $FinalWhiteListArray
(?i)gmail\.com
(?i)yahoo\.com
(?i)live\.com
Code
Function CheckDirectoryForCSVFilesToSearch {
$global:CSVFiles = Get-ChildItem $Global:Directory -recurse -Include *.csv | % {$_.FullName} #removed -recurse
}
Function ImportCSVReports {
Foreach ($CurrentChangeReport in $global:CSVFiles) {
$global:ImportedChangeReport = Import-csv $CurrentChangeReport
}
}
Function CreateWhiteListArrayNOREGEX {
$Global:FinalWhiteListArray = New-Object System.Collections.ArrayList
$WhiteListPath = $Global:ScriptRootDir + "\" + "WhiteList.txt"
$Global:FinalWhiteListArray= Get-Content $WhiteListPath
}
$Global:ScriptRootDir = Split-Path -Path $psISE.CurrentFile.FullPath
$Global:Directory = $Global:ScriptRootDir + "\" + "Reports to Search" + "\" #Where to search for CSV files
CheckDirectoryForCSVFilesToSearch
ImportCSVReports
CreateWhiteListArrayNOREGEX
Foreach ($Global:Change in $global:ImportedChangeReport){
If (-not ([string]::IsNullOrEmpty($Global:Change.Previous_Provider_Contact_Email))){
$pos = $Global:Change.Provider_Contact_Email.IndexOf("#")
$leftPart = $Global:Change.Provider_Contact_Email.Substring(0, $pos)
$Global:Domain = $Global:Change.Provider_Contact_Email.Substring($pos+1)
$results = $Global:FinalWhiteListArray | Where-Object { $_ -match $global:Domain}
}
}
Thanks in advance for any help with this.
the problem with your current code is that you put the regex on the left side of the -match operator. [grin] swap that and your code otta work.
taking into account what LotPings pointed out about case sensitivity and using a regex OR symbol to make one test per URL, here's a demo of some of that. the \b is for word boundaries, the | is the regex OR symbol. the $RegexURL_WhiteList section builds that regex pattern from the 1st array. if i haven't made something clear, please ask ...
$URL_WhiteList = #(
'gmail.com'
'yahoo.com'
'live.com'
)
$RegexURL_WhiteList = -join #('\b' ,(#($URL_WhiteList |
ForEach-Object {
[regex]::Escape($_)
}) -join '|\b'))
$NeedFiltering = #(
'example.com/this/that'
'GMail.com'
'gmailstuff.org/NothingElse'
'NotReallyYahoo.com'
'www.yahoo.com'
'SomewhereFarAway.net/maybe/not/yet'
'live.net'
'Live.com/other/another'
)
foreach ($NF_Item in $NeedFiltering)
{
if ($NF_Item -match $RegexURL_WhiteList)
{
'[ {0} ] matched one of the test URLs.' -f $NF_Item
}
}
output ...
[ GMail.com ] matched one of the test URLs.
[ www.yahoo.com ] matched one of the test URLs.
[ Live.com/other/another ] matched one of the test URLs.

How van I get 2 matches in a regex?

I have xml files formatted like this:
<User>
<FirstName>Foo Bar</FirstName>
<CompanyName>Foo</CompanyName>
<EmailAddress>bar#foo.com</EmailAddress>
</User>
<User>
...
I want to read through all xml files, creating as output <CompanyName>,<EmailAddress>, so:
Foo,bar#foo.com
User2,user#email.com
Blah,blah#blah.com
I am using the following snippet:
$directory = "\\PC001\Blah"
Function GetFiles ($path) {
foreach ($item in Get-ChildItem $path) {
if ( Test-Path $item.FullName -PathType Container) {
GetFiles ($item.FullName)
} else {
$item
}
}
}
Foreach ($file in GetFiles($directory)) {
If ($file.extension -eq '.test') {
$content = Get-Content $file.fullname
$pattern = '(?si)<CompanyName>(.*?)</CompanyName>\n<EmailAddress>(.*?)</EmailAddress>'
$matches = [regex]::matches($content, $pattern)
foreach ($match in $matches) {
$matches[0].Value -replace "<.*?>"
}
}
}
However, $matches is empty so there's something wrong with my regex. If I leave out \n<EmailAddress>(.*?)</EmailAddress>, it works. What am I doing wrong?
$pattern = '(?si)<CompanyName>(.*?)</CompanyName>\s*<EmailAddress>(.*?)</EmailAddress>'
Try this.\s will make sure all spaces and newlines are covered.
There is a chance of \r character would present in that file. So change your regex like below,
$pattern = '(?si)<CompanyName>(.*?)</CompanyName>[\n\r]+<EmailAddress>(.*?)</EmailAddress>'
OR
$pattern = '(?si)<CompanyName>(.*?)</CompanyName>.*?<EmailAddress>(.*?)</EmailAddress>'

I would like to color highlight sections of a pipeline string according to a regex in Powershell

This is a question of technique, but as an exercise my intention is to write a PS to accept piped input, with a regex as a parameter, and highlight any text matching the regex.
The part I'm not able to find any info on is that it's easy to match text, capturing to a buffer, or to replace text. But I need to replace matched text with color control, the original text, then resume the previous color. I can't seem to find any way to generate color output other than with write-output, and can't do separate colors in a single write, which would mean:
-matching the regex
-write-host out all text prior to the match in default color, with -NoNewLine
-write-host the match, with -NoNewLine
-write-host the remainder
This seems messy, and gets even more messy if we want to support multiple matches.
Is there a more eloquent way to do this?
Write-Host is the right way to do this. Use the .Index and .Length properties of the resulting Match object to determine where exactly the matched text is. You just need to be a bit careful keeping track of indices :)
This works for multiple matches, and is not terribly untidy IMO:
function ColorMatch
{
param(
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[string] $InputObject,
[Parameter(Mandatory = $true, Position = 0)]
[string] $Pattern
)
begin{ $r = [regex]$Pattern }
process
{
$ms = $r.Matches($inputObject)
$startIndex = 0
foreach($m in $ms)
{
$nonMatchLength = $m.Index - $startIndex
Write-Host $inputObject.Substring($startIndex, $nonMatchLength) -NoNew
Write-Host $m.Value -Back DarkRed -NoNew
$startIndex = $m.Index + $m.Length
}
if($startIndex -lt $inputObject.Length)
{
Write-Host $inputObject.Substring($startIndex) -NoNew
}
Write-Host
}
}
This is an extension of latkin's answer. Here I'm extending the Match object such that it can be processed for this purpose - and others - more easily.
function Split-Match {
param([Parameter(Mandatory = $true)]
$match
)
$sections = #()
$start = 0
$text = $m.Line
foreach ($m in $match.Matches) {
$i = $m.Index
$l = $m.Length
$sections += $false, $text.Substring($start, $i - $start)
$sections += $true, $text.Substring($i, $l)
$start = $i + $l
}
$sections += $false, $text.Substring($start)
$match | Add-Member -Force Sections $sections
$match
}
function Write-Match {
param([Parameter(Mandatory = $true)]
$match
)
$fg = "White"
$bg = "Black"
foreach($s in $match.Sections) {
if ($s.GetType() -eq [bool]) {
if ($s) {
$fg = "White"
$bg = "Red"
} else {
$fg = "White"
$bg = "Black"
}
} else {
Write-Host -NoNewline -ForegroundColor $fg -BackgroundColor $bg $s
}
}
Write-Host
}
$string = #'
Match this A
Not this B
Not this C
But this A
'#
$m = $string | select-string -CaseSensitive -AllMatches "A"
$m = Split-Match $m
Write-Match $m
Alternatively I found using ANSI/VT100 formatting more simple and does exactly what I needed with a much larger range of colors:
$esc=[char]27
$fileContents="abc455315testing123455315abc"
$keywordSearch="testing123"
$fileContents -replace $keywordSearch,"$esc[38;2;0;200;255m$keywordSearch$esc[0m"
Note this only works in a PowerShell console window not in PowerShell ISE. This wikipedia page also was helpful; specifically this line with regards to choosing a color:
ESC[ 38;2;⟨r⟩;⟨g⟩;⟨b⟩ m Select RGB foreground color