Powershell - How to get data within delimiters and manipulate the data - regex

I'm using Powershell to write a script which based on an input file splits each section and manipulates the underlying data.
In practical terms, i have a TXT file which is formatted like so:
-------Group_Name_1-------
member: Name:Host1 Handle:123
member: Name:Host2 Handle:213
member: Name:Host3 Handle:321
-------Group_Name_2-------
member: Name:Host10 Handle:1230
member: Name:Host20 Handle:2130
Group (member of): Firewall subnet
etc...
So far, I've come up with the following script:
$filepath = 'C:\test.txt'
$getfile = Get-Content $filepath -Raw
$regex_group_name = '.*-------'
foreach($object in $splitfile){
$lines = $object.Split([Environment]::NewLine)
$data_group_name = #($lines) -match $regex_group_name
$data_group_value = $data_group_name -replace '-------' , ''
Write-Host $data_group_value
}
I'm trying to achieve the following output for each group (for example):
add group name Group_Name_1 members.1 "Host1" members.2 "Host2" members.3 "Host3" groups.1 "Firewall Subnet"
And I'm stuck in 2 parts:
1) The script above, obviously won't work, because the "split" function will basically remove the group name: how can I approach this problem in order to get the desired output?
2) I have no idea on how to change the output in order to reflect the effective number of members (if the members are 2, it will only output members.1 and members.2; alternatively members.1, members.2, members.3 etc...).
Thanks in advance for your help.

I would solve this in two steps.
First, parse the file into a data structure (in this case, an array of PSCustomObjects)
Then, convert those into the command strings.
Assuming this $fileContent
$fileContent = "-------Group_Name_1-------
member: Name:Host1 Handle:123
member: Name:Host2 Handle:213
member: Name:Host3 Handle:321
-------Group_Name_2-------
member: Name:Host10 Handle:1230
member: Name:Host20 Handle:2130
Group (member of): Firewall subnet
etc..."
We can build step 1 like this:
$sections = $fileContent -split "(?m)^-------"
$groups = foreach ($section in $sections) {
$group = [pscustomobject]#{
Name = ([regex]"(.*)-------").Match($section).Groups[1].Value
Members = #(([regex]"member: Name:(\S+)").Matches($section) | ForEach-Object {
$_.Groups[1].Value
})
MemberOf = #(([regex]"Group \(member of\): (.+)").Matches($section) | ForEach-Object {
$_.Groups[1].Value
})
}
if ($group.Name) {
$group
}
}
# intermediate results (parsed group structure)
$groups
This prints
Name Members MemberOf
---- ------- --------
Group_Name_1 {Host1, Host2, Host3} {}
Group_Name_2 {Host10, Host20} {Firewall subnet}
Now it's easy to turn this into a different format:
$commands = foreach ($group in $groups) {
$str = "add group "
$str += "name `"$( $group.Name )`" "
$i = 1
foreach ($member in $group.Members) {
$str += "members.$i `"$member`" "
$i++
}
$i = 1
foreach ($memberOf in $group.MemberOf) {
$str += "groups.$i `"$memberOf`" "
$i++
}
$str.Trim()
}
# final results (command strings)
$commands
Which prints:
add group name "Group_Name_1" members.1 "Host1" members.2 "Host2" members.3 "Host3"
add group name "Group_Name_2" members.1 "Host10" members.2 "Host20" groups.1 "Firewall subnet"

Related

I want to split a string from : to \n in Powershell script

I am using a config file that contains some information as shown below.
User1:xyz#gmail.com
User1_Role:Admin
NAME:sdfdsfu4343-234324-ffsdf-34324d-dsfhdjhfd943
ID:xyz#abc-demo-test-abc-mssql
Password:rewrfsdv34354*fds*vdfg435434
I want to split each value from*: to newline* in my Powershell script.
I am using -split '[: \n]' it matches perfectly until there is no '' in the value. If there is an '*' it will fetch till that. For example, for Password, it matches only rewrfsdv34354. Here is my code:
$i = 0
foreach ($keyOrValue in $Contents -split '[: *\n]') {
if ($i++ % 2 -eq 0) {
$varName = $keyOrValue
}
else {
Set-Variable $varName $keyOrValue
}
}
I need to match all the chars after : to \n. Please share your ideas.
It's probably best to perform two separate splits here, it makes things easier to work out if the code is going wrong for some reason, although the $i % 2 -eq 0 part is a neat way to pick up key/value.
I would go for this:
# Split the Contents variable by newline first
foreach ($line in $Contents -split '[\n]') {
# Now split each line by colon
$keyOrValue = $line -split ':'
# Then set the variables based on the parts of the colon-split
Set-Variable $keyOrValue[0] $keyOrValue[1]
}
You could also convert to a hashmap and go from there, e.g.:
$h = #{}
gc config.txt | % { $key, $value = $_ -split ' *: *'; $h[$key] = $value }
Or with ConvertFrom-StringData:
$h = (gc -raw dims.txt) -replace ':','=' | ConvertFrom-StringData
Now you have convenient access to keys and values, e.g.:
$h
Output:
Name Value
---- -----
Password rewrfsdv34354*fds*vdfg435434
User1 xyz#gmail.com
ID xyz#abc-demo-test-abc-mssql
NAME sdfdsfu4343-234324-ffsdf-34324d-dsfhdjhfd943
User1_Role Admin
Or only keys:
$h.keys
Output:
Password
User1
ID
NAME
User1_Role
Or only values:
$h.values
Output:
rewrfsdv34354*fds*vdfg435434
xyz#gmail.com
xyz#abc-demo-test-abc-mssql
sdfdsfu4343-234324-ffsdf-34324d-dsfhdjhfd943
Admin
Or specific values:
$h['user1'] + ", " + $h['user1_role']
Output:
xyz#gmail.com, Admin
etc.

Capturing Regex in Powershell

I'm having a block solving this. I want to get all the URL's in the text that match my pattern. Should include the first parm of the URL, but not the second one.
Two issues:
It's not getting the first URL
I'm missing how the capture works.
In Method 1, I see the matches, but I don't see the capture text of what I put in parentheses. In Method 2, I see my captures on some outputs, but getting extra outputs that contain more than my capture. I like Method 2 style, but did Method 1 to try to understand what's happening, but just dug my self a deeper hole.
$fileContents = 'Misc Text < a href="http://example.com/Test.aspx?u=a1">blah blah</a> More Stuff blah blah Closing Text'
#Sample URL http://example.com/Test.aspx?u=a1&parm=123
$pattern = '<a href="(http://example.com/Test.aspx\?u=.*?)[&"]'
Write-Host "RegEx Pattern=$pattern"
Write-Host "----------- Method 1 --------------"
$groups = [regex]::Matches($fileContents, $pattern)
$groupnum = 0
foreach ($group in $groups)
{
Write-Host "Group=$groupnum URL=$group "
$capturenum = 0
foreach ($capture in $group.Captures)
{
Write-Host "Group=$groupnum Capture=$capturenum URL=$capture.value index=$($capture.index)"
$capturenum = $capturenum + 1
}
$groupnum = $groupnum + 1
}
Write-Host "----------- Method 2 --------------"
$urls = [regex]::Matches($fileContents, $pattern).Groups.Captures.Value
#$urls = $urls | select -Unique
Write-Host "Number of Matches = $($urls.Count)"
foreach ($url in $urls)
{
Write-Host "URL: $url "
}
Write-Host " "
Output:
----------- Method 1 --------------
Group=0 URL=<a href="http://example.com/Test.aspx?u=b2&
Group=0 Capture=0 URL=<a href="http://example.com/Test.aspx?u=b2&.value index=81
----------- Method 2 --------------
Number of Matches = 2
URL: <a href="http://example.com/Test.aspx?u=b2&
URL: http://example.com/Test.aspx?u=b2
Powershell Version 5.1.17763.592
I'm missing how the capture works.
Capture group 0 is always the entire match - unnamed capture groups will be numbered 1 through 9, so you'll want group 1.
I've renamed the variables to make their meaning a little more clear:
$MatchList = [regex]::Matches($fileContents, $pattern)
foreach($Match in $MatchList){
for($i = 0; $i -lt $Match.Groups.Count; $i++){
"Group $i is: $($Match.Groups[$i].Value)"
}
}
If you want to collect all the captured url's, just do:
$urls = foreach($Match in $MatchList){
$Match.Groups[$i].Value
}
If you only need the first match you don't need to invoke [regex]::Matches() manually though - PowerShell will automatically inject the string value of any captured groups into the automatic $Matches variable when you use the -match operator, so if you do:
if($fileContents -match $pattern){
"Group 1 is $($Matches[1])"
}
# or
if($fileContents -match $pattern){
$url = $Matches[1]
}
... you'll get the expected result:
Group 1 is http://example.com/Test.aspx?u=b2
Use Select-String with the parameter -AllMatches to get all matches from your input string. Your regular expression should look like this: (?<=a href=")[^"]*. That will match any character that is not a double quote after the string a href=" (with that last string not being included in the match). Now you just need to expand the value of the matches and you're done.
$re = '(?<=a href=")[^"]*'
$fileContents |
Select-String -Pattern $re -AllMatches |
Select-Object -Expand Matches |
Select-Object -Expand Value

Matching Something Against Array List Using Where Object

I've found multiple examples of what I'm trying here, but for some reason it's not working.
I have a list of regular expressions that I'm checking against a single value and I can't seem to get a match.
I'm attempting to match domains. e.g. gmail.com, yahoo.com, live.com, etc.
I am importing a csv to get the domains and have debugged this code to make sure the values are what I expect. e.g. "gmail.com"
Regular expression examples AKA $FinalWhiteListArray
(?i)gmail\.com
(?i)yahoo\.com
(?i)live\.com
Code
Function CheckDirectoryForCSVFilesToSearch {
$global:CSVFiles = Get-ChildItem $Global:Directory -recurse -Include *.csv | % {$_.FullName} #removed -recurse
}
Function ImportCSVReports {
Foreach ($CurrentChangeReport in $global:CSVFiles) {
$global:ImportedChangeReport = Import-csv $CurrentChangeReport
}
}
Function CreateWhiteListArrayNOREGEX {
$Global:FinalWhiteListArray = New-Object System.Collections.ArrayList
$WhiteListPath = $Global:ScriptRootDir + "\" + "WhiteList.txt"
$Global:FinalWhiteListArray= Get-Content $WhiteListPath
}
$Global:ScriptRootDir = Split-Path -Path $psISE.CurrentFile.FullPath
$Global:Directory = $Global:ScriptRootDir + "\" + "Reports to Search" + "\" #Where to search for CSV files
CheckDirectoryForCSVFilesToSearch
ImportCSVReports
CreateWhiteListArrayNOREGEX
Foreach ($Global:Change in $global:ImportedChangeReport){
If (-not ([string]::IsNullOrEmpty($Global:Change.Previous_Provider_Contact_Email))){
$pos = $Global:Change.Provider_Contact_Email.IndexOf("#")
$leftPart = $Global:Change.Provider_Contact_Email.Substring(0, $pos)
$Global:Domain = $Global:Change.Provider_Contact_Email.Substring($pos+1)
$results = $Global:FinalWhiteListArray | Where-Object { $_ -match $global:Domain}
}
}
Thanks in advance for any help with this.
the problem with your current code is that you put the regex on the left side of the -match operator. [grin] swap that and your code otta work.
taking into account what LotPings pointed out about case sensitivity and using a regex OR symbol to make one test per URL, here's a demo of some of that. the \b is for word boundaries, the | is the regex OR symbol. the $RegexURL_WhiteList section builds that regex pattern from the 1st array. if i haven't made something clear, please ask ...
$URL_WhiteList = #(
'gmail.com'
'yahoo.com'
'live.com'
)
$RegexURL_WhiteList = -join #('\b' ,(#($URL_WhiteList |
ForEach-Object {
[regex]::Escape($_)
}) -join '|\b'))
$NeedFiltering = #(
'example.com/this/that'
'GMail.com'
'gmailstuff.org/NothingElse'
'NotReallyYahoo.com'
'www.yahoo.com'
'SomewhereFarAway.net/maybe/not/yet'
'live.net'
'Live.com/other/another'
)
foreach ($NF_Item in $NeedFiltering)
{
if ($NF_Item -match $RegexURL_WhiteList)
{
'[ {0} ] matched one of the test URLs.' -f $NF_Item
}
}
output ...
[ GMail.com ] matched one of the test URLs.
[ www.yahoo.com ] matched one of the test URLs.
[ Live.com/other/another ] matched one of the test URLs.

Powershell to use regx to find character position

I want to add " after third comma and " before fifth comma. How can this can be done in powershell ?
My idea is to use regex function to find the location of the third and fifth comma then add " to them by
$s.Insert(4,'-') **In case reg return position 4
example data
04642583,3,HC Mobile,O213,Inc,SIS Services,KR,Non Payroll Relevant,KR50
Output
04642583,3,HC Mobile,"O213,Inc",SIS Services,KR,Non Payroll Relevant,KR50
This is code I tried, but it failed by 'An empty pipe element is not allowed' How to fix it
$source = "D:\Output\MoreComma.csv"
$FinalFile = "D:\Output\MoreComma_Corrected.csv"
$content = Get-Content $source
foreach ($line in $content)
{
$items = $line.split(',');
$items[3] = '"'+$items[3]
$items[4] = $items[4]+'"';
$items -join ','
} | Set-Content $FinalFile
If you know the format (e.g you know that it's always in this comma-separated fashion); and your're only trying to achieve this; you can simply just split the line, add the quotes and join the line again.
Example:
$data = "04642583,3,HC Mobile,O213,Inc,SIS Services,KR,Non Payroll Relevant,KR50";
$items = $data.split(',');
$items[3] = '"'+$items[3]
$items[4] = $items[4]+'"';
$items -join ','
This will produce the line:
04642583,3,HC Mobile,"O213,Inc",SIS Services,KR,Non Payroll Relevant,KR50
Given you've stored this in a CSV- file:
$file = "C:\tmp\test.csv";
$lines = (get-content $file);
$newLines=($lines|foreach-object {
$items = $_.split(',');
$items[3] = '"'+$items[3]
$items[4] = $items[4]+'"';
$items -join ','
})
You can then output the result in a new file if you want
$newLines|Set-content C:\tmp\test2.csv
This will "mess" up your CSV-format file though (as it will considered to "merge the columns"), but I'm guessing this is what you're trying to achieve?

Get index of regex in filename in powershell

I'm trying to get the starting position for a regexmatch in a folder name.
dir c:\test | where {$_.fullname.psiscontainer} | foreach {
$indexx = $_.fullname.Indexofany("[Ss]+[0-9]+[0-9]+[Ee]+[0-9]+[0-9]")
$thingsbeforeregexmatch.substring(0,$indexx)
}
Ideally, this should work but since indexofany doesn't handle regex like that I'm stuck.
You can use the Regex.Match() method to perform a regex match. It'll return a MatchInfo object that has an Index property you can use:
Get-ChildItem c:\test | Where-Object {$_.PSIsContainer} | ForEach-Object {
# Test if folder's Name matches pattern
$match = [regex]::Match($_.Name, '[Ss]+[0-9]+[0-9]+[Ee]+[0-9]+[0-9]')
if($match.Success)
{
# Grab Index from the [regex]::Match() result
$Index = $Match.Index
# Substring using the index we obtained above
$ThingsBeforeMatch = $_.Name.Substring(0, $Index)
Write-Host $ThingsBeforeMatch
}
}
Alternatively, use the -match operator and the $Matches variable to grab the matched string and use that as an argument to IndexOf() (using RedLaser's sweet regex optimization):
if($_.Name -match 's+\d{2,}e+\d{2,}')
{
$Index = $_.Name.IndexOf($Matches[0])
$ThingsBeforeMatch = $_.Name.Substring(0,$Index)
}
You can use the Index property of the Match object. Example:
# Used regEx fom #RedLaser's comment
$regEx = [regex]'(?i)[s]+\d{2}[e]+\d{2}'
$testString = 'abcS00E00b'
$match = $regEx.Match($testString)
if ($match.Success)
{
$startingIndex = $match.Index
Write-Host "Match. Start index = $startingIndex"
}
else
{
Write-Host 'No match found'
}