Powershell Regex Grouping from Select-String - regex

I am scraping a web request response to pull information held within html code which repeats a few times so using select-string rather than match. My code looks like
$regexname = '\(\w{0,11}).{1,10}\'
$energenie.RawContent | select-string $regexname -AllMatches | % {$_.matches}
The return looks something like:
Groups : {<h2 class="ener">TV </h2>, TV}
Success : True
Captures : {<h2 class="ener">TV </h2>}
Index : 1822
Length : 33
Value : <h2 class="ener">TV </h2>
Groups : {<h2 class="ener">PS3 </h2>, PS3}
Success : True
Captures : {<h2 class="ener">PS3 </h2>}
Index : 1864
Length : 33
Value : <h2 class="ener">PS3 </h2>
I can't workout a way to grab the second element of groups e.g. TV or PS3 as:
$energenie.RawContent | select-string $regexname -AllMatches | % {$_.matches.groups}
Gives a strange output
Paul

This should work:
$energenie.RawContent | select-string $regexname -AllMatches | ForEach-Object { Write-Host $_.Matches.Groups[1].Value }

To get the second item in a collection, use the array index operator: [n] where n is the index from which you want a value.
For each entry in Matches, you want the second entry in the Groups property so that would be:
$MyMatches = $energenie.RawContent | select-string $regexname -AllMatches | % {$_.Matches}
$SecondGroups = $MyMatches | % {$_.Groups[1]}
To get just the captured value, use the Value property:
$MyMatches | % { $_.Groups[1].Value }

Related

Keep the $character in regular expression replace

Two problems of regular replace
1.need to keep the front $character in the replacement result
2.Skipping the first two lines and the last line is not valid
Code:
$str = #'
#$start1 Random characters
#$start2 Random characters
$p1.AppendBreak($BreakType.LineBreak)
$doc.Protect($ProtectionType.AllowOnlyRevisions, "123")
$footerPara.AppendField("page", $FieldType.FieldPage)
$footerParagraph.AppendField("number of pages", $FieldType.FieldSectionPages)
$txtWatermark.Layout = $WatermarkLayout.Diagonal
$tr1.CharacterFormat.Border.BorderType = $BorderStyle.DashDotStroker
$stri.CharacterFormat.TextBackgroundColor = $Color.LightGray
$document.LoadFromFile(".\Template_HtmlFile.html", $FileFormat.Html, $XHTMLValidationType.None)
$docObject.DocumentObjectType -eq $DocumentObjectType.Picture
$document.Sections[0].Paragraphs[0].InsertSectionBreak($SectionBreakType.NoBreak)
$footerParagraph.Format.HorizontalAlignment = $Spire.Doc.Documents.HorizontalAlignment.Right
#end Random characters
'#
$str | Foreach-Object {
$_ -replace '\$\w+\.(\w+)', '"$1"'
} | Set-Content .\ok.txt
<# -Skip -SkipLast not valid
$str | Foreach-Object {
$_ -replace '\$\w+\.(\w+)', '"$1"'
} | Select-Object -Skip 2 | Select-Object -SkipLast 1 | Set-Content .\ok.txt
#>
Expected results:
At least for your example here string, you need to break it into a string array. Then for the replacement I was only successful when capturing both the beginning and the desired changed text.
$str -split '\r?\n' | Select-Object -Skip 2 |
Select-Object -SkipLast 1 | Foreach-Object {
$_ -replace '(^.+?)\$.+\.(\w+)', '$1"$2"'
} | Set-Content .\ok.txt
Contents of ok.txt
$p1.AppendBreak("LineBreak")
$doc.Protect("AllowOnlyRevisions", "123")
$footerPara.AppendField("page", "FieldPage")
$footerParagraph.AppendField("number of pages", "FieldSectionPages")
$txtWatermark.Layout = "Diagonal"
$tr1.CharacterFormat.Border.BorderType = "DashDotStroker"
$stri.CharacterFormat.TextBackgroundColor = "LightGray"
$document.LoadFromFile(".\Template_HtmlFile.html", "None")
$docObject.DocumentObjectType -eq "Picture"
$document.Sections[0].Paragraphs[0].InsertSectionBreak("NoBreak")
$footerParagraph.Format.HorizontalAlignment = "Right"

How to use a capture variable as a field name in JQ?

I'm trying to use jq to automate changing i18n string files from the format taken by one library to another.
I have a json file which has looks like this:
{
"some_label": {
"message": "a string in English with a $VARIABLE$",
"description": "directions to translators",
"placeholders": {
"VARIABLE": {
"content": "{variable}"
}
}
},
// more of the same...
}
And I need that to turn in to "some-label": "a string in English with a {variable}"
I am pretty close to getting it. Currently, I'm using
jq '[.
| to_entries
| .[]
| .key |= (gsub("_";"-"))
| .value.placeholders as $p
| .value.message |= (sub("\\$KEY_NAME\\$";$p.KEY_NAME.content))
| .value = .value.message
] | from_entries'
The next step is to use a capture group in the sub call so I can programmatically get variables with different names, but I'm not sure how to use the capture group to index into $p.
I've tried sub("\\$(?<id>VARIABLE)\\$";$p.(.id).content) which gave a compiler error, and I'm pretty much stuck on what to try next.
Here is one way of achieving the desired result. It could be simplified further too. At the top level it removes the usage of to_entries/from_entries by enclosing the whole filter under with_entries() and modifying the .value field as required
with_entries(
.key |= ( gsub("_";"-") ) |
.value.placeholders as $p |
.value.message as $m |
( $m | match(".*\\$(.*)\\$") | .captures[0].string ) as $c |
( $p | .[$c].content ) as $v |
( "\\$" + $c + "\\$" ) as $t |
.value = ( $m | sub($t; $v) )
)
My view of the key parts of the expression are
The part $m | match(".*\\$(.*)\\$") | .captures[0].string makes a regex match to extract the part within the $..$ in the .message
The part $p | .[$c].content does a generic object index fetch using the dynamic value of $c
Since the first argument of sub()/gsub() functions are a regex, the value captured $c needs to be created as \\$VARIABLE\\$
jqplay - Demo
Here's a basic JQ. Haven't tried with complex inputs, and haven't accommodated for $. I guess you can build on top of this -
to_entries | map(. as $kv | { "\($kv.key)": $kv.value.placeholders | to_entries | map(. as $p | $kv.value.message | sub("\\$\($p.key)\\$"; $p.value.content))[0]}) | add
output -
{
"some_label": "a string in English with a {variable}"
}

Reading list style text file into powershell array

I am provided a list of string blocks in a text file, and i need this to be in an array in powershell.
The list looks like this
a:1
b:2
c:3
d:
e:5
[blank line]
a:10
b:20
c:30
d:
e:50
[blank line]
...
and i want this in a powershell array to further work with it.
Im using
$output = #()
Get-Content ".\Input.txt" | ForEach-Object {
$splitline = ($_).Split(":")
if($splitline.Count -eq 2) {
if($splitline[0] -eq "a") {
#Write-Output "New Block starting"
$output += ($string)
$string = "$($splitline[1])"
} else {
$string += ",$($splitline[1])"
}
}
}
Write-Host $output -ForegroundColor Green
$output | Export-Csv ".\Output.csv" -NoTypeInformation
$output | Out-File ".\Output.txt"
But this whole thing feels quite cumbersome and the output is not a csv file, which at this point is i think because of the way i use the array. Out-File does produce a file that contains rows that are separated by commas.
Maybe someone can give me a push in the right direction.
Thx
x
One solution is to convert your data to an array of hash tables that can be read into a custom object. Then the output array object can be exported, formatted, or read as required.
$hashtables = (Get-Content Input.txt) -replace '(.*?):','$1=' | ConvertFrom-StringData
$ObjectShell = "" | Select-Object ($hashtable.keys | Select-Object -Unique)
$output = foreach ($hashtable in $hashtable) {
$obj = $ObjectShell.psobject.Copy()
foreach ($n in $hashtable.GetEnumerator()) {
$obj.($n.key) = $n.value
}
$obj
}
$output
$output | Export-Csv Output.csv -NoTypeInformation
Explanation:
The first colons (:) on each line are replaced with =. That enables ConvertFrom-StringData to create an array of hash tables with values on the LHS of the = being the keys and values on the RHS of the = being the values. If you know there is only one : on each line, you can make the -replace operation simpler.
$ObjectShell is just an object with all of the properties your data presents. You need all of your properties present for each line of data whether or not you assign values to them. Otherwise, your CSV output or table view within the console will have issues.
The first foreach iterates through the $hashtables array. Then we need to enumerate through each hash table to find the keys and values, which is performed by the second foreach loop. Each key/value pair is stored as a copy of $ObjectShell. The .psobject.Copy() method is used to prevent references to the original object. Updating data that is a reference will update the data of the original object.
$output contains the array of objects of all processed data.
Usability of output:
# Console Output
$output | format-table
a b c d e
- - - - -
1
2
3
5
10
20
30
50
# Convert to CSV
$output | ConvertTo-Csv -NoTypeInformation
"a","b","c","d","e"
"1",,,,
,"2",,,
,,"3",,
,,,"",
,,,,"5"
,,,,
"10",,,,
,"20",,,
,,"30",,
,,,"",
,,,,"50"
# Accessing Properties
$output.b
2
20
$output[0],$output[1]
a : 1
b :
c :
d :
e :
a :
b : 2
c :
d :
e :
Alternative Conversion:
$output = ((Get-Content Input.txt -raw) -split "(?m)^\r?\n") | Foreach-Object {
$data = $_ -replace "(.*?):(.*?)(\r?\n)",'"$1":"$2",$3'
$data = $data.Remove($data.LastIndexOf(','),1)
("{1}`r`n{0}`r`n{2}" -f $data,'{','}') | ConvertFrom-Json
}
$output | ConvertTo-Csv -NoType
Alternative Explanation:
Since ConvertFrom-StringData does not guarantee hash table key order, this alternative readies the file for a JSON conversion. This will maintain the property order listed in the file provided each group's order is the same. Otherwise, the property order of the first group will be respected.
All properties and their respective values are divided by the first : character on each line. The property and value are each surrounded by double quotes. Each property line is separated by a ,. Then finally the opening { and closing } are added. The resulting JSON-formatted string is converted to a custom object.
You can split by \n newline, see example:
$text = #"
a:1
b:2
c:3
d:
e:5
a:10
b:20
c:30
d:
e:50
e:50
e:50
e:50
"#
$Array = $text -split '\n' | ? {$_}
$Array.Count
15
if you want to exclude the empty lines, add ? {$_}
With your example:
$Array = (Get-Content ".\Input.txt") -split '\n' | ? {$_}

Powershell: apply multiple queries to a foreach on logs

I need to get the total data transferred value from multiple log files. The data value is shown under the total column in the bytes row.
It is also respectively delimited with a g, m, t or k to show gb/mb/tb/kb.
Total Copied Skipped Mismatch FAILED Extras
Bytes : 54.414 g 54.414 g 0 0 0 0
Currently I have this script which can go through all the files and extract the value between Bytes: and the g but I need to be able to add more queries and to the `foreach file and sum them all to one consistent value.
This is what I have currently but only outputs the kb.
$pattern = "(?<=.*Bytes :.*)\w.+?(?= g.*)"
$pattern1 = "(?<=.*Bytes :.*)\w.+?(?= m.*)"
$pattern2 = "(?<=.*Bytes :.*)\w.+?(?= k.*)"
Get-ChildItem "C:\Users\logs" -Filter "BFR*" | ForEach-Object {
Get-Content "C:\Users\logs\*.log" | where-Object {$_ -match $pattern } | ForEach-Object {
[double] ($matches[0])
} | Measure-Object -Sum | Select-Object -ExpandProperty sum
} | ForEach-Object {
Get-Content "C:\Users\logs\*.log" | where-Object {$_ -match $pattern1 } | ForEach-Object {
[double] ($matches[0])
} | Measure-Object -Sum | Select-Object -ExpandProperty sum
} | ForEach-Object {
Get-Content "C:\Users\logs\*.log" | where-Object {$_ -match $pattern2 } | ForEach-Object {
[double] ($matches[0])
} | Measure-Object -Sum | Select-Object -ExpandProperty sum
}
Here is one possibility. (Sorry #JamesC, it's a 1-liner :-)):
Get-ChildItem .\LogFolder\*.log |
ForEach-Object {$totalBytes = 0}{
Get-Content $_ | Select-String -Pattern "^Bytes\s+:\s+(?<size>\d+\.\d+) (?<units>[tgmk]).*$" |
Foreach-Object {
$size = $_.Matches.Groups[1].Value
switch ($_.Matches.Groups[2].Value)
{
t {$totalbytes += (1tb * $size)}
g {$totalbytes += (1gb * $size)}
m {$totalbytes += (1mb * $size)}
k {$totalbytes += (1kb * $size)}
}
}
} {"Total Bytes: $totalBytes"}

Get the numbers after ":" and count them with the help of powershell

Could someone please help me with extracting and counting the numbers from a text file with PowerShell?
Example: c:\temp\1.txt is some text with semicolon and numbers after them. I need to sum all of these numbers.
blablabl:5 dzfdsfdsfsdfsf:10
sdfsdfsdfdffs:8sdfsfsfdsfdsf:111
5+10+8+111...
What I've tried so far:
$LogText = "C:\temp\1.txt"
[regex]$Regex = "\. (\d+):[1]"
$Matches = $Regex.Matches($LogText)
$Matches | ForEach-Object {
Write-Host $Matches
}
#$array = #()
#$array = new-object collections.arraylist
$array = while ($Matches.Success) {
Write-Host $array[i++]
}
# -------------------------------------------------------------------
$text = Get-Content "C:\temp\1.txt"
[regex]$Regex = "\d"
$Matches = $Regex.Matches($text)
# -------------------------------------------------------------------
$pos = $text.IndexOf(":")
$rightPart = $text.Substring($pos+1)
Write-Host $rightPart
Use Select-String to extract the matches from the file and Measure-Object to do the calculation.
Select-String -Path 'C:\temp\1.txt' -Pattern '(?<=:)\d+' -AllMatches |
Select-Object -Expand Matches |
Select-Object -Expand Value |
Measure-Object -Sum |
Select-Object -Expand Sum
(?<=:) is a positive lookbehind assertion to match the colon preceding the number without making it part of the match.
Try it like that:
$txt=
#"
blablabl:5 dzfdsfdsfsdfsf:10
sdfsdfsdfdffs:8sdfsfsfdsfdsf:111
"#
[regex]$Regex = '\d+'
$sum=0;
$Regex.Matches($txt) | ForEach-Object {
$val = [int]$_.Value
$val
$sum+=$val
}
$sum