I have the following website http://www.shazam.com/charts/top-100/australia which displays songs, I want to capture the songs using RegEx & PowerShell. The PowerShell code below is what I have so far:
$ie = New-Object -comObject InternetExplorer.Application
$ie.navigate('http://www.shazam.com/charts/top-100/australia')
Start-Sleep -Seconds 10
$null = $ie.Document.body.innerhtml -match 'data-chart-position="1"(.|\n)*data-track-title=.*content="(.*)"><a href(.|\n)*data-track-artist=\W\W>(.|\n)*<meta\scontent="(.*)"\sitemprop';$shazam01artist = $matches[5];$shazam01title = $matches[2]
data-chart-position
data-track-title
data-track-artist
Each of the songs listed have the 3 values (above) associated with each of them, I want to capture the Artist & Title for each song based on the different chart positions (numbers). So a regular expression to find the actual chart position, then the trailing Artist & Title.
If I run the RegEx separately for Artist & Title (code below), it finds them, however it only finds the first Artist & Title. I need to find the Artist & Title for each song based on the different chart position.
$null = $ie.Document.body.innerhtml -match 'data-track-artist=\W\W>(.|\n)*<meta\scontent="(.*)"\sitemprop';$shazam01artist = $matches[2]
$null = $ie.Document.body.innerhtml -match 'data-track-title=.*content="(.*)"><a href';$shazam01title = $matches[1]
$shazam01artist
$shazam01title
Using regex to parse partial HTML is an absolute nightmare, you might want to reconsider that approach.
Invoke-WebRequest returns a property called ParsedHtml, that contains a reference to a pre-parsed HTMLDocument object. Use that instead:
# Fetch the document
$Top100Response = Invoke-WebRequest -Uri 'http://www.shazam.com/charts/top-100/australia'
# Select all the "article" elements that contain charted tracks
$Top100Entries = $Top100Response.ParsedHtml.getElementsByTagName("article") |Where-Object {$_.className -eq 'ti__container'}
# Iterate over each article
$Top100 = foreach($Entry in $Top100Entries){
$Properties = #{
# Collect the chart position from the article element
Position = $Entry.getAttribute('data-chart-position',0)
}
# Iterate over the inner paragraphs containing the remaining details
$Entry.getElementsByTagName('p') |ForEach-Object {
if($_.className -eq 'ti__artist') {
# the ti__artist paragraph contains a META element that holds the artist name
$Properties['Artist'] = $_.getElementsByTagName('META').item(0).getAttribute('content',0)
} elseif ($_.className -eq 'ti__title') {
# the ti__title paragraph stores the title name directly in the content attribute
$Properties['Title'] = $_.getAttribute('content',0)
}
}
# Create a psobject based on the details we just collected
New-Object -TypeName psobject -Property $Properties
}
Now, let's see how Tay-Tay's doing down under:
PS C:\> $Top100 |Where-Object { $_.Artist -match "Taylor Swift" }
Position Title Artist
-------- ----- ------
42 Bad Blood Taylor Swift Feat. Kendrick Lamar
Sweet!
Related
I want to copy the whole column values to a new column.
As a solution, I prepare a workflow:
SET FIELD TO VALUE and make the workflow start when item update
But, I have 16000+ rows and to manually update each one is not possible as of now.
I also tried using Microsoft Flow but no success.
Could anyone please suggest a way to achieve it.
I would suggest PowerShell for such 'migration' work. Script from here,the script need to be run in SharePoint server.
Add-PSSnapin Microsoft.SharePoint.Powershell -ErrorAction SilentlyContinue
#Parameters
$SiteURL = "http://siteurl/"
$listName = "list"
$web = Get-SPweb $SiteURL
#Use the Display Names
$CopyFromColumnName = "Description" #column copy source
$CopyToColumnName = "Desc" #destination column
#Get the List
$list = $web.lists[$ListName]
#Get all Items
$Items = $list.Items
ForEach ($Item in $items)
{
#copy data from one column to another
$item[$copyToColumnName] = $item[$copyFromColumnName]
#Do a system update to avoid Version and to Keep same metadata
$item.SystemUpdate($false)
}
For SharePoint online, refer this thread, replace the iterate logic as pageing.
$Query = New-Object Microsoft.SharePoint.Client.CamlQuery
$Query.ViewXml = "<View Scope='RecursiveAll'><Query><OrderBy><FieldRef Name='ID' Ascending='TRUE'/></OrderBy></Query><RowLimit Paged='TRUE'>$BatchSize</RowLimit></View>"
$Counter = 0
#Batch process list items - to mitigate list threshold issue on larger lists
Do {
#Get items from the list
$ListItems = $List.GetItems($Query)
$Ctx.Load($ListItems)
$Ctx.ExecuteQuery()
$Query.ListItemCollectionPosition = $ListItems.ListItemCollectionPosition
#Loop through each List item
ForEach($ListItem in $ListItems)
{
//to do copy field value
$Counter++
Write-Progress -PercentComplete ($Counter / ($List.ItemCount) * 100) -Activity "Processing Items $Counter of $($List.ItemCount)" -Status "Searching Unique Permissions in List Items of '$($List.Title)'"
}
} While ($Query.ListItemCollectionPosition -ne $null)
I have a list of switches in CSV and a list of data spaces where these switches are. In my list of Data Spaces, I have a DataSpace_ID field which represents its associated DataSpace_Name.
My list of switches has a Host_Name and IP_Address fields. What I want is using PowerShell and regex matching using Wildcards, I want to match the DataSpace field example, "ABC-COM" to the switch listing Host_Name which would be ABC-COM-3750-SW1. I only want to match up to ABC-COM...
Then for my result I want the output, based on the matches found, to associate the DataSpace_ID value found and include it in the output of the switch listing.
Let's say I match ABC-COM = DATASPACE_ID 1 and DEF-COM = DataSpace_ID 2, and my switch data is:
Host_Name IP_Address
ABC-COM-3750-SW1 IP 192.168.1.2
ABC-COM-3750-SW2 IP: 192.168.1.3
DEF-COM-3750-SW1 IP: 192.168.3.5
DEF-COM-3750-SW2 IP: 192.168.3.6
So, in the end you would have this output from the switch listing based on comparison of the dataspace listing, except it would add the DataSpace_ID Column from the other comparison listing of data space names... Switch listing Output would look like this:
DataSpace_ID Host_Name IP_Address
1 ABC-COM-3750-SW1 IP 192.168.1.2
1 ABC-COM-3750-SW2 IP: 192.168.1.3
2 DEF-COM-3750-SW1 IP: 192.168.3.5
2 DEF-COM-3750-SW2 IP: 192.168.3.6
Here is my latest code revised based on some of your input, I am not getting errors any longer, however my output is not returning any results either.
clear-host
$hash.clear()
$dataSpacesExport = Import-Csv -Path .\DataSpaces_Export.csv -Header 'DataSpace_ID', 'DataSpace_Name' -Delimiter ","
$accessSwitchesForExport = Import-Csv -Path .\AccessSwitchesForExport.csv -Header 'Host_Name', 'IP_Address' -Delimiter ","
# create hashtable
$hash = #{}
# Create Regex criteria
$re = [regex]".+(?=-\d+)"
$dataSpacesExport | ConvertFrom-Csv | % { $hash.Add($_,”$_”) }
# output
$accessSwitchesForExport | ConvertFrom-Csv |
Select-Object #{ n = "DataSpace_ID"; e = { $hash[$re.Match($_.Host_Name).Value] } },* |
Where-Object { $_.DataSpace_ID -ne $null }
My CSV files as some have asked for, example data would be:
DataSpaces and switches output examples are below in the post. DataSpaces contain a DataSpace_ID and DataSpace_Name, and switches csv contain a Host_Name and IP_Address fields.
Output, like below, based on comparison of two csv's should show:
Matching DataSpace_ID with matching Host_Name, and its associated IP Address in final table.
This is a solution using a hash table.
$dataSpacesExport = #"
DataSpace_ID,DataSpace_Name
1,ABC-COM
2,DEF-COM
"#
$accessSwitchesForExport = #"
Host_Name,IP_Address
ABC-COM-3750-SW1,IP: 192.168.1.2
ABC-COM-3750-SW2,IP: 192.168.1.3
DEF-COM-3750-SW1,IP: 192.168.3.5
DEF-COM-3750-SW2,IP: 192.168.3.6
GHI-COM-3750-SW2,IP: 192.168.3.6
"#
$re = [regex]".+(?=-\d+)"
# create hashtable
$id = #{}
$dataSpacesExport | ConvertFrom-Csv | ForEach-Object { $id[$_.DataSpace_Name] = $_.DataSpace_ID }
# output
$accessSwitchesForExport | ConvertFrom-Csv |
Select-Object #{ n = "DataSpace_ID"; e = { $id[$re.Match($_.Host_Name).Value] } },* |
Where-Object { $_.DataSpace_ID -ne $null }
The output is as follows.
DataSpace_ID Host_Name IP_Address
------------ --------- ----------
1 ABC-COM-3750-SW1 IP: 192.168.1.2
1 ABC-COM-3750-SW2 IP: 192.168.1.3
2 DEF-COM-3750-SW1 IP: 192.168.3.5
2 DEF-COM-3750-SW2 IP: 192.168.3.6
The following code is another solution. In this case, you do not need a regular expression.
$dataSpaces = $dataSpacesExport | ConvertFrom-Csv
$accessSwitchesForExport | ConvertFrom-Csv | ForEach-Object {
foreach ($ds in $dataSpaces) {
if (!$_.Host_Name.StartsWith($ds.DataSpace_Name)) { continue }
[pscustomobject]#{
DataSpace_ID = $ds.DataSpace_ID
Host_Name = $_.Host_Name
IP_Address = $_.IP_Address
}
break
}
}
Thank you everyone for your help! I used bits and pieces of the recommendations above to come up with the following result which works perfectly and generates that data needed.
#Set Present Working Directory for path to save data to.
#Clear any Hash Table Data prior to start of script //
$id.clear()
#Import current listing of Data Spaces and Access switches from CSV format //
$dataSpacesExport = import-csv -Header DataSpace_ID, DataSpace_Name -Path ".\DataSpaces_Export.csv"
$accessSwitchesForExport = import-csv -Header Host_Name, Device_IP -Delimiter "," -Path ".\AccessSwitchesForExport.csv"
#Regex text matching criteria //
$re = [regex]".+(?=-\d+)"
# create hashtable to store output //
$id=#{}
# Inject DataSpaces listing into Script for processing via hash table $id //
$dataSpacesExport | % {$id[$_.DataSpace_Name] = $_.DataSpace_ID}
# output - Compare Access Switch listing to DataSpaces Hashtable information, produce output to out-file sw_names.txt //
$accessSwitchesForExport |
Select-Object #{ n = "DataSpace_ID"; e = { $id[$re.Match($_.Host_Name).Value] } },* |
Where-Object { $_.DataSpace_ID -ne $null } | Out-File ./sw_names.txt -Force
Output is as expected and is now working.
This is what I wrote to get output with powercli;
Get-VM -name SERVERX | Get-Annotation -CustomAttribute "Last EMC vProxy Backup"|select #{N='VM';E={$_.AnnotatedEntity}},Value
This is the output
VM Value
-- -----
SERVERX Backup Server=networker01, Policy=vmbackup, Workflow=Linux_Test_Production, Action=Linux_Test_Production, JobId=1039978, StartTime=2018-10-31T00:00:27Z, EndTime=2018-10-31T00:12:45Z
SERVERX1 Backup Server=networker01, Policy=vmbackup, Workflow=Linux_Test_Production, Action=Linux_Test_Production, JobId=1226232, StartTime=2018-12-06T00:00:29Z, EndTime=2018-12-06T00:0...
SERVERX2 Backup Server=networker01, Policy=vmbackup, Workflow=Linux_Test_Production, Action=Linux_Test_Production, JobId=1226239, StartTime=2018-12-05T23:58:27Z, EndTime=2018-12-06T00:0...
But I would like retrieve only "starttime" and "endtime" values
Desired output is;
VM Value
-- -----
SERVERX StartTime=2018-10-31T00:00:27Z, EndTime=2018-10-31T00:12:45Z
SERVERX1 StartTime=2018-12-06T00:00:29Z, EndTime=2018-1206T00:11:14Z
SERVERX2 StartTime=2018-12-05T23:58:27Z, EndTime=2018-12-06T00:11:20Z
How can I get this output?
This would be better suited in Powershell forum as this is just data manipulation.
Providing your output is always the same number of commas then
$myannotation = Get-VM -name SERVERX | Get-Annotation -CustomAttribute "Last EMC
vProxy Backup"|select #{N='VM';E={$_.AnnotatedEntity}},Value
$table1 = #()
foreach($a in $myannotation)
$splitter = $a.value -split ','
$splitbackupstart = $splitter[5]
$splitbackupend = $splitter[6]
$row = '' | select vmname, backupstart, backupend
$row.vmname = $a.AnnotatedEntity # or .vm would have to try
$row.backupstart = $splitbackupstart
$row.backupend= $splitbackupend
$table1 += $row
}
$table1
Untested. If you format of the string is going to change over time then a regex to search for starttime will be better.
I have an issue with a bit of code to create a word document, fill this with some lines of text, creating a list (numbering, 1., 1.1, 1.1.1, etc) and then creating an index. ($i is part of a for loop)
This works amazingly well when I just use the following line of code:
$paragraphs[0].Item($i).range.ListFormat.ApplyNumberDefault(1)
The output is then:
1., a., i.
For some reason it defaults to 'single level' lists if I put down:
$paragraphs[0].Item($i).range.ListFormat.ApplyNumberDefault(0)
Resulting in the output:
1., 2., 3.
However, using the below code obviously doesn't work, because I need a ListTemplate object to apply to the format, but I can't find any specific way to create that object in Powershell. There's some VBA examples, but I seem incapable of translating this to Powershell.
$paragraphs[0].Item($i).range.ListFormat.ApplyListTemplate('wdStyleListBullet2')
The intended end-result has to be 1., 1.1., 1.1.1. ...
(Obviously the bullet2 style is just an example, the question is how do I create the ListTemplate object in Powershell).
#Function to create a or multiple paragraphs, to prevent absurd paragraph clutter
function CreateParagraph($Selection, $count)
{
for ($i = 0;$i -lt $count;$i++){
$Selection.TypeParagraph()
}
}
#Function to create numbered lists based on a selected range of paragraphs
function NumberParagraphs($Selection, $paragraphs, $countstart, $countend, $indent)
{
$x = $false
$template = $word.ListGalleries[[Microsoft.Office.Interop.Word.WdListGalleryType]::WdBuiltinStyle].ListTemplates(2)
$template
for ($i = $countstart;$i -le $countend;$i++)
{
if (($paragraphs[0].Item($i).range.text -ne $null) -and ($paragraphs[0].Item($i).range.text -ne "") -and ($paragraphs[0].Item($i).range.text.length -gt 1))
{
#Set the listtemplate style here
#$paragraphs[0].Item($i).range.ListFormat.ApplyNumberDefault(1)
$paragraphs[0].Item($i).range.ListFormat.ApplyListTemplate($template)
}
if ($x -eq $false)
{
$indent
if ($indent -eq -1)
{
$paragraphs[0].Item($i).range.ListFormat.ListLevelNumber = 1
}
else
{
$paragraphs[0].Item($i).range.ListFormat.ListLevelNumber = $indent
}
}
$x = $true
}
}
#create Word object, create a new Word document
$Word = New-Object -ComObject Word.Application
$Word.Visible = $True
$Document = $Word.Documents.Add()
$Selection = $Word.Selection
$Range = $Selection.Range
#Add table of content
$Toc = $Document.TablesOfContents.Add($range)
#Create sample headers (Office language must be US or EN(?))
CreateParagraph $Selection 1
$Selection.Style = 'Heading 1'
$Selection.TypeText("Hello")
CreateParagraph $Selection 1
$Selection.Style = "Heading 2"
$Selection.TypeText("Report compiled at $(Get-Date).")
CreateParagraph $Selection 1
$Selection.Style = 'Heading 2'
$Selection.TypeText("Report compiled at $(Get-Date).")
CreateParagraph $Selection 1
$Selection.Style = 'Heading 2'
$Selection.TypeText("Report compiled at $(Get-Date).")
CreateParagraph $Selection 1
$Selection.Style = 'Heading 2'
$Selection.TypeText("Report compiled at $(Get-Date).")
CreateParagraph $Selection 1
$Selection.Style = 'Heading 2'
$Selection.TypeText("Report compiled at $(Get-Date).")
$Paragraphs = $Document.Range().Paragraphs
#create numbered lists.
NumberParagraphs $Selection $Paragraphs 2 2 1
NumberParagraphs $Selection $Paragraphs 3 3 2
NumberParagraphs $Selection $Paragraphs 4 5 -1
NumberParagraphs $Selection $Paragraphs 6 7 2
#Refresh table of content
$toc.Update()
After spending most of the day questioning my own sanity, I decided to go basically reverse engineer my own actions. Obviously one would expect that the $word object would contain all references required, which it does. I tested this earlier myself; It does contain the full range of templates under galleries. I had seen that before.
So I went back, revisiting what I had already attempted and what I had not and it turns out I had somehow ignored one obvious answer:
$paragraphs[0].Item($i).range.ListFormat.ApplyListTemplate($Word.ListGalleries::ListTemplates[15])
Now the only thing that might be an issue, is when, as Cindy says, the order or count of templates differentiates from one to the other workstation. I might have to build a solution for that, but that's of later concern.
You have a working Powershell script that automated Word. You'd like to use the following snippet in that script:
$paragraphs[0].Item($i).range.ListFormat.ApplyListTemplate('wdStyleListBullet2')
But, you can't quite get it to work?
I cooked up the following:
$word = New-Object -ComObject word.application
$word.Visible = $false
$doc = $word.documents.add()
$doc.paragraphs.add()
$template = $word.ListGalleries[[Microsoft.Office.Interop.Word.WdListGalleryType]::WdBuiltinStyle].ListTemplates(2)
$doc.paragraphs(1).range.ListFormat.ApplyListTemplate($template)
It's kind of what you want. I just don't know parameter to provide to ListTemplates(). It takes a number. I'm not sure which number ties to 'wdStyleListBullet2'. You have to figure that out. Unfortunately, ComObject's don't provide the same reflective abilities as .NET objects. :-(
But, to your question, that's how you'd call the ApplyListTemplate() function.
Sorry for previous confusion...
I've spent several hours today trying to write a powershell script that will pull a client ID off a PDF from system #1 (example, Smith,John_H123_20171012.pdf where the client ID is the H#### value), then look it up in an Excel spreadsheet that contains the client ID in system 1 and system 2, then rename the file to the format needed for system 2 (xxx_0000000123_yyy.pdf).
One gotcha is that client # is 2-4 digits in system 2 and always preceeded by 0's.
Using Powershell and regular expressions.
This is the first part I am trying to use for my initial rename:
Get-ChildItem -Filter *.pdf | Foreach-Object{
$pattern = "_H(.*?)_2"
$OrionID = [regex]::Match($file, $pattern).Groups[1].value
Rename-Item -NewName $OrionID
}
It is not accepting "NewName" because it states it is an empty string. I have run:
Get-Variable | select name,value,Description
And new name shows up as a name but with no value. How can I pass the output from the Regex into the rename?
Run this code line by line in debugger, you will understand how this works.
#Starts an Excel process, you can see Excel.exe as background process
$processExcel = New-Object -com Excel.Application
#If you set it to $False you wont see whats going on on Excel App
$processExcel.visible = $True
$filePath="C:\somePath\file.xls"
#Open $filePath file
$Workbook=$processExcel.Workbooks.Open($filePath)
#Select sheet 1
$sheet = $Workbook.Worksheets.Item(1)
#Select sheet with name "Name of some sheet"
$sheetTwo = $Workbook.Worksheets.Item("Name of some sheet")
#This will store C1 text on the variable
$cellString = $sheet.cells.item(3,1).text
#This will set A4 with variable value
$sheet.cells.item(1,4) = $cellString
#Iterate through all the sheet
$lastUsedRow = $sheet.UsedRange.Rows.count
$LastUsedColumn = $sheet.UsedRange.Columns.count
for ($i = 1;$i -le $lastUsedRow; $i++){
for ($j = 1;$j -le $LastUsedColumn; $j++){
$otherString = $sheet.cells.item($i,$j).text
}
}
#Create new Workbook and add sheet to it
$newWorkBook = $processExcel.Workbooks.Add()
$newWorkBook.worksheets.add()
$newSheet = $newWorkBook.worksheets.item(1)
$newSheet.name="SomeName"
#Close the workbook, if you set $False it wont save any changes, same as close without save
$Workbook.close($True)
#$Workbook.SaveAs("C:\newPath\newFile.xls",56) #You can save as the sheet, 56 is format code, check it o internet
$newWorkBook.close($False)
#Closes Excel app
$processExcel.Quit()
#This code is to remove the Excel process from the OS, this does not always work.
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($processExcel)
Remove-Variable processExcel
I ended up using a utility called "Bulk Rename Utility" and Excel. I can run the various renaming regex's through BRU and add the reference .txt file after some Excel formatting.