PowerShell capture website data using regex and Invoke-WebRequest - regex

I am trying to capture the now playing song on this radio station when it is displayed on the website, I'm at the beginning of writing the script, so far I have the following code:
$webpage = (Invoke-WebRequest http://www.2dayfm.com.au).Content
$regex = [regex]"(.*nowPlayingInfo.*span)"
$regex.Match($webpage).Value.Split(">")[4].Replace("</span","")
This captures the website listed in the code, however there's two things an issue.
The first thing, when this code is run, it comes back with Loading... Reason for this, if I look at the result of this:
(Invoke-WebRequest http://www.2dayfm.com.au).Content | clip
Paste this into notepad, if I search for "Playing:" it has this line:
<p><span class="listenHeading">Playing:</span> <span id="nowPlayingInfo">Loading...</span></p>
When I run the Invoke-WebRequest in my code, it captures the website at that point in time, and to see this in real life, navigate in a browser to http://www.2dayfm.com.au/ and look right at the top where the Now Playing song is, it says Loading... for a short time before the song actually loads.
The other thing I was hoping is to remove the second line of the code and clean the regex up on the first line, so I don't need to use as many Split & Replace methods.
The other way I was trying to get this to work was by copying the XPATH from Chrome inspect element, the use something like
(Invoke-WebRequest -Uri 'http://www.2dayfm.com.au').Content | Select-Xml -XPath '//*[#id="nowPlayingInfo"]'
But this doesn't seem to work either, like it doesn't accept the XPATH, like the XPATH Chrome that thinks it is, is different to what PowerShell expects the XPATH to be.

Using a scraper isn't going to work because you get just the initial html content that is downloaded. The page uses Javascript/Ajax to render the song/artist info by manipulating the DOM after the initial download. However, you can use the InternetExplorer.Application COM object to do this:
$ie = New-Object -comObject InternetExplorer.Application
$ie.navigate('http://www.2dayfm.com.au/')
while ($ie.ReadyState -ne 4) { Start-Sleep -Seconds 1 } # need timeout here
$null = $ie.Document.body.innerhtml -match '\s+id\s*=\s*"nowPlayingInfo"\s*>(.*)</span'
$ie.Quit()
$matches[1]
Outputs:
Little Mix, Black Magic
The $null = bit is to just get rid of the True output that the -match operator generates (assuming the regex matches).

Related

Can't seem to get RegEx to match

I am trying to extract the Get-Help comment headers from a PowerShell script...using PowerShell. The file I'm reading looks something like this:
<#
.SYNOPSIS
Synopsis goes here.
It could span multiple lines.
Like this.
.DESCRIPTION
A description.
It could also span multiple lines.
.PARAMETER MyParam
Purpose of MyParam
.PARAMETER MySecondParam
Purpose of MySecondParam.
Notice that this section also starts with '.PARAMETER'.
This one should not be captured.
...and many many more lines like this...
#>
# Rest of the script...
I would like to get all the text below .DESCRIPTION, up to the first instance of .PARAMETER. So the desired output would be:
A description.
It could also span multiple lines.
Here's what I've tried:
$script = Get-Content -Path "C:\path\to\the\script.ps1" -Raw
$pattern = '\.DESCRIPTION(.*?)\.PARAMETER'
$description = $script | Select-String -Pattern $pattern
Write-Host $description
When I run that, $description is empty. If I change $pattern to .*, I get the entire contents of the file, as expected; So there must be something wrong with my RegEx pattern, but I can't seem to figure it out.
Any ideas?
(get-help get-date).description
The `Get-Date` cmdlet gets a DateTime object that represents the current date
or a date that you specify. It can format the date and time in several Windows
and UNIX formats. You can use `Get-Date` to generate a date or time character
string, and then send the string to other cmdlets or programs.
(get-help .\script.ps1).description
the Select-String cmdlet works on entire strings and you have given it ONE string. [grin]
so, instead of fighting with that, i went with the -match operator. the following presumes you have loaded the entire file into $InStuff as one multiline string with -Raw.
the (?ms) stuff is two regex flags - multiline & singleline.
$InStuff -match '(?ms)(DESCRIPTION.*?)\.PARAMETER'
$Matches.1
output ...
DESCRIPTION
A description.
It could also span multiple lines.
note that there is a blank line at the end. you likely will want to trim that away.
In the words of #Mathias R. Jessen:
Don't use regex to parse PowerShell code in PowerShell
Use the PowerShell parser instead!
So, let's use PowerShell to parse PowerShell:
$ScriptFile = "C:\path\to\the\script.ps1"
$ScriptAST = [System.Management.Automation.Language.Parser]::ParseFile($ScriptFile, [ref]$null, [ref]$null)
$ScriptAST.GetHelpContent().Description
We use the [System.Management.Automation.Language.Parser]::ParseFile() to parse our file and ouput an Abstract Syntax Tree (AST).
Once we have the Abstract Syntax Tree, we can then use the GetHelpContent() method (exactly what Get-Help uses) to get our parsed help content.
Since we are only interested in the Description portion, we can simply access it directly with .GetHelpContent().Description

Remove Substring with Regex Characters in Filename

so i'm trying to organize and clean up on how I download music. One of these ways is using a pluging for converting videos into MP3, though this usually leaves a watermark in the filename, which I'd like to remove via a Powershell Script.
So I essentially would have this "artist - songname[watermark.com].mp3"
I've looked into it and try to just get the brackets removed due to them being regex and i've had this done so far:
$removeMe = "[watermark.com]"
$list = Get-ChildItem *.mp3 | -replace '[[\]]',''
That's what I have so far before I get lost, It removes the brackets like so
Artist - Songnamewatermark.com.mp3
So I tried the same with
-replace 'watermark.com' yet it brings back the brackets.
Artist - Songname[watermark.com].mp3
I'm kind of struggling here, RegEx is not my forte, any help would be appreciated.
You can use code:
-replace '\[[A-Za-z.]+\]',''

Update a line in the AD info field

I have slight problem.
We have a PowerShell script that sets an expiration date in the 'Notes:' field in the AD.
What i want to do is to be able to remove/update this w/o removing other data in the field.
Example of 'Notes:' field (for ie. user X):
GR1234567890 expires on 20251125
END
If i use following code to try and isolate everything but the line starting with GR in it.
$UserName = Get-ADUser -Filter {SAMAccountName -eq "X"} -Properties Info
$UserName.Info | Select-String -Pattern 'GR[\s\S].+' -NotMatch
I get a "full match" and no output at all.
And if i remove -NotMatch i get a full match and full output of 'Notes:' field.
I've tried the RegEx in some of the RegEx online testers out there and there it works as expected. It is like there are no LF/CR or some wierd encoding on the output when traversing the pipeline...
I could do a match GR, a date and everything in between i guess... but id like for knowledge sake want to know if the above thinking is not possible or totally wrong (RegEX is not my strongest suit).
Problem was indeed the code itself as pointed out by Wiktor. Hats of to him.
$UserName.Info -Replace '^GR.+'
Will remove the line i want removed.

how to remove all image tags based on string

I am trying to remove the images from the HTML of a few hundred pages from a folder. The common string is "/stream/image.axd". I have tried using RegEx but I cannot seem to figure out how to get to the start and end parts of the tags.
An example would look like the below.
The new gears look like <img src="/stream/image.axd?picture=planetary.gif" width="600" height="237">
First: you shouldn't parse html using regular expressions, have a look here. If you still want to do it, you can use something like
Get-Content 'file.html' | ForEach{$_ -replace '<.*/stream/image\.axd.*?>'}
More advanced, you can use the method in this thread in order to set up a .NET-parsed version of a local html file:
$html = New-Object -ComObject "HTMLFile";
$source = Get-Content -Path "file.html" -Raw;
$html.IHTMLDocument2_write($source);
Once you have it you can identify the images by tag name and then use the method removeNode in order to remove the image-tags.

How to Find Replace Multiple strings in multiple text files using Powershell

I am new to scripting, and Powershell. I have been doing some study lately and trying to build a script to find/replace text in a bunch of text files (Each text file having code, not more than 4000 lines). However, I would like to keep the FindString and ReplaceString as variables, for there are multiple values, which can in turn be read from a separate csv file.
I have come up with this code, which is functional, but I would like to know if this is the optimal solution for the aforementioned requirement. I would like to keep the FindString and ReplaceString as regular expression compatible in the script, as I would also like to Find/Replace patterns. (I am yet to test it with Regular Expression Pattern)
Sample contents of Input.csv: (Number of objects in csv may vary from 50 to 500)
FindString ReplaceString
AA1A 171PIT9931A
BB1B 171PIT9931B
CC1C 171PIT9931E
DD1D 171PIT9932A
EE1E 171PIT9932B
FF1F 171PIT9932E
GG1G 171PIT9933A
The Code
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Host "Processing Find Replace on : $Graphic"
foreach($item in $FindReplaceList){
Get-Content $Graphic | ForEach-Object { $_ -replace "$($item.FindString)", "$($item.ReplaceString)" } | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
$Iteration = $Iteration +1
Write-Host "String Replace Completed for $($item.ReplaceString)"
}
}
I have gone through other posts here in Stackoverflow, and gathered valuable inputs, based on which the code was built. This post from Ivo Bosticky came pretty close to my requirement, but I had to perform the same on a nested foreach loop with Find/Replace Strings as Variables reading from an external source.
To summarize,
I would like to know if the above code can be optimized for
execution, since I feel it takes a long time to execute. (I prefer
not using aliases for now, as I am just starting out, and am fine
with a long and functional script rather than a concise one which is
hard to understand)
I would like to add the number of Iterations being carried out in
the loop. I was able to add the current Iteration number onto the
console, but couldn't figure how to pipe the output of
Measure-Command onto a variable, which could be used in Write-Host
Command. I would also like to display the time taken for code
execution, on completion.
Thanks for the time taken to read this Query. Much appreciate your support!
First of all, unless your replacement string is going to contain newlines (which would change the line boundaries), I would advise getting and setting each $Graphic file's contents only once, and doing all replacements in a single pass. This will also result in fewer file renames and deletions.
Second, it would be (probably marginally) faster to pass $item.FindString and $item.ReplaceString directly to the -replace operator rather than invoking the templating engine to inject the values into string literals.
Third, unless you truly need the output to go directly to the console instead of going to the normal output stream, I would avoid Write-Host. See Write-Host Considered Harmful.
And fourth, you might actually want to remove the Write-Host that gets called for every find and replace, as it may have a fair bit of effect on the overall execution time, depending on how many replacements there are.
You'd end up with something like this:
$timeTaken = (measure-command {
$Iteration = 0
$FDPATH = 'D:\opt\HMI\Gfilefind_rep'
#& 'D:\usr\fox\wp\bin\tools\fdf_g.exe' $FDPATH\*.fdf
$GraphicsList = Get-ChildItem -Path $FDPATH\*.g | ForEach-Object FullName
$FindReplaceList = Import-Csv -Path $FDPATH\Input.csv
foreach($Graphic in $Graphicslist){
Write-Output "Processing Find Replace on : $Graphic"
Get-Content $Graphic | ForEach-Object {
foreach($item in $FindReplaceList){
$_ = $_ -replace $item.FindString, $item.ReplaceString
}
$Iteration += 1
$_
} | Set-Content ($Graphic+".tmp")
Remove-Item $Graphic
Rename-Item ($Graphic+".tmp") $Graphic
}
}).TotalMilliseconds
I haven't tested it but it should run a fair bit faster, plus it will save the elapsed time to a variable.