Powershell Regex: How to back reference more than 10 groups - regex

I am trying to back reference more than 10 groups in PowerShell regex but cannot find the correct syntax. Here is the relevant snippet of the script.
ForEach-Object {$folder=($_.Name -replace '([0-9]{3})( \()([A-Z]{3}-[0-9]{2})(-...-)(.)(.)(......)(......)(..)(.)(..)(.*)',"`$3-`$5-`$9-`$10"); $folder}
Please advise on how to make group "`$10" work. Thanks.
I also tried "`${10}" and it did not work.

Related

Why does my regex not work in PowerShell? [duplicate]

I was under impression that .replace and -replace were the exact same thing, however I found that I could not accomplish some RegEx tasks with .replace that I could with -replace. Could someone please point out what I'm missing?
Broken Regex replace:
$a=$a.Replace('.:\\LOGROOT\\', "\\$env:computername\logroot\")
Working Regex replace:
$a=$a -Replace('.:\\LOGROOT\\', "\\$env:computername\logroot\")
ps:
The following URL leads me to think there are .replace options I am unfamiliar with, but I cant seem to find any additional information on how to use them, or how to access the help for these options. http://www.computerperformance.co.uk/powershell/powershell_regex.htm
Regex.Replace(String, String, String, RegexOptions) and also:
Regex.Replace(String, String, MatchEvaluator, RegexOptions) methods.
Thank you
While #Keith Hill's answer explains the difference between Replace method and the -replace operator, to explain why you might not see the same result, it is because you are using the String.Replace method which does string replace and -replace operator uses regex replace. You can use the Regex.Replace method for this purpose and you should see the same effect:
[regex]::replace($a,'.:\\LOGROOT\\', "\\$env:computername\logroot\")
In short, the -replace operator is same as Regex.Replace (the particular overload linked above), but in general Replace() can be instance or static method that can be doing anything completely different from -replace
They are not the same thing. .Replace is a .NET method either on System.String or any other type with an instance method named Replace. -replace is a PowerShell operator that that uses regular expressions. Run man about_operators to see more info on the -replace operator.

How to use Regex to replace a tag in a word document with Powershell

First post on stackoverflow for me so sorry if something is out of norm or similar ^^
Currently I'm trying to find a way to read vouchers out of a .csv that I get from my pfsense.
Plan is to read it out of the .csv and write it down in a Word document so that secretaries can print it out and give them out to coworkers.
So far I have no problems replacing names and room numbers, all I gotta do now is to find a way to replace the voucher codes, but since they obviously always change I tried to use regex, here's the current state of that part of my code:
if ($Vouchers -match '((\d|\w){11})*') {
$matches.0 }
ReplaceTag –Document $Doc -FindText ‘<Vouchers>’ -replacewithtext $matches
The regex itself is working perfectly fine (already tested it on regex101) so I guess it's the code.
I'm assuming that it's trying to literally match "((\d|\w){11})*" instead of using the pattern :\
Any kinda help would be welcomed!

How do I escape double quote and other troublesome characters in Powershell regex

I've hit a snag in a script I'm putting together to download the latest installation packages without needing to use Chocolatey or Ketarin. Unfortunately a few utilities aren't provided at a direct download link and are hidden behind redirecting URLs, with the download URL expiring after 15 minutes. To complicate things a bit further, I'm doing this in PowerShell 2 as we have a few Vista machines in our office.
After researching other similar scenarios, it seems as though I can invoke the .NET WebClient to handle the download, though there isn't a progress bar. As I haven't found a sample of code to handle downloading files behind redirects after a certain amount of time that works with a .NET WebClient, I decided that what I could do is use a WebClient request to load the page, and then get the current direct download URL from the page using the following regex, and then use a regex to that URL to download the file. I've checked with regexr.com to verify that the regex catches the sample URL below.
Sample URL
CF DL here
Regex
<a(?: [^>]*?)? href=(["'])([^\1]*?ProgramName*?)\1(?: .*?)?>.*?<\/a>
Unfortunately Powershell red flags this, as it seems to think the double quotes need to be terminated. After attempting to escape any red-flagged characters using backticks, I've wound up with the following, that throws a error saying that '?:' is not recognized as a term, cmdlet, etc.
$downloadLinkRegex = New-Object System.Text.RegularExpressions.Regex (<a(?: [^>]*?)? href=(`[`"`'])(`[`^\1]*?ProgramName.exe*?)\1(?: .*?)?>.*?</a>)
if ("https://www.example.com/randomstring003ejdjd38/dl/ProgramName.exe" -match $downloadLinkRegex){
write-host "yay"
} else{
write-host "nope"}
Attempts to escape the ? using backticks fails also. Regex's are incredibly difficult for me, so at this point I'm out of ideas on how to make the ISE recognize that this is a valid regex, and that it doesn't need to be validated, and that it can be stored as the value of a variable to be called later on the contents of a webrequest.
If anyone could point out where I've gone wrong, or how to resolve the issue, I would be immensely grateful.
The easiest way I can think of is by using the #" bla "# block in powershell (I don't know the official name).
For example :
$regex = #"
Insert regex here
"#
Everything between the #" "# block will be treated as a string value.
I just removed the items PowerShell flags. I had to test several different ways to make sure this was the only way PowerShell would let me print to HTML. Even the ConvertTo-HTML won't bypass PowerShell's issues. It is a like a hybrid to HTML. I also noticed that PowerShell doesn't pay attention to blank space when you type so my real code has lots of spaces and empty lines to differentiate my script.
$My_HTML_table = "<!DOCTYPE html>
<head><title> My Excellent Page </title></head>
<H2> Table 1 </H2>
<text></text>
<table border=1;border-style:solid>
<tr>
<td colspan=1 style=color:blue;background-color:#CCCCCC;font-size:18;padding:5px> Cute Header </td>
</tr>"
$My_HTML_table > C:\File_Path\My_Excellent_HTML.html
But it doesn't match on regexr.com ...? It fails because it thinks the </a> is the end of the regex. It also fails because it's trying to match ProgramNam(one or unlimited 'e') and ignoring the .exe bit. (And "must not match octal number 1"? That's probably not what you want in there (no, I didn't know that, I just saw it while scratching my head trying to decipher this on regex101.com)).
Anyway, to your question: PowerShell doesn't have regex literals, so you can't just write <a(?: [^>]*?... into the shell and have it work. They have to be strings.
But they don't have to be run through New-Object System.Text.RegularExpressions.Regex.
e.g.
$url = 'CF DL here'
$pattern = "<a.*?href=[`"'](.*?)[`"'][^>]*>.*?</a>"
$url -match $pattern
$Matches[1]
I've quoted the string in double quotes around the outside. And then I've used a backtick to escape the double quotes inside the pattern.
Where the regex pattern is explained much more helpfully here
I actually reworked the regex into something simpler to resolve the issue. While the URL continually changes the file name doesn't, so I focused on the filename, rather than the whole URL, and was able to grab the URL I needed.
Looks good
$a='CF DL here'
$a -match '(?<=ef=")[^"]+?(\w+).(exe|pdf)'
Iwr $matches[0] -outfile "$($matches[1]).$($matches[2])"

How do I extract data between HTML tags using Regex?

I've been assigned some sed homework in my class and am one step away from finishing the assignment. I've racked my head trying to come up with a solution and nothing's worked to the point where I'm about to give up.
Basically, in the file I've got...I'm supposed to replace this:
<b>Some text here...each bold tag has different content...</b>
with
Some text here...each bold tag has different content...
I've got it partially completed, but what I can't figure out is how to "echo" the extracted content using sed (regexp).
I manage to substitute the content out just fine, but it's when I'm trying to actually OUTPUT the content that's between the HTML tags that it goes wrong.
If that's confusing, I truly apologize. I've been at this project a couple hours now and am getting a bit frusturated. Basically, why does this not work?
s/<b>.*<\/b>/.*/g
I simply want to output the content WITHOUT the bold tags.
Thanks a bunch!
If you want to reference a part of your regex match in the replacement, you need to place that portion of the regex into a capturing group, and then refer to it using the group number preceded by a backslash. Try the following:
s/<b>\(.*\)</b>/\1/g
You need to use a capturing group, which are parentheses ()
So, it's just this:
s/<b>(.*)<\/b>/\1/g
Capturing groups are numbered, from left to right, starting with one, and increasing.
This syntax is the standard way to do regular expressions; sed's syntax is slightly different. the sed command is
sed 's/<b>\(.*\)<\/b>/\1/g' [file]
or
sed -r 's/<b>(.*)<\/b>/\1/g' [file]
Of course, if you just want to remove the bold tags, the other solution would be to just replace the HTML tags with blanks like so
sed 's/<\([^>]\|\(\"[^\"]\"\)\)*>//g' [file]
(I dislike sed's need to escape everything)
s/<([^\]|(\"[^\"]\"))*>//g
I think this question should be addressed to SED's mans. Like this: http://www.grymoire.com/Unix/Sed.html#uh-4

perl regex problem -- $amp in yahoo finance page

I found an old perl hack on the O'Reilly site http://oreilly.com/pub/h/1041 and decided to check it out. After a little fiddling around it started to run but the regex are out of date.
Here is the question: with this
/<a href="\/q\/op\?s=(.*?)\&m=(.*?)">/
as the first line of regex, what needs to be modified to make the regex function again? The following are snippets from
http://finance.yahoo.com/q/op?s=FISV
<a href="/q/op?s=FISV&k=55.000000">
and
<a href="/q/os?s=FISV&m=2011-04-15">
.
The original hack is dated 2004 and option symbols looked like this (FQVAH or FQVFF) back then instead of fisv110416c00060000 for a call option and fisv110416p00090000 for a put option. First thing I did to get it going was to modify all instances of $url to $curl because until the name was changed the symbol was not being passed to yahoo for lookup. The &amp is giving me the most trouble. If this is found to run without modification I would be very surprised and would very much like to know what system and perl -V is installed. SLES 10 and perl 5.8.0 is what I am currently using.
Any suggestions would be helpful. It could be a useful script to anyone who is serious about protecting themselves from a falling equity market.
Thanks,
robm
I'm not /100%/ sure what you're asking, but if I'm understanding, you want a regex that will capture "fisv110416c00060000" and tell you the first few letters, whether it's a call or a put, and the amount?
If so, you're looking for something like:
/([a-z]+)(\d+)([cp])(\d+)/
That should capture the following for the first example
$1 = "fisv"
$2 = 110416
$3 = c
$4 = 00060000
The original regex was very specific to that html string. You can include the beginning bits of it if you need to use it to check that the entire string is there as well. Of course, make your regex as tight as possible to avoid over-matches and wasted time pattern matching. I'm just not sure the exact pattern you're trying to match (ie: is it always "fisv"?).
You should either first unescape the html, this would turn the & into a &, or just change the regex, like this:
/<a href="\/q\/os\?s=(.*?)\&(?:amp;)?m=(.*?)">/
To match both types of urls:
/<a href="\/q\/o[ps]\?s=(.*?)\&(?:amp;)?[mk]=(.*?)">/