Regex -replace in Powershell - regex

I am trying to read a .sln file and extract the strings that contain the path to the .csproj within my solution.
The lines that contain the information that I am looking for look like this:
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Project", "Project\Project.csproj", "{0DB516E6-4358-499D-BFBF-408F50A44E14}"
So, this is what I am trying:
$projectsInFile = Select-String "$slnFile" -pattern '^Project'
$csprojectsNames = $projectsInFile -replace ".+= `"(\S*) `""
Now, $csprojectsName contain the information that I am looking for, but also the rest of the string.
Just like this:
Project\Project.csproj", "{0DB516E6-4358-499D-BFBF-408F50A44E14}"
What is the best way to retrieve the name of the .csproj file without needing to manually cut the rest of the string?
Thank you

What you can do is capture the entire string and use a capture group in your replacement string thereby dropping the unneeded parts.
$csprojectsNames = $projectsInFile -replace '.+= "(\S*) "(.*?)",.*"','$2'
The second capture group is the data inbetween the quotes that follow = "Project", ".....". Since it is the second capture group we replace the entire with that group '$2'. Using single quotes ensure that PowerShell does not try to expand a variable.
Better approach
You might just be able to use [^"]*?\.csproj in select-string directly without having to do a secondary parse. That will match everything before .csproj that is not a quote so it wont gooble up too much.

You can use a group to capture the file path and then use the value of the group in as the replacement value. For instance:
$csprojectsNames = $projectsInFile -replace 'Project\(.*?\) = "Project", "(.*?)"', '$1'

Related

Complex named match group RegEx review

From this example string
$logLine = '{header[3]}_Pragmatic Praxis Initialization Log'
I am trying to extract three pieces of data
header as type
3 as an (optional) tab value
everything after that _ as a string
What I have now is
$logLine = '{header[3]}_Pragmatic Praxis Initialization Log'
if ($logLine -match '^\{(?<type>[a-z]+)(?:\[?(?<tab>\d?)\]?)\}_(?<string>.+)$') {
Write-Host "$($matches['type'])"
Write-Host "$($matches['tab'])"
Write-Host "$($matches['string'])"
}
And it's working well. But I am so unskilled in RegEx, and this is by far the most complex RegEx I have ever cobbled together from scratch, that I am wondering if anyone sees a gotcha in this approach that I am not seeing?
Or do I need to open some wine and celebrate reaching some sort of RegEx comprehension milestone?
EDIT:
So my success made me over confident. I decided to make Tab required, but add an optional Target which can be either 'console' or 'file'. So I did this
$logLine = '{header[3]}_Pragmatic Praxis Initialization Log'
if ($logLine -match '^\{(?<type>[a-z]+)(?:-(?<target>(console|file)))\[(?<tab>\d*)\]\}_(?<string>.+)$') {
Write-Host "$($matches['type'])"
Write-Host "$($matches['target'])"
Write-Host "$($matches['tab'])"
Write-Host "$($matches['string'])"
}
Which works a treat when target is present, but fails when it is not. So, looks like I get to learn something, rather than celebrate. ;)
EDIT #2:
Per #Ansgar Wiechers, I was indeed misunderstanding (?:...), specifically confusing it for (....)?. based on that, this is my revised pattern, which seems to be doing what I want. I may still make both target and tab required, since I think it makes the code more readable while also simplifying the RegEx pattern, but still good to have it working as I initially intended it to work too.
if ($logLine -match '^\{(?<type>[a-z]+)(-(?<target>(console|file)))?(\[(?<tab>\d+)\])?\}_(?<string>.+)') {
Write-Host "$($matches['type'])"
Write-Host "$($matches['target'])"
Write-Host "$($matches['tab'])"
Write-Host "$($matches['string'])"
}
Looks to me like you're misunderstanding what (?:...) does. That construct does not define an optional match, but a non-capturing group. A (sub)expression (?:-(?<target>console|file)) will require the string to contain either -console or -file and return console or file (without the leading hyphen) as a named match "target". To make the group optional you need to add another ? after the group.
^\{(?<type>[a-z]+)(?:-(?<target>console|file))?\[(?<tab>\d*)\]\}_(?<string>.+)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
Note that a trailing expression .+ or .* makes anchoring the expression at the end of the string ($) pointless, so just remove the $ from the end of your expression.
You also don't need the nested (unnamed) capturing group around console|file. The named capturing group is sufficient.

REGEX Pattern for Username inside a longer string

MAC OSX, PowerShell 6.1 Core
I'm struggling with creating the correct REGEX pattern to find a username string in the middle of a url. In short, I'm working in Powershell Core 6.1 and pulling down a webpage and scraping out the "li" elements. I write this to a file so I have a bunch of lines like this:
<LI>Smith, Jimmy
The string I need is the "jimmysmith" part, and every line will have a different username, no longer than eight alpha characters. My current pattern is this:
(<(.|\n)+?>)|( )
and I can use a "-replace $pattern" in my code to grab the "Smith, Jimmy" part. I have no idea what I'm doing, and any success in getting what I did get was face-roll-luck.
After using several online regex helpers I'm still stuck on how to just get the "string after the third "/" and up-to but not including the last quote.
Thank you for any assistance you can give me.
You could go super-simple,
expand-user/([^"]+)
Find expand-user, then capture until a quotation.
(?:\/.*){2}\/(?<username>.*)"
(?:\/.*) Matches a literal / followed by any number of characters
{2} do the previous match two times
\/ match another /
(?<username>.*)" match everything up until the next " and put it in the
username group.
https://regex101.com/r/0gj7yG/1
Although, since each line is presumably identical up until the username:
$line = ("<LI>Smith, Jimmy ")
$line = $line.Substring(36,$line.LastIndexOf("\""))
the answer is what was posted by Dave. I saved my scraped details to a file (the lines with "li") by doing:
get-content .\list.txt -ReadCount 1000| foreach-object { $_ -match "<li>"} |out-file .\transform.txt
I then used the method proposed by Dave as follows:
$a = get-content .\transform.txt |select-string -pattern '(?:\/.*){2}\/(?<username>.*)"' | % {"$($_.matches.groups[1])"} |out-file .\final.txt
I had to look up how to pull the group name out, and i used this reference to figure that out: How to get the captured groups from Select-String?

Powershell with regex: Unable to find and replace ALL occurences of specified string in a set of data

I am new to regular expressions and stackoverflow. Any help would be greatly appreciated.
I am trying to remove unwanted data from a data set. The data is contained in a .csv file column with multiple cells, each cell containing data similar to this:
OSVDB #109124,OSVDB #109125,OSVDB #109126,OSVDB #109127,OSVDB #109128,OSVDB #109129,OSVDB #109130,OSVDB #109131,OSVDB #109132,OSVDB #109133,OSVDB #109134,OSVDB #109135,OSVDB #109136,OSVDB #109137,OSVDB #109138,OSVDB #109139,OSVDB #109140,OSVDB #109141,OSVDB #109142,OSVDB #109143,VMSA #2014-0012,OSVDB #102715,OSVDB #104972,OSVDB #106710,OSVDB #115364,IAVA #2014-A-0191,IAVB #2014-B-0160,IAVB #2014-B-0162,IAVB #2015-B-0007
I want to replace the above data with each occurrence of the strings beginning "IAV...". So, the above cell would read:
IAVA #2014-A-0191,IAVB #2014-B-0160,IAVB #2014-B-0162,IAVB #2015-B-0007
Below is a snippet of the script that imports the .csv and gets the column containing the data.
My regex, within powershell is:
$reg1 = '$1'
$reg2 = '(IAV[A|B]\s#[0-9]{4}-[A|B]-[0-9]{4}){1,}'
ForEach-Object {$_.IAVM = [regex]::replace($_.IAVM,$reg2,$reg1); $_}
The result is:
The entire cell contents posted above.
From my understanding {1,} at the end of the regex should return each occurrence of the string pattern, but I'm returning all contents of every cell containing my regex string.
Maybe instead of trying to pick out your string you just delete the stuff you don't want? Try something like:
$reg1=''
$reg2='((OSVDB|VMSA)\s#[M-S0-9-]{6,9}[,]?)'
You have .* in that regex at the very beginning. This will capture everything up to the last match of the pat that follows it. In your case I don't think you need that part anyway.
Also note that PowerShell has a handy -replace operator, so there's often no reason to use the static methods on the Regex type.

PowerShell and RegEx formatting

::EDIT::
After much goofing off, I was able to find a solution that appears to work in all cases... Consider the following:
$subject = '"LaunchPermission"=hex:01,00,14,80,64,00,00,00,74,00,00,00,14,00,00,00,30,00,00,00,02,00,1C,00,01,00,00,00,11,00,14,00,04,00,00,00,01,01,00,00,00,00,00,10,00,10,00,00,02,00,34,00,02,00,00,00,00,00,18,00,0B,00,00,00,01,02,00,00,00,00,00,0F,02,00,00,00,01,00,00,00,00,00,14,00,0B,00,00,00,01,01,00,00,00,00,00,01,00,00,00,00,01,02,00,00,00,00,00,05,20,00,00,00,20,02,00,00,01,02,00,00,00,00,00,05,20,00,00,00,20,02,00,00'
$result = $subject -creplace '(?ism)(.{1,76},)(.{1,75})', #'
$1\
$2\
'#
Write-Host $result
Note - Line 6 contains 2 spaces to get the indenting correctly.
This outputs exactly how I need! Thanks Fede for putting me on the right track!
Preface: I do realize there are other ways of achieving the end goal here, but within my current, larger scope, I need to format $subject in a specific way.
Good evening! Regex noob here. I'm attempting to find a way to format $subject in such a way that it is valid for dropping into a Windows .Reg file. At this point with the code below, I am able to return the first line exactly as I need it, but I'm struggling with trying to figure out how to create a second capture group that returns the values immediately after the first capture group.
Below is my current PowerShell code.
$subject = '"LaunchPermission"=hex:01,00,14,80,64,00,00,00,74,00,00,00,14,00,00,00,30,00,00,00,02,00,1C,00,01,00,00,00,11,00,14,00,04,00,00,00,01,01,00,00,00,00,00,10,00,10,00,00,02,00,34,00,02,00,00,00,00,00,18,00,0B,00,00,00,01,02,00,00,00,00,00,0F,02,00,00,00,01,00,00,00,00,00,14,00,0B,00,00,00,01,01,00,00,00,00,00,01,00,00,00,00,01,02,00,00,00,00,00,05,20,00,00,00,20,02,00,00,01,02,00,00,00,00,00,05,20,00,00,00,20,02,00,00'
$result = $subject -creplace '(?ism)(.{1,78},).*', '$1\'
Write-Host $result
This returns a $result of:
"LaunchPermission"=hex:01,00,14,80,64,00,00,00,74,00,00,00,14,00,00,00,30,00,\
From that point, I need to figure out how to create a second capture group so that it contains the remainder of the hex pairs that I can then apply additional formatting to.
The end goal is to have $result returned like this(for any similar value fed in via $subject):
"LaunchPermission"=hex:01,00,14,80,64,00,00,00,74,00,00,00,14,00,00,00,30,00,\
00,00,02,00,1c,00,01,00,00,00,11,00,14,00,04,00,00,00,01,01,00,00,00,00,00,\
10,00,10,00,00,02,00,34,00,02,00,00,00,00,00,14,00,0b,00,00,00,01,01,00,00,\
00,00,00,01,00,00,00,00,00,00,18,00,0b,00,00,00,01,02,00,00,00,00,00,0f,03,\
00,00,00,00,10,00,00,01,02,00,00,00,00,00,05,20,00,00,00,20,02,00,00,01,02,\
00,00,00,00,00,05,20,00,00,00,20,02,00,00
Any thoughts?
I'm not fully sure if this is what you want.
Using this regex:
(.{1,78},)(.{1,78})
You can check this working demo

Regex Assistance for a url filepath

Can someone assist in creating a Regex for the following situation:
I have about 2000 records for which I need to do a search/repleace where I need to make a replacement for a known item in each record that looks like this:
<li>View Product Information</li>
The FILEPATH and FILE are variable, but the surrounding HTML is always the same. Can someone assist with what kind of Regex I would substitute for the "FILEPATH/FILE" part of the search?
you may match the constant part and use grouping to put it back
(<li>View Product Information</li>)
then you should replace the string with $1your_replacement$2, where $1 is the first matching group and $2 the second (if using python for instance you should call Match.group(1) and Match.group(2))
You would have to escape \ chars if you're using Java instead.