Get the next string after validating patterns using powershell - regex

I have a text file and the contents can be:
debug --configuration "Release" \p corebuild
Or:
-c "Dev" debug
And now I have to validate the file to see if it has any pattern that matches --configuration or -c and print the string next to it
Pattern 1 - It should be Release
Pattern 2 - It should be Dev
How to achieve this in single command?
I tried below , but not sure how to extract only the release in the text , I only tried to see 1 pattern at a time
PS Z:\> $text = Get-Content 'your_file_path' -raw
PS Z:\> $Regex = [Regex]::new("(?<=\-\-configuration)(.*)")
PS Z:\> $Match = $Regex.Match($text)
PS Z:\> $Match.Value
**Release /p net**
Any help would be appreciated

If I understand correctly and you only care about extracting the argument to the parameters and not which parameter was used, this might do the trick:
$content = Get-Content 'your_file_path' -Raw
$re = [regex] '(?i)(?<=(?:--configuration|-c)\s")[^"]+'
$re.Matches($content).Value
See https://regex101.com/r/d2th35/3 for details.
From feedback in comments --configuration and -c can appear together, hence Regex.Matches is needed to find all occurrences.

To complement Santiago's helpful answer with a PowerShell-only alternative:
Assuming that a matching line only ever contains --configuration OR -c, you can avoid the need for .NET API calls with the help of the -match operator, which outputs a Boolean ($true or $false) to indicate whether the input string matches, and also reports the match it captures in the automatic $Matches variable:
# Note: Omitting -Raw makes Get-Content read the file *line by line*.
Get-Content 'your_file_path' |
ForEach-Object { # Look for a match on each line
# Look for the pattern of interest and capture the
# substring of interest in a capture group - (...) -
# which is later reflected in $Matches by its positional index, 1.
if ($_ -match '(?:--configuration|-c) "(.*?)"') { $Matches[1] }
}
Note:
-match only every looks for one match per input string, and only populates $Matches if the input is a single string (if it is an array of strings, -match acts as a filter and returns the subarray of matching elements).
GitHub issue #7867 proposes introducing -matchall operator that looks for all matches in the input string.
See this regex101.com page for an explanation of the regex.

Related

Using Regex to replace multiple lines of text in file

Basically, I have a .bas file that I am looking to update. Basically the script requires some manual configuration and I don't want my team to need to reconfigure the script every time they run it. What I would like to do is have a tag like this
<BEGINREPLACEMENT>
'MsgBox ("Loaded")
ReDim Preserve STIGArray(i - 1)
ReDim Preserve SVID(i - 1)
STIGArray = RemoveDupes(STIGArray)
SVID = RemoveDupes(SVID)
<ENDREPLACEMENT>
I am kind of familiar with powershell so what I was trying to do is to do is create an update file and to replace what is in between the tags with the update. What I was trying to do is:
$temp = Get-Content C:\Temp\file.bas
$update = Get-Content C:\Temp\update
$regex = "<BEGINREPLACEMENT>(.*?)<ENDREPLACEMENT>"
$temp -replace $regex, $update
$temp | Out-File C:\Temp\file.bas
The issue is that it isn't replacing the block of text. I can get it to replace either or but I can't get it to pull in everything in between.
Does anyone have any thoughts as to how I can do this?
You need to make sure you read the whole files in with newlines, which is possible with the -Raw option passed to Get-Content.
Then, . does not match a newline char by default, hence you need to use a (?s) inline DOTALL (or "singleline") option.
Also, if your dynamic content contains something like $2 you may get an exception since this is a backreference to Group 2 that is missing from your pattern. You need to process the replacement string by doubling each $ in it.
$temp = Get-Content C:\Temp\file.bas -Raw
$update = Get-Content C:\Temp\update -Raw
$regex = "(?s)<BEGINREPLACEMENT>.*?<ENDREPLACEMENT>"
$temp -replace $regex, $update.Replace('$', '$$')

How Do I change a string in a specific line contained in a file preserving all other lines?

I have a file that contains this information:
Type=OleDll
Reference=*\G{00020430-0000-0000-C000-000000000046}#2.0#0#..\..\..\..\..\..\..\Windows\SysWOW64\stdole2.tlb#OLE Automation
Reference=*\G{7C0FFAB0-CD84-11D0-949A-00A0C91110ED}#1.0#0#..\..\..\..\..\..\..\Windows\SysWOW64\msdatsrc.tlb#Microsoft Data Source Interfaces for ActiveX Data Binding Type Library
Reference=*\G{26C4A893-1B44-4616-8684-8AC2FA6B0610}#1.0#0#..\..\..\..\..\..\..\Windows\SysWow64\Conexion_NF.dll#Zeus Data Access Library 1.0 (NF)
Reference=*\G{9668818B-3228-49FD-A809-8229CC8AA40F}#1.0#0#..\packages\ZeusMaestrosContabilidad.19.3.0\lib\native\ZeusMaestrosContabilidad190300.dll#Zeus Maestros Contables Des (Contabilidad)
I need to change the data between {} characters on line 5 using powershell and save the change preserving all other information in the file.
You can use the -replace operator to perform a regex match and string replacement.
If there is only one pair of {} per line, you can do the following where .*? matches any non-newline character as few as possible. Since by default Get-Content creates an object that is an array of lines, you can access each line by index with [4] being line 5.
$content = Get-Content File.txt
$content[4] = $content[4] -replace '{.*?}','{new data}'
$content | Set-Content File.txt
If there could be multiple {} pairs per line, you will need to be more specific with your regex. A positive lookbehind assertion (?<=) will do.
$content = Get-Content File.txt
$content[4] = $content[4] -replace '(?<=Reference=\*\\G){.*?}','{newest data}'
$content | Set-Content File.txt
For the case when you don't know which line contains the data you want to replace, you will need to be more specific about the data you are replacing.
Get-Content File.txt -replace '{9668818B-3228-49FD-A809-8229CC8AA40F}','{New Data}' | Set-Content
If there are an encoding requirements, consider using the -Encoding parameter on the Get-Content and Set-Content commands.
Try Regex: (?<=(?:.*\n){4}Reference=\*\\G\{)[\w-]+
Demo
If the content of the {} is always the same you can do this:
(Get-Content $yourfile) -replace $regex, ('{9668818B-3228-49FD-A809-8229CC8AA40F}') | Set-Content $newValue;
One solution :
$Content=Get-Content "C:\temp\test.txt"
$Row5Splited=$Content[4].Split("{}".ToCharArray())
$Content[4]="{0}{1}{2}" -f $Row5Splited[0], "{YOURNEWVALUE}", $Row5Splited[2]
$Content | Out-File "C:\temp\test2.txt"
One approach would be to find,
(.*Reference=\*\\G{)[^\r\n}]+
and replace with,
$1any_thing_you_like_to_replace_with
RegEx Circuit
jex.im visualizes regular expressions:
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

Replace text between two string powershell

I have a question which im pretty much stuck on..
I have a file called xml_data.txt and another file called entry.txt
I want to replace everything between <core:topics> and </core:topics>
I have written the below script
$test = Get-Content -Path ./xml_data.txt
$newtest = Get-Content -Path ./entry.txt
$pattern = "<core:topics>(.*?)</core:topics>"
$result0 = [regex]::match($test, $pattern).Groups[1].Value
$result1 = [regex]::match($newtest, $pattern).Groups[1].Value
$test -replace $result0, $result1
When I run the script it outputs onto the console it doesnt look like it made any change.
Can someone please help me out
Note: Typo error fixed
There are three main issues here:
You read the file line by line, but the blocks of texts are multiline strings
Your regex does not match newlines as . does not match a newline by default
Also, the literal regex pattern must when replacing with a dynamic replacement pattern, you must always dollar-escape the $ symbol. Or use simple string .Replace.
So, you need to
Read the whole file in to a single variable, $test = Get-Content -Path ./xml_data.txt -Raw
Use the $pattern = "(?s)<core:topics>(.*?)</core:topics>" regex (it can be enhanced in case it works too slow by unrolling it to <core:topics>([^<]*(?:<(?!</?core:topics>).*)*)</core:topics>)
Use $test -replace [regex]::Escape($result0), $result1.Replace('$', '$$') to "protect" $ chars in the replacement, or $test.Replace($result0, $result1).

Powershell: Pull URL out of String

I am pulling a string from a text file that looks like:
C:\Users\users\Documents\Firefox\tools\Install.ps1:37: Url = "https://somewebsite.com"
I need to some how remove everything except the URL, so it should look like:
https://www.somewebsite.com
Here is what I have tried:
$Urlselect = Select-String -Path "$zipPath\tools\chocolateyInstall.ps1" -Pattern "url","Url"-List # Selects URL download path
$Urlselect = $Urlselect -replace ".*" ","" -replace ""*.","" # remove everything but the download link
but this didn't seam to do anything. I am thinking that its going to have to do with regex but I am not sure how to put it. Any help is appreciated. Thanks
I suggest using the switch statement with the -Regex and -File options:
$url = switch -regex -file "$zipPath\tools\chocolateyInstall.ps1" {
' Url = "(.*?)"' { $Matches[1]; break }
}
-file makes switch loop over all lines of the specified file.
-regex interprets the branch conditionals as regular expressions, and the automatic $Matches variable can be used in the associated script block ({ ... }) to access the results of the match, notably, what the 1st (and only) capture group in the regex ((...)) captured - the URL of interest.
break stops processing once the 1st match is found. (To continue matching, use continue).
If you do want to use Select-String:
$url = Select-String -List ' Url = "(.*?)"' "$zipPath\tools\chocolateyInstall.ps1" |
ForEach-Object { $_.Matches.Groups[1].Value }
Note that the switch solution will perform much better.
As for what you tried:
Select-String -Path "$zipPath\tools\chocolateyInstall.ps1" -Pattern "url","Url"
Select-String is case-insensitive by default, so there's no need to specify case variations of the same string. (Conversely, you must use the -CaseSensitive switch to force case-sensitive matching).
Also note that Select-String doesn't output the matching line directly, as a string, but as a match-information objects; to get the matching line, access the .Line property[1].
$Urlselect -replace ".*" ","" -replace ""*.",""
".*" " and ""*." result in syntax errors, because you forgot to escape the _embedded " as `".
Alternatively, use '...' (single-quoted literal strings), which allows you to embed " as-is and is generally preferable for regexes and replacement operands, because there's no confusion over what parts PowerShell may interpret up front (string expansion).
Even with the escaping problem solved, however, your -replace operations wouldn't have worked, because .*" matches greedily and therefore up to the last "; here's a corrected solution with non-greedy matching, and with the replacement operand omitted (which makes it default to the empty string):
PS> 'C:\...ps1:37: Url = "https://somewebsite.com"' -replace '^.*?"' -replace '"$'
https://somewebsite.com
^.*?" non-greedily replaces everything up to the first ".
"$ replaces a " at the end of the string.
However, you can do it with a single -replace operation, using the same regex as with the switch solution at the top:
PS> 'C:\...ps1:37: Url = "https://somewebsite.com"' -replace '^.*?"(.*?)"', '$1'
https://somewebsite.com
$1 in the replacement operand refers to what the 1st capture group ((...)) captured, i.e. the bare URL; for more information, see this answer.
[1] Note that there's a green-lit feature suggestion - not yet implemented as of Windows PowerShell Core 6.2.0 - to allow Select-String to emit strings directly, using the proposed -Raw switch - see https://github.com/PowerShell/PowerShell/issues/7713

Regex in Powershell fails to check for newlines

I'm trying to get the first block of releasenotes...
(See sample content in the code)
Whenever I use something simple it works, it only breaks when I try to
search across multiple lines (\n). I'm using (Get-Content $changelog | Out-String) because that gives back 1 string instead of an array from each line.
$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
(Get-Content $changelog | Out-String) | Select-String -Pattern $regex -AllMatches
<#
SAMPLE:
------
v1.0.23
- Adds an IContainer API.
- Bugfixes.
v1.0.22
- Hotfix: Language operators.
v1.0.21
- Support duplicate query parameters.
v1.0.20
- Splitting up the ICommand interface.
- Fixing the referrer header empty field value.
#>
The result I need is:
v1.0.23
- Adds an IContainer API.
- Bugfixes.
Update:
Using options..
$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
Get-Content -Path $changelog -Raw | Select-String -Pattern $regex -AllMatches
I also get nothing.. (no matter if I use \n or \r\n)
Unless you're stuck with PowerShell v2, it's simpler and more efficient to use Get-Content -Raw to read an entire file as a single string; besides, Out-String adds an extra newline to the string.[1]
Since you're only looking for the first match, you can use the -match operator - no need for Select-String's -AllMatches switch.
Note: While you could use Select-String without it, it is more efficient to use the -match operator, given that you've read the entire file into memory already.
Regex matching is by default always case-insensitive in PowerShell, consistent with PowerShell's overall case-insensitivity.
Thus, the following returns the first block, if any:
if ((Get-Content -Raw $changelog) -match '(?m)^v\d+\.\d+\.\d+.*(\r?\n-\s?.*)+') {
# Match found - output it.
$Matches[0]
}
* (?m) turns on inline regex option m (multi-line), which causes anchors ^ and $ to match the beginning and end of individual lines rather than the overall string's.
\r?\n matches both CRLF and LF-only newlines.
You could make the regex slightly more efficient by making the (...) subexpression non-capturing, given that you're not interested in what it captured: (?:...).
Note that -match itself returns a Boolean (with a scalar LHS), but information about the match is recorded in the automatic $Matches hashtable variables, whose 0 entry contains the overall match.
As for what you tried:
'([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
doesn't work, because by default $ only matches at the very end of the input string, at the end of the last line (though possibly before a final newline).
To make $ to match the end of each line, you'd have to turn on the multiline regex option (which you did in your 2nd attempt).
As a result, nothing matches.
'(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
doesn't work as intended, because by using option s (single-line) you've made . match newlines too, so that a greedy subexpression such as .* will match the remainder of the string, across lines.
As a result, everything from the first block on matches.
[1] This problematic behavior is discussed in GitHub issue #14444.