Problem
I am working on an automated version control script in powershell and I hit a snag where I am trying pull a line of text from an AssemblyInfo.cs file. However, I can't seem to get it to work as expected despite all my efforts.
After many trial and errors to achieve the results intended, I have come to find the following to get close to what I'm trying to achieve:
# Constants
$Assembly = Get-Item "C:\generalfilepath\*\Assembly.cs"
$regex = '(?<!\/\/ \[assembly: AssemblyVersion\(")(?<=\[assembly: AssemblyVersion\(")[^"]*'
$CurrentVersion = GC $Assembly
$CurrentVersion = $CurrentVersion -match $regex
Write-Host "CurrentVersion = $CurrentVersion
I am expecting to see: CurrentVersion = 1.0.0.0 or something similar, but what I get is: CurrentVersion = [assembly: AssemblyVersion("1.0.0.0")]
I have been searching all over for examples on how to properly utilize regex with PowerShell, but I have not really found anything that contains more than the regex itself... As such, I was hoping someone here could assist me in either steering me in the right direction or point out what I am doing wrong.
Question
Despite the use of the regex filter, I'm getting the entire line instead of just the value I want. How do I ensure the variable stores ONLY the target value using the lookbehind and lookahead regex filters?
You can try:
C:\> ([string] (Get-Content \AssemblyInfo.cs)) -match 'AssemblyVersion\("([0-9]+(\.([0-9]+|\*)){1,3}){1}"\)'
True
Afterwards the result shall be store in $Matches[1]
C:\> $Matches[1]
6.589.0.123
Be aware to cast the results of Get-Content to [string] otherwise $Matches will be $null. See this answer for a complete description.
If you want to use a more "greedy" regex like ^\[assembly: AssemblyVersion\("([0-9]+(\.([0-9]+|\*)){1,3}){1}"\) which also includes a check (^\[assembly:) that the string starts with [assembly you've to walk through the [string[]] array Get-Content returns:
> gc .\AssemblyInfo.cs | % { $_ -match '^\[assembly: AssemblyVersion\("([0-9]+(\.([0-9]+|\*)){1,3}){1}"\)' }
> $Matches[1]
6.589.0.*
Hope that helps.
Related
I have a text file and the contents can be:
debug --configuration "Release" \p corebuild
Or:
-c "Dev" debug
And now I have to validate the file to see if it has any pattern that matches --configuration or -c and print the string next to it
Pattern 1 - It should be Release
Pattern 2 - It should be Dev
How to achieve this in single command?
I tried below , but not sure how to extract only the release in the text , I only tried to see 1 pattern at a time
PS Z:\> $text = Get-Content 'your_file_path' -raw
PS Z:\> $Regex = [Regex]::new("(?<=\-\-configuration)(.*)")
PS Z:\> $Match = $Regex.Match($text)
PS Z:\> $Match.Value
**Release /p net**
Any help would be appreciated
If I understand correctly and you only care about extracting the argument to the parameters and not which parameter was used, this might do the trick:
$content = Get-Content 'your_file_path' -Raw
$re = [regex] '(?i)(?<=(?:--configuration|-c)\s")[^"]+'
$re.Matches($content).Value
See https://regex101.com/r/d2th35/3 for details.
From feedback in comments --configuration and -c can appear together, hence Regex.Matches is needed to find all occurrences.
To complement Santiago's helpful answer with a PowerShell-only alternative:
Assuming that a matching line only ever contains --configuration OR -c, you can avoid the need for .NET API calls with the help of the -match operator, which outputs a Boolean ($true or $false) to indicate whether the input string matches, and also reports the match it captures in the automatic $Matches variable:
# Note: Omitting -Raw makes Get-Content read the file *line by line*.
Get-Content 'your_file_path' |
ForEach-Object { # Look for a match on each line
# Look for the pattern of interest and capture the
# substring of interest in a capture group - (...) -
# which is later reflected in $Matches by its positional index, 1.
if ($_ -match '(?:--configuration|-c) "(.*?)"') { $Matches[1] }
}
Note:
-match only every looks for one match per input string, and only populates $Matches if the input is a single string (if it is an array of strings, -match acts as a filter and returns the subarray of matching elements).
GitHub issue #7867 proposes introducing -matchall operator that looks for all matches in the input string.
See this regex101.com page for an explanation of the regex.
Basically, I have a .bas file that I am looking to update. Basically the script requires some manual configuration and I don't want my team to need to reconfigure the script every time they run it. What I would like to do is have a tag like this
<BEGINREPLACEMENT>
'MsgBox ("Loaded")
ReDim Preserve STIGArray(i - 1)
ReDim Preserve SVID(i - 1)
STIGArray = RemoveDupes(STIGArray)
SVID = RemoveDupes(SVID)
<ENDREPLACEMENT>
I am kind of familiar with powershell so what I was trying to do is to do is create an update file and to replace what is in between the tags with the update. What I was trying to do is:
$temp = Get-Content C:\Temp\file.bas
$update = Get-Content C:\Temp\update
$regex = "<BEGINREPLACEMENT>(.*?)<ENDREPLACEMENT>"
$temp -replace $regex, $update
$temp | Out-File C:\Temp\file.bas
The issue is that it isn't replacing the block of text. I can get it to replace either or but I can't get it to pull in everything in between.
Does anyone have any thoughts as to how I can do this?
You need to make sure you read the whole files in with newlines, which is possible with the -Raw option passed to Get-Content.
Then, . does not match a newline char by default, hence you need to use a (?s) inline DOTALL (or "singleline") option.
Also, if your dynamic content contains something like $2 you may get an exception since this is a backreference to Group 2 that is missing from your pattern. You need to process the replacement string by doubling each $ in it.
$temp = Get-Content C:\Temp\file.bas -Raw
$update = Get-Content C:\Temp\update -Raw
$regex = "(?s)<BEGINREPLACEMENT>.*?<ENDREPLACEMENT>"
$temp -replace $regex, $update.Replace('$', '$$')
I have a question which im pretty much stuck on..
I have a file called xml_data.txt and another file called entry.txt
I want to replace everything between <core:topics> and </core:topics>
I have written the below script
$test = Get-Content -Path ./xml_data.txt
$newtest = Get-Content -Path ./entry.txt
$pattern = "<core:topics>(.*?)</core:topics>"
$result0 = [regex]::match($test, $pattern).Groups[1].Value
$result1 = [regex]::match($newtest, $pattern).Groups[1].Value
$test -replace $result0, $result1
When I run the script it outputs onto the console it doesnt look like it made any change.
Can someone please help me out
Note: Typo error fixed
There are three main issues here:
You read the file line by line, but the blocks of texts are multiline strings
Your regex does not match newlines as . does not match a newline by default
Also, the literal regex pattern must when replacing with a dynamic replacement pattern, you must always dollar-escape the $ symbol. Or use simple string .Replace.
So, you need to
Read the whole file in to a single variable, $test = Get-Content -Path ./xml_data.txt -Raw
Use the $pattern = "(?s)<core:topics>(.*?)</core:topics>" regex (it can be enhanced in case it works too slow by unrolling it to <core:topics>([^<]*(?:<(?!</?core:topics>).*)*)</core:topics>)
Use $test -replace [regex]::Escape($result0), $result1.Replace('$', '$$') to "protect" $ chars in the replacement, or $test.Replace($result0, $result1).
I'm new to powershell, and there seem to be a few differences in the way regex are handled. Currently iterating through a large number of txt files and want the start of each one of them (which is a URL) up to the | character.
The start of every file is a url ending in a slash. This was my umpteenth attempt with no luck:
$FirstUrl = '.*/\|$'
Pushed through a For-Each loop from which every other piece of information i'm trying to grab is coming out as expected:
Foreach-Object {
$FileContent = Get-Content $_.FullName
$Pos = Select-String -InputObject $FileContent -Pattern $FirstURL
Any tips on how to phrase the regex right in the $FirstURL. I'm generally 'ok' at regex and have googled my face off trying to find the proper documentation for powershell.
If each file is having the URL in the first line and after that there is a Pipe, then you do not need to use a regex in this case. You can directly split that like:
$FileContent = Get-Content $_.FullName
$FileContent.Split('|')[0]
Split actually splits the result data into an array. Then the first array element will be in the '0th' index and you can take it out.
Hope it helps.
I have a script that goes through HTTP access log, filters out some lines based on a regex patern and copies them into another file:
param($workingdate=(get-date).ToString("yyMMdd"))
Get-Content "access-$workingdate.log" |
Select-string -pattern $pattern |
Add-Content "D:\webStatistics\log\filtered-$workingdate.log"
My logs can be quite large (up to 2GB), which takes up to 15 minutes to run. Is there anything I can to do improve the performance of the statement above?
Thank you for your thoughts!
See if this isn't faster than your current solution:
param($workingdate=(get-date).ToString("yyMMdd"))
Get-Content "access-$workingdate.log" -ReadCount 2000 |
foreach { $_ -match $pattern |
Add-Content "D:\webStatistics\log\filtered-$workingdate.log"
}
You don't show your patterns, but I suspect they are a large part of the problem.
You will want to look for a new question here (I am sure it has been asked) or elsewhere for detailed advice on building fast regular expression patterns.
But I find the best advice is to anchor your patterns and avoid runs of unknown length of all characters.
So instead of a pattern like path/.*/.*\.js use one with a $ on the end to anchor it to the end of the string. That way the regex engine can tell immediately that index.html is not a match. Otherwise it has to do some rather complicated scans with path/ and .js possibly showing up anywhere in the string. This example of course assumes the file name is at the end of the log line.
Anchors work well with start of line patterns as well. A pattern might look like ^[^"]*"GET /myfile" That has a unknown run length but at least it knows that it doesn't have to restart the search for more quotes after finding the first one. The [^"] character class allows the regex engine to stop because the pattern can't match after the first quote.
You could also try seeing if using streams would speed it up. Something like this might help, although I couldn't test it because, as mentioned above, I'm not sure what patter you are using.
param($workingdate=(get-date).ToString("yyMMdd"))
$file = New-Object System.IO.StreamReader -Arg "access-$workingdate.log"
$stream = New-Object System.IO.StreamWriter -Arg "D:\webStatistics\log\filtered-$workingdate.log"
while ($line = $file.ReadLine()) {
if($line -match $pattern){
$stream.WriteLine($line)
}
}
$file.close()
$stream.Close()