Print function name in which a pattern of string occurs - regex

I am looking for a way to automate a manual task. I'm not sure if it's even possible.
I have to find a pattern of string in all files in a project folder. It'a project of C#/.net project(if that matters at all). I have to also write the function name and file name where the pattern occurs, along with the full string that matches it. So far I've done following in PowerShell:
PS C:\trunk> Get-ChildItem "C:\trunk” -recurse | Select-String -pattern
“AlertMessage” | group path | select name
This prints file name where string pattern matches.
PS C:\trunk> Select-String -pattern "AlertMessage" -path
"C:\trunk\VATScan.Web\Areas \Administration\Controllers\HomeController.cs”
This prints line number and string that matches it in a given file.
Any pointers on how I can acheive my goal?

By no means perfect but at least this my fall under the category of pointer
$text = #"
Public Sub Bitchin()
Dim AlertMe
End Sub
Private Sub Function() As something
End Function
"#
[void]($text -match "(?smi)((public|private)\W(sub|function)\W(.+?)\(.*?Alertme)")
$Matches[4]
This will look for a Function or Sub routine declaration with a single white space between words followed by the next occurrence of the word AlertMe
Need to get item 4 from $Matches since there are a bunch of capture groups.
A more concise explanation of the regex used can be found here
Hopefully this will get you started or at least thinking. I am not familiar with c# declarations as $text is more of a VBA example but your should get the idea.

Related

PowerShell to slip a text file on specific string

I am trying to split a large text file into several files based on a specific string. Every time I see the string ABCDE - 3 I want to cut and paste the content up to that string in a new text file. I also want to extract the last 4 of the social, last name and first name. The new text file needs be saved as first_name,last_name and last 4 of social.
See text file example and a bit of initial code. I would feel much more comfortbale doing it in Python but PowerShell is the only option.
$my_text = Get-Content .\ab.txt
$ssn_pattern = '([0-8]\d{2})-(\d{2})-(\d{4})'
ForEach ($file in my_text)
To get the firstname, lastname and the last 4 digits of the social, you could make use of capturing groups and use those groups when assembling the filename.
From your pattern, only the last 4 digits should be grouped.
You could use a pattern to start the match with TO: and from the next line get the values for the names and the number.
Then match all lines the do not start with ABCDE - 3 using a negative lookahead (?!
You can adjust the pattern and the code to match your exact text.
(?m)^[^\S\r\n]+TO:.*\r?\n\s*ATTN:\s*[A-Z]{3} ([^,\r\n]+),[^\S\r\n]*(.+?)[^\S\r\n]*[0-8]\d{2}-\d{2}-(\d{4})(?:\r?\n(?![^\S\r\n]+ABCDE - 3).*)*\r?\n[^\S\r\n]+ABCDE - 3.*
Regex demo
I constructed a code snippet using stackoverflow postings, so this might be improved. It basically comes down to load a raw string and get all the matches.
Then loop over all the matches and get the groups to assemble a filename an save the full match as the content.
If there are names which contain spaces and you don't want those to be in the filename, you could replace those with an empty string.
Example code:
$my_text = Get-Content -Raw ./Documents/stack-overflow/powershell/ab.txt
$pattern = "(?m)^[^\S\r\n]+TO:.*\r?\n\s*ATTN:\s*[A-Z]{3} ([^,\r\n]+),[^\S\r\n]*(.+?)[^\S\r\n]*[0-8]\d{2}-\d{2}-(\d{4})(?:\r?\n(?![^\S\r\n]+ABCDE - 3).*)*\r?\n[^\S\r\n]+ABCDE - 3.*"
Select-String $pattern -input $my_text -AllMatches |
ForEach-Object { $_.Matches } |
ForEach-Object {
$fileName = -join ($_.groups[2].Value, $_.groups[1].Value, $_.groups[3].Value)
Write-Host $fileName
Set-Content -Path "your-path-here/$fileName.txt" -Value $_.Value
}
When I run this, I get 2 files with the content for each match:
MIOTTISAREMO2222.txt
MIOTTSANREMO1111.txt

Select-String: match a string only if it isn't preceded by a specific character

I have a list of files that contain either of the two strings:
"stuff" or ";stuff"
I'm trying to write a PowerShell Script that will return only the files that contain "stuff". The script below currently returns all the files because obviously "stuff" is a substring of ";stuff"
For the life of me, I cannot figure out how to only matches file that contain "stuff", without a preceding ;
Get-Content "C:\temp\list\list.txt" |
Where-Object { Select-String -Quiet -Pattern "stuff" -SimpleMatch $_ }
Note: C:\temp\list\list.txt contains a list of file paths that are each passed to Select-String.
Thanks for the help.
You cannot perform the desired matching with literal substring searches (-SimpleMatch).
Instead, use a regex with a negative look-behind assertion ((?<!..)) to rule out stuff substrings preceded by a ; char.: (?<!;)stuff
Applied to your command:
Get-Content "C:\temp\list\list.txt" |
Where-Object { Select-String -Quiet -Pattern '(?<!;)stuff' -LiteralPath $_ }
Regex pitfalls:
It is tempting to use [^;]stuff instead, using a negated (^) character set ([...]) (see this answer); however, this will not work as expected if stuff appears at the very start of a line, because a character set - whether negated or not - only matches an actual character, not the start-of-the-line position.
It is then tempting to apply ? to the negated character set (for an optional match - 0 or 1 occurrence): [^;]?stuff. However, that would match a string containing ;stuff again, given that stuff is technically preceded by a "0-repeat occurrence" of the negated character set; thus, ';stuff' -match '[^;]?stuff' yields $true.
Only a look-behind assertion works properly in this case - see regular-expressions.info.
To complement #mklement0's answer, I suggest an alternative approach to make your code easier to read and understand:
#requires -Version 4
#(Get-Content -Path 'C:\Temp\list\list.txt').
ForEach([IO.FileInfo]).
Where({ $PSItem | Select-String -Pattern '(?<!;)stuff' -Quiet })
This will turn your strings into objects (System.IO.FilePath) and utilizes the array functions ForEach and Where for brevity/conciseness. Further, this allows you to pipe the paths as objects which will be accepted by the -Path parameter into Select-String to make it more understandable (I find long lists of parameter sets difficult to read).
The example code posted won't actually run, as it will look at each line as the -Path value.
What you need is to get the content, select the string you're after, then filter the results with Where-Object
Get-Content "C:\temp\list\list.txt" | Select-String -Pattern "stuff" | Where-Object {$_ -notmatch ";stuff"}
You could create a more complex regex if needed, but depends on what your result data from your files looks like

Regex in Powershell fails to check for newlines

I'm trying to get the first block of releasenotes...
(See sample content in the code)
Whenever I use something simple it works, it only breaks when I try to
search across multiple lines (\n). I'm using (Get-Content $changelog | Out-String) because that gives back 1 string instead of an array from each line.
$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
(Get-Content $changelog | Out-String) | Select-String -Pattern $regex -AllMatches
<#
SAMPLE:
------
v1.0.23
- Adds an IContainer API.
- Bugfixes.
v1.0.22
- Hotfix: Language operators.
v1.0.21
- Support duplicate query parameters.
v1.0.20
- Splitting up the ICommand interface.
- Fixing the referrer header empty field value.
#>
The result I need is:
v1.0.23
- Adds an IContainer API.
- Bugfixes.
Update:
Using options..
$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
Get-Content -Path $changelog -Raw | Select-String -Pattern $regex -AllMatches
I also get nothing.. (no matter if I use \n or \r\n)
Unless you're stuck with PowerShell v2, it's simpler and more efficient to use Get-Content -Raw to read an entire file as a single string; besides, Out-String adds an extra newline to the string.[1]
Since you're only looking for the first match, you can use the -match operator - no need for Select-String's -AllMatches switch.
Note: While you could use Select-String without it, it is more efficient to use the -match operator, given that you've read the entire file into memory already.
Regex matching is by default always case-insensitive in PowerShell, consistent with PowerShell's overall case-insensitivity.
Thus, the following returns the first block, if any:
if ((Get-Content -Raw $changelog) -match '(?m)^v\d+\.\d+\.\d+.*(\r?\n-\s?.*)+') {
# Match found - output it.
$Matches[0]
}
* (?m) turns on inline regex option m (multi-line), which causes anchors ^ and $ to match the beginning and end of individual lines rather than the overall string's.
\r?\n matches both CRLF and LF-only newlines.
You could make the regex slightly more efficient by making the (...) subexpression non-capturing, given that you're not interested in what it captured: (?:...).
Note that -match itself returns a Boolean (with a scalar LHS), but information about the match is recorded in the automatic $Matches hashtable variables, whose 0 entry contains the overall match.
As for what you tried:
'([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
doesn't work, because by default $ only matches at the very end of the input string, at the end of the last line (though possibly before a final newline).
To make $ to match the end of each line, you'd have to turn on the multiline regex option (which you did in your 2nd attempt).
As a result, nothing matches.
'(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
doesn't work as intended, because by using option s (single-line) you've made . match newlines too, so that a greedy subexpression such as .* will match the remainder of the string, across lines.
As a result, everything from the first block on matches.
[1] This problematic behavior is discussed in GitHub issue #14444.

Powershell regex match first string ending in pipe character

I'm new to powershell, and there seem to be a few differences in the way regex are handled. Currently iterating through a large number of txt files and want the start of each one of them (which is a URL) up to the | character.
The start of every file is a url ending in a slash. This was my umpteenth attempt with no luck:
$FirstUrl = '.*/\|$'
Pushed through a For-Each loop from which every other piece of information i'm trying to grab is coming out as expected:
Foreach-Object {
$FileContent = Get-Content $_.FullName
$Pos = Select-String -InputObject $FileContent -Pattern $FirstURL
Any tips on how to phrase the regex right in the $FirstURL. I'm generally 'ok' at regex and have googled my face off trying to find the proper documentation for powershell.
If each file is having the URL in the first line and after that there is a Pipe, then you do not need to use a regex in this case. You can directly split that like:
$FileContent = Get-Content $_.FullName
$FileContent.Split('|')[0]
Split actually splits the result data into an array. Then the first array element will be in the '0th' index and you can take it out.
Hope it helps.

Improving performance on PowerShell filtering statement

I have a script that goes through HTTP access log, filters out some lines based on a regex patern and copies them into another file:
param($workingdate=(get-date).ToString("yyMMdd"))
Get-Content "access-$workingdate.log" |
Select-string -pattern $pattern |
Add-Content "D:\webStatistics\log\filtered-$workingdate.log"
My logs can be quite large (up to 2GB), which takes up to 15 minutes to run. Is there anything I can to do improve the performance of the statement above?
Thank you for your thoughts!
See if this isn't faster than your current solution:
param($workingdate=(get-date).ToString("yyMMdd"))
Get-Content "access-$workingdate.log" -ReadCount 2000 |
foreach { $_ -match $pattern |
Add-Content "D:\webStatistics\log\filtered-$workingdate.log"
}
You don't show your patterns, but I suspect they are a large part of the problem.
You will want to look for a new question here (I am sure it has been asked) or elsewhere for detailed advice on building fast regular expression patterns.
But I find the best advice is to anchor your patterns and avoid runs of unknown length of all characters.
So instead of a pattern like path/.*/.*\.js use one with a $ on the end to anchor it to the end of the string. That way the regex engine can tell immediately that index.html is not a match. Otherwise it has to do some rather complicated scans with path/ and .js possibly showing up anywhere in the string. This example of course assumes the file name is at the end of the log line.
Anchors work well with start of line patterns as well. A pattern might look like ^[^"]*"GET /myfile" That has a unknown run length but at least it knows that it doesn't have to restart the search for more quotes after finding the first one. The [^"] character class allows the regex engine to stop because the pattern can't match after the first quote.
You could also try seeing if using streams would speed it up. Something like this might help, although I couldn't test it because, as mentioned above, I'm not sure what patter you are using.
param($workingdate=(get-date).ToString("yyMMdd"))
$file = New-Object System.IO.StreamReader -Arg "access-$workingdate.log"
$stream = New-Object System.IO.StreamWriter -Arg "D:\webStatistics\log\filtered-$workingdate.log"
while ($line = $file.ReadLine()) {
if($line -match $pattern){
$stream.WriteLine($line)
}
}
$file.close()
$stream.Close()