PowerShell - Search for email addresses and exclude some addresses - regex

Get-ChildItem 'D:\failed log' -Recurse |
Select-String -AllMatches '\w+#\w+\.\w+' |
Select-String -NotMatch '\w+#repoinfotec.\w+'|
Select-Object FileName
Explanantion
Above pipeline finds all email IDs excluding #repoinfotec.com
Problem
It should find all the email addresses excluding #repoinfotec.com and #bnymellon.com, so how to exclude 2?
Should I put them in for loop or something like that?

You don't need the Get-ChildItem cmdlet here, just pass the -Path to the Select-String cmdlet. To exclude two domain names, use the regex or |:
select-string -Path 'your_file' -Pattern '\w+#\w+\.\w+' |
where Line -NotMatch '\w+#(repoinfotec|bnymellon).\w+'

Related

Regex Powershell shows too much

I am new to powershell. I am trying to automate my work a bit and need simple extraction of following pattern from all filetypes:
([0-9A-Z]{2,4}.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4})
Example:
*lots of text*
X-xdaemon-transaction-id: string=9971.0A67341C.6147B834.0043,ee=3,shh,rec=0.0,recu=0.0,reid=0.0,cu=3,cld=1
X-xdaemon-transaction-id: string=AA71.0A67341C.6147B442.0043,ee=3,shh,rec=0.0,recu=0.0,reip=0.0,cu=3,cld=1
*lots of text*
Unfortunately, I am receiving output like this:
1mAAAA-0005nG-TN-H:220:
X-xdaemon-transaction-id: string=AA71.0A67341C.6147B442.0043,ee=3,shh,rec=0.0,recu=0.0,reip=0.0,cu=3,cld=1
my 'code' is as following:
Select-String -Path C:\Samples\* -Pattern "(0001.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4})" -CaseSensitive
And I'd like to receive only the patterns: AA71.0A67341C.6147B442.0043 without anything added
Thanks for any help!
You can use
$rx = '\b[0-9A-Z]{2,4}\.[0-9A-Z]{8}\.[0-9A-Z]{8}\.[0-9A-Z]{4}\b'
Select-String -AllMatches -Pattern $rx -Path 'C:\Samples\*' -CaseSensitive | % { $_.matches.value }
That is,
Add word boundaries to match your expected strings as whole words and escape the literal . chars
Use -AllMatches (to get multiple matches per line if any) and access each resulting object match value with $_.matches.value.
PS test:
PS C:\Users\admin> $B = Select-String -AllMatches -Pattern '\b[0-9A-Z]{2,4}\.[0-9A-Z]{8}\.[0-9A-Z]{8}\.[0-9A-Z]{4}\b' -Path 'C:\Samples\*' -CaseSensitive | % { $_.matches.value }
PS C:\Users\admin> $B
9971.0A67341C.6147B834.0043
AA71.0A67341C.6147B442.0043
PS C:\Users\admin>
try:
$find = Get-ChildItem *.txt | Select-String -Pattern '\b[0-9A-Z]{2,4}.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4}\b' -CaseSensitive
$find.Matches.Value

Search pattern in directory and extract string from files using PowerShell

I have almost 400 .sql files where i need to search for a specific pattern and output the results.
e.g
*file1.sql
select * from mydb.ops1_tbl from something1 <other n lines>
*file2.sql
select * from mydb.ops2_tbl from something2 <other n lines>
*file3.sql
select * from mydb.ops3_tbl ,mydb.ops4_tbl where a = b <other n lines>
Expected result
file1.sql mydb.ops1_tbl
file2.sql mydb.ops2_tbl
file3.sql mydb.ops3_tbl mydb.ops4_tbl
Below script in powershell - able to fetch the filename
Get-ChildItem -Recurse -Filter *.sql|Select-String -pattern "mydb."|group path|select name
Below script in powershell - able to fetch the line
Get-ChildItem -Recurse -Filter *.sql | Select-String -pattern "mydb." |select line
I need in the above format, someone has any pointers regarding this?
you need to escape the dot in a RegEx to match a literal dot with a backslash \.
to get all matches on a line use the parameter -AllMatches
you need a better RegEx to match the mydb string upto the next space
iterate the Select-string results with a ForEach-Object
A one liner:
Get-ChildItem -Recurse -Filter *.sql|Select-String -pattern "mydb\.[^ ]+" -Allmatches|%{$_.path+" "+($_.Matches|%{$_.value})}
broken up
Get-ChildItem -Recurse -Filter *.sql|
Select-String -Pattern "mydb\.[^ ]+" -Allmatches | ForEach-Object{
$_.path+" "+($_.Matches|ForEach-Object{$_.value})
}
Sample output:
Q:\Test\2019\01\24\file1.sql mydb.ops1_tbl
Q:\Test\2019\01\24\file2.sql mydb.ops2_tbl
Q:\Test\2019\01\24\file3.sql mydb.ops3_tbl mydb.ops4_tbl
If you don't want the full path (despite you are recursing) like your Expected result,
replace $_.path with (Split-Path $_.path -Leaf)
First, fetch the result of your file query into an array, then iterate over it and extract the file contents using regex matching:
$files = Get-ChildItem -Recurse -Filter *.sql|Select-String -pattern "mydb."|group path|select name
foreach ($file in $files)
{
$str = Get-Content -Path $file.Name
$matches = ($str | select-string -pattern "mydb\.\w+" -AllMatches).Matches.Value
[console]::writeline("{0:C} {1:C}", $file.Name, [string]::Join(' ', $matches) )
}
I used the .NET WriteLine function to output the result for demonstration purpose only.

match multi-line string

I am using a PowerShell command to find all *.vue files (it's a simple text format) in a directory, where I need to match this:
7,Id
6,Default
So, these are 2 consecutive lines. With Notepad++ I see CRLF at the end of the line. Following Google searches, this must be close:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse |
Select-String -Pattern "7,Id\r\n6,Default" -CaseSensitive |
Out-File C:\test.txt
But it does not find the files. I checked that I can find the first part (7,Id) correctly, and also the second part (6,Default), but the combination with the newline is not working.
Any ideas please? Maybe an alternative?
I can have a workaround but it's inefficient and a lot of coding. For example, I could use PowerShell to provide a list of only the first sentence, then process these files to see if it matches the second sentence as well. I want to avoid that.
You need to pass the content of the file as a single string, otherwise Select-String will apply the pattern to each line separately.
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse | ForEach-Object {
Get-Content $_.FullName | Out-String |
Select-String -Pattern "7,Id\r\n6,Default" -CaseSensitive |
Select-Object -Expand Matches |
Select-Object -Expand Groups |
Select-Object -Expand Value
} | Out-File C:\test.txt
On PowerShell v3 and newer you can use Get-Content -Raw instead of Get-Content | Out-String.
As an alternative to Select-String you could use the -cmatch operator in a Where-Object filter:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse | ForEach-Object {
Get-Content $_.FullName | Out-String | Where-Object {
$_ -cmatch "7,Id\r\n6,Default"
} | ForEach-Object {
$matches[0]
}
} | Out-File C:\test.txt
With Select-String, the -Pattern parameter is regex capable, so try this:
Get-ChildItem "D:\Wim\TM1\TI processes" -Filter *.vue -Recurse |
Select-String -Pattern "7,Id|6,Default" -CaseSensitive |
Out-File C:\test.txt
The vertical pipe bar (|) acts as an alternative separator, or in otherwords, an "or" operator. With the pattern it will match either.

Powershell to find specific pattern

I am trying to extract only my JIRA issue numbers from a text file, eliminating duplicates. This is good in Shell script:
cat /tmp/jira.txt | grep -oE '^[A-Z]+-[0-9]+' | sort -u
But I want to use Powershell and tried this
$Jira_Num=Get-Content /tmp/jira.txt | Select-String -Pattern '^[A-Z]+-[0-9]+' > "$outputDir\numbers.txt"
But, this returns the entire line also not eliminating the duplicates. I tried regex but I am new to powershell don't know how exactly to use it. Can someone help please.
Sample Jira.txt file
PRJ-2303 Modified the artifactName
PRJ-2303 Modified comment
JIRA-1034 changed url to tag the prj projects
JIRA-1000 for release 1.1
JIRA-1000 Content modification
Expected output
PRJ-2303
JIRA-1034
JIRA-1000
Should work with something like this:
$Jira_Num = Get-Content /tmp/jira.txt | ForEach-Object {
if ($_ -match '^([A-Z]+-[0-9]+)') {
$Matches[1]
}
} | Select-Object -Unique
Get-Content reads a file line by line, so we can pipe it to other cmdlets to process each line.
ForEach-Object runs a command block for each item in the pipeline. So here we're using the -match operator to perform a regex match against the line, with a capturing group. If the match succeeds, we send the matched group (the JIRA issue key) down the pipeline.
Select-Object -Unique will compare the objects and return only the unique ones.
Select-String can still work! The problem comes from the misconception of the return object. It returns a [Microsoft.PowerShell.Commands.MatchInfo] and it would appear it ToString() equivalent is the whole matching line. I don't know what version of PowerShell you have but this should do the trick.
$Jira_Num = Get-Content /tmp/jira.txt |
Select-String -Pattern '^[A-Z]+-[0-9]+' |
Select-Object -ExpandProperty Matches |
Select-Object -ExpandProperty Value -Unique
Also you can get odd results when you are writing to an output stream and a variable at the same time. It is generally better to use Tee-Object in cases like that.
Select-String /tmp/jira.txt -Pattern '^[A-Z]+-[0-9]+' |
Select-Object -ExpandProperty Matches |
Select-Object -ExpandProperty Value -Unique |
Tee-Object -Variable Jira_Num |
Set-Content "$outputDir\numbers.txt"
Now the file $outputDir\numbers.txt and the variable $Jira_Num contain the unique list. The $ not used with Tee-Object was done on purpose.

PowerShell RegEx with Select-String doesn't return any rows

Using this reg-ex tester: http://myregextester.com/index.php
Indicates my regex should work:
Regex:
{name:"(\w*?)", rank:([\d]+)},
Sample Data to capture:
{name:"AARON", rank:77},
{name:"ABBEY", rank:1583},
Here's the powershell script I'm attempting to run , to parse json-like data into a powershell grid.
$regex = '{name:"(\w*?)", rank:([\d]+)},'
(Select-String -Path EmailDomains.as -Pattern $regex -AllMatches).matches |foreach {
$obj = New-Object psobject
$obj |Add-Member -MemberType NoteProperty -Name Rank -Value $_.groups[1].value
$obj |Add-Member -MemberType NoteProperty -Name Name -Value $_.groups[0].value
$obj
} |Out-GridView -Title "Test"
The reg-ex never seems to return values (I'm guessing its a MS regex versus Perl regex mixup, but I can't identify), so I'm not sure what the issue could be. Any help is appreciated!
The question mark often has different functionality in different environments (in this one, I think it means "match the preceding character 0 or 1 times"). I doubt that it is the same as Perl's. Instead of
"(\w*?)"
Try:
"([^"]*)"
Your expression:
(Select-String -Path EmailDomains.as -Pattern $regex -AllMatches)
returns an array of MatchInfo objects. The array itself does not have a Matches property.
What you have to do is expand the Matches property using the Slect-Object commandlet, then pass that along your pipeline:
Select-String -Path EmailDomains.as -Pattern $regex -AllMatches | select-object -expand Matches | foreach {
I don't think your regex is the problem. Matches is a property on each of the objects returned by Select-Object, not on the collection of objects returned.
$regex = '{name:"(\w*?)", rank:([\d]+)},'
$matches = (Select-String -Path .\a.txt -Pattern $regex)
$matches | Select -ExpandProperty Matches | Select #{n="Name";e={$_.Groups[1].Value}}, #{n="Rank";e={$_.Groups[2].Value}}