Regex Powershell shows too much - regex

I am new to powershell. I am trying to automate my work a bit and need simple extraction of following pattern from all filetypes:
([0-9A-Z]{2,4}.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4})
Example:
*lots of text*
X-xdaemon-transaction-id: string=9971.0A67341C.6147B834.0043,ee=3,shh,rec=0.0,recu=0.0,reid=0.0,cu=3,cld=1
X-xdaemon-transaction-id: string=AA71.0A67341C.6147B442.0043,ee=3,shh,rec=0.0,recu=0.0,reip=0.0,cu=3,cld=1
*lots of text*
Unfortunately, I am receiving output like this:
1mAAAA-0005nG-TN-H:220:
X-xdaemon-transaction-id: string=AA71.0A67341C.6147B442.0043,ee=3,shh,rec=0.0,recu=0.0,reip=0.0,cu=3,cld=1
my 'code' is as following:
Select-String -Path C:\Samples\* -Pattern "(0001.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4})" -CaseSensitive
And I'd like to receive only the patterns: AA71.0A67341C.6147B442.0043 without anything added
Thanks for any help!

You can use
$rx = '\b[0-9A-Z]{2,4}\.[0-9A-Z]{8}\.[0-9A-Z]{8}\.[0-9A-Z]{4}\b'
Select-String -AllMatches -Pattern $rx -Path 'C:\Samples\*' -CaseSensitive | % { $_.matches.value }
That is,
Add word boundaries to match your expected strings as whole words and escape the literal . chars
Use -AllMatches (to get multiple matches per line if any) and access each resulting object match value with $_.matches.value.
PS test:
PS C:\Users\admin> $B = Select-String -AllMatches -Pattern '\b[0-9A-Z]{2,4}\.[0-9A-Z]{8}\.[0-9A-Z]{8}\.[0-9A-Z]{4}\b' -Path 'C:\Samples\*' -CaseSensitive | % { $_.matches.value }
PS C:\Users\admin> $B
9971.0A67341C.6147B834.0043
AA71.0A67341C.6147B442.0043
PS C:\Users\admin>

try:
$find = Get-ChildItem *.txt | Select-String -Pattern '\b[0-9A-Z]{2,4}.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4}\b' -CaseSensitive
$find.Matches.Value

Related

Search pattern in directory and extract string from files using PowerShell

I have almost 400 .sql files where i need to search for a specific pattern and output the results.
e.g
*file1.sql
select * from mydb.ops1_tbl from something1 <other n lines>
*file2.sql
select * from mydb.ops2_tbl from something2 <other n lines>
*file3.sql
select * from mydb.ops3_tbl ,mydb.ops4_tbl where a = b <other n lines>
Expected result
file1.sql mydb.ops1_tbl
file2.sql mydb.ops2_tbl
file3.sql mydb.ops3_tbl mydb.ops4_tbl
Below script in powershell - able to fetch the filename
Get-ChildItem -Recurse -Filter *.sql|Select-String -pattern "mydb."|group path|select name
Below script in powershell - able to fetch the line
Get-ChildItem -Recurse -Filter *.sql | Select-String -pattern "mydb." |select line
I need in the above format, someone has any pointers regarding this?
you need to escape the dot in a RegEx to match a literal dot with a backslash \.
to get all matches on a line use the parameter -AllMatches
you need a better RegEx to match the mydb string upto the next space
iterate the Select-string results with a ForEach-Object
A one liner:
Get-ChildItem -Recurse -Filter *.sql|Select-String -pattern "mydb\.[^ ]+" -Allmatches|%{$_.path+" "+($_.Matches|%{$_.value})}
broken up
Get-ChildItem -Recurse -Filter *.sql|
Select-String -Pattern "mydb\.[^ ]+" -Allmatches | ForEach-Object{
$_.path+" "+($_.Matches|ForEach-Object{$_.value})
}
Sample output:
Q:\Test\2019\01\24\file1.sql mydb.ops1_tbl
Q:\Test\2019\01\24\file2.sql mydb.ops2_tbl
Q:\Test\2019\01\24\file3.sql mydb.ops3_tbl mydb.ops4_tbl
If you don't want the full path (despite you are recursing) like your Expected result,
replace $_.path with (Split-Path $_.path -Leaf)
First, fetch the result of your file query into an array, then iterate over it and extract the file contents using regex matching:
$files = Get-ChildItem -Recurse -Filter *.sql|Select-String -pattern "mydb."|group path|select name
foreach ($file in $files)
{
$str = Get-Content -Path $file.Name
$matches = ($str | select-string -pattern "mydb\.\w+" -AllMatches).Matches.Value
[console]::writeline("{0:C} {1:C}", $file.Name, [string]::Join(' ', $matches) )
}
I used the .NET WriteLine function to output the result for demonstration purpose only.

powershell regex Pattern

I get different results.
In Powershell using:
$Matches = Select-String -InputObject (Get-Content "StevenBlackhosts-urls.txt") `
-Pattern "(^|\.)ad[sxvkdz]\-" -AllMatches #`
$Matches.Matches.Count
I get 12 matches and this is incorrect.
In Notepad++, find and count
"(^|\.)ad[sxvkdz]\-"
I have 62 matches and this is correct.
I do not know what's wrong?
the txt "StevenBlackhosts-urls.txt" contains 65106 lines ...
zeus.ad.intl.xiaomi.com
api.ad.intl.xiaomi.com
sdkconfig.ad.intl.xiaomi.com
adv.sec.intl.miui.com
zeus.ad.xiaomi.com
www.api.ad.intl.xiaomi.com
ampmetrics.engadget.com
c.adskeeper.co.uk
events3.adcolony.com
metrics.adage.com
ads.feedly.com
lepodownload.mediatek.com
ads.aerserv.com
ads.mp.mydas.mobi
ads.nexage.com
sdk.adincube.com
dasdada.fu.ck
i1.dl-ad.com
ad.api.kaffnet.com
ad.click.kaffnet.com
api.ad.snappea.com
etc..
testing in this way if I get the same result to Notepaq++; Why does this happen ??
$Matches = Select-String -InputObject (Get-Content "StevenBlackhosts-urls.txt") -Pattern "( |\.)ad[sxvkdz]\-" -AllMatches
$Matches.Matches.Count
It also works well like this, giving me the 62 lines
Get-Content "StevenBlackhosts-urls.txt" | Select-String -Pattern "(^|\.)ad[sxvkdz]\-" -AllMatches | set-content "test.txt"
Get-Content "StevenBlackhosts-urls.txt" | Select-String -Pattern "(^|\.)ad[sxvkdz]\-" -AllMatches | Measure-Object –Line | Select-Object -ExpandProperty Lines
this works right for me ..
thank you all for your suggestions . .

How to recursively scrape email addresses from files with Powershell?

I try to scrape emailaddresses with Powershell from a directory, with subdirectories and within them .txt files. So i have this code:
$input_path = ‘C:\Users\Me\Documents\toscrape’
$output_file = ‘C:\Users\Me\Documents\toscrape\output.txt’
$regex = ‘\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b’
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file
But when I execute it, it gives me an error
select-string : The file C:\Users\Me\Documents\toscrape\ can not be read: Could not
path 'C:\Users\Me\Documents\toscrape\'.
At line:1 char:1
+ select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [Select-String], ArgumentException
+ FullyQualifiedErrorId : ProcessingFile,Microsoft.PowerShell.Commands.SelectStringCommand
I've tried variations to the $input_path, with Get-Item, Get-ChildItem, -Recurse, but nothing seems to work. Can anyone figure out how I need to scrape my location and all its subdirectories and files for the regex pattern?
The error is because Select-String assumes the -Path points to a file or is a wildcard pattern, and $input_path is pointing to a folder. You could use:
$input_path = 'C:\Users\Me\Documents\toscrape\*.txt'
Select-String $input_path ....
However, since you want to recurse through subdirectories, you'll need to use Get-ChildItem to do that.
$input_path = 'C:\Users\Me\Documents\toscrape'
$output_file = 'C:\Users\Me\Documents\toscrape\output.txt'
$regex = '\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b'
Get-ChildItem $input_path -Include *.txt -Recurse |
Select-String -Pattern $regex -AllMatches |
Select-Object -ExpandProperty Matches |
Select-Object -ExpandProperty Value |
Set-Content $output_file
Note that your regex may cause problems here. You're using \b for word boundary, but period ., hyphen -, and percent sign % are all non-word (\W) characters. The word characters (\w) are [A-Za-z0-9_].
For example:
PS C:\> '%username#example.com' -match '\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b'
True
PS C:\> $Matches.Values
username#example.com
If that's what you want the pattern to do, that's great, but it is something to be aware of. Regex for an email address is notoriously difficult.
Your correction didn't work but gave me another error, #Bacon Bits. However you put me on the right track. I adapted a bit and this seemed to work out for me.
$input_path = 'C:\Users\Me\Documents\toscrape'
$output_file = 'C:\Users\Me\Documents\toscrape\output.txt'
$regex = '\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b'
Get-ChildItem $input_path -Recurse | Select-String -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file

Print Powershell Regex captures to an output file

I have a file, input.txt, containing text like this:
GRP123456789
123456789012
GRP234567890
234567890123
GRP456789012
"A lot of text. More text. Blah blah blah: Foobar." (Source Error) (Blah blah blah)
GRP567890123
Source Error
GRP678901234
Source Error
GRP789012345
345678901234
456789012345
I'm attempting to capture all occurrences of "GRP#########" on the condition that at least one number is on the next line.
So GRP123456789 is valid, but GRP456789012 and GRP678901234 are not.
The RegEx pattern I came up with on http://regexstorm.net/tester is: (GRP[0-9]{9})\s\n\s+[0-9]
The PowerShell script I have so far, based off this site http://techtalk.gfi.com/windows-powershell-extracting-strings-using-regular-expressions/, is:
$input_path = 'C:\Users\rtaite\Desktop\input.txt'
$output_file = 'C:\Users\rtaite\Desktop\output.txt'
$regex = '(GRP[0-9]{9})\s\n\s+[0-9]'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Values } > $output_file
I'm not getting any output, and I'm not sure why.
Any help with this would be appreciated as I'm just trying to understand this better.
You need to turn the text input into a single string before passing it to Select-String, otherwise the cmdlet will operate on each line individually and thus never find a match.
Get-Content $input_path | Out-String |
Select-String $regex -AllMatches |
Select-Object -Expand Matches |
ForEach-Object { $_.Groups[1].Value } |
Set-Content $output_file
If you're using PowerShell v3 or newer you can replace Get-Content | Out-String with Get-Content -Raw.
To strip strings from a text file using a pattern, then the best tool for the job is the Select-String. This is also has a parameter called -Context which lets you capture lines before or after the matched line, ideal for just this problem.
So my solution would be something like this:
Select-String 'input.txt' -Pattern '^GRP[0-9]{9}' -Context 0, 1 | ? {
$_.Context.PostContext -match '\d'
} | Select -ExpandProperty line | Set-Content 'output_file.txt'
Using
[regex]::Matches($(Get-Content '.\Desktop\new 1.txt'), "GRP\d+(?=\s+\d)") |
% { $_.value | Out-File .\Desktop\new-1-matches.txt -Append }
I achieved the following output from your sample file:
GRP123456789
GRP234567890
GRP789012345

remove lines which start with *(asterik) in powershell select-string output

I am working on a code so that it find lines which has $control but should remove lines which start with * at first column
I am working with following but doesn't seem to work ..
$result = Get-Content $file.fullName | Select-String $control | Select-String -pattern "\^*" -notmatch
Thanks in advance
You're escaping the wrong character. You do not want to escape ^ as that's your anchor for "starting with". You'll want to escape the asterix, so try this:
$result = Get-Content $file.fullName | Select-String $control | select-string -pattern "^\*" -notmatch
Also, if all you want is the lines, you could also use this:
Get-Content $file.fullName | ? { $_ -match $control -and $_ -notmatch '^\*'}