I have almost 400 .sql files where i need to search for a specific pattern and output the results.
e.g
*file1.sql
select * from mydb.ops1_tbl from something1 <other n lines>
*file2.sql
select * from mydb.ops2_tbl from something2 <other n lines>
*file3.sql
select * from mydb.ops3_tbl ,mydb.ops4_tbl where a = b <other n lines>
Expected result
file1.sql mydb.ops1_tbl
file2.sql mydb.ops2_tbl
file3.sql mydb.ops3_tbl mydb.ops4_tbl
Below script in powershell - able to fetch the filename
Get-ChildItem -Recurse -Filter *.sql|Select-String -pattern "mydb."|group path|select name
Below script in powershell - able to fetch the line
Get-ChildItem -Recurse -Filter *.sql | Select-String -pattern "mydb." |select line
I need in the above format, someone has any pointers regarding this?
you need to escape the dot in a RegEx to match a literal dot with a backslash \.
to get all matches on a line use the parameter -AllMatches
you need a better RegEx to match the mydb string upto the next space
iterate the Select-string results with a ForEach-Object
A one liner:
Get-ChildItem -Recurse -Filter *.sql|Select-String -pattern "mydb\.[^ ]+" -Allmatches|%{$_.path+" "+($_.Matches|%{$_.value})}
broken up
Get-ChildItem -Recurse -Filter *.sql|
Select-String -Pattern "mydb\.[^ ]+" -Allmatches | ForEach-Object{
$_.path+" "+($_.Matches|ForEach-Object{$_.value})
}
Sample output:
Q:\Test\2019\01\24\file1.sql mydb.ops1_tbl
Q:\Test\2019\01\24\file2.sql mydb.ops2_tbl
Q:\Test\2019\01\24\file3.sql mydb.ops3_tbl mydb.ops4_tbl
If you don't want the full path (despite you are recursing) like your Expected result,
replace $_.path with (Split-Path $_.path -Leaf)
First, fetch the result of your file query into an array, then iterate over it and extract the file contents using regex matching:
$files = Get-ChildItem -Recurse -Filter *.sql|Select-String -pattern "mydb."|group path|select name
foreach ($file in $files)
{
$str = Get-Content -Path $file.Name
$matches = ($str | select-string -pattern "mydb\.\w+" -AllMatches).Matches.Value
[console]::writeline("{0:C} {1:C}", $file.Name, [string]::Join(' ', $matches) )
}
I used the .NET WriteLine function to output the result for demonstration purpose only.
Related
I am new to powershell. I am trying to automate my work a bit and need simple extraction of following pattern from all filetypes:
([0-9A-Z]{2,4}.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4})
Example:
*lots of text*
X-xdaemon-transaction-id: string=9971.0A67341C.6147B834.0043,ee=3,shh,rec=0.0,recu=0.0,reid=0.0,cu=3,cld=1
X-xdaemon-transaction-id: string=AA71.0A67341C.6147B442.0043,ee=3,shh,rec=0.0,recu=0.0,reip=0.0,cu=3,cld=1
*lots of text*
Unfortunately, I am receiving output like this:
1mAAAA-0005nG-TN-H:220:
X-xdaemon-transaction-id: string=AA71.0A67341C.6147B442.0043,ee=3,shh,rec=0.0,recu=0.0,reip=0.0,cu=3,cld=1
my 'code' is as following:
Select-String -Path C:\Samples\* -Pattern "(0001.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4})" -CaseSensitive
And I'd like to receive only the patterns: AA71.0A67341C.6147B442.0043 without anything added
Thanks for any help!
You can use
$rx = '\b[0-9A-Z]{2,4}\.[0-9A-Z]{8}\.[0-9A-Z]{8}\.[0-9A-Z]{4}\b'
Select-String -AllMatches -Pattern $rx -Path 'C:\Samples\*' -CaseSensitive | % { $_.matches.value }
That is,
Add word boundaries to match your expected strings as whole words and escape the literal . chars
Use -AllMatches (to get multiple matches per line if any) and access each resulting object match value with $_.matches.value.
PS test:
PS C:\Users\admin> $B = Select-String -AllMatches -Pattern '\b[0-9A-Z]{2,4}\.[0-9A-Z]{8}\.[0-9A-Z]{8}\.[0-9A-Z]{4}\b' -Path 'C:\Samples\*' -CaseSensitive | % { $_.matches.value }
PS C:\Users\admin> $B
9971.0A67341C.6147B834.0043
AA71.0A67341C.6147B442.0043
PS C:\Users\admin>
try:
$find = Get-ChildItem *.txt | Select-String -Pattern '\b[0-9A-Z]{2,4}.[0-9A-Z]{8}.[0-9A-Z]{8}.[0-9A-Z]{4}\b' -CaseSensitive
$find.Matches.Value
Get-ChildItem 'D:\failed log' -Recurse |
Select-String -AllMatches '\w+#\w+\.\w+' |
Select-String -NotMatch '\w+#repoinfotec.\w+'|
Select-Object FileName
Explanantion
Above pipeline finds all email IDs excluding #repoinfotec.com
Problem
It should find all the email addresses excluding #repoinfotec.com and #bnymellon.com, so how to exclude 2?
Should I put them in for loop or something like that?
You don't need the Get-ChildItem cmdlet here, just pass the -Path to the Select-String cmdlet. To exclude two domain names, use the regex or |:
select-string -Path 'your_file' -Pattern '\w+#\w+\.\w+' |
where Line -NotMatch '\w+#(repoinfotec|bnymellon).\w+'
I want to search through files in a folder and find the following strings in each file and I want to output it to a file. I would like to find a combination of 2 strings in the files no matter how it is written in the file. I should be able to find these combination of strings even if a carriage return exists in the middle of these 2 strings.
Here's the code I have so far:
$Path = "C:\Promotion\Scripts"
$txt_string1 = "CREATE"
$txt_string2 = "PROC"
$PathArray = #()
$Results = "C:\Promotion\Errors\Deployment_Errors.txt"
# This code snippet gets all the files in $Path that end in ".sql".
Get-ChildItem $Path -Filter "*.sql" |
Where-Object { $_.Attributes -ne "Directory"} |
ForEach-Object {
If (Get-Content $_.FullName | Select-String -Pattern $txt_string2) {
$PathArray += $_.FullName
}
}
$PathArray | ForEach-Object {$_} | Out-File $Results
for find more than one string in txt file You should Use like this method
"hello","guy","hello guy" | Select-String -Pattern '(hello.*guy)|(guy.*hello)'
the result :
hello guy
after you find strings you want out-file
like that:
"hello","guy","hello guy" | Select-String -Pattern '(hello.*guy)|(guy.*hello)' | Out-File -FilePath c:\test.txt
now we see in test.txt
PS C:\> Get-Content test.txt
hello guy
You can do this without loops. Define the combinations of your two search terms as alternatives in a regular expression with multiline support enabled ((?ms)).
$basepath = 'C:\Promotion\Scripts'
$results = 'C:\Promotion\Errors\Deployment_Errors.txt'
$term1 = 'CREATE'
$term2 = 'PROC'
$pattern = "(?ms)($term1.*$term2|$term2.*$term1)"
Get-ChildItem "$basepath\*.sql" |
? { Get-Content $_.FullName -Raw | Select-String -Pattern $pattern } |
select -Unique -Expand FullName |
Out-File $results
Note that this will report any file that contains both terms anywhere in it, no matter what other text is between them. If you want to find only files that contain combinations of the two terms either not separated (CREATEPROC or PROCCREATE) or separated nothing but whitespace, change the pattern to this:
$pattern = "(?ms)($term1\s*$term2|$term2\s*$term1)"
Depending on your search terms it may also be a good idea to escape them before building the regular expression, so that you don't get unwanted meta characters (not likely with the two string literals you have, but just to be on the safe side):
$term1 = [regex]::Escape('CREATE')
$term2 = [regex]::Escape('PROC')
I need to search file names in a directory for position based characters. I am looking for files with parenthesis within parenthesis.
like this:
# 2262281102-03_Cutting_Plate_Lower_Stop_(Anschlag_Cutting_Frame_(Schnittgestell)_unten)_400kN
GET-CHILDITEM C:\BU\p -recurse | WHERE-OBJECT {$_.nAME -MATCH "(?!)((?!)((!?))(!?))(!?)"}
I also need to match any file with 4+ letters and no parenthesis. ie:
# 2277131504-03_Haltebolzen_platte
GET-CHILDITEM C:\BU\p -EXCLUDE "*)*" -recurse | WHERE-OBJECT {$_.nAME -MATCH "\W\.[^\W]"}
I've got this:
$tests = #(
'2262281102-03_Cutting_Plate_Lower_Stop_(Anschlag_Cutting_Frame_(Schnittgestell)_unten)_400kN',
'2277131504-03_Haltebolzen_platte'
)
$regex = '^.*\(.*\(.*\).*\).*$|^[^()]*[a-z]{4}[^()]*$'
$tests -match $regex
2262281102-03_Cutting_Plate_Lower_Stop_(Anschlag_Cutting_Frame_(Schnittgestell)_unten)_400kN
2277131504-03_Haltebolzen_platte
I have a list of regular expressions(about 2000) and over a million html files. I want to check if each regular expression success on every file or not. How to do this on powershell?
Performance is important, so I don't want to loop through regular expressions.
I try
$text | Select-String -Pattern pattern1, pattern2,...
And it returns all matches, but I also want to find out, which pattern success which one not. I need to build a list of success regular expressions for each file
You could try something like this:
$regex = "^test","e2$" #Or use (Get-Content <path to your regex file>)
$ht = #{}
#Modify Get-Childitem to your criterias(filter, path, recurse etc.)
Get-ChildItem -Filter *.txt | Select-String -Pattern $regex | ForEach-Object {
$ht[$_.Path] += #($_ | Select-Object -ExpandProperty Pattern)
}
Test-output:
$ht | Format-Table -AutoSize
Name Value
---- -----
C:\Users\graimer\Desktop\New Text Document (2).txt {e2$}
C:\Users\graimer\Desktop\New Text Document.txt {^test, e2$}
You didn't specify how you wanted the output.
UPDATE: To match multiple patterns on a single line, try this(mjolinor's answer is probably faster then this).
$regex = "^test","e2$" #Or use (Get-Content <path to your regex file>)
$ht = #{}
#Modify Get-Childitem to your criterias(filter, path, recurse etc.)
$regex | ForEach-Object {
$pattern = $_
Get-ChildItem -Filter *.txt | Select-String -Pattern $pattern | ForEach-Object {
$ht[$_.Path] += #($_ | Select-Object -ExpandProperty Pattern)
}
}
UPDATE2: I don't have enough samples to try it, but since you have such a huge amount of files, you migh want to try reading the file into memory before looping through the patterns. It may be faster.
$regex = "^test","e2$" #Or use (Get-Content <path to your regex file>)
$ht = #{}
#Modify Get-Childitem to your criterias(filter, path, recurse etc.)
Get-ChildItem -Filter *.txt | ForEach-Object {
$text = $_ | Get-Content
$filename = $_.FullName
$regex | ForEach-Object {
$text | Select-String -Pattern $_ | ForEach-Object {
$ht[$filename] += #($_ | Select-Object -ExpandProperty Pattern)
}
}
}
I don't see any way around doing a foreach through the regex collection.
This is the best I could come up with performance-wise:
$regexes = 'pattern1','pattern2'
$files = get-childitem -Path <file path> |
select -ExpandProperty fullname
$ht = #{}
foreach ($file in $files)
{
$ht[$file] = New-Object collections.arraylist
foreach ($regex in $regexes)
{
if (select-string $regex $file -Quiet)
{
[void]$ht[$file].add($regex)
}
}
}
$ht
You could speed up the process by using background jobs and dividing up the file collection among the jobs.