search for certian words in a string - regex

I´m struggling in a simple report script.
for example
$report.FullFormattedMessage = "This is the deployment test for Servername for Stackoverflow in datacenter onTV"
$report.FullFormattedMessage.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
Now I want to pick some certians words out like...
$dc = should be the 'onTV'
$srv = should be the 'Servername'
$srv = $report.FullFormattedMessage.contains... or -match ?? something like this?
The trick with .split()[] is not working for me because the $report looks different some times. How could I do that?

oh ok I found a solution, I dont know if this is best... but I show you:
$lines = $report.FullFormattedMessage.split()
$dc= ForEach ($line in $lines){
$line | Where-Object {$_ -match "onTV*"}
}
$srv= ForEach ($line in $lines){
$line | Where-Object {$_ -match "Server*"}
}

I'd probably do something like
$report.FullFormattedMessage = "This is the deployment test for Servername for Stackoverflow in datacenter onTV"
$dc = if ($report.FullFormattedMessage -match '\b(onTV)\b') { $Matches[1] } # see details 1
$srv = if ($report.FullFormattedMessage -match '\b(Server[\w]*)') { $Matches[1] } # see details 2
# $dc --> "onTV"
# $srv --> "Servername"
Regex details 1
\b Assert position at a word boundary
( Match the regular expression below and capture its match into backreference number 1
onTV Match the characters “onTV” literally
)
\b Assert position at a word boundary
Regex details 2
\b Assert position at a word boundary
( Match the regular expression below and capture its match into backreference number 1
Server Match the characters “Server” literally
[\w] Match a single character that is a “word character” (letters, digits, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)

Related

Select-String pattern finds only partial string match of -cmatch

I am trying to put together a string replacement routine. I have got as far as isolating the substring matches for two strings stored in array of strings $lines. Except there is a problem:
[string[]]$lines = "160 FROG Kermit 164 Big Bird_Road, Wellsville Singer","161 PIGGY Miss Pretty 1640 Really Long Main_Road, Whathellville Prima Donna"
# match string from last number to comma
foreach ($line in $lines) {
if ($line -cmatch '\d\s\w[a-z]*\s.*,') {
Write-Host "Found match!"
$line | Select-String -Pattern '\d\s\w[a-z]*\s.' -AllMatches |
ForEach-Object {
$x = $_.Matches[1].Value
Write-Host "x is:" $x
}
}
The first regex in $line -cmatch '\d\s\w[a-z]*\s.*,' is correct according to testing in Expresso. I want the address part of the string from last street number to comma. I am looking to replace the street basename spaces with underscores eg Big Bird_Road with Big_Bird_Road and Really Long Main_Road with Really_Long_Main_Road
The problem is that the second regex contained in: $line | Select-String -Pattern '\d\s\w[a-z]*\s.' -AllMatches |
Cannot be completed. As it is here. The output is:
Found match!
x is: 4 Big B
Found match!
x is: 0 Really L
The substring has not been captured yet! And if I add the remaining *, I get no output at all for x is:
Why doesn't the first regex (used with -cmatch) work in the same way when used as a Select-String pattern?
If you want to do a replace for that format in the strings, you can might use -replace and might use a patter to match the spaces only to replace them with an underscore:
(?<=\d\s+\w[a-zA-Z\s_]*)\s(?=[^\d,]*,)
Explanation
(?<= Positive lookbehind to assert what to the left is
\d\s+\w[a-zA-Z\s_]* Match a digit, 1+ whitespace chars, a word char and optionally repeat the listed characters in the character class
) Close the lookbehind
\s Match a whitespace char (or \s+ to match 1 or more)
(?=[^\d,]*,) Assert a comma to the right after matching optional chars other than a digit or comma
Regex demo
[string[]]$lines = "160 FROG Kermit 164 Big Bird_Road, Wellsville Singer","161 PIGGY Miss Pretty 1640 Really Long Main_Road, Whathellville Prima Donna"
foreach ($line in $lines) {
$line -replace "(?<=\d\s+\w[a-zA-Z\s_]*)\s(?=[^\d,]*,)","_"
}
Output
160 FROG Kermit 164 Big_Bird_Road, Wellsville Singer
161 PIGGY Miss Pretty 1640 Really_Long_Main_Road, Whathellville Prima Donna

Regex brings all lines. I want to make PS script to find all password pattern in File server

I am trying to make powershell scripts for search all documents and txt files to find Password pattern. Actually my code is working fine. My regex is capture password but it brings all lines.
Regex Pattern
""" .(?=.{8,20})(?=.[a-z])(?=.[A-Z])(?=.[0-9])(?=.[!^#+$%&{()=}?-|¨#.,:;,]).* """
Samples
Samples2
My regex bring all Line. How can fix this error.
PS Code
$Path = "c:\users\XXXXXXX"
$output_file = ‘C:\Users\XXXXXXXX\Desktop\Result.txt’
$ALLWORDS =Get-ChildItem $Path -Recurse -ErrorAction SilentlyContinue
foreach ($WORDS in $ALLWORDS) { #
$Word = New-Object -ComObject Word.Application
$Word.Visible = $false
$regex = ‘\b.*(?=.{8,20})(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!^#+$%&{()=}?*\-|¨#.,:;,]).*\b’
$Doc = $Word.Documents.Open($WORDS.FullName, $false, $true)
$paras = $doc.Paragraphs
echo $paras.count
foreach($para in $paras){
if( $para.range.text -match $regex) {
$1=$matches.Values,$Words.Name | Out-File $output_file -Append
Write-Host $matches.Values, $words.Name
}
}
$Doc.Close()
$Word.Quit()
$Word = $null
}
You can use
(?<!\S)(?=[^a-z\s]*[a-z])(?=[^A-Z\s]*[A-Z])(?=[^0-9\s]*[0-9])(?=\S*[!^#+$%&{()=}?*|¨#.,:;,-])\S{8,20}(?!\S)
See the regex demo. Note you need to use -cmatch operator instead of -match in PowerShell, or it will be case insensitive.
Details:
(?<!\S) - no whitespace on the left allowed
(?=[^a-z\s]*[a-z]) - after any zero or more chars other than whitespace and lowercase ASCII letters, there must be a lowercase ASCII letter
(?=[^A-Z\s]*[A-Z]) - after any zero or more chars other than whitespace and uppercase ASCII letters, there must be an uppercase ASCII letter
(?=[^0-9\s]*[0-9]) - after any zero or more chars other than whitespace and ASCII digits, there must be an ASCII digit
(?=\S*[!^#+$%&{()=}?*|¨#.,:;,-]) - after any zero or more non-whitespaces, there must be a symbol from [!^#+$%&{()=}?*|¨#.,:;,-] set (replace \S* with [^\s!^#+$%&{()=}?*|¨#.,:;,-]* for better performance, I just wanted to keep the pattern shorter here)
\S{8,20} - eight to twenty non-whitespace chars
(?!\S) - a right-hand whitespace boundary.

Regular expression to locate one string appearing anywhere after another but before someting

I have an EDI file. This is the piece in question:
N1*ST*TEST
N3*ADDRESS
N4*CITY*ST*POSTAL
PER*EM*TEST#GMAIL.COM
N1*BY*TEST
N3*ADDRESS
N4*CITY*ST*POSTAL
PER*EM*TEST2#GMAIL.COM
I am using powershell
Get-ChildItem 'C:\Temp\*.edi' | Where-Object {(Select-String -InputObject $_ -Pattern 'PER\*EM\*\w+#\w+\.\w+' -List)}
I want to find the email address that appears after the N1*ST, but before the N1*BY. I have the expression that works for an email address but I am stuck on how to only get the one value. The real issue is sometimes the email is there and other times it is not. So I really do want to ignore that second email after the N1*BY.
Thanks in advance for the help.
You can use
(?s)(?<=N1\*ST.*)PER\*EM\*\w+#\w+\.\w+(?=.*N1\*BY)
See the .NET regex demo.
Details
(?s) - a DOTALL (RegexOptions.Singleline in .NET) regex inline modifier making . match newline chars, too
(?<=N1\*ST.*) - a positive lookbehind that matches a location immediaely preceded with N1*ST
PER\*EM\* -a PER*EM* string
\w+#\w+ - 1+ word chars, #, and 1+ word chars
\. - a dot
\w+ - 1+ word chars
(?=.*N1\*BY) - a positive lookahead that matches a location immediaely followed with N1*BY literal string.
NOTE: You need to read in the file contents with Get-Content $filepath -Raw in order to find the proper match.
Something like
Get-ChildItem 'C:\Temp\*.edi' | % { Get-Content $_ -Raw | Select-String -Pattern '(?s)(?<=N1\*ST.*)PER\*EM\*\w+#\w+\.\w+(?=.*N1\*BY)' } | % { $_.Matches.value }

Remove formatting from US phone number and their extension number

HI need help get phone number and there extension using either replace or regex
phone
(123) 455-6789 --> 1234556789
(123) 577-2145 ext81245 --> 1235772145
extension
(123) 455-6789 -->
(123) 577-2145 ext81245 --> 81245
"(123) 455-6789" -replace "[()\s\s-]+|Ext\S+", ""
"(123) 455-6789 Ext 2445" -replace "[()\s\s-]+|Ext\S+", ""
This solves phone number but not extension.
You may try:
^\((\d{3})\)\s*(\d{3})-(\d{4})(?: ext(\d{5}))?$
Explanation of the above regex:
^, $ - Represents start and end of the line respectively.
\((\d{3})\) - Represents first capturing group matching the digits inside ().
\s* - Matches a white-space character zero or more times.
(\d{3})- - Represents second capturing group capturing exactly 3 digits followed by a -.
(\d{4}) - Represents third capturing group matching the digits exactly 4 times.
(?: ext(\d{5}))? -
(?: Represents a non capturing group
ext - Followed by a space and literal ext.
(\d{5}) - Represents digits exactly 5 times.
) - Closing of the non-captured group.
? - Represents the quantifier making the whole non-captured group optional.
You can find the sample demo of the above regex in here.
Powershell Commands:
PS C:\Path\To\MyDesktop> $input_path='C:\Path\To\MyDesktop\InputFile.txt'
PS C:\Path\To\MyDesktop> $output_path='C:\Path\To\MyDesktop\outFile.txt'
PS C:\Path\To\MyDesktop> $regex='^\((\d{3})\)\s*(\d{3})-(\d{4})(?: ext(\d{5}))?$'
PS C:\Path\To\MyDesktop> select-string -Path $input_path -Pattern $regex -AllMatches | % { "Phone Number: $($_.matches.groups[1])$($_.matches.groups[2])$($_.matches.groups[3]) Extension: $($_.matches.groups[4])" } > $output_path
Sample Result:
After you've replaced all characters, you could split the result to get two numbers
Applied to your example
#(
'(123) 455-6789'
, '(123) 577-2145 ext81245'
) | % {
$elements = $_ -replace '[()\s\s-]+' -split 'ext'
[PSCustomObject]#{
phone = $elements[0]
extension = $elements[1]
}
}
returns
phone extension
------ ---------
1234556789
1235772145 81245
Try out this pattern. It will match phone numbers with and without parentheses, spaces and hyphens.
((?:\(?)(\d{3})(?:\)?\s?)(\d{3})(?:-?)(\d{4}))
So alternatively, you could use two replace functions in a single go. Say your original data sits in File1.txt and you want to output to File2.txt then you could use:
$content = Get-Content -Path 'C:\File1.txt'
$newContent = $content -replace '[^\d\n]', '' -replace '^(.{10})(.*)', 'Phone: $1 Extention: $2'
$newContent | Set-Content -Path 'C:\File2.txt'

Program-Name Detection

this is how the lines look like:
//| Vegas.c |
and I would like to get the name, here Vegas.c
This works in PS' regex:
$found = $body -match '.+?\s+(\w.+?\.c[\+]+)[\s\|]+'
But what if the name does not start with a-zA-Z0-9 (=\w) but e.g. ~ or other none-word-chars?
The first char of the name must be different from a blank so I tried:
$found = $body -match '.+?\s+(\S+.+?\.c[\+]+)[\s\|]+'
$found = $body -match '.+?\s+([^\ ]+.+?\.c[\+]+)[\s\|]+'
$found = $body -match '.+?\s+([^\s]+.+?\.c[\+]+)[\s\|]+'
None of them work even some more work. In most of the cases this detects only the whole line!
Any ideas?
How about this?
\/\/\| *([^ ]*)
\/ matches the character /
\/ matches the character /
\| matches the character |
 * matches 0 to many of the character
round brackets ( ) are the first capture group
[^ ] captures all the characters that are ^(not) a space (so long as all your file names do not contain spaces this should work)
I think you made your question more basic then you needed from what I see in your comments but I have this which worked with your test string.
$string = #"
//| Vegas.c |
"#
Just look for data inbetween the pipes and whitespace the pipes border. Not sure how it will perform with you real data but should work if spaces are in the program names.
[void]($string -match "\|\s+(.+)\s+\|")
$Matches[1]
Vegas.c
You could also used named matches in PowerShell
[void]($string -match "\|\s+(?<Program>.+)\s+\|")
$Matches.Program
Vegas.c