Program-Name Detection - regex

this is how the lines look like:
//| Vegas.c |
and I would like to get the name, here Vegas.c
This works in PS' regex:
$found = $body -match '.+?\s+(\w.+?\.c[\+]+)[\s\|]+'
But what if the name does not start with a-zA-Z0-9 (=\w) but e.g. ~ or other none-word-chars?
The first char of the name must be different from a blank so I tried:
$found = $body -match '.+?\s+(\S+.+?\.c[\+]+)[\s\|]+'
$found = $body -match '.+?\s+([^\ ]+.+?\.c[\+]+)[\s\|]+'
$found = $body -match '.+?\s+([^\s]+.+?\.c[\+]+)[\s\|]+'
None of them work even some more work. In most of the cases this detects only the whole line!
Any ideas?

How about this?
\/\/\| *([^ ]*)
\/ matches the character /
\/ matches the character /
\| matches the character |
 * matches 0 to many of the character
round brackets ( ) are the first capture group
[^ ] captures all the characters that are ^(not) a space (so long as all your file names do not contain spaces this should work)

I think you made your question more basic then you needed from what I see in your comments but I have this which worked with your test string.
$string = #"
//| Vegas.c |
"#
Just look for data inbetween the pipes and whitespace the pipes border. Not sure how it will perform with you real data but should work if spaces are in the program names.
[void]($string -match "\|\s+(.+)\s+\|")
$Matches[1]
Vegas.c
You could also used named matches in PowerShell
[void]($string -match "\|\s+(?<Program>.+)\s+\|")
$Matches.Program
Vegas.c

Related

search for certian words in a string

I´m struggling in a simple report script.
for example
$report.FullFormattedMessage = "This is the deployment test for Servername for Stackoverflow in datacenter onTV"
$report.FullFormattedMessage.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
Now I want to pick some certians words out like...
$dc = should be the 'onTV'
$srv = should be the 'Servername'
$srv = $report.FullFormattedMessage.contains... or -match ?? something like this?
The trick with .split()[] is not working for me because the $report looks different some times. How could I do that?
oh ok I found a solution, I dont know if this is best... but I show you:
$lines = $report.FullFormattedMessage.split()
$dc= ForEach ($line in $lines){
$line | Where-Object {$_ -match "onTV*"}
}
$srv= ForEach ($line in $lines){
$line | Where-Object {$_ -match "Server*"}
}
I'd probably do something like
$report.FullFormattedMessage = "This is the deployment test for Servername for Stackoverflow in datacenter onTV"
$dc = if ($report.FullFormattedMessage -match '\b(onTV)\b') { $Matches[1] } # see details 1
$srv = if ($report.FullFormattedMessage -match '\b(Server[\w]*)') { $Matches[1] } # see details 2
# $dc --> "onTV"
# $srv --> "Servername"
Regex details 1
\b Assert position at a word boundary
( Match the regular expression below and capture its match into backreference number 1
onTV Match the characters “onTV” literally
)
\b Assert position at a word boundary
Regex details 2
\b Assert position at a word boundary
( Match the regular expression below and capture its match into backreference number 1
Server Match the characters “Server” literally
[\w] Match a single character that is a “word character” (letters, digits, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)

Need a regex where two different substrings must not be included in a string

I have the following strings:
$string = #(
'Get-WindowsDevel'
'Put-WindowsDevel'
'Get-LinuxDevel'
'Put-LinuxDevel'
)
Now I need one regex with the following two rules:
$string must not start with "Get-"
$string must not contain "Linux"
This exclude the "Get-" at the beginning:
PS C:\> $string | Where-Object { $_ -match "^(?!Get-).*" }
Put-WindowsDevel Put-LinuxDevel
I would expect that the following command does not match "Put-LinuxDevel" but it does:
PS C:\> $string | Where-Object { $_ -match "^(?!Get-).*(?!Linux)" }
Put-WindowsDevel Put-LinuxDevel
So, what I need is a regex that is valid for this string only:
Put-WindowsDevel
Use -notmatch (or, if case-sensitive matching is needed, -cnotmatch) - i.e., a negated match - in combination with alternation (|):
PS> $string -notmatch '(^Get-|Linux)'
Put-WindowsDevel
The -match operator and its variations (as well as many other operators) can directly act on arrays as the LHS, in which case the operator acts as a filter and returns only matching elements.
Using -match on an array is much faster than using the Where-Object cmdlet in a pipeline for filtering.
Regex (^Get-|Linux) matches either Get- at the start of the string (^) or (|) substring Linux anywhere in the string.
Therefore, this regex matches strings that you don't want, and by using the negated form of -match - -notmatch - you therefore exclude those strings, as desired.
If you really want to express your regex as a positive match:
PS> $string -match '^(?!Get)((?!Linux).)*$'
Put-WindowsDevel
Note, however, that not only is this regex much more complex, it will also perform worse (albeit only slightly).
As for what you tried:
The .*(?!Linux) part of your regex - involving a negative lookahead assertion ((?!...)) - is not effective at excluding strings that contain substring Linux; e.g.:
PS> 'Linux' -match '.*(?!Linux)'
True # !!
The reason is that .* matches the entire string and then looks ahead to see if Linux isn't there - which is obviously true at the end of the string.
To effectively rule out a substring, the assertion must be applied around each character of the entire string:
PS> '', 'inux', 'Linux', 'a Linux', 'aLinuxb' -match '^((?!Linux).)*$'
# '' (empty string) matched
inux # 'inux' matched
Note how 'Linux', 'a Linux', and 'aLinuxb' were correctly excluded.
this seems to do what you seek [grin] ...
$StringList = #(
'Get-WindowsDevel'
'Put-WindowsDevel'
'Get-LinuxDevel'
'Put-LinuxDevel'
)
$ExcludeList = #(
'^get'
'linux'
)
$RegexExcludeList = $ExcludeList -join '|'
$StringList -notmatch $RegexExcludeList
output ...
Put-WindowsDevel

Regular Expressions in powershell split

I need to strip out a UNC fqdn name down to just the name or IP depending on the input.
My examples would be
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
I want to end up with just tom or 123.43.234.23
I have the following code in my array which is striping out the domain name perfect, but Im still left with \\tom
-Split '\.(?!\d)')[0]
Your regex succeeds in splitting off the tokens of interest in principle, but it doesn't account for the leading \\ in the input strings.
You can use regex alternation (|) to include the leading \\ at the start as an additional -split separator.
Given that matching a separator at the very start of the input creates an empty element with index 0, you then need to access index 1 to get the substring of interest.
In short: The regex passed to -split should be '^\\\\|\.(?!\d)' instead of '\.(?!\d)', and the index used to access the resulting array should be [1] instead of [0]:
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' |
ForEach-Object { ($_ -Split '^\\\\|\.(?!\d)')[1] }
The above yields:
tom
123.43.234.23
Alternatively, you could remove the leading \\ in a separate step, using -replace:
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' |
ForEach-Object { ($_ -Split '\.(?!\d)')[0] -replace '^\\\\' }
Yet another alternative is to use a single -replace operation, which does not require a ForEach-Object call (doesn't require explicit iteration):
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' -replace
'?(x) ^\\\\ (.+?) \.\D .+', '$1'
Inline option (?x) (IgnoreWhiteSpace) allows you to make regexes more readable with insignificant whitespace: any unescaped whitespace can be used for visual formatting.
^\\\\ matches the \\ (escaped with \) at the start (^) of each string.
(.+?) matches one or more characters lazily.
\.\D matches a literal . followed by something other than a digit (\d matches a digit, \D is the negation of that).
.+ matches one or more remaining characters, i.e., the rest of the input.
$1 as the replacement operand refers to what the 1st capture group ((...)) in the regex matched, and, given that the regex was designed to consume the entire string, replaces it with just that.
I'm stealing Lee_Daileys $InSTuff
but appending a RegEx I used recently
$InStuff = -split #'
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
'#
$InStuff |ForEach-Object {($_.Trim('\\') -split '\.(?!\d{1,3}(\.|$))')[0]}
Sample Output:
tom
123.43.234.23
As you can see here on RegEx101 the dots between the numbers are not matched
The Select-String function uses regex and populates a MatchInfo object with the matches (which can then be queried).
The regex "(\.?\d+)+|\w+" works for your particular example.
"\\tom.overflow.corp.com", "\\123.43.234.23.overflow.corp.com" |
Select-String "(\.?\d+)+|\w+" | % { $_.Matches.Value }
while this is NOT regex, it does work. [grin] i suspect that if you have a really large number of such items, then you will want a regex. they do tend to be faster than simple text operators.
this will get rid of the leading \\ and then replace the domain name with .
# fake reading in a text file
# in real life, use Get-Content
$InStuff = -split #'
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
'#
$DomainName = '.overflow.corp.com'
$InStuff.ForEach({
$_.TrimStart('\\').Replace($DomainName, '')
})
output ...
tom
123.43.234.23

Replace text after special character

I have string which should to be change from numbers to text in my case variable is:
$string = '18.3.0-31290741.41742-1'
I want to replace everything after '-' to be "-SNAPSHOT" and when perform echo $string to show information below. I tried with LastIndexOf(), Trim() and other things but seems not able to manage how to do it.
Expected result:
PS> echo $string
18.3.0-SNAPSHOT
Maybe that can be the light of the correct way, but when have two '-' is going to replace the last one not the first which can see:
$string = "18.3.0-31290741.41742-1" -replace '(.*)-(.*)', '$1-SNAPSHOT'
.* is a greedy match, meaning it will produce the longest matching (sub)string. In your case that would be everything up to the last hyphen. You need either a non-greedy match (.*?) or a pattern that won't match hyphens (^[^-]*).
Demonstration:
PS C:\> '18.3.0-31290741.41742-1' -replace '(^.*?)-.*', '$1-SNAPSHOT'
18.3.0-SNAPSHOT
PS C:\> '18.3.0-31290741.41742-1' -replace '(^[^-]*)-.*', '$1-SNAPSHOT'
18.3.0-SNAPSHOT
By using a positive lookbehind assertion ((?<=...)) you could eliminate the need for a capturing group and backreference:
PS C:\> "18.3.0-31290741.41742-1" -replace '(?<=^.*?-).*', 'SNAPSHOT'
18.3.0-SNAPSHOT
You could use Select-String and an regular expression to match the pattern, then pass the match to ForEach-Object (commonly shorthanded with alias %) to construct the final string:
$string = "18.3.0-31290741.41742-1" | Select-String -pattern ".*-.*-" | %{ "$($_.Matches.value)SNAPSHOT" }
$string

Regex for multiple app versions

Im trying to get list of versions from my custom attribute in powershell script. Atrribute looks like this:
[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]
And I end up with regex like this but it does'nt work at all:
'\(\"([^\",?]*)\"+\)'
You should do this as a two-step process: First you parse out the CompatibleVersions attribute, and then you split out those version numbers. Otherwise you will have difficulties finding the version numbers individually without likely finding otheer version-like numbers.
$s = '[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]'
$versions = ($s | Select-String -Pattern 'CompatibleVersions\(([^)]+)\)' | % { $_.Matches }).Groups[1].Value
$versions.Split(',') | % { $_.Trim('"') } | Write-Host
# 1.7.1.0
# 1.7.1.1
# 1.2.2.3
Start by grabbing the parentheses pair and everything inside:
$string = '[assembly: CompatibleVersions("1.7.1.0","1.7.1.1","1.2.2.3")]'
if($string -match '\(([^)]+)\)'){
# Remove the parentheses themselves, split by comma and then trim the "
$versionList = $Matches[0].Trim("()") -split ',' |ForEach-Object Trim '"'
}
You may use
$s | select-string -pattern "\d+(?:\.\d+)+" -AllMatches | Foreach {$_.Matches} | ForEach-Object {$_.Value}
The \d+(?:\.\d+)+ pattern will match:
\d+ - 1 or more digits
(?:\.\d+)+ - 1 or more sequences of a . and 1+ digits.
See the regex demo on RegexStorm.
'"([.\d]+)"' will match any substring composed of dots and digits (\d) and comprised into double quotes (")
Try it here
A number between .. can be 0, but cannot be 00, 01 or similar.
Pay attention to the starting [
This is a regex for the check:
^\[assembly: CompatibleVersions\("(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}"(?:,"(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}")*\)]$
Here is the regex with tests.
But if you are reading a list, you should use instead:
^\[assembly: CompatibleVersions\("((?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}"(?:,"(?:[1-9]\d*|0)(?:\.(?:[1-9]\d*|0)){3}")*)\)]$
By it you will extract the "...","..."... consequence from the inner parenthesis.
After that split the result string by '","' into a list and remove last " from the last element and the first " from the first element. Now you have list of correct versions Strings.
Alas, regex cannot create a list without split() function.