String split in windows powershell - regex

Can you please help me to get the desired output, where SIT is the environment and type of file is properties, i need to remove the environment and the extension of the string.
#$string="<ENV>.<can have multiple period>.properties
*$string ="SIT.com.local.test.stack.properties"
$b=$string.split('.')
$b[0].Substring(1)*
Required output : com.local.test.stack //can have multiple period

This should do.
$string = "SIT.com.local.test.stack.properties"
# capture anything up to the first period, and in between first and last period
if($string -match '^(.+?)\.(.+)\.properties$') {
$environment = $Matches[1]
$properties = $Matches[2]
# ...
}

You may use
$string -replace '^[^.]+\.|\.[^.]+$'
This will remove the first 1+ chars other than a dot and then a dot, and the last dot followed with any 1+ non-dot chars.
See the regex demo and the regex graph:
Details
^ - start of string
[^.]+ - 1+ chars other than .
\. - a dot
| - or
\. - a dot
[^.]+ - 1+ chars other than .
$ - end of string.

You can use -match to capture your desired output using regex
$string ="SIT.com.local.test.stack.properties"
$string -match "^.*?\.(.+)\.[^.]+$"
$Matches.1

You can do this with the Split operator also.
($string -split "\.",2)[1]
Explanation:
You split on the literal . character with regex \.. The ,2 syntax tells PowerShell to return 2 substrings after the split. The [1] index selects the second element of the returned array. [0] is the first substring (SIT in this case).

Related

Exclude a substring after a pattern is matched using regex

I want to write a regex that splits a string such as only few elements are selected. For example:
M:\Shares\Profiles\Server\Profiles\abcd.contoso.V2.01
the result I am aiming for is:
abcd.V2.01, so that the domain name i.e. 'contoso' is dropped
However, I am unable to exclude a part of the string after a match is found. I tried
$original = 'M:\Shares\Profiles\Server\Profiles\abcd.contoso.V2.01'
$modified = $original -replace '.*\\([^\\.]+.contoso.V2)[^\\]*$', '$1'
that returns
$modified as 'abcd.contoso.V2'
You can use two capturing groups:
$original = 'M:\Shares\Profiles\Server\Profiles\abcd.contoso.V2.01'
$original -replace '.*\\([^\\.]*)\.contoso(\.V2[^\\]*)$', '$1$2'
# => abcd.V2.01
Do not forget to escape literal dots in the regex pattern. Here is a demo of the above regex. Details:
.* - any zero or more chars other than LF chars
\\ - a \ char
([^\\.]*) - Group 1 ($1): any zero or more chars other than \ and .
\.contoso - a .contoso string
(\.V2[^\\]*) - Group 2 ($2): .V2 string and then any zero or more chars other than \
$ - end of string.

Bash regex matching "0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."

In a Bash script I'm writing, I need to capture the /path/to/my/file.c and 93 in this line:
0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93).
0xffffffc0006e0584 is in another_function(char *arg1, int arg2) (/path/to/my/other_file.c:94).
With the help of regex101.com, I've managed to create this Perl regex:
^(?:\S+\s){1,5}\((\S+):(\d+)\)
but I hear that Bash doesn't understand \d or ?:, so I came up with this:
^([:alpha:]+[:space:]){1,5}\(([:alpha:]+):([0-9]+)\)
But when I try it out:
line1="0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."
regex="^([:alpha:]+[:space:]){1,5}\(([:alpha:]+):([0-9]+)\)"
[[ $line1 =~ $regex ]]
echo ${BASH_REMATCH[0]}
I don't get any match. What am I doing wrong? How can I write a Bash-compatible regex to do this?
You are right, Bash uses POSIX ERE and does not support \d shorthand character class, nor does it support non-capturing groups. See more regex features unsupported in POSIX ERE/BRE in this post.
Use
.*\((.+):([0-9]+)\)
Or even (if you need to grab the first (...) substring in a string):
\(([^()]+):([0-9]+)\)
Details
.* - any 0+ chars, as many as possible (may be omitted, only necessary if there are other (...) substrings and you only need to grab the last one)
\( - a ( char
(.+) - Group 1 (${BASH_REMATCH[1]}): any 1+ chars as many as possible
: - a colon
([0-9]+) - Group 2 (${BASH_REMATCH[2]}): 1+ digits
\) - a ) char.
See the Bash demo (or this one):
test='0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93).'
reg='.*\((.+):([0-9]+)\)'
# reg='\(([^()]+):([0-9]+)\)' # This also works for the current scenario
if [[ $test =~ $reg ]]; then
echo ${BASH_REMATCH[1]};
echo ${BASH_REMATCH[2]};
fi
Output:
/path/to/my/file.c
93
In the first pattern you use \S+ which matches a non whitespace char. That is a broad match and will also match for example / which is not taken into account in the second pattern.
The pattern starts with [:alpha:] but the first char is a 0. You could use [:alnum:] instead. Since the repetition should also match _ that could be added as well.
Note that when using a quantifier for a capturing group, the group captures the last value of the iteration. So when using {1,5} you use that quantifier only for the repetition. Its value would be some_function
You might use:
^([[:alnum:]_]+[[:space:]]){1,5}\(((/[[:alpha:]]+)+\.[[:alpha:]]):([[:digit:]]+)\)\.$
Regex demo | Bash demo
Your code could look like
line1="0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."
regex="^([[:alnum:]_]+[[:space:]]){1,5}\(((/[[:alpha:]]+)+\.[[:alpha:]]):([[:digit:]]+)\)\.$"
[[ $line1 =~ $regex ]]
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[4]}
Result
/path/to/my/file.c
93
Or a bit shorter version using \S and the values are in group 2 and 3
^([[:alnum:]_]+[[:space:]]){1,5}\((\S+\.[[:alpha:]]):([[:digit:]]+)\)\.$
Explanation
^ Start of string
([[:alnum:]_]+[[:space:]]){1,5} Repeat 1-5 times what is captured in group 1
\( match (
(\S+\.[[:alpha:]]) Capture group 2 Match 1+ non whitespace chars, . and an alphabetic character
: Match :
([[:digit:]]+) Capture group 3 Match 1+ digits
\)\. Match ).
$ End of string
See this page about bracket expressions
Regex demo

Powershell adding CR at the end of regex match group

I'm gettting a CR between the regex match and the ','. What's going on?
$r_date ='ExposeDateTime=([\w /:]{18,23})'
$v2 = (Select-String -InputObject $_ -Pattern $r_date | ForEach-Object {$_.Matches.Groups[1].Value}) + ',';
Example of output:
9/25/2018 8:45:19 AM[CR],
Original String:
ExposeDateTime=9/25/2018 8:45:19 AM
Error=Dap
PostKvp=106
PostMa=400
PostTime=7.2
PostMas=2.88
PostDap=0
Try this:
$original = #"
ExposeDateTime=9/25/2018 8:45:19 AM
Error=Dap
PostKvp=106
PostMa=400
PostTime=7.2
PostMas=2.88
PostDap=0
"#
$r_date ='ExposeDateTime=([\d\s/:]+(?:(?:A|P)M)?)'
$v2 = (Select-String -InputObject $original -Pattern $r_date | ForEach-Object {$_.Matches.Groups[1].Value}) -join ','
Regex details:
ExposeDateTime= Match the characters “ExposeDateTime=” literally
( Match the regular expression below and capture its match into backreference number 1
[\d\s/:] Match a single character present in the list below
A single digit 0..9
A whitespace character (spaces, tabs, line breaks, etc.)
One of the characters “/:”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
(?: Match the regular expression below
Match either the regular expression below (attempting the next alternative only if this one fails)
A Match the character “A” literally
| Or match regular expression number 2 below (the entire group fails if this one fails to match)
P Match the character “P” literally
)
M Match the character “M” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
if your input is a multiline string stored in $Original, then this rather simpler regex seems to do the job. [grin] it uses a named capture group and the multiline regex flag to capture the string after ExposedDateTime= and before the next line ending.
$Original -match '(?m)ExposeDateTime=(?<Date>.+)$'
$Matches.Date
output ...
9/25/2018 8:45:19 AM

Perl Non-greedy Matching -- Is the "?" character used correctly?

I am trying to match the parameter name of a parameter declaration line such as below:
parameter BWIDTH = 32;
The Perl regular expression used is:
$line =~ /(\w+)\s*=/
where the parameter name, BWIDTH, is captured into $1. Most parameters I encountered are declared in such a way that the name precedes the equal sign, "=", which is the reason the regular expression is designed with the "=" in it (/(\w+)\s*=/).
However there are special cases where the parameter is declared:
parameter reg [31:0] PORT_WIDTH [BWIDTH-1:0] = 32;
In this case, the parameter name that I am trying to capture is PORT_WIDTH. Revising the regular expression to match this instance does not capture PORT_WIDTH successfully, although it does capture BWIDTH fine.
$line =~ /(\w+)(\s*\[.*?\])*\s*=/
where (\s*\[.*?\])* matches reg [31:0] PORT_WIDTH [BWIDTH-1:0] which is greedy matching.
I am baffled as to why the metacharacter ? does not halt the greedy matching? How should I revise the regular expression?
Replace the .*? with [^][]* to match 0+ chars other than ] and [:
/(\w+)(\s*\[[^][]*])*\s*=/
^^^^^^
You may also turn the second capturing group into a non-capturing one if you are not using that value.
Pattern details:
(\w+) - Group 1: one or more word chars
(\s*\[[^][]*])* - a capturing group (add ?: after ( to make it non-capturing) zero or more occurrences of:
\s* - 0+ whitespaces
\[ - a literal [
[^][]* - a negated character class matching zero or more chars other than ] and [
] - a literal ]
\s* - zero or more whitespaces
= - an equal sign.
Greediness vs. non-greediness affects where a match ends, but it still starts as early as possible. Basically, a greedy match is the leftmost-longest possible match, while non-greedy is leftmost-shortest. But non-greedy is still leftmost, not rightmost.
To get what you want, I would use a more explicit description of what I want matched: /(\w+)(\s*\[[^]]*\])?\s*=/ In English, that's a word (\w+), optionally followed by some text in square brackets ((\s*\[[^]]*\])?), and then optional whitespace and an equals sign. Note that I used a negated character class ([^]]) instead of a non-greedy match for what's inside the brackets - IMO, negated character classes are generally a better option than non-greedy matching.
Results with this regex:
$ perl -E '$x = q(parameter reg [31:0] PORT_WIDTH [BWIDTH-1:0] = 32;); $x =~ /(\w+)(:?\s*\[[^]]*\])?\s*=/; say $1;'
PORT_WIDTH
$ perl -E '$x = q(parameter BWIDTH = 32;); $x =~ /:?(\w+)(\s*\[[^]]*\])?\s*=/; say $1;'
BWIDTH
You have information available to you which you are choosing not to use. You know the basic structure of each statement you are trying to parse. The statements have mandatory and optional parts. So, put the information you have in to the match. For example:
#!/usr/bin/env perl
use strict;
use warnings;
my $stuff_in_square_brackets = qr{ \[ [^\]]+ \] }x;
my $re = qr{
^
parameter \s+
(?: reg \s+)?
(?: $stuff_in_square_brackets \s+)?
(\w+) \s+
(?: $stuff_in_square_brackets \s+)?
= \s+
(\w+) ;
$
}x;
while (my $line = <DATA>) {
if (my($p, $v) = ($line =~ $re)) {
print "'$p' = '$v'\n";
}
}
__DATA__
parameter BWIDTH = 32;
parameter reg [31:0] PORT_WIDTH [BWIDTH-1:0] = 32;
Output:
'BWIDTH' = '32'
'PORT_WIDTH' = '32'

Matching first letter of word

I want to match the first letter of a word in one string to another with the similar letter. In this example the letter H:
25HB matches to HC
I am using the match operator shown below:
my ($match) = ( $value =~ m/^d(\w)/ );
to not match the digit, but the first matching word character. How could I correct this?
That regex doesn't do what you think it does:
m/^d(\w)/
Matches 'start of line' - letter d then a single word character.
You may want:
m/^\d+(\w)/
Which will then match one or more digits from the start of line, and grab the first word character after that.
E.g.:
my $string = '25HC';
my ( $match ) =( $string =~ m/^\d+(\w)/ );
print $match,"\n";
Prints H
You are not clear about what you want. If you want to match the first letter in a string to the same letter later in the string:
m{
( # start a capture
[[:alpha:]] # match a single letter
) # end of capture
.*? # skip minimum number of any character
\1 # match the captured letter
}msx; # /m means multilines, /s means . matches newlines, /x means ignore whitespace in pattern
See perldoc perlre for more details.
Addendum:
If by word, you mean any alphanumeric sequence, this may be closer to what you want:
m{
\b # match a word boundary (start or end of a word)
\d* # greedy match any digits
( # start a capture
[[:alpha:]] # match a single letter
) # end of capture
.*? # skip minimum number of any character
\b # match a word boundary (start or end of a word)
\d* # greedy match any digits
\1 # match the captured letter
}msx; # /m means multilines, /s means . matches newlines, /x means ignore whitespace in pattern
You could try ^.*?([A-Za-z]).
The following code returns:
ITEM: 22hb
MATCH: h
ITEM: 33HB
MATCH: H
ITEM: 3333
MATCH:
ITEM: 43 H
MATCH: H
ITEM: HB33
MATCH: H
Script.
#!/usr/bin/perl
my #array = ('22hb','33HB','3333','43 H','HB33');
for my $item (#array) {
my $match = $1 if $item =~ /^.*?([A-Za-z])/;
print "ITEM: $item \nMATCH: $match\n\n";
}
I believe this is what you are looking for:
(If you can provide more clear example of what you are looking for we may be able to help you better)
The following code takes two strings and finds the first non-digit character common in both the strings:
my $string1 = '25HB';
my $string2 = 'HC';
#strip all digits
$string1 =~ s/\d//g;
foreach my $alpha (split //, $string1) {
# for each non-digit check if we find a match
if ($string2 =~ /$alpha/) {
print "First matching non-numeric character: $alpha\n";
exit;
}
}