Exclude a substring after a pattern is matched using regex - regex

I want to write a regex that splits a string such as only few elements are selected. For example:
M:\Shares\Profiles\Server\Profiles\abcd.contoso.V2.01
the result I am aiming for is:
abcd.V2.01, so that the domain name i.e. 'contoso' is dropped
However, I am unable to exclude a part of the string after a match is found. I tried
$original = 'M:\Shares\Profiles\Server\Profiles\abcd.contoso.V2.01'
$modified = $original -replace '.*\\([^\\.]+.contoso.V2)[^\\]*$', '$1'
that returns
$modified as 'abcd.contoso.V2'

You can use two capturing groups:
$original = 'M:\Shares\Profiles\Server\Profiles\abcd.contoso.V2.01'
$original -replace '.*\\([^\\.]*)\.contoso(\.V2[^\\]*)$', '$1$2'
# => abcd.V2.01
Do not forget to escape literal dots in the regex pattern. Here is a demo of the above regex. Details:
.* - any zero or more chars other than LF chars
\\ - a \ char
([^\\.]*) - Group 1 ($1): any zero or more chars other than \ and .
\.contoso - a .contoso string
(\.V2[^\\]*) - Group 2 ($2): .V2 string and then any zero or more chars other than \
$ - end of string.

Related

String split in windows powershell

Can you please help me to get the desired output, where SIT is the environment and type of file is properties, i need to remove the environment and the extension of the string.
#$string="<ENV>.<can have multiple period>.properties
*$string ="SIT.com.local.test.stack.properties"
$b=$string.split('.')
$b[0].Substring(1)*
Required output : com.local.test.stack //can have multiple period
This should do.
$string = "SIT.com.local.test.stack.properties"
# capture anything up to the first period, and in between first and last period
if($string -match '^(.+?)\.(.+)\.properties$') {
$environment = $Matches[1]
$properties = $Matches[2]
# ...
}
You may use
$string -replace '^[^.]+\.|\.[^.]+$'
This will remove the first 1+ chars other than a dot and then a dot, and the last dot followed with any 1+ non-dot chars.
See the regex demo and the regex graph:
Details
^ - start of string
[^.]+ - 1+ chars other than .
\. - a dot
| - or
\. - a dot
[^.]+ - 1+ chars other than .
$ - end of string.
You can use -match to capture your desired output using regex
$string ="SIT.com.local.test.stack.properties"
$string -match "^.*?\.(.+)\.[^.]+$"
$Matches.1
You can do this with the Split operator also.
($string -split "\.",2)[1]
Explanation:
You split on the literal . character with regex \.. The ,2 syntax tells PowerShell to return 2 substrings after the split. The [1] index selects the second element of the returned array. [0] is the first substring (SIT in this case).

Bash regex matching "0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."

In a Bash script I'm writing, I need to capture the /path/to/my/file.c and 93 in this line:
0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93).
0xffffffc0006e0584 is in another_function(char *arg1, int arg2) (/path/to/my/other_file.c:94).
With the help of regex101.com, I've managed to create this Perl regex:
^(?:\S+\s){1,5}\((\S+):(\d+)\)
but I hear that Bash doesn't understand \d or ?:, so I came up with this:
^([:alpha:]+[:space:]){1,5}\(([:alpha:]+):([0-9]+)\)
But when I try it out:
line1="0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."
regex="^([:alpha:]+[:space:]){1,5}\(([:alpha:]+):([0-9]+)\)"
[[ $line1 =~ $regex ]]
echo ${BASH_REMATCH[0]}
I don't get any match. What am I doing wrong? How can I write a Bash-compatible regex to do this?
You are right, Bash uses POSIX ERE and does not support \d shorthand character class, nor does it support non-capturing groups. See more regex features unsupported in POSIX ERE/BRE in this post.
Use
.*\((.+):([0-9]+)\)
Or even (if you need to grab the first (...) substring in a string):
\(([^()]+):([0-9]+)\)
Details
.* - any 0+ chars, as many as possible (may be omitted, only necessary if there are other (...) substrings and you only need to grab the last one)
\( - a ( char
(.+) - Group 1 (${BASH_REMATCH[1]}): any 1+ chars as many as possible
: - a colon
([0-9]+) - Group 2 (${BASH_REMATCH[2]}): 1+ digits
\) - a ) char.
See the Bash demo (or this one):
test='0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93).'
reg='.*\((.+):([0-9]+)\)'
# reg='\(([^()]+):([0-9]+)\)' # This also works for the current scenario
if [[ $test =~ $reg ]]; then
echo ${BASH_REMATCH[1]};
echo ${BASH_REMATCH[2]};
fi
Output:
/path/to/my/file.c
93
In the first pattern you use \S+ which matches a non whitespace char. That is a broad match and will also match for example / which is not taken into account in the second pattern.
The pattern starts with [:alpha:] but the first char is a 0. You could use [:alnum:] instead. Since the repetition should also match _ that could be added as well.
Note that when using a quantifier for a capturing group, the group captures the last value of the iteration. So when using {1,5} you use that quantifier only for the repetition. Its value would be some_function
You might use:
^([[:alnum:]_]+[[:space:]]){1,5}\(((/[[:alpha:]]+)+\.[[:alpha:]]):([[:digit:]]+)\)\.$
Regex demo | Bash demo
Your code could look like
line1="0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."
regex="^([[:alnum:]_]+[[:space:]]){1,5}\(((/[[:alpha:]]+)+\.[[:alpha:]]):([[:digit:]]+)\)\.$"
[[ $line1 =~ $regex ]]
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[4]}
Result
/path/to/my/file.c
93
Or a bit shorter version using \S and the values are in group 2 and 3
^([[:alnum:]_]+[[:space:]]){1,5}\((\S+\.[[:alpha:]]):([[:digit:]]+)\)\.$
Explanation
^ Start of string
([[:alnum:]_]+[[:space:]]){1,5} Repeat 1-5 times what is captured in group 1
\( match (
(\S+\.[[:alpha:]]) Capture group 2 Match 1+ non whitespace chars, . and an alphabetic character
: Match :
([[:digit:]]+) Capture group 3 Match 1+ digits
\)\. Match ).
$ End of string
See this page about bracket expressions
Regex demo

Capture word between optional hyphens regex

I've following type of strings,
abc - xyz
abc - pqr - xyz
abc - - xyz
abc - pqr uvw - xyz
I want to retrieve the text xyz from 1st string and pqr from 2nd string, `` (empty) from 3rd & pqr uvw. The 2nd hyphen is optional. abc is static string, it has to be there. I've tried following regex,
/^(?:abc) - (.*)[^ -]?/
But it gives me following output,
xyz
pqr - xyz
- xyz
pqr uvw - xyz
I don't need the last part in the second string. I'm using perl for scripting. Can it be done via regex?
Note that (.*) part is a greedily quantified dot and it grabs any 0+ chars other than line break chars, as many as possible, up to the end of the line and the [^ -]?, being able to match an empty string due to the ? quantifier (1 or 0 repetitions), matches the empty string at the end of the line. Thus, pqr - xyz output for abc - pqr - xyz is only logical for the regex engine.
You need to use a more restrictive pattern here. E.g.
/^abc\h*-\h*((?:[^\s-]+(?:\h+[^\s-]+)*)?)/
See the regex demo.
Details
^ - start of a string
abc - an abc
\h*-\h* - a hyphen enclosed with 0+ horizontal whitespaces
((?:[^\s-]+(?:\h+[^\s-]+)*)?) - Group 1 capturing an optional occurrence of
[^\s-]+ - 1 or more chars other than whitespace and -
(?:\h+[^\s-]+)* - zero or more repetitions of
\h+ - 1+ horizontal whitespaces
[^\s-]+ - 1 or more chars other than whitespace and -
You could use ^[^-]*-\s*\K[^\s-]*.
Here's how it works:
^ # Matches at the beginning of the line (in multiline mode)
[^-]* # Matches every non - characters
- # Followed by -
\s* # Matches every spacing characters
\K # Reset match at current position
[^\s-]* # Matches every non-spacing or - characters
Demo.
Update for multiple enclosed words: ^[^-]*-\s*\K[^\s-]*(?:\s*[^\s-]+)*
Last part (?:\s*[^\s-]+)* checks for existence of any other word preceded by space(s).
Demo
You could use split:
$answer = (split / \- /, $t)[1];
Where $t is the text string and you want the 2nd split (i.e. [1] as starts from 0). Works for everything except abc - - xyz but if the separator is " - " then it should have 2 spaces in the middle to return nothing. If abc - - xyz is correct then you can do this before the split for all to work:
$t =~ s/\- \-/- -/;
It simply inserts an extra space so it'll match " - " twice with nothing in-between.
Can it be done via regex?
Yes, with three simple regexes: - and ^\s+ and \s+$.
use strict;
use warnings;
use 5.020;
use autodie;
use Data::Dumper;
open my $INFILE, '<', 'data.txt';
my #results = map {
(undef, my $target) = split /-/, $_, 3;
$target =~ s/^\s+//; #remove leading spaces
$target =~ s/\s+$//; #remove trailing spaces
$target;
} <$INFILE>;
close $INFILE;
say Dumper \#results;
--output:--
$VAR1 = [
'xyz',
'pqr',
'',
'pqr uvw'
];

how to replace special characters within multiple matched groups

How do I use regex in google apps script to replace the special characters in my text but only those between certain strings?
so if this was the text and x represents random alphanumeric characters...
xx##xxxSTARTxxx###xxx$xxxENDxxxxx##££xxxSTARTxxxx££££xxx&&&&&xxxxENDxxx
what regex would i need so i end up with
xx##xxxSTARTxxxxxxxxxENDxxxxx##££xxxSTARTxxxxxxxxxxxENDxxx
You may use a replace with a callback:
var text = "xx##xxxSTARTxxx###xxx$xxxENDxxxxx##££xxxSTARTxxxx££££xxx&&&&&xxxxENDxxx";
var regex = /(START)([\s\S]*?)(END)/g;
var result = text.replace(regex, function ($0, $1, $2, $3) {
return $1 + $2.replace(/[^\w\s]+/g, '') + $3;
});
console.log(result);
// => xx##xxxSTARTxxxxxxxxxENDxxxxx##££xxxSTARTxxxxxxxxxxxENDxxx
The first regex is a simple regex to match a string between two strings:
(START) - Group 1 ($1): START (may be replaced with any pattern)
([\s\S]*?) - Group 2 ($2): any 0+ chars, but as few as possible
(END) - Group 3 ($3): END (may be replaced with any pattern)
The regex to match special chars I used here is [^\w\s], it matches any 1+ chars other than ASCII letters, digits, _ and whitespaces.
See Check for special characters in string for more variations of the special char regex.

cant save pattern matches in array using perl and regex

I am trying to save matched patterns in an array using perl and regex, the problem is that when the match is saved it is missing some characters
ex:
my #array;
my #temp_array;
#types_U8 = ("uint8","vuint8","UCHAR");
foreach my $type (#types_U8)
{
#temp_array = $str =~ /\(\s*\Q$type\E\s*\)\s*(0x[0-9ABCDEF]{3,}|\-[1-9]+)/g;
push(#array,#temp_array);
#temp_array = ();
}
So if $str = "any text (uint8)-1"
The saved string in the #temp_array is only ever "-1"
Your current regular expression is:
/\(\s*\Q$type\E\s*\)\s*(0x[0-9ABCDEF]{3,}|\-[1-9]+)/g
this means
match a literal left paren: \(
match zero or more whitespace characters: \s*
match the value that is stored in $type: \Q$type\E
match zero of more whitespace characters: \s*
match a literal right paren: \)
match zero of more whitespace characters: \s*
START capturing group: (
match a 3 digit hexadecimal number prefixed with 0x
OR
match a literal dash, followed by 1 or more digits from 1 to 9: 0x[0-9ABCDEF]{3,}|\-[1-9]+
END capturing group: )
If you notice above, your capturing group doesn't start until step #7, when you would also like to capture $type and the literal parens.
Extend your capturing group to enclose those areas:
/(\(\s*\Q$type\E\s*\)\s*(?:0x[0-9ABCDEF]{3,}|\-[1-9]+))/;
This means:
START a capturing group: (
match a literal left paren: \(
match zero or more whitespace characters: \s*
match the value that is stored in $type: \Q$type\E
match zero of more whitespace characters: \s*
match a literal right paren: \)
match zero of more whitespace characters: \s*
START non-capturing group: (?:
match a 3 digit hexadecimal number prefixed with 0x
OR
match a literal dash, followed by 1 or more digits from 1 to 9: 0x[0-9ABCDEF]{3,}|\-[1-9]+
END non-capturing group: )
END capturing group: )
(Note: I removed the g (global) modifier because it is unnecessary)
This change gives me a result of (uint8)-1