I'm trying to regex match patterns with the following criteria:
I want to match a string that only has one single occurrence in the entire string. I then want to capture the portion before the single colon.
Examples of valid strings:
JohnP: random text here
BobF::student: random text here (this is valid because there's only ONE occurrence of a single colon. the other is a double colon)
Paris: random text here::student (valid for the same reason as above)
Examples of invalid strings:
JohnP: student: random text here
BobF::student: random text here: more
I have no idea how to do a regex match like this. In the case of the valid strings, the group i want to return is:
JohnP
BobF::student
Paris
I would appreciate the help! I have tried $string =~ ^[^:]+:\s* but that only matches up to the first colon.
You can use this regex:
^((?:::|[^:])*+):(?!.*(?<!:):(?!:))
It looks for some number of pairs of colons or non-colon characters followed by a colon, using a possessive quantifier (*+) to prevent matching part-way through a double-colon in a string such as Bill:: xyz. Those characters are captured in group 1. A negative lookahead assertion is then used to check that there are no more single colons in the string.
Demo on regex101
Perhaps regular expression can be in form: match until : not preceded with : and not followed with :.
Note: code written in shorted form
use strict;
use warnings;
use feature 'say';
my $re = qr/^(.+)(?<!:):(?!:)/;
/$re/ && say $1 for <DATA>;
__DATA__
JohnP: random text here
BobF::student: random text here (this is valid because there's only ONE occurrence of a single colon. the other is a double colon)
Paris: random text here::student (valid for the same reason as above)
Output
JohnP
BobF::student
Paris
Related
How do I write a regular expression to get that returns only the letters and numbers without the asterisks in between ?
You could use a regex replacement here:
my $var = 'RMRIV43069411**2115.82';
$var =~ s/^.*?\D(\d+(?:\.\d+)*)$/$1/g;
print "$var"; // 2115.82
The idea is to capture the final number in the string, and then replace with only that captured quantity.
Here is an explanation of the pattern:
^ from the start of the input
.*? consume all content up until
\D the first non digit character, which is followed by
(\d+(?:\.\d+)*) match AND capture: a number, with optional decimal component,
occurring before
$ the end of the input
Then, we place with just this captured number, which is available in $1.
I would like to replace "," to # in following strings, but without changing it in unit (10,000) format.
x,y,z to x#y#z
x1,y1,z1 to x1#y1#z1
x1,y1 10,000,z1 to x1#y1 10,000#z1
I used s/(\D),/\1#/g, but it won't work for 2 and 3. How to recognize the exclusion pattern is digit on both sides? Can someone help? thanks so much
You need a regex which says to match a comma that does not have a number to its left or right.
s/(?<!\d),|,(?!\d)/#/g
The negative lookbehind assertion (?<!\d) allows matches such as x,, since x is not a number. Using a negated expression allows this to also match beginning of line, e.g. ,x. The negative lookahead assertion (?!\d) allows matches against commas that are not followed by numbers. Neither of these expressions will match a comma surrounded by numbers.
Try the following alternative:
s/,(?<!\d)(?!\d)/\#/g;
sample script
use strict;
use warnings;
my #array = ( 'x,y,z', 'x1,y1,z1', 'x1,y1 10,000,z1');
for my $string (#array) {
$string =~ s/,(?<!\d)(?!\d)/\#/g;
print "$string\n";
}
#OUTPUT
#x#y#z
#x1#y1#z1
#x1#y1 10,000#z1
Is there a Powershell regex command I could use to replace the last consecutive zero in a text string with a "M". For Example:
$Pattern = #("000123456", "012345678", "000000001", "000120000")
Final result:
00M123456
M12345678
0000000M1
00M120000
Thanks.
Search for the following regex:
"^(0*)0"
The regex searches for a consecutive string of 0 at the beginning ^ of the string. It captures all the 0 except the one for replacement. "^0(0*)" also works, since we only need to take note of the number of 0 which we don't touch.
With the replacement string:
'$1M'
Note that $1 is denotes the text captured by the first capturing group, which is (0*) in the regex.
Example by #SegFault:
"000120000" -replace "^(0*)0", '$1M'
I've got a regular expression with capture groups that matches what I want in a broader context. I then take capture group $1 and use it for my needs. That's easy.
But how to use capture groups with s/// when I just want to replace the content of $1, not the entire regex, with my replacement?
For instance, if I do:
$str =~ s/prefix (something) suffix/42/
prefix and suffix are removed. Instead, I would like something to be replaced by 42, while keeping prefix and suffix intact.
As I understand, you can use look-ahead or look-behind that don't consume characters. Or save data in groups and only remove what you are looking for. Examples:
With look-ahead:
s/your_text(?=ahead_text)//;
Grouping data:
s/(your_text)(ahead_text)/$2/;
If you only need to replace one capture then using #LAST_MATCH_START and #LAST_MATCH_END (with use English; see perldoc perlvar) together with substr might be a viable choice:
use English qw(-no_match_vars);
$your_string =~ m/aaa (bbb) ccc/;
substr $your_string, $LAST_MATCH_START[1], $LAST_MATCH_END[1] - $LAST_MATCH_START[1], "new content";
# replaces "bbb" with "new content"
This is an old question but I found the below easier for replacing lines that start with >something to >something_else. Good for changing the headers for fasta sequences
while ($filelines=~ />(.*)\s/g){
unless ($1 =~ /else/i){
$filelines =~ s/($1)/$1\_else/;
}
}
I use something like this:
s/(?<=prefix)(group)(?=suffix)/$1 =~ s|text|rep|gr/e;
Example:
In the following text I want to normalize the whitespace but only after ::=:
some text := a b c d e ;
Which can be achieved with:
s/(?<=::=)(.*)/$1 =~ s|\s+| |gr/e
Results with:
some text := a b c d e ;
Explanation:
(?<=::=): Look-behind assertion to match ::=
(.*): Everything after ::=
$1 =~ s|\s+| |gr: With the captured group normalize whitespace. Note the r modifier which makes sure not to attempt to modify $1 which is read-only. Use a different sub delimiter (|) to not terminate the replacement expression.
/e: Treat the replacement text as a perl expression.
Use lookaround assertions. Quoting the documentation:
Lookaround assertions are zero-width patterns which match a specific pattern without including it in $&. Positive assertions match when their subpattern matches, negative assertions match when their subpattern fails. Lookbehind matches text up to the current match position, lookahead matches text following the current match position.
If the beginning of the string has a fixed length, you can thus do:
s/(?<=prefix)(your capture)(?=suffix)/$1/
However, ?<= does not work for variable length patterns (starting from Perl 5.30, it accepts variable length patterns whose length is smaller than 255 characters, which enables the use of |, but still prevents the use of *). The work-around is to use \K instead of (?<=):
s/.*prefix\K(your capture)(?=suffix)/$1/
I need a regular expression that will find a number(s) that is not inside parenthesis.
Example abcd 1 (35) (df)
It would only see the 1.
Is this very complex? I've tried and had no luck.
Thanks for any help
An easy solution is to first remove the unwanted values:
my $string = "abcd 12 (35) (df) 2311,22";
$string =~ s/\(\d+\)//g; # remove numbers within parens
my #numbers = $string =~ /\d+/g; # extract the numbers
This is quite hard but something like this will probably do:
^(?:\()(\d+)(?:[^)])|(?:[^(0-9]|^)(\d+)(?:[^)0-9]|^)|(?:[^(])(\d+)(?:\))$
The problem is to match (123, 123) and also to not match the string 123 as the number 2 between the non-parentheses characters 1 and 3. Also there are probably some edge cases for start of and end of string.
My suggestion is to not use a regex for this. Maybe a regex that matches numbers and then use the capture info to check if the surrounding characters are not parentheses.
The regular expression would be:
^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$
The result is the first (and only) matching group of the regex.
Maybe you want to remove the ^ and $ if the regex should not match only if it’s the content of a whole single line. You can also use [a-zA-Z] or [[:alpha:]]. This depends on the regular expression engine you use and, of course, the content you want to match.
Example perl code:
if (m/^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$/) {
print("$1\n");
}
Please note that your question contains not enough information to make a good answer possible (you did not say anything about the general format of your expression, for example if you want to match integers or floating points)
How about
/(?:^|[^\d(])(\d+)(?:[^\d)]|$)/
? This matches a string of digits (\d+) that are
preceded by the beginning of the string, or a character that is not a digit or an open parenthesis ((?:^|[^\d(]))
succeeded by the end of the string, or by a character that is not a digit or a close parenthesis ((?:[^\d)]|$))