Perl search and replace the last character occurrence - regex

I have what I thought would be an easy problem to solve but I am not able to find the answer to this.
How can I find and replace the last occurrence of a character in a string?
I have a string: GE1/0/1 and I would like it to be: GE1/0:1 <- This can be variable length so no substrings please.
Clarification:
I am looking to replace the last / with a : no matter what comes before or after it.

use strict;
use warnings;
my $a = 'GE1/0/1';
(my $b = $a) =~ s{(.*)/}{$1:}xms;
print "$b\n";
I use the greedy behaviour of .*

Perhaps I have not understand the problem with variable length, but I would do the following :
You can match what you want with the regex :
(.+)/
So, this Perl script
my $text = 'GE1/0/1';
$text =~ s|(.+)/|$1:|;
print 'Result : '.$text;
will output :
Result : GE1/0:1
The '+' quantifier being 'greedy' by default, it will match only the last slash character.
Hope this is what you were asking.

This finds a slash and looks ahead to make sure there are no more slashes past it.:
Raw regex:
/(?=[^/]*$)
I think the code would look something like this, but perl isn't my language:
$string =~ s!/(?=[^/]*$)!\:!g;

"last occurrence in a string" is slightly ambiguous. The way I see it, you can mean either:
"Foo: 123, yada: GE1/0/1, Bar: null"
Meaning the last occurrence in the "word" GE1/0/1, or:
"GE1/0/1"
As a complete string.
In the latter case, it is a rather simple matter, you only have to decide how specific you can be in your regex.
$str =~ s{/(\d+)$}{:$1};
Is perfectly fine, assuming the last character(s) can only be digits.
In the former case, which I don't think you are referring to, but I'll include anyway, you'd need to be much more specific:
$str =~ s{(\byada:\s+\w+/\w+)/(\w+\b)}{$1:$2};

Related

Make a regular expression in perl to grep value work on a string with different endings

I have this code in perl where I want to extract the value of 'EUR_AF', in this case '0.39'.
Sometimes 'EUR_AF' ends with ';', sometimes it doesn't.
Alternatively, 'EUR_AF' may end with '=0' instead of '=0.39;' or '=0.39'.
How do I make the code handle that? Can't seem to find it online...I could of course wrap everything in an almost endless if-elsif-else statement, but that seems overkill.
Example text:
AVGPOST=0.9092;AN=2184;RSQ=0.5988;ERATE=0.0081;AC=144;VT=SNP;THETA=0.0045;AA=A;SNPSOURCE=LOWCOV;LDAF=0.0959;AF=0.07;ASN_AF=0.05;AMR_AF=0.10;AFR_AF=0.11;EUR_AF=0.039
Code: $INFO =~ m/\;EUR\_AF\=(.*?)(;)/
I did find that: $INFO =~ m/\;EUR\_AF\=(.*?0)/ handles the cases of EUR_AF=0, but how to handle alternative scenarios efficiently?
Extract one value:
my ($eur_af) = $s =~ /(?:^|;)EUR_AF=([^;]*)/;
my ($eur_af) = ";$s" =~ /;EUR_AF=([^;]*)/;
Extract all values:
my %rec = split(/[=;]/, $s);
my $eur_af = $rec{EUR_AF};
This regex should work for you: (?<=EUR_AF=)\d+(\.\d+)?
It means
(?<=EUR_AF=) - look for a string preceeded by EUR_AF=
\d+(\.\d+)? - consist of a digit, optionally a decimal digit
EDIT: I originally wanted the whole regex to return the correct result, not only the capture group. If you want the correct capture group edit it to (?<=EUR_AF=)(\d+(?:\.\d+)?)
I have found the answer. The code:
$INFO =~ m/(?:^|;)EUR_AF=([^;]*)/
seems to handle the cases where EUR_AF=0 and EUR_AF=0.39, ending with or without ;. The resulting $INFO will be 0 or 0.39.

Regular Expression - Perl

I am trying to get the a sub string from a string using regular expression but it getting error as my regular expression is not working. Can any one help me out in writing correct one :
Here is the Pattern on which i am trying to write the regular expression :
MSM8_BD_V4.3_1-1_idle-Kr_Run3.xlsx
MSM8_BD_V4.3_2-6_mp3-Kr_Run2.xlsx
MSM8_BD_V4.3_Camera_snap-7.xlsx
MSM8_BD_V4.3_Camera_snap-8.xlsx
MSM8_BD_V4.3_Radio_202.16-0.xlsx
I am trying to get the bold part of the substring .
below is the Regular expression i tried:
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
my ($captured) = $line =~ /MSM8939_BD_V4\.\3\_[d]*(.+?)\w/gx;
print "$captured\n";
[d] matches nothing but the literal letter d. You want \d, without the brackets, to match a digit. However, it looks like you also want to include underscores. That would be [\d_].
Try this:
/^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/
If I run this on your input (with e.g. perl -nE 'say $1 if /^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/'), I get this output:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
for (qw(
MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx
MSM8939_BD_V4.3_2-6_mp3-Kratos_Run2.xlsx
MSM8939_BD_V4.3_Camera_snap-7.xlsx
MSM8939_BD_V4.3_Camera_snap-8.xlsx
MSM8939_BD_V4.3_Radio_202.16-0.xlsx
)) {
my ($captured) = ($_ =~ /.*[-_]([^\W_]+_[\w.]+)-/gx);
print "$captured\n";
}
Use a greedy pattern to go as far as possible, then grab the last two strings that look like what you want which are still followed by a hyphen.
As does the other answer which was just edited while I was typing, this produces:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
This one may be more general in that the beginning of the substring is not hard-coded, i.e., you could use it in other cases which did not necessarily start with MSM8_BD_V4.3.

Don't understand my regex's matches

I'm currently reading xml balises from a file but I tried to reduce this to this simple example.
#!/usr/bin/perl
use strict;
use warnings;
my $str = '<tag x="20" y="7" x="15" z="14"/>';
if($str =~ /<tag.*(x|y|z)=\"(\d+)\".*(x|y|z)=\"(\d+)\".*(x|y|z)=\"(\d+)\".*\/>/){
print "$1-$2\n";
print "$3-$4\n";
print "$5-$6\n";
}
As I understand my regex, the first x should match the first group, the first y the third group and the second x the fifth group.
So I expect as output:
x-20
y-7
x-15
But I get
y-7
x-15
z-14
Could someone explain what's happening here?
Use ? to make *, + quantifiers non-greedy as these are greedy by default (ie. matching any char . as much as possible)
$str =~ /<tag.*?(x|y|z)=\"(\d+)\".*?(x|y|z)=\"(\d+)\".*?(x|y|z)=\"(\d+)\".*\/>/
Instead of .* use \s+. Becasue you actually want to match multiple space characters. not multiple any characters.
If this is really an assignment you should do it in a more proper way. And regular expression is not proper way for xml thing. As its assignment just write a parser. It easier than you think.

Regular expression which matches a specific pattern

I want to find a regular expression in Perl which matches a pattern such as this:
my $sumthing = "people say
for -->";
Over here after say there is a single newline character. So I need to find a regular expression which could match such a pattern which includes a newline within a pattern. Please help me to find this as I'm new to Perl & regular expression.
The possible methods I tried were these:
if (($sumthing !~ (/\n+$/)) && ($sumthing !~ (/^\n+/m)))
They kindly help me to find out an expression to match this kind of a pattern, but not getting the output as desired.
It's not clear what you want. Do you want match that string exactly? If so, you could use
$sumthing =~ /^people say\nfor -->\z/
or
$sumthing eq "people say\nfor -->"
Or maybe what you need to know is that . matches any character including newline when /s is used?
/people .* -->/s
The following will check for anything then new line then anything. Not sure if I totally understood your question.
if($sumthing =~ m/.*\n.*/)
Have a look at the /s modifier which causes .to match anything, including a newline.
my $str = "people say for\nsomething...";
$str =~ m{say(.*)}s and print "'$1'\n";
This would print:
' for
something...'

Regex to match suffixes to english words

I'm searching for the word "move" and i want to match "moved" as well when I print.
The way I'm going about this is:
if ($sentence =~ /($search_key)d$/i) {
$search_key = $search_keyd;
}
$subsentences[$i] =~ s/$search_key/ **$search_key** /i;
$subsentences[$i] =~ s/\b$parsewords[1]_\w+/ --$parsewords[1]--/i;
print "MATCH #$count\n",split(/_\S+/,$subsentences[$i]), "\n";
$count++;
This is part of a longer code so if anything is unclear let me know. The _ is because the words in the sentence are tagged (ex. I_NN move_VB to_PREP ....).
Where $search_keyd will be $search_key."d", which worked!
A nice addition would be to check if the word ended in e and therefore only a d would need to be appended. I'd guess it'd look something like this: e?$/d$
Even a general answer will suffice.
I'm new to Perl. So sorry if this is elementary. Thanks in advance!!!
If I understand you correctly, you want to search for "move" and add a highlight, but also include any variation of the basic word, such as "moves" "moved".
When you are replacing words in a text like this, you usually want to replace all the words, and then you need the /g operator on the regex, like so:
$subsentences[$i] =~ s/$search_key/ **$search_key** /ig
Also, you should make sure to not match partials of words. E.g. you want to match "move", but not perhaps "remove". For this, you can use \b to mark word boundry:
$subsentences[$i] =~ s/\b$search_key/ **$search_key** /ig
In order to match certain suffixes, you need a character class with valid characters or combination of characters. move[sd] will find "moves" and "moved". However, for a word like "jump", you would need to be a bit more specific: "jump(s|ed)". Note that [sd] can be replaced with (s|d). So barring any bad spelling in your text, you can get away with:
$subsentences[$i] =~ s/\b$search_key(s|d|ed)/ **$search_key$1** /ig
Note that $1 matches whatever is found inside the first matching parenthesis.
To find the number of matching words:
my $matches = $subsentences[$i] =~ s/\b$search_key(s|d|ed)/ **$search_key$1** /ig
If you want to be more specific with the suffixes, i.e. make it not match badly spelled words like "moveed", you'd need to do some special matching. Something like:
if ($search_key =~ /e$/i) { $suffix = '(s|d)' }
else { $suffix = '(s|ed)' }
my $matches = $subsentences[$i] =~ s/\b$search_key$suffix/ **$search_key$1** /ig
It can probably become very complicated the more search words you add.
Some help about regexes here
If what you want is to match all complete words which begin with your search term, i.e. 'move' matches 'move', 'moved', 'movers', etc, then you want to use a character class to detect the end of the word.
So, instead of:
if ($sentence =~ /($search_key)d$/i)
Try using:
if ($sentence =~ /($search_key\w*)\W$/i)
The \w* will match any number of standard word characters and the \W should prevent you from including other characters, such as whitespace or punctuation.