Matching string between first and last parentheses - regex

I need to pick up all texts between the first and last parentheses but am having a hard time with regex.
What I have is this so far and I'm stuck and don't know hot to proceed further.
/(\w+)\((.*?)\)\s/g)
But it stops at the first ")" that it sees.
Sample:
(me)
(mine)
((me) and (you))
Desired output is
me
mine
(me) and (you)

Your code is almost correct, it would worked only if you would not add the ? in the regex, for example: (I have also removed a couple of things)
/\w+\((.*)\)/

Since you want to capture all text inside parenthesis, you shouldn't use non-greedy quantifier. You can use this regex which uses lookarounds and greedy version .* which captures all text in between ( and ).
(?<=\().*(?=\))
Demo
EDIT: Another alternative solution
Another way to extract same data can be done using following regex which doesn't have any look ahead/behind which is not supported by some regex flavors and might be useful in those situations.
^\((.*)\)$
Here ^\( matches the starting bracket and then (.*) consumes any text in a exhaustive manner and places in first grouping pattern and only stops at last occurrence of ) before end of line.
Demo without lookaround

Here's a non-regex solution. Since you want the absolute first and last instances of fixed substrings, index and rindex find the right positions that you can feed to substr:
#!/usr/bin/perl
use v5.10;
while( <DATA> ) {
chomp;
my $start = 1 + index $_, '(';
my $end = rindex $_, ')';
my $s = substr $_, $start, ($end - $start);
say "Read: $_";
say "Extracted: $s";
}
__END__
(me)
(mine)
((me) and (you))

A non-regex way with chop() and reverse()
$string='((me) and (you))';
chop($string);
$string = reverse($string);
chop($string);
$string = reverse($string);
print $string;
Output:
(me) and (you)
DEMO: http://tpcg.io/MhaLed

Related

Make a regular expression in perl to grep value work on a string with different endings

I have this code in perl where I want to extract the value of 'EUR_AF', in this case '0.39'.
Sometimes 'EUR_AF' ends with ';', sometimes it doesn't.
Alternatively, 'EUR_AF' may end with '=0' instead of '=0.39;' or '=0.39'.
How do I make the code handle that? Can't seem to find it online...I could of course wrap everything in an almost endless if-elsif-else statement, but that seems overkill.
Example text:
AVGPOST=0.9092;AN=2184;RSQ=0.5988;ERATE=0.0081;AC=144;VT=SNP;THETA=0.0045;AA=A;SNPSOURCE=LOWCOV;LDAF=0.0959;AF=0.07;ASN_AF=0.05;AMR_AF=0.10;AFR_AF=0.11;EUR_AF=0.039
Code: $INFO =~ m/\;EUR\_AF\=(.*?)(;)/
I did find that: $INFO =~ m/\;EUR\_AF\=(.*?0)/ handles the cases of EUR_AF=0, but how to handle alternative scenarios efficiently?
Extract one value:
my ($eur_af) = $s =~ /(?:^|;)EUR_AF=([^;]*)/;
my ($eur_af) = ";$s" =~ /;EUR_AF=([^;]*)/;
Extract all values:
my %rec = split(/[=;]/, $s);
my $eur_af = $rec{EUR_AF};
This regex should work for you: (?<=EUR_AF=)\d+(\.\d+)?
It means
(?<=EUR_AF=) - look for a string preceeded by EUR_AF=
\d+(\.\d+)? - consist of a digit, optionally a decimal digit
EDIT: I originally wanted the whole regex to return the correct result, not only the capture group. If you want the correct capture group edit it to (?<=EUR_AF=)(\d+(?:\.\d+)?)
I have found the answer. The code:
$INFO =~ m/(?:^|;)EUR_AF=([^;]*)/
seems to handle the cases where EUR_AF=0 and EUR_AF=0.39, ending with or without ;. The resulting $INFO will be 0 or 0.39.

Capturing Group with regex

I am using the code as follows,
Code:
my $str = 123455;
if ($str =~ m/([a-z]+)|(\d+)/ {
print "$1\n";
}
I know that it will not print the result because we should give $2. But I want to get the result as it is using the same code by changing the regular expression.
Is it possible to do it?
Note :
Please do not provide the result as below,
my $str = 123455;
if ($str =~ m/(?:[a-z]+)|(\d+)/ {
print "$1\n";
}
You can use (?| .. ) for alternative capture group numbering,
use 5.010; # regex feature available since perl 5.10
my $str = 123455;
if ($str =~ m/(?| ([a-z]+)|(\d+) )/x) {
print "$1\n";
}
([a-z]+|\d+)
Try this.Replace by $1.See demo.
http://regex101.com/r/sZ2wJ5/1
Add anchors if you want to match only letters or numbers at a time.
^([a-z]+|\d+)$
or
((?:[a-z]+)|(?:\d+))
You could use print "$&\n".
$& contains the entire matched string (in other words : either $1 or $2).
See http://perldoc.perl.org/perlre.html for more details ;-)
What do you mean you don't want to change your group structuring? You want your capture to go to group 1, but what you have won't ever put a number in group 1. You have to change your group structuring.
If you still want to be able to find a numeric in group 2, you can create subgroups -- groups number from the opening parenthesis. Try
([a-z]+|(\d+))
if that's what you want.

how to replace a string with a dynamic string

Case 1.
I have a string of alphabets like fthhdtrhththjgyhjdtygbh. Using regex I want to change it to ftxxxxxxxxxxxxxxxxxxxxx, i.e, keep the first two letters and replace the rest by x.
After a lot of googling, I achieved this:
s/^(\w\w)(\w+)/$1 . "x" x length($2)/e;
Case 2.
I have a string of alphabets like sdsABCDEABCDEABCDEABCDEABCDEsdf. Using regex I want to change it to sdsABCDExyxyxyABCDEsdf, i.e, keep the first and last ABCDE and replace the ABCDE in the middle with xy.
I achieved this:
s/ABCDE((ABCDE)+)ABCDE/$len = length($1)\/5; ABCDE."xy"x $len . ABCDE/e;
Problem : I am not happy with my solution to the mentioned problem. Is there any better or neat solution to the mentioned problem.
Contraint : Only one regex have to be used.
Sorry for the poor English in the title and the body of the problem, english isn't my first language. Please ask in comments if anything is not clear.
Task 1: Simplify the password hider regex
Use a Positive Lookbehind Assertion to replace all word characters preceded by two other word characters. This removes the need for the /e Modifier:
my $str = 'fthhdtrhththjgyhjdtygbh';
$str =~ s/(?<=\w{2})\w/x/g;
print $str;
Outputs:
ftxxxxxxxxxxxxxxxxxxxxx
Task 2: Translate inner repeated pattern regex
Use both a Positive Lookbehind and Lookahead Assertion to replace all ABCDE that are bookended by the same string:
my $str = 'sdsABCDEABCDEABCDEABCDEABCDEsdf';
$str =~ s/(?<=(ABCDE))\1(?=\1)/xy/g;
print $str, "\n";
Output:
sdsABCDExyxyxyABCDEsdf
One regex, less redundancy using \1 to refer to first captured group,
s|(ABCDE)\K (\1+) (?=\1)| "xy" x (length($2)/length($1)) |xe;

Perl search and replace the last character occurrence

I have what I thought would be an easy problem to solve but I am not able to find the answer to this.
How can I find and replace the last occurrence of a character in a string?
I have a string: GE1/0/1 and I would like it to be: GE1/0:1 <- This can be variable length so no substrings please.
Clarification:
I am looking to replace the last / with a : no matter what comes before or after it.
use strict;
use warnings;
my $a = 'GE1/0/1';
(my $b = $a) =~ s{(.*)/}{$1:}xms;
print "$b\n";
I use the greedy behaviour of .*
Perhaps I have not understand the problem with variable length, but I would do the following :
You can match what you want with the regex :
(.+)/
So, this Perl script
my $text = 'GE1/0/1';
$text =~ s|(.+)/|$1:|;
print 'Result : '.$text;
will output :
Result : GE1/0:1
The '+' quantifier being 'greedy' by default, it will match only the last slash character.
Hope this is what you were asking.
This finds a slash and looks ahead to make sure there are no more slashes past it.:
Raw regex:
/(?=[^/]*$)
I think the code would look something like this, but perl isn't my language:
$string =~ s!/(?=[^/]*$)!\:!g;
"last occurrence in a string" is slightly ambiguous. The way I see it, you can mean either:
"Foo: 123, yada: GE1/0/1, Bar: null"
Meaning the last occurrence in the "word" GE1/0/1, or:
"GE1/0/1"
As a complete string.
In the latter case, it is a rather simple matter, you only have to decide how specific you can be in your regex.
$str =~ s{/(\d+)$}{:$1};
Is perfectly fine, assuming the last character(s) can only be digits.
In the former case, which I don't think you are referring to, but I'll include anyway, you'd need to be much more specific:
$str =~ s{(\byada:\s+\w+/\w+)/(\w+\b)}{$1:$2};

How to return the first five digits using Regular Expressions

How do I return the first 5 digits of a string of characters in Regular Expressions?
For example, if I have the following text as input:
15203 Main Street
Apartment 3 63110
How can I return just "15203".
I am using C#.
This isn't really the kind of problem that's ideally solved by a single-regex approach -- the regex language just isn't especially meant for it. Assuming you're writing code in a real language (and not some ill-conceived embedded use of regex), you could do perhaps (examples in perl)
# Capture all the digits into an array
my #digits = $str =~ /(\d)/g;
# Then take the first five and put them back into a string
my $first_five_digits = join "", #digits[0..4];
or
# Copy the string, removing all non-digits
(my $digits = $str) =~ tr/0-9//cd;
# And cut off all but the first five
$first_five_digits = substr $digits, 0, 5;
If for some reason you really are stuck doing a single match, and you have access to the capture buffers and a way to put them back together, then wdebeaum's suggestion works just fine, but I have a hard time imagining a situation where you can do all that, but don't have access to other language facilities :)
it would depend on your flavor of Regex and coding language (C#, PERL, etc.) but in C# you'd do something like
string rX = #"\D+";
Regex.replace(input, rX, "");
return input.SubString(0, 5);
Note: I'm not sure about that Regex match (others here may have a better one), but basically since Regex itself doesn't "replace" anything, only match patterns, you'd have to look for any non-digit characters; once you'd matched that, you'd need to replace it with your languages version of the empty string (string.Empty or "" in C#), and then grab the first 5 characters of the resulting string.
You could capture each digit separately and put them together afterwards, e.g. in Perl:
$str =~ /(\d)\D*(\d)\D*(\d)\D*(\d)\D*(\d)/;
$digits = $1 . $2 . $3 . $4 . $5;
I don't think a regular expression is the best tool for what you want.
Regular expressions are to match patterns... the pattern you are looking for is "a(ny) digit"
Your logic external to the pattern is "five matches".
Thus, you either want to loop over the first five digit matches, or capture five digits and merge them together.
But look at that Perl example -- that's not one pattern -- it's one pattern repeated five times.
Can you do this via a regular expression? Just like parsing XML -- you probably could, but it's not the right tool.
Not sure this is best solved by regular expressions since they are used for string matching and usually not for string manipulation (in my experience).
However, you could make a call to:
strInput = Regex.Replace(strInput, "\D+", "");
to remove all non number characters and then just return the first 5 characters.
If you are wanting just a straight regex expression which does all this for you I am not sure it exists without using the regex class in a similar way as above.
A different approach -
#copy over
$temp = $str;
#Remove non-numbers
$temp =~ s/\D//;
#Get the first 5 numbers, exactly.
$temp =~ /\d{5}/;
#Grab the match- ASSUMES that there will be a match.
$first_digits = $1
result =~ s/^(\d{5}).*/$1/
Replace any text starting with a digit 0-9 (\d) exactly 5 of them {5} with any number of anything after it '.*' with $1, which is the what is contained within the (), that is the first five digits.
if you want any first 5 characters.
result =~ s/^(.{5}).*/$1/
Use whatever programming language you are using to evaluate this.
ie.
regex.replace(text, "^(.{5}).*", "$1");