Explain the code below dealing with regular expressions in perl [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Explain the below code:
$x = '12aba34ba5';
#num = split /(a|b)+/, $x;
gives # #num = ('12','a','34','a','5')
#num = split /(?:a|b)+/, $x;
gives # #num = ('12','34','5')

In the first case you are capturing (a|b) so a gets captured.(a|b)+ will match aba but only a will be stored as regex remembers only the last one when continuous groups are there.So split is at groups of ab in any order .In the second case you dont capture (a|b) .So you get the correct split result.

The string 12aba34ba5 is being split on occurrences of multiple a or b characters, giving the result 12, 34, 5
However, you also have a capture in the split regex, which inserts the captured string into the split list
If you write 'aba' =~ /(a|b)+/ then there are three occurrences of the pattern (a|b), but only the last one can be saved in $1, and this is the value that split inserts
So you are picking up the last value of aba (a) and the last value of ba (another a) and inserting them into the list, giving 12, a, 34, a, 5
If you wanted the letters separated from the numbers, you could write
#num = split /((?:a|b)+)/, $x;
or, equivalently and more neatly
#num = split /([ab]+)/, $x;
giving 12, aba, 34, ba, 5

Related

How to locate and replace a value [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have been looking for a solution for the following:
Replace the < with the words less than, but keep the number <xx (the xx is a number like 55). For example: <55 or < 55 to less than 55.
I have not found a solution.
Mike
The naive way
...would be just using replace() and call it a day:
<cfscript>
s = "Is 54 < 55 and < 56?";
r = replace(s, "<", "less than", "ALL");
writeOutput(r);
</cfscript>
Returns: Is 54 less than 55 and less than 56?
But it's not that easy
...because you eventually encounter:
<cfscript>
s = "Is 54<55 and <56?";
r = replace(s, "<", "less than", "ALL");
writeOutput(r);
</cfscript>
Returns: Is 54less than55 and less than56?
We need to handle missing whitespaces around <.
Easy, we just add spaces around the needle, like this " less than ".
Are we done?
...No, it can always get worse. Look at this:
<cfscript>
s = "Is <b>54</b><55 and < 56?";
r = replace(s, "<", " less than ", "ALL");
writeOutput(r);
</cfscript>
Returns: Is less than b>54 less than /b> less than 55 and less than 56?
We need to actually detect if the > character is in front of a digit.
The fix
...is called regular expression. And reReplace() the name of the function we need:
<cfscript>
s = "Is <b>54</b> <55 and < 56?";
r = reReplace(s, "<\s*([0-9])", "less than \1", "ALL");
writeOutput(r);
</cfscript>
Returns: Is <b>54</b> less than 55 and less than 56?
Breakdown of the regex:
<
pattern starts with <
\s*
any whitespace (\s), can be missing or present in any number (*)
([0-9])
we are capturing any digit [0-9] using brackets
In the needle we replace everything that was not captured with less than and bring back the captured digit using \1. As a sideffect, we also removed any additional whitespaces in front of the digit, since we only captured the digit itself and replaced everything between < and the digit.
You could preserve the whitespaces in front by extending the capture and there also might be a need to tackle something like 54< 55 to result in 54 less than 55. Once you understand how regex capturing works, this won't be a problem for you.

How to use regexp to identify the number of hydrogens in a chemical formula? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Which expression should I use to identify the number of hydrogen atoms in a chemical formula?
For example:
C40H51N11O19 - 51 hydrogens
C2HO - 1 hydrogen
CO2 - no hydrogens (empty)
Any suggestions?
Thanks!
Cheers!
You can start using this regex :
H\d*
H -> match literaly the H caracter
d* -> match 0 to N time a digit
see exemple and try yourself other regex at :
https://regex101.com/r/vdvH8S/2
But regex wont convert for you the result, regex only do lookup.
You need to process your result saying :
H with a number : extract the number
only H : 1
no match : 0
A Regex Expression that will match H with follwowing digits would be:
/H(\d+)/g
The 'H' is a literal charecter match to the H in the given chemical
formula
() declares a capture group, so you cna then grab the captured group without the H in whatever programming language you are using
\d will match any digit along with the + modifier that matches 1 or more
There is no catch all scenarios here, you might be best using something other than a regex.

Perl regular expression for searching a value inside a range [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I want to search a value which comes inside a range x and y. I want a generic PERL regular expression because the x and y are dynamic.
Please help
This is an excessively bad idea. Not impossible, but hard to write as a general solution.
Let's write a regular expression that matches all numbers between 2 and 123. We have to look at each possible number of digits separately.
1 digit: [2-9] – 2 or larger
2 digits: [1-9][0-9] – any two-digit number
3 digits: [1](?:[0-1][0-9]|[2][0-3]) – either any 3-digit number up to 119, or 12x where 0 <= x <= 3.
Together: /\A(?:[2-9]|[1-9][0-9]|[1](?:[0-1][0-9]|[2][0-3]))\z/
Is this readable or maintainable? Certainly not.
You could use embedded code: /\A([0-9]+)(?(?{ not($x <= $^N && $^N <= $y) })(*F))\z/, but that's rather silly as well.
The best solution is to use code for what should be done with code. Regexes are simply not an appropriate tool here.
my ($num) = $string =~ /\A([0-9]+)\z/ or die "no number in \$string";
if (not($x <= $num and $num <= $y)) {
die "Number $num out of range [$x .. $y]";
}

How to count the number of commas in a string using regular expression in python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a string in the following format:
str ="India,USA,Australia,Japan,Russia"
I want to extract the word present after third comma by counting the number of commas using regular expression in python.
desired output:Japan
You can do that with a regular expression with something like
([^,]*,){3}([^,]*)
with the meaning
[^,]* Zero or more chars but no commas
, a comma
{3} the previous group must be repeated three times
[^,]* Zero or more chars but no commas
the second group will be the fourh comma-separated value
import re
text = "India,USA,Australia,Japan,Russia"
m = re.match("([^,]*,){3}([^,]*)", text)
if m:
print m.group(2)
In this specific case however it would be much simpler to just split on commas and taking the fourth value:
print text.split(',')[3]

How do I extract two center columns from a tab-delimited line of text? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I need two regex regular expressions. One that will find the second block of numbers and one that will find the third block of numbers. My data is like this:
8782910291827182 04 1988 081
One code to find the 04 and other to find the 1988. I already have the expression to find the first 16 numbers and the last 3 numbers, but I am stuck in finding those 2 numbers of the second and third section.
Use Field-Splitting Instead
Based on your corpus, it seems that one should be able to rely on the existence of four fields separated by tabs or other whitespace. Splitting fields is much easier than building and testing a regex, so I'd recommend skipping the regex unless there are edge cases not included in your examples.
Consider the following Ruby examples:
# Split the string into fields.
string = '8782910291827182 04 1988 081'
fields = string.split /\s+/
#=> ["8782910291827182", "04", "1988", "081"]
# Access members of the field array.
fields.first
#=> "8782910291827182"
fields[1]
#=> "04"
fields[2]
#=> "1988"
# Unpack array elements into variables.
field1, field2, field3, field4 = fields
p field2, field3
#=> ["04", "1988"]
A regular expression will force you to spend more time on pattern matching, especially as your corpus grows more complex; string-splitting is generally simpler, and will enable to you focus more on the result set. In most cases, the end results will be functionally similar, so which one is more useful to you will depend on what you're really trying to do. It's always good to have alternative options!
Find 2 numbers:
\b\d{2}\b
Find 4 numbers:
\b\d{4}\b