Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have been looking for a solution for the following:
Replace the < with the words less than, but keep the number <xx (the xx is a number like 55). For example: <55 or < 55 to less than 55.
I have not found a solution.
Mike
The naive way
...would be just using replace() and call it a day:
<cfscript>
s = "Is 54 < 55 and < 56?";
r = replace(s, "<", "less than", "ALL");
writeOutput(r);
</cfscript>
Returns: Is 54 less than 55 and less than 56?
But it's not that easy
...because you eventually encounter:
<cfscript>
s = "Is 54<55 and <56?";
r = replace(s, "<", "less than", "ALL");
writeOutput(r);
</cfscript>
Returns: Is 54less than55 and less than56?
We need to handle missing whitespaces around <.
Easy, we just add spaces around the needle, like this " less than ".
Are we done?
...No, it can always get worse. Look at this:
<cfscript>
s = "Is <b>54</b><55 and < 56?";
r = replace(s, "<", " less than ", "ALL");
writeOutput(r);
</cfscript>
Returns: Is less than b>54 less than /b> less than 55 and less than 56?
We need to actually detect if the > character is in front of a digit.
The fix
...is called regular expression. And reReplace() the name of the function we need:
<cfscript>
s = "Is <b>54</b> <55 and < 56?";
r = reReplace(s, "<\s*([0-9])", "less than \1", "ALL");
writeOutput(r);
</cfscript>
Returns: Is <b>54</b> less than 55 and less than 56?
Breakdown of the regex:
<
pattern starts with <
\s*
any whitespace (\s), can be missing or present in any number (*)
([0-9])
we are capturing any digit [0-9] using brackets
In the needle we replace everything that was not captured with less than and bring back the captured digit using \1. As a sideffect, we also removed any additional whitespaces in front of the digit, since we only captured the digit itself and replaced everything between < and the digit.
You could preserve the whitespaces in front by extending the capture and there also might be a need to tackle something like 54< 55 to result in 54 less than 55. Once you understand how regex capturing works, this won't be a problem for you.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Which expression should I use to identify the number of hydrogen atoms in a chemical formula?
For example:
C40H51N11O19 - 51 hydrogens
C2HO - 1 hydrogen
CO2 - no hydrogens (empty)
Any suggestions?
Thanks!
Cheers!
You can start using this regex :
H\d*
H -> match literaly the H caracter
d* -> match 0 to N time a digit
see exemple and try yourself other regex at :
https://regex101.com/r/vdvH8S/2
But regex wont convert for you the result, regex only do lookup.
You need to process your result saying :
H with a number : extract the number
only H : 1
no match : 0
A Regex Expression that will match H with follwowing digits would be:
/H(\d+)/g
The 'H' is a literal charecter match to the H in the given chemical
formula
() declares a capture group, so you cna then grab the captured group without the H in whatever programming language you are using
\d will match any digit along with the + modifier that matches 1 or more
There is no catch all scenarios here, you might be best using something other than a regex.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I need a regular expression for the elo credit card which should allow only first 6 digits are mentioned below. The total length will be 16 and all 16 should be numbers only. Alphabets are not allow.
Allowed prefixes:
401178, 401179, 431274, 438935, 451416, 457393, 457631, 457632,
504175, 627780, 636297, 636368, 655000, 655001, 651652, 651653,
651654, 650485, 650486, 650487, 650488, 506699 to 506778 and 509000
to 509999
Use an alternation, with a bit of extra work to cover the two numerical ranges you have.
^(?:401178|401179|431274|438935|451416|457393|457631|457632|504175|627780|636297|636368|
655000|655001|651652|651653|651654|650485|650486|650487|650488|506699|5067[0-6][0-9]|
50677[0-8]|509\d{3})\d{10}$
Here is how we handle the two ranges:
506699 to 506778
506699| matches 506699
5067[0-6][0-9]| matches 506700 through and including 506769
50677[0-8] matches 506770 through and including 506778
509000 to 509999
509\d{3} matches 509000 through and including 509999
i.e. 509 followed by any 3 digits
Demo here:
Regex101
You can try this:
^(?:40117[8-9]|431274|438935|451416|457393|45763[1-2]|504175
|627780|636297|636368|65500[0-1]|65165[2-4]|65048[5-8]|506699
|5067[0-6]\d|50677[0-8]|509\d{3})\d{10}$
Demo
Simple Explanation
^ Start of the line
( start of group
?: will not store it in the group
40117[8-9] means 40117 followed by anything between 8 to 9 ( same
applies for similars)
| means OR
5067[0-6]\d means 5067 + a digit between 0 to 6 + a single digit
(any)
\d{10} means it will see if the next 10 characters are digits (after previous valid 6 digits)
$ end of the line
Basically, you need alternation with some range operators to shorten the regex.
The most tricky part is to define the range 506699 to 506778, which can be represented as 506699|5067[06]\d|50677[0-8].
(?x)^(?:
40117[89]|431274|438935|451416|457393|457631|457632|504175
|627780|636297|636368|65500[01]|65165[234]|65048[5-8]
|506699|5067[06]\d|50677[0-8]
|509\d{3}
)\d{10}$
Demo: https://regex101.com/r/BbnHeQ/2
NB: the (?x) is used to allow for whitespace characters in the regex, which simplifies reading for log expressions.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Explain the below code:
$x = '12aba34ba5';
#num = split /(a|b)+/, $x;
gives # #num = ('12','a','34','a','5')
#num = split /(?:a|b)+/, $x;
gives # #num = ('12','34','5')
In the first case you are capturing (a|b) so a gets captured.(a|b)+ will match aba but only a will be stored as regex remembers only the last one when continuous groups are there.So split is at groups of ab in any order .In the second case you dont capture (a|b) .So you get the correct split result.
The string 12aba34ba5 is being split on occurrences of multiple a or b characters, giving the result 12, 34, 5
However, you also have a capture in the split regex, which inserts the captured string into the split list
If you write 'aba' =~ /(a|b)+/ then there are three occurrences of the pattern (a|b), but only the last one can be saved in $1, and this is the value that split inserts
So you are picking up the last value of aba (a) and the last value of ba (another a) and inserting them into the list, giving 12, a, 34, a, 5
If you wanted the letters separated from the numbers, you could write
#num = split /((?:a|b)+)/, $x;
or, equivalently and more neatly
#num = split /([ab]+)/, $x;
giving 12, aba, 34, ba, 5
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I want to search a value which comes inside a range x and y. I want a generic PERL regular expression because the x and y are dynamic.
Please help
This is an excessively bad idea. Not impossible, but hard to write as a general solution.
Let's write a regular expression that matches all numbers between 2 and 123. We have to look at each possible number of digits separately.
1 digit: [2-9] – 2 or larger
2 digits: [1-9][0-9] – any two-digit number
3 digits: [1](?:[0-1][0-9]|[2][0-3]) – either any 3-digit number up to 119, or 12x where 0 <= x <= 3.
Together: /\A(?:[2-9]|[1-9][0-9]|[1](?:[0-1][0-9]|[2][0-3]))\z/
Is this readable or maintainable? Certainly not.
You could use embedded code: /\A([0-9]+)(?(?{ not($x <= $^N && $^N <= $y) })(*F))\z/, but that's rather silly as well.
The best solution is to use code for what should be done with code. Regexes are simply not an appropriate tool here.
my ($num) = $string =~ /\A([0-9]+)\z/ or die "no number in \$string";
if (not($x <= $num and $num <= $y)) {
die "Number $num out of range [$x .. $y]";
}
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I need two regex regular expressions. One that will find the second block of numbers and one that will find the third block of numbers. My data is like this:
8782910291827182 04 1988 081
One code to find the 04 and other to find the 1988. I already have the expression to find the first 16 numbers and the last 3 numbers, but I am stuck in finding those 2 numbers of the second and third section.
Use Field-Splitting Instead
Based on your corpus, it seems that one should be able to rely on the existence of four fields separated by tabs or other whitespace. Splitting fields is much easier than building and testing a regex, so I'd recommend skipping the regex unless there are edge cases not included in your examples.
Consider the following Ruby examples:
# Split the string into fields.
string = '8782910291827182 04 1988 081'
fields = string.split /\s+/
#=> ["8782910291827182", "04", "1988", "081"]
# Access members of the field array.
fields.first
#=> "8782910291827182"
fields[1]
#=> "04"
fields[2]
#=> "1988"
# Unpack array elements into variables.
field1, field2, field3, field4 = fields
p field2, field3
#=> ["04", "1988"]
A regular expression will force you to spend more time on pattern matching, especially as your corpus grows more complex; string-splitting is generally simpler, and will enable to you focus more on the result set. In most cases, the end results will be functionally similar, so which one is more useful to you will depend on what you're really trying to do. It's always good to have alternative options!
Find 2 numbers:
\b\d{2}\b
Find 4 numbers:
\b\d{4}\b