Regular expression match decimal with letters - regex

I have following string 3.14, 123.56f, .123e5f, 123D, 1234, 343E12, 32.
What I want to do is match any combination of above inputs. So far I started with the following:
^[0-9]\d*(\.\d+)
I realize I have to escape the . since its a regular expression itself.
Thanks.

This should also work, if not already proposed.
try {
Pattern regex = Pattern.compile("\\.?\\b[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?[fD]?\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

Probably
^(\d+(\.\d+)?|\.\d+)([eE]\d+)?[fD]?$
http://regexr.com?2ut9t
^ start of the string
(\d+(\.\d+)?|\.\d+) one or more digits with an optional ( . and one or more digits)
or
. and one or more digits
([eE]\d+)? an optional ( e or E and one or more digits)
[fD]? an optional f or D
$ end of the string
As a sidenote, I've made the D compatible with everything but the f.
If you need positive and negative sign, add [+-]? after the ^

This will match all of those:
[0-9.]+(?:[Ee][0-9.]*)?[DdFf]?
Note that within a character class (square brackets), dot . is not a special character and should not be escaped.

Maybe that one ?
^\d*(?:\.\d+)?(?:[eE]\d+)?(?:[fD])?$
with
^\d* #possibly a digit or sequence of digits at the start
(?:\.\d+)? #possibly followed by a dot and at least one digit
(?:[eE]\d+)? #possibly a 'e' or 'E' followed by at least one digit
(?:[fD])?$ #optionnaly followed by 'f' or 'D' letters until the end

You can use regexpal to test it out, but this seems to work on all of those examples:
^\d*\.?(\d*[eE]?\d*)[fD]?$

Related

Why does the regex [a-zA-Z]{5} return true for non-matching string?

I defined a regular expression to check if the string only contains alphabetic characters and with length 5:
use regex::Regex;
fn main() {
let re = Regex::new("[a-zA-Z]{5}").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
The text I use contains many illegal characters and is longer than 5 characters, so why does this return true?
You have to put it inside ^...$ to match the whole string and not just parts:
use regex::Regex;
fn main() {
let re = Regex::new("^[a-zA-Z]{5}$").unwrap();
println!("{}", re.is_match("this-shouldn't-return-true#"));
}
Playground.
As explained in the docs:
Notice the use of the ^ and $ anchors. In this crate, every expression is executed with an implicit .*? at the beginning and end, which allows it to match anywhere in the text. Anchors can be used to ensure that the full text matches an expression.
Your pattern returns true because it matches any consecutive 5 alpha chars, in your case it matches both 'shouldn't' and 'return'.
Change your regex to: ^[a-zA-Z]{5}$
^ start of string
[a-zA-Z]{5} matches 5 alpha chars
$ end of string
This will match a string only if the string has a length of 5 chars and all of the chars from start to end fall in range a-z and A-Z.

Regex that does not accept sub strings of more than two 'b'

I need a regex that accepts all the strings consisting only of characters a and b, except those with more than two 'b' in a row.
For example, these should not match:
abb
ababbb
bba
bbbaa
bbb
bb
I came up with this, but it's not working
[a-b]+b{2,}[a-b]*
Here is my code:
int main() {
string input;
regex validator_regex("\b(?:b(?:a+b?)*|(?:a+b?)+)\b");
cout << "Hello, "<<endl;
while(regex_match(input,validator_regex)==false){
cout << "please enter your choice of regEx :"<<endl;
cin>>input;
if(regex_match(input,validator_regex)==false)
cout<<input+" is not a valid input"<<endl;
else
cout<<input+" is valid "<<endl;
}
}
Your pattern [a-b]+b{2,}[a-b]* matches 1 or more a or b chars until you match bb which is what you don't want. Also note that the string should be at least 3 characters long due to this part [a-b]+b{2,}
To not match 2 b chars in a row you can exclude those matches using a negative lookahead by matching optional chars a or b until you encounter bb
Note that [a-b] is the same as [ab]
\b(?![ab]*?bb)[ab]+\b
\b A word boundary
(?![ab]*?bb) Negative lookahead, assert not 0+ times a or b followed by bb to the right
[ab]+ Match 1+ occurrences of a or b
\b A word boundary
Regex demo
Without using lookarounds, you can match the strings that you don't want by matching a string that contains bb, and capture in group 1 the strings that you want to keep:
\b[ab]*bb[ab]*\b|\b([ab]+)\b
Regex demo
Or use an alternation matching either starting with b and optional repetitions of 1+ a chars followed by an optional b, or match 1+ repetitions of starting with a followed by an optional b
\b(?:b(?:a+b?)*|(?:a+b?)+)\b
Regex demo
The simplest regex is:
^(?!.*bb)[ab]+$
See live demo.
This regex works by adding a negative look ahead (anchored to start) for bb appearing anywhere within input consisting of a or b.
If zero length input should match, change [ab]+ to [ab]*.

Exclude quantitizer from regular expression`

I have a quantifier regular expression that matches a 5digit code [0-9]{5}.
How can I exclude any matched of the above quantifier?
I tried [^([0-9]{5})] but it seems it doesn't work.
Test data follows:
including:
12345678875645 (will be matched)
pppppaaaaa (will be matched)
52p26 (will be matched)
123 (will be matched)
excluding:
12345 (won't be matched)
try this
^(\d{1,4}|\d{6,})$
This won't match numbers with exactly 5 digits
demo here: https://regex101.com/r/sHvRMA/1
You can use a negative look ahead:
/(?!^[0-9]{5}$)^.+$/
var rexp = /(?!^[0-9]{5}$)^.+$/;
var str = ['12345', '12345678875645', 'pppppaaaaa', '52p26', '123'];
for (var i = 0; i < str.length; i++) {
console.log(str[i] + ' - ' + (rexp.test(str[i]) ? 'matched' : 'did not match'));
}
I assume that you need a regex to match all things except 5 digits length
You simply need to use negative lookahead assertion for excluding 5 digits. that is it.
\b(?!\d{5}).+|.{6,}\b
It excludes only 5 digits not anything else

How do you find 3 UNIQUE digits in a string of digits?

I am trying to write a regex that is very specific. I want to find 3 digits in a list. The issue comes because I do not care about repeating digits (5, 555, and 55555555555555 are seen as 5). Also, within the 3 digits, they need to be 3 different digits (123 = good, 311 = bad).
Here is what I have so far to find 3 digits, ignoring repeats but it does not specify 3 unique digits.
^(?:([0]{1,}|[1]{1,}|[2]{1,}|[3]{1,}|[4]{1,}|[5]{1,}|[6]{1,}|[7]{1,}|[8]{1,}|[9]{1,}|[0]{1,})(?!.*\\1)){3}$<p>
Here is an example of the types of data I see.
Matching:
458
3333335555111
2222555111
222255558888
111147
9533333333
And not matching:
999999999
222252
888887
Right now my regex will find all of these. How can I ignore any that do not have 3 unique digits?
If your regex-tool of choice supports look-behinds, back-references and possesive matching you could use
^(\d)\1*+(?!.*\1)(\d)\2*+(\d)\3*+$
^ and $ are anchors to ensure, that we check the whole string
(\d) matches a digit into a first capturing group, with \1*+ we possesively match any following occurences of this digit and use the lookbehind (?!.*\1) to ensure, that it doesn't end with that number.
(\d)\2*+ then matches the next different digit, again matching any following occurences possesively (check 122 without the possesive matching to see, why I use it here)
(\d)\3*+ matches the last digit with any following occurences.
Without possesive matching you could make more use of look-behinds, like ^(\d)\1*(?!.*\1)(\d)\2*(?!.*\2)(\d)\3*+$
See https://regex101.com/r/pV2tB2/2 for a demo.
Site Note: Regex might not be the best for this, but as you specifically asked for it - here you are.
This can be done with regex, but it's not the best tool for your work.
Instead of a regex-only approach, you can easily achieve this using Python.
Example:
strings = ['458', '3333335555111', '2222555111', '222255558888', '111147', '9533333333', '955555555', '12222211']
for s in strings:
if len(set(list(s))) == 3:
print "Ok :", s
else:
print "Error :", s
Output:
>> Ok : 458
>> Ok : 3333335555111
>> Ok : 2222555111
>> Ok : 222255558888
>> Ok : 111147
>> Ok : 9533333333
>> Error : 955555555
>> Error : 12222211
I've used the following commands while iterating over the strings inside that list:
list()
set()
len()
Using negative lookahead, this should match any string of digits that contains at least 3 unique digits /^(\d)\1*(?!\1)(\d)(?:\2|\1)*(?!\2|\1)(\d)+$/
(\d) - Match a digit
\1* - Allow that digit to repeat
(?!\1) - Make sure that's followed by a digit that does not match the first match
(\d) - Match the new digit
(?:\2|\1)* - Allow repeats of either the first or second digit
(?!\2|\1) - Make sure that's followed by a digit that does not match the first or second match
(\d)+ - Capture the third unique digit, then allow any number of digits of any kind to follow
I'm not sure if an awk script will do it for you, but here it goes:
awk '
function match_func(num) {
if (match_array[num] == 0)
match_array[num] = 1;
}
{
for (i = 0; i < length($1); i++)
match_func(substr($1, i, 1));
for (i = 0; i < 10; i++)
if (match_array[i] == 1) match_sum++;
if (match_sum == 3)
print $1;
}'

Regex: mask all but the last 5 digits, ignoring non-digits

I want to match a number containing 17-23 digits interspersed with spaces or hyphens, then replace all but the last five digits with asterisks. I can match with the following regex:
((?:(?:\d)([\s-]*)){12,18})(\d[\s-]*){5}
My problem is that I can't get the regex to group all instances of [\s-] in the first section, and I have no idea how to get it to replace the initial 12-18 digits with asterisks (*).
How about this:
s/\d(?=(?:[ -]*\d){5,22}(?![ -]*\d))/*/g
The positive lookahead insures that there are at least 5 digits ahead of the just-matched digit, while the embedded negative lookahead insures that aren't more than 22.
However, there could still be more digits before the first-matched digit. That is, if there are 24 or more digits, this regex only operates on the last 23 of them. I don't know if that's a problem for you.
Even assuming that this is feasible with regex alone I'd bet that it would be way slower than using the non-capturing version of your regex and then reverse iterating over the match, leaving the first 5 digits alone and replacing the rest of them with '*'.
I think your regex is ok, but you might need to have a callback where you can insert the asterisks with another inline regex. The below is a Perl example.
s/((?:\d[\s-]*){12,18})((?:\d[\s-]*){4}\d)/ add_asterisks($1,$2) /xeg
use strict;
use warnings;
my $str = 'sequence of digits 01-2 3-456-7-190 123-416 78 ';
if ($str =~ s/((?:\d[\s-]*){12,18})((?:\d[\s-]*){4}\d)/ add_asterisks($1,$2) /xeg )
{
print "New string: '$str'\n";
}
sub add_asterisks {
my ($pre,$post) = #_;
$pre =~ s/\d/*/g;
return $pre . $post;
}
__END__
Output
New string: 'sequence of digits **-* *-***-*-*** ***-416 78 '
To give a java regex variant to Alan Moore's answer and using all word characters [a-zA-Z0-9] as \w instead of just digits \d.
This will also work with any length string.
public String maskNumber(String number){
String regex = "\\w(?=(?:\\W*\\w){4,}(?!\\W*\\w))";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(number);
while(m.find()){
number = number.replaceFirst(m.group(),"*");
}
return number;
}
This example
String[] numbers = {
"F4546-6565-55654-5457",
"F4546-6565-55654-54-D57",
"F4546-6565-55654-54-D;5.7",
"F4546-6565-55654-54-g5.37",
"hd6g83g.duj7*ndjd.(njdhg75){7dh i8}",
"####.####.####.675D-45",
"****.****.****.675D-45",
"**",
"12"
};
for (String number : numbers){
System.out.println(maskNumber(number));
}
Gives:
*****-****-*****-5457
*****-****-*****-*4-D57
*****-****-*****-*4-D;5.7
*****-****-*****-**-g5.37
*******.*********.(*******){*dh i8}
####.####.####.**5D-45
****.****.****.**5D-45
**
12