Regex for numbers on scientific notation? - c++

I'm loading a .obj file that has lines like
vn 8.67548e-017 1 -1.55211e-016
for the vertex normals. How can I detect them and bring them to double notation?

A regex that would work pretty well would be:
-?[\d.]+(?:e-?\d+)?
Converting to a number can be done like this: String in scientific notation C++ to double conversion, I guess.
The regex is
-? # an optional -
[\d.]+ # a series of digits or dots (see *1)
(?: # start non capturing group
e # "e"
-? # an optional -
\d+ # digits
)? # end non-capturing group, make optional
**1) This is not 100% correct, technically there can be only one dot, and before it only one (or no) digit. But practically, this should not happen. So the regex is a good approximation and false positives should be very unlikely. Feel free to make the regex more specific.*

You can identify the scientific values using: -?\d*\.?\d+e[+-]?\d+ regex.

I tried a number of the other solutions to no avail, so I came up with this.
^(-?\d+)\.?\d+(e-|e\+|e|\d+)\d+$
Debuggex Demo
Anything that matches is considered to be valid Scientific Notation.
Please note: This accepts e+, e- and e; if you don't want to accept e, use this: ^(-?\d+)\.?\d+(e-|e\+|\d+)\d+$
I'm not sure if it works for c++, but in c# you can add (?i) between the ^ and (- in the regex, to toggle in-line case-insensitivity. Without it, exponents declared like 1.05E+10 will fail to be recognised.
Edit: My previous regex was a little buggy, so I've replaced it with the one above.

The standard library function strtod handles the exponential component just fine (so does atof, but strtod allows you to differentiate between a failed parse and parsing the value zero).

If you can be sure that the format of the double is scientific, you can try something like the following:
string inp("8.67548e-017");
istringstream str(inp);
double v;
str >> scientific >> v;
cout << "v: " << v << endl;
If you want to detect whether there is a floating point number of that format, then the regexes above will do the trick.
EDIT: the scientific manipulator is actually not needed, when you stream in a double, it will automatically do the handling for you (whether it's fixed or scientific)

Well this is not exactly what you asked for since it isn't Perl (gak) and it is a regular definition not a regular expression, but it's what I use to recognize an extension of C floating point literals (the extension is permitting "_" in digit strings), I'm sure you can convert it to an unreadable regexp if you want:
/* floats: Follows ISO C89, except that we allow underscores */
let decimal_string = digit (underscore? digit) *
let hexadecimal_string = hexdigit (underscore? hexdigit) *
let decimal_fractional_constant =
decimal_string '.' decimal_string?
| '.' decimal_string
let hexadecimal_fractional_constant =
("0x" |"0X")
(hexadecimal_string '.' hexadecimal_string?
| '.' hexadecimal_string)
let decimal_exponent = ('E'|'e') ('+'|'-')? decimal_string
let binary_exponent = ('P'|'p') ('+'|'-')? decimal_string
let floating_suffix = 'L' | 'l' | 'F' | 'f' | 'D' | 'd'
let floating_literal =
(
decimal_fractional_constant decimal_exponent? |
hexadecimal_fractional_constant binary_exponent?
)
floating_suffix?
C format is designed for programming languages not data, so it may support things your input does not require.

For extracting numbers in scientific notation in C++ with std::regex I normally use
((\\+|-)?[[:digit:]]+)(\\.(([[:digit:]]+)?))?((e|E)((\\+|-)?)[[:digit:]]+)?
which corresponds to
((\+|-)?\d+)(\.((\d+)?))?((e|E)((\+|-)?)\d+)?
Debuggex Demo
This will match any number of the form +12.3456e-78 where
the sign can be either + or - and is optional
the comma as well as the positions after the comma are optional
the exponent is optional and can be written with a lower- or upper-case letter
A corresponding code for parsing might look like this:
std::regex const scientific_regex {"((\\+|-)?[[:digit:]]+)(\\.(([[:digit:]]+)?))?((e|E)((\\+|-)?)[[:digit:]]+)?"};
std::string const str {"8.67548e-017 1 -1.55211e-016"};
for (auto it = std::sregex_iterator(str.begin(), str.end(), scientific_regex); it != std::sregex_iterator(); ++it) {
std::string const match {it->str()};
std::cout << match << std::endl;
}
If you want to convert the found sub-strings to a double number std::stod should handle the conversion correctly as already pointed out by Ben Voigt.
Try it here!

Related

QRegExp not finding expected string pattern

I am working in Qt 5.2, and I have a piece of code that takes in a string and enters one of several if statements based on its format. One of the formats searched for is the letters "RCV", followed by a variable amount of numbers, a decimal, and then one more number. There can be more than one of these values in the line, separated by "|", for example it could one value like "RCV0123456.1" or mulitple values like "RCV12345.1|RCV678.9". Right now I am using QRegExp class to find this, like this:
QString value = "RCV000030249.2|RCV000035360.2"; //Note: real test value from my code
if(QRegExp("^[RCV\d+\.\d\|?]+$").exactMatch(value))
std::cout << ":D" << std::endl;
else
std::cout << ":(" << std::endl;
I want it to use the if statement, but it keeps going into the else statement. Is there something I'm doing wrong with the regular expression?
Your expression should be like #vahancho mentionet in a comment:
if(QRegExp("^[RCV\\d+\\.\\d\\|?]+$").exactMatch(value))
If you use C++11, then you can use its raw strings feature:
if(QRegExp(R"(^[RCV\d+\.\d\|?]+$)").exactMatch(value))
Aside from escaping the backslashes which others has mentioned in answers and comments,
There can be more than one of these values in the line, separated by "|", for example it could one value like "RCV0123456.1" or mulitple values like "RCV12345.1|RCV678.9".
[RCV\d+\.\d\|?] may not be doing what you expect. Perhaps you want () instead of []:
/^
[RCV\d+\.\d\|?]+ # More than one of characters from the list:
# R, C, V, a digit, a +, a dot, a digit, a |, a ?
$/x
/^
(
RCV\d+\.\d # RCV, some digits, a dot, followed by a digit
\|? # Optional: a |
)+ # Quantifier of one or more
$/x
Also, maybe you could revise the regex such that the optional | requires the group to be matched *again*:
/^
(RCV\d+\.\d) # RCV, some digits, a dot, followed by a digit
(
\|(?1) # A |, then match subpattern 1 (Above)
)+ # Quantifier of one or more
$/x
Check if only valid occurences in line with the addition to require an | starting second occurence (having your implementation would not require the | even with double quotes):
QString value = "RCV000030249.2|RCV000035360.2"; //Note: real test value from my code
if(QRegExp("^RCV\\d+\\.\\d(\\|RCV\\d+\\.\\d)*$").exactMatch(value))
std::cout << ":D" << std::endl;
else
std::cout << ":(" << std::endl;

C++11 regex to tokenize Mathematical Expression

I have the following code to tokenize a string of the format: (1+2)/((8))-(100*34):
I'd like to throw an error to the user if they use an operator or character that isn't part of my regex.
e.g if user enters 3^4 or x-6
Is there a way to negate my regex, search for it and if it is true throw the error?
Can the regex expression be improved?
//Using c++11 regex to tokenize input string
//[0-9]+ = 1 or many digits
//Or [\\-\\+\\\\\(\\)\\/\\*] = "-" or "+" or "/" or "*" or "(" or ")"
std::regex e ( "[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]");
std::sregex_iterator rend;
std::sregex_iterator a( infixExpression.begin(), infixExpression.end(), e );
queue<string> infixQueue;
while (a!=rend) {
infixQueue.push(a->str());
++a;
}
return infixQueue;
-Thanks
You can run a search on the string using the search expression [^0-9()+\-*/] defined as C++ string as "[^0-9()+\\-*/]" which finds any character which is NOT a digit, a round bracket, a plus or minus sign (in real hyphen), an asterisk or a slash.
The search with this regular expression search string should not return anything otherwise the string contains a not supported character like ^ or x.
[...] is a positive character class which means find a character being one of the characters in the square brackets.
[^...] is a negative character class which means find a character NOT being one of the characters in the square brackets.
The only characters which must be escaped within square brackets to be interpreted as literal character are ], \ and - whereby - must not be escaped if being first or last character in the list of characters within the square brackets. But it is nevertheless better to escape - always within square brackets as this makes it easier for the regular expression engine / function to detect that the hyphen character should be interpreted as literal character and not with meaning "FROM x to z".
Of course this expression does not check for missing closing round brackets. But formula parsers do often not require that there is always a closing parenthesis for every opening parenthesis in comparison to a compiler or script interpreter simply because not needed to calculate the value based on entered formula.
Answer is given already but perhaps someone might need this
[0-9]?([0-9]*[.])?[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]
This regex separates floats, integers and arithmetic operators
Heres the trick:
[0-9]?([0-9]*[.])?[0-9]+ -> if its a digit and has a point, then grab the digits with the point and the digits that follows it, if not, just grab the digits.
Sorry if my answer isn't clear, i just learned regex and found this solution by my own by just trial and errors.
Heres the code (it takes a mathematical expression and split all digits and operators into a vector)
NOTE: I don't know if it accepts whitespaces, meaning that the mathematical expression that i worked with had no whitespaces. Example: 4+2*(3+1) and would separate everything nicely, but i havent tried with whitespaces.
/* Separate every int or float or operator into a single string using regular expression and store it in untokenize vector */
string infix; //The string to be parse (the arithmetic operation if you will)
vector<string> untokenize;
std::regex words_regex("[0-9]?([0-9]*[.])?[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]");
auto words_begin = std::sregex_iterator(infix.begin(), infix.end(), words_regex);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
cout << (*i).str() << endl;
untokenize.push_back((*i).str());
}
Output:
(<br/>
1<br/>
+<br/>
2<br/>
)<br/>
/<br/>
(<br/>
(<br/>
8<br/>
)<br/>
)<br/>
-<br/>
(<br/>
100<br/>
*<br/>
34<br/>
)<br/>

Ignore String containing special words (Months)

I am trying to find alphanumeric strings by using the following regular expression:
^(?=.*\d)(?=.*[a-zA-Z]).{3,90}$
Alphanumeric string: an alphanumeric string is any string that contains at least a number and a letter plus any other special characters it can be # - _ [] () {} ç _ \ ù %
I want to add an extra constraint to ignore all alphanumerical strings containing the following month formats :
JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
One solution is to actually match an alphanumerical string. Then check if this string contains one of these names by using the following function:
vector<string> findString(string s)
{
vector<string> vec;
boost::regex rgx("JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre
");
boost::smatch match;
boost::sregex_iterator begin {s.begin(), s.end(), rgx},
end {};
for (boost::sregex_iterator& i = begin; i != end; ++i)
{
boost::smatch m = *i;
vec.push_back(m.str());
}
return vec;
}
Question: How can I add this constraint directly into the regular expression instead of using this function.
One solution is to use negative lookahead as mentioned in How to ignore words in string using Regular Expressions.
I used it as follows:
String : 2-hello-001
Regular expression : ^(?=.*\d)(?=.*[a-zA-Z]^(?!Jan|Feb|Mar)).{3,90}$
Result: no match
Test website: http://regexlib.com/
The edit provided by #Robin and #RyanCarlson : ^[][\w#_(){}ç\\ù%-]{3,90}$ works perfectly in detecting alphanumeric strings with special characters. It's just the negative lookahead part that isn't working.
You can use negative look ahead, the same way you're using positive lookahead:
(?=.*\d)(?=.*[a-zA-Z])
(?!.*(?:JANVIER|FEVRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AOUT|SEPTEMBRE|OCTOBRE|NOVEMBRE|DECEMBRE|Jan|Feb|Mar|Apr|May|Jun|JUN|Jul|Aug|Sep|Oct|Nov|Dec|[jJ]anvier|[fF][ée]vrier|[mM]ars|[aA]vril|[mM]ai|[jJ]uin|[jJ]uillet|[aA]o[éû]t|aout|[sS]eptembre|[oO]ctobre|[nN]ovembre|[dD][eé]cembre)).{3,90}$
Also you regex is pretty unclear. If you want alphanumerical strings with a length between 3 and 90, you can just do:
/^(?!.*(?:JANVIER|F[Eé]VRIER|MARS|AVRIL|MAI|JUIN|JUILLET|AO[Uù]T|SEPTEMBRE|OCTOBRE|NOVEMBRE|D[Eé]CEMBRE|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))
[][\w#_(){}ç\\ù%-]{3,90}$/i
the i flag means it will match upper and lower case (so you can reduce your forbidden list), \w is a shortcut for [0-9a-zA-Z_] (careful if you copy-paste, there's a linebreak here for readability between (?! ) and [ ]). Just add in the final [...] whatever special characters you wanna match.

Detecting text like "#smth" with RegExp (with some more terms)

I'm really bad in regular expressions, so please help me.
I need to find in string any pieces like #text.
text mustn't contain any space characters (\\s). It's length must be at least 2 characters ({2,}), and it must contain at least 1 letter(QChar::isLetter()).
Examples:
#c, #1, #123456, #123 456, #123_456 are incorrect
#cc, #text, #text123, #123text are correct
I use QRegExp.
QRegExp rx("#(\\S+[A-Za-z]\\S*|\\S*[A-Za-z]\\S+)$");
bool result = (rx.indexIn(str) == 0);
rx either finds a non-whitespace followed by a letter and by an unspecified number of non-whitespace characters, or a letter followed by at least non-whitespace.
Styne666 gave the right regex.
Here is a little Perl script which is trying to match its first argument with this regex:
#!/usr/bin/env perl
use strict;
use warnings;
my $arg = shift;
if ($arg =~ m/(#(?=\d*[a-zA-Z])[a-zA-Z\d]{2,})/) {
print "$1 MATCHES THE PATTERN!\n";
} else {
print "NO MATCH\n";
}
Perl is always great to quickly test your regular expressions.
Now, your question is a bit different. You want to find all the substrings in your text string,
and you want to do it in C++/Qt. Here is what I could come up with in couple of minutes:
#include <QtCore/QCoreApplication>
#include <QRegExp>
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
QString str = argv[1];
QRegExp rx("[\\s]?(\\#(?=\\d*[a-zA-Z])[a-zA-Z\\d]{2,})\\b");
int pos = 0;
while ((pos = rx.indexIn(str, pos)) != -1)
{
QString token = rx.cap(1);
cout << token.toStdString().c_str() << endl;
pos += rx.matchedLength();
}
return 0;
}
To make my test I feed it an input like this (making a long string just one command line argument):
peter#ubuntu01$ qt-regexp "#hjhj 4324 fdsafdsa #33e #22"
And it matches only two words: #hjhj and #33e.
Hope it helps.
The shortest I could come up with (which should work, but I haven't tested extensively) is:
QRegExp("^#(?=[0-9]*[A-Za-z])[A-Za-z0-9]{2,}$");
Which matches:
^ the start of the string
# a literal hash character
(?= then look ahead (but don't match)
[0-9]* zero or more latin numbers
[A-Za-z] a single upper- or lower-case latin letter
)
[A-Za-z0-9]{2,} then match at least two characters which may be upper- or lower-case latin letters or latin numbers
$ then find and consume the end of the line
Technically speaking though this is still wrong. It only matches latin letters and numbers. Replacing a few bits gives you:
QRegExp("^#(?=\\d*[^\\d\\s])\\w{2,}$");
This should work for non-latin letters and numbers but this is totally untested. Have a quick read of the QRegExp class reference for an explanation of each escaped group.
And then to match within larger strings of text (again, untested):
QRegExp("\b#(?=\\d*[^\\d\\s])\\w{2,}\b");
A useful tool is the Regular Expressions Example which comes with the SDK.
use this regular expression. hope fully your problem will solve with given RE.
^([#(a-zA-Z)]+[(a-zA-Z0-9)]+)*(#[0-9]+[(a-zA-Z)]+[(a-zA-Z0-9)]*)*$

Regular expression match decimal with letters

I have following string 3.14, 123.56f, .123e5f, 123D, 1234, 343E12, 32.
What I want to do is match any combination of above inputs. So far I started with the following:
^[0-9]\d*(\.\d+)
I realize I have to escape the . since its a regular expression itself.
Thanks.
This should also work, if not already proposed.
try {
Pattern regex = Pattern.compile("\\.?\\b[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?[fD]?\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Probably
^(\d+(\.\d+)?|\.\d+)([eE]\d+)?[fD]?$
http://regexr.com?2ut9t
^ start of the string
(\d+(\.\d+)?|\.\d+) one or more digits with an optional ( . and one or more digits)
or
. and one or more digits
([eE]\d+)? an optional ( e or E and one or more digits)
[fD]? an optional f or D
$ end of the string
As a sidenote, I've made the D compatible with everything but the f.
If you need positive and negative sign, add [+-]? after the ^
This will match all of those:
[0-9.]+(?:[Ee][0-9.]*)?[DdFf]?
Note that within a character class (square brackets), dot . is not a special character and should not be escaped.
Maybe that one ?
^\d*(?:\.\d+)?(?:[eE]\d+)?(?:[fD])?$
with
^\d* #possibly a digit or sequence of digits at the start
(?:\.\d+)? #possibly followed by a dot and at least one digit
(?:[eE]\d+)? #possibly a 'e' or 'E' followed by at least one digit
(?:[fD])?$ #optionnaly followed by 'f' or 'D' letters until the end
You can use regexpal to test it out, but this seems to work on all of those examples:
^\d*\.?(\d*[eE]?\d*)[fD]?$