makeValidWord(std::string word) not working properly - c++

I'm programming a hash table thing in C++, but this specific piece of code will not run properly. It should return a string of alpha characters and ' and -, but I get cases like "t" instead of "art" when I try to input "'aRT-*".
isWordChar() return a bool value depending on whether the input is a valid word character or not using isAlpha()
// Words cannot contain any digits, or special characters EXCEPT for
// hyphens (-) and apostrophes (') that occur in the middle of a
// valid word (the first and last characters of a word must be an alpha
// character). All upper case characters in the word should be convertd
// to lower case.
// For example, "can't" and "good-hearted" are considered valid words.
// "12mOnkEYs-$" will be converted to "monkeys".
// "Pa55ive" will be stripped "paive".
std::string WordCount::makeValidWord(std::string word) {
if (word.size() == 0) {
return word;
}
string r = "";
string in = "";
size_t incr = 0;
size_t decr = word.size() - 1;
while (incr < word.size() && !isWordChar(word.at(incr))) {
incr++;
}
while (0 < decr && !isWordChar(word.at(decr))) {
decr--;
}
if (incr > decr) {
return r;
}
while (incr <= decr) {
if (isWordChar(word.at(incr)) || word.at(incr) == '-' || word.at(incr) == '\'') {
in =+ word.at(incr);
}
incr++;
}
for (size_t i = 0; i < in.size(); i++) {
r += tolower(in.at(i));
}
return r;
}

Assuming you can use standard algorithms its better to rewrite your function using them. This achieves 2 goals:
code is more readable, since using algorithms shows intent along with code itself
there is less chance to make error
So it should be something like this:
std::string WordCount::makeValidWord(std::string word) {
auto first = std::find_if(word.cbegin(), word.cend(), isWordChar);
auto last = std::find_if(word.crbegin(), word.crend(), isWordChar);
std::string i;
std::copy_if(first, std::next(last), std::back_inserter(i), [](char c) {
return isWordChar(c) || c == '-' || c == '\'';
});
std::string r;
std::transform(i.cbegin(), i.cend(), std::back_inserter(r), std::tolower);
return r;
}

I am going to echo #Someprogrammerdude and say: Learn to use a debugger!
I pasted your code into Visual Studio (changed isWordChar() to isalpha()), and stepped it through with the debugger. Then it was pretty trivial to notice this happening:
First loop of while (incr <= decr) {:
Second loop:
Ooh, look at that; the variable in does not update correctly - instead of collecting a string of the correct characters it only holds the last one. How can that be?
in =+ word.at(incr); Hey, that is not right, that operator should be +=.
Many errors are that easy and effortless to find and correct if you use a debugger. Pick one up today. :)

Related

Please help me write this Function on how to see if a given string is an float [duplicate]

Does anybody know of a convenient means of determining if a string value "qualifies" as a floating-point number?
bool IsFloat( string MyString )
{
... etc ...
return ... // true if float; false otherwise
}
If you can't use a Boost library function, you can write your own isFloat function like this.
#include <string>
#include <sstream>
bool isFloat( string myString ) {
std::istringstream iss(myString);
float f;
iss >> noskipws >> f; // noskipws considers leading whitespace invalid
// Check the entire string was consumed and if either failbit or badbit is set
return iss.eof() && !iss.fail();
}
You may like Boost's lexical_cast (see http://www.boost.org/doc/libs/1_37_0/libs/conversion/lexical_cast.htm).
bool isFloat(const std::string &someString)
{
using boost::lexical_cast;
using boost::bad_lexical_cast;
try
{
boost::lexical_cast<float>(someString);
}
catch (bad_lexical_cast &)
{
return false;
}
return true;
}
You can use istream to avoid needing Boost, but frankly, Boost is just too good to leave out.
Inspired by this answer I modified the function to check if a string is a floating point number. It won't require boost & doesn't relies on stringstreams failbit - it's just plain parsing.
static bool isFloatNumber(const std::string& string){
std::string::const_iterator it = string.begin();
bool decimalPoint = false;
int minSize = 0;
if(string.size()>0 && (string[0] == '-' || string[0] == '+')){
it++;
minSize++;
}
while(it != string.end()){
if(*it == '.'){
if(!decimalPoint) decimalPoint = true;
else break;
}else if(!std::isdigit(*it) && ((*it!='f') || it+1 != string.end() || !decimalPoint)){
break;
}
++it;
}
return string.size()>minSize && it == string.end();
}
I.e.
1
2.
3.10000
4.2f
-5.3f
+6.2f
is recognized by this function correctly as float.
1.0.0
2f
2.0f1
Are examples for not-valid floats. If you don't want to recognize floating point numbers in the format X.XXf, just remove the condition:
&& ((*it!='f') || it+1 != string.end() || !decimalPoint)
from line 9.
And if you don't want to recognize numbers without '.' as float (i.e. not '1', only '1.', '1.0', '1.0f'...) then you can change the last line to:
return string.size()>minSize && it == string.end() && decimalPoint;
However: There are plenty good reasons to use either boost's lexical_cast or the solution using stringstreams rather than this 'ugly function'. But It gives me more control over what kind of formats exactly I want to recognize as floating point numbers (i.e. maximum digits after decimal point...).
I recently wrote a function to check whether a string is a number or not. This number can be an Integer or Float.
You can twist my code and add some unit tests.
bool isNumber(string s)
{
std::size_t char_pos(0);
// skip the whilespaces
char_pos = s.find_first_not_of(' ');
if (char_pos == s.size()) return false;
// check the significand
if (s[char_pos] == '+' || s[char_pos] == '-') ++char_pos; // skip the sign if exist
int n_nm, n_pt;
for (n_nm = 0, n_pt = 0; std::isdigit(s[char_pos]) || s[char_pos] == '.'; ++char_pos) {
s[char_pos] == '.' ? ++n_pt : ++n_nm;
}
if (n_pt>1 || n_nm<1) // no more than one point, at least one digit
return false;
// skip the trailing whitespaces
while (s[char_pos] == ' ') {
++ char_pos;
}
return char_pos == s.size(); // must reach the ending 0 of the string
}
void UnitTest() {
double num = std::stod("825FB7FC8CAF4342");
string num_str = std::to_string(num);
// Not number
assert(!isNumber("1a23"));
assert(!isNumber("3.7.1"));
assert(!isNumber("825FB7FC8CAF4342"));
assert(!isNumber(" + 23.24"));
assert(!isNumber(" - 23.24"));
// Is number
assert(isNumber("123"));
assert(isNumber("3.7"));
assert(isNumber("+23.7"));
assert(isNumber(" -423.789"));
assert(isNumber(" -423.789 "));
}
Quick and dirty solution using std::stof:
bool isFloat(const std::string& s) {
try {
std::stof(s);
return true;
} catch(...) {
return false;
}
}
I'd imagine you'd want to run a regex match on the input string. I'd think it may be fairly complicated to test all the edge cases.
This site has some good info on it. If you just want to skip to the end it says:
^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$
Which basically makes sense if you understand regex syntax.
[EDIT: Fixed to forbid initial whitespace and trailing nonsense.]
#include <sstream>
bool isFloat(string s) {
istringstream iss(s);
float dummy;
iss >> noskipws >> dummy;
return iss && iss.eof(); // Result converted to bool
}
You could easily turn this into a function templated on a type T instead of float. This is essentially what Boost's lexical_cast does.
I always liked strtof since it lets you specify an end pointer.
bool isFloat(const std::string& str)
{
char* ptr;
strtof(str.c_str(), &ptr);
return (*ptr) == '\0';
}
This works because the end pointer points to the character where the parse started to fail, therefore if it points to a nul-terminator, then the whole string was parsed as a float.
I'm surprised no one mentioned this method in the 10 years this question has been around, I suppose because it is more of a C-Style way of doing it. However, it is still perfectly valid in C++, and more elegant than any stream solutions. Also, it works with "+inf" "-inf" and so on, and ignores leading whitespace.
EDIT
Don't be caught out by empty strings, otherwise the end pointer will be on the nul-termination (and therefore return true). The above code should be:
bool isFloat(const std::string& str)
{
if (str.empty())
return false;
char* ptr;
strtof(str.c_str(), &ptr);
return (*ptr) == '\0';
}
With C++17:
bool isNumeric(std::string_view s)
{
double val;
auto [p, ec] = std::from_chars(s.data(), s.data() + s.size(), val);
return ec == std::errc() && p == s.data() + s.size();
}
Both checks on return are necessary. The first checks that there are no overflow or other errors. The second checks that the entire string was read.
You can use the methods described in How can I convert string to double in C++?, and instead of throwing a conversion_error, return false (indicating the string does not represent a float), and true otherwise.
The main issue with other responses is performance
Often you don't need every corner case, for example maybe nan and -/+ inf, are not as important to cover as having speed. Maybe you don't need to handle 1.0E+03 notation. You just want a fast way to parse strings to numbers.
Here is a simple, pure std::string way, that's not very fast:
size_t npos = word.find_first_not_of ( ".+-0123456789" );
if ( npos == std::string::npos ) {
val = atof ( word.c_str() );
}
This is slow because it is O(k*13), checking each char against 0 thur 9
Here is a faster way:
bool isNum = true;
int st = 0;
while (word.at(st)==32) st++; // leading spaces
ch = word.at(st);
if (ch == 43 || ch==45 ) st++; // check +, -
for (int n = st; n < word.length(); n++) {
char ch = word.at(n);
if ( ch < 48 || ch > 57 || ch != 46 ) {
isNum = false;
break; // not a num, early terminate
}
}
This has the benefit of terminating early if any non-numerical character is found, and it checks by range rather than every number digit (0-9). So the average compares is 3x per char, O(k*3), with early termination.
Notice this technique is very similar to the actual one used in the stdlib 'atof' function:
http://www.beedub.com/Sprite093/src/lib/c/stdlib/atof.c
You could use atof and then have special handling for 0.0, but I don't think that counts as a particularly good solution.
This is a common question on SO. Look at this question for suggestions (that question discusses string->int, but the approaches are the same).
Note: to know if the string can be converted, you basically have to do the conversion to check for things like over/underflow.
What you could do is use an istringstream and return true/false based on the result of the stream operation. Something like this (warning - I haven't even compiled the code, it's a guideline only):
float potential_float_value;
std::istringstream could_be_a_float(MyString)
could_be_a_float >> potential_float_value;
return could_be_a_float.fail() ? false : true;
it depends on the level of trust, you need and where the input data comes from.
If the data comes from a user, you have to be more careful, as compared to imported table data, where you already know that all items are either integers or floats and only thats what you need to differentiate.
For example, one of the fastest versions, would simply check for the presence of "." and "eE" in it. But then, you may want to look if the rest is being all digits. Skip whitespace at the beginning - but not in the middle, check for a single "." "eE" etc.
Thus, the q&d fast hack will probably lead to a more sophisticated regEx-like (either call it or scan it yourself) approach. But then, how do you know, that the result - although looking like a float - can really be represented in your machine (i.e. try 1.2345678901234567890e1234567890). Of course, you can make a regEx with "up-to-N" digits in the mantissa/exponent, but thats machine/OS/compiler or whatever specific, sometimes.
So, in the end, to be sure, you probably have to call for the underlying system's conversion and see what you get (exception, infinity or NAN).
I would be tempted to ignore leading whitespaces as that is what the atof function does also:
The function first discards as many
whitespace characters as necessary
until the first non-whitespace
character is found. Then, starting
from this character, takes as many
characters as possible that are valid
following a syntax resembling that of
floating point literals, and
interprets them as a numerical value.
The rest of the string after the last
valid character is ignored and has no
effect on the behavior of this
function.
So to match this we would:
bool isFloat(string s)
{
istringstream iss(s);
float dummy;
iss >> skipws >> dummy;
return (iss && iss.eof() ); // Result converted to bool
}
int isFloat(char *s){
if(*s == '-' || *s == '+'){
if(!isdigit(*++s)) return 0;
}
if(!isdigit(*s)){return 0;}
while(isdigit(*s)) s++;
if(*s == '.'){
if(!isdigit(*++s)) return 0;
}
while(isdigit(*s)) s++;
if(*s == 'e' || *s == 'E'){
s++;
if(*s == '+' || *s == '-'){
s++;
if(!isdigit(*s)) return 0;
}else if(!isdigit(*s)){
return 0;
}
}
while(isdigit(*s)) s++;
if(*s == '\0') return 1;
return 0;
}
I was looking for something similar, found a much simpler answer than any I've seen (Although is for floats VS. ints, would still require a typecast from string)
bool is_float(float val){
if(val != floor(val)){
return true;
}
else
return false;
}
or:
auto lambda_isFloat = [](float val) {return (val != floor(val)); };
Hope this helps !
ZMazz

Separator character in string c++

This is the requirement: Read a string and loop it, whenever a new word is encountered insert it into std::list. If the . character has a space, tab, newline or digit on the left and a digit on the right then it is treated as a decimal point and thus part of a word. Otherwise it is treated as a full stop and a word separator.
And this is the result I run from the template program:
foo.bar -> 2 words (foo, bar)
f5.5f -> 1 word
.4.5.6.5 -> 1 word
d.4.5f -> 3 words (d, 4, 5f)
.5.6..6.... -> 2 words (.5.6, 6)
It seems very complex for me in first time dealing with string c++. Im really stuck to implement the code. Could anyone suggest me a hint ? Thanks
I just did some scratch ideas
bool isDecimal(std::string &word) {
bool ok = false;
for (unsigned int i = 0; i < word.size(); i++) {
if (word[i] == '.') {
if ((std::isdigit(word[(int)i - 1]) ||
std::isspace(word[(int)i -1]) ||
(int)(i - 1) == (int)(word.size() - 1)) && std::isdigit(word[i + 1]))
ok = true;
else {
ok = false;
break;
}
}
}
return ok;
}
void checkDecimal(std::string &word) {
if (!isDecimal(word)) {
std::string temp = word;
word.clear();
for (unsigned int i = 0; i < temp.size(); i++) {
if (temp[i] != '.')
word += temp[i];
else {
if (std::isalpha(temp[i + 1]) || std::isdigit(temp[i + 1]))
word += ' ';
}
}
}
trimLeft(word);
}
I think you may be approaching the problem from the wrong direction. It seems much easier if you turn the condition upside down. To give you some pointers in a pseudocode skeleton:
bool isSeparator(const std::string& string, size_t position)
{
// Determine whether the character at <position> in <string> is a word separator
}
void tokenizeString(const std::string& string, std::list& wordList)
{
// for every character in string
// if(isSeparator(character) || end of string)
// list.push_back(substring from last separator to this one)
}
I suggest to implement it using flex and bison with c++ implementation

Loop quitting for no reason

I have a question regarding C++. This is my current function:
string clarifyWord(string str) {
//Remove all spaces before string
unsigned long i = 0;
int currentASCII = 0;
while (i < str.length()) {
currentASCII = int(str[i]);
if (currentASCII == 32) {
str.erase(i);
i++;
continue;
} else {
break;
}
}
//Remove all spaces after string
i = str.length();
while (i > -1) {
currentASCII = int(str[i]);
if (currentASCII == 32) {
str.erase(i);
i--;
continue;
} else {
break;
}
}
return str;
}
Just to get the basic and obvious things out of the way, I have #include <string> and using namespace std; so I do have access to the string functions.
The thing is though that the loop is quitting and sometimes skipping the second loop. I am passing in the str to be " Cheese " and it should remove all the spaces before the string and after the string.
In the main function, I am also assigning a variable to clarifyWord(str) where str is above. It doesn't seem to print that out either using cout << str;.
Is there something I am missing with printing out strings or looping with strings? Also ASCII code 32 is Space.
Okay so the erase function you are calling looks like this:
string& erase ( size_t pos = 0, size_t n = npos );
The n parameter is the number of items to delete. The npos means, delete everything up until the end of the string, so set the second parameter to 1.
str.erase(i,1)
[EDIT]
You could change the first loop to this:
while (str.length() > 0 && str[0] == ' ')
{
str.erase(0,1);
}
and the second loop to this:
while (str.length() > 0 && str[str.length() - 1] == ' ')
{
str.erase(str.length() - 1, 1);
}
In your second loop, you can't initialize i to str.length().
str[str.length()] is going to be after the end of your string, and so is unlikely to be a space (thus triggering the break out of the second loop).
You're using erase (modifying the string) while you're in a loop checking its size. This is a dangerous way of processing the string. As you return a new string, I would recommend you first to search for the first occurrence in the string of the non-space character, and then the last one, and then returning a substring. Something along the lines of (not tested):
size_t init = str.find_first_not_of(' ');
if (init == std::string::npos)
return "";
size_t fini = std.find_last_not_of(' ');
return str.substr(init, fini - init + 1);
You see, no loops, erases, etc.
unsigned long i ... while (i > -1) Well, that's not right, is it? How would you expect that to work? The compiler will in fact convert both operands to the same type: while (i > static_cast<unsigned long>(-1)). And that's just another way to write ULONG-MAX, i.e. while (i > ULONG_MAX). In other words, while(false).
You're using erase incorrectly. It'll erase from pos to npos.
i.e. string& erase ( size_t pos = 0, size_t n = npos );
See: http://www.cplusplus.com/reference/string/string/erase/
A better way to do this is to note the position of the first non space and where the spaces occur at the end of the string. Then use either substr or erase twice.
You also don't need to go to the trouble of doing this:
currentASCII = int(str[i]);
if (currentASCII == 32) {
Instead do this:
if (str[i] == ' ') {
Which I think you'll agree is a lot easier to read.
So, you can shorten it somewhat with something like: (not tested but it shouldn't be far
off)
string clarifyWord(string str) {
int start = 0, end = str.length();
while (str[start++] == ' ');
while (str[end--] == ' ');
return str.substr(start, end);
}

C++ Remove new line from multiline string

Whats the most efficient way of removing a 'newline' from a std::string?
#include <algorithm>
#include <string>
std::string str;
str.erase(std::remove(str.begin(), str.end(), '\n'), str.cend());
The behavior of std::remove may not quite be what you'd expect.
A call to remove is typically followed by a call to a container's erase method, which erases the unspecified values and reduces the physical size of the container to match its new logical size.
See an explanation of it here.
If the newline is expected to be at the end of the string, then:
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
If the string can contain many newlines anywhere in the string:
std::string::size_type i = 0;
while (i < s.length()) {
i = s.find('\n', i);
if (i == std::string:npos) {
break;
}
s.erase(i);
}
You should use the erase-remove idiom, looking for '\n'. This will work for any standard sequence container; not just string.
Here is one for DOS or Unix new line:
void chomp( string &s)
{
int pos;
if((pos=s.find('\n')) != string::npos)
s.erase(pos);
}
Slight modification on edW's solution to remove all exisiting endline chars
void chomp(string &s){
size_t pos;
while (((pos=s.find('\n')) != string::npos))
s.erase(pos,1);
}
Note that size_t is typed for pos, it is because npos is defined differently for different types, for example, -1 (unsigned int) and -1 (unsigned float) are not the same, due to the fact the max size of each type are different. Therefore, comparing int to size_t might return false even if their values are both -1.
s.erase(std::remove(s.begin(), s.end(), '\n'), s.end());
The code removes all newlines from the string str.
O(N) implementation best served without comments on SO and with comments in production.
unsigned shift=0;
for (unsigned i=0; i<length(str); ++i){
if (str[i] == '\n') {
++shift;
}else{
str[i-shift] = str[i];
}
}
str.resize(str.length() - shift);
std::string some_str = SOME_VAL;
if ( some_str.size() > 0 && some_str[some_str.length()-1] == '\n' )
some_str.resize( some_str.length()-1 );
or (removes several newlines at the end)
some_str.resize( some_str.find_last_not_of(L"\n")+1 );
Another way to do it in the for loop
void rm_nl(string &s) {
for (int p = s.find("\n"); p != (int) string::npos; p = s.find("\n"))
s.erase(p,1);
}
Usage:
string data = "\naaa\nbbb\nccc\nddd\n";
rm_nl(data);
cout << data; // data = aaabbbcccddd
All these answers seem a bit heavy to me.
If you just flat out remove the '\n' and move everything else back a spot, you are liable to have some characters slammed together in a weird-looking way. So why not just do the simple (and most efficient) thing: Replace all '\n's with spaces?
for (int i = 0; i < str.length();i++) {
if (str[i] == '\n') {
str[i] = ' ';
}
}
There may be ways to improve the speed of this at the edges, but it will be way quicker than moving whole chunks of the string around in memory.
If its anywhere in the string than you can't do better than O(n).
And the only way is to search for '\n' in the string and erase it.
for(int i=0;i<s.length();i++) if(s[i]=='\n') s.erase(s.begin()+i);
For more newlines than:
int n=0;
for(int i=0;i<s.length();i++){
if(s[i]=='\n'){
n++;//we increase the number of newlines we have found so far
}else{
s[i-n]=s[i];
}
}
s.resize(s.length()-n);//to delete only once the last n elements witch are now newlines
It erases all the newlines once.
About answer 3 removing only the last \n off string code :
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
Will the if condition not fail if the string is really empty ?
Is it not better to do :
if (!s.empty())
{
if (s[s.length()-1] == '\n')
s.erase(s.length()-1);
}
To extend #Greg Hewgill's answer for C++11:
If you just need to delete a newline at the very end of the string:
This in C++98:
if (!s.empty() && s[s.length()-1] == '\n') {
s.erase(s.length()-1);
}
...can now be done like this in C++11:
if (!s.empty() && s.back() == '\n') {
s.pop_back();
}
Optionally, wrap it up in a function. Note that I pass it by ptr here simply so that when you take its address as you pass it to the function, it reminds you that the string will be modified in place inside the function.
void remove_trailing_newline(std::string* str)
{
if (str->empty())
{
return;
}
if (str->back() == '\n')
{
str->pop_back();
}
}
// usage
std::string str = "some string\n";
remove_trailing_newline(&str);
Whats the most efficient way of removing a 'newline' from a std::string?
As far as the most efficient way goes--that I'd have to speed test/profile and see. I'll see if I can get back to you on that and run some speed tests between the top two answers here, and a C-style way like I did here: Removing elements from array in C. I'll use my nanos() timestamp function for speed testing.
Other References:
See these "new" C++11 functions in this reference wiki here: https://en.cppreference.com/w/cpp/string/basic_string
https://en.cppreference.com/w/cpp/string/basic_string/empty
https://en.cppreference.com/w/cpp/string/basic_string/back
https://en.cppreference.com/w/cpp/string/basic_string/pop_back

C++ IsFloat function

Does anybody know of a convenient means of determining if a string value "qualifies" as a floating-point number?
bool IsFloat( string MyString )
{
... etc ...
return ... // true if float; false otherwise
}
If you can't use a Boost library function, you can write your own isFloat function like this.
#include <string>
#include <sstream>
bool isFloat( string myString ) {
std::istringstream iss(myString);
float f;
iss >> noskipws >> f; // noskipws considers leading whitespace invalid
// Check the entire string was consumed and if either failbit or badbit is set
return iss.eof() && !iss.fail();
}
You may like Boost's lexical_cast (see http://www.boost.org/doc/libs/1_37_0/libs/conversion/lexical_cast.htm).
bool isFloat(const std::string &someString)
{
using boost::lexical_cast;
using boost::bad_lexical_cast;
try
{
boost::lexical_cast<float>(someString);
}
catch (bad_lexical_cast &)
{
return false;
}
return true;
}
You can use istream to avoid needing Boost, but frankly, Boost is just too good to leave out.
Inspired by this answer I modified the function to check if a string is a floating point number. It won't require boost & doesn't relies on stringstreams failbit - it's just plain parsing.
static bool isFloatNumber(const std::string& string){
std::string::const_iterator it = string.begin();
bool decimalPoint = false;
int minSize = 0;
if(string.size()>0 && (string[0] == '-' || string[0] == '+')){
it++;
minSize++;
}
while(it != string.end()){
if(*it == '.'){
if(!decimalPoint) decimalPoint = true;
else break;
}else if(!std::isdigit(*it) && ((*it!='f') || it+1 != string.end() || !decimalPoint)){
break;
}
++it;
}
return string.size()>minSize && it == string.end();
}
I.e.
1
2.
3.10000
4.2f
-5.3f
+6.2f
is recognized by this function correctly as float.
1.0.0
2f
2.0f1
Are examples for not-valid floats. If you don't want to recognize floating point numbers in the format X.XXf, just remove the condition:
&& ((*it!='f') || it+1 != string.end() || !decimalPoint)
from line 9.
And if you don't want to recognize numbers without '.' as float (i.e. not '1', only '1.', '1.0', '1.0f'...) then you can change the last line to:
return string.size()>minSize && it == string.end() && decimalPoint;
However: There are plenty good reasons to use either boost's lexical_cast or the solution using stringstreams rather than this 'ugly function'. But It gives me more control over what kind of formats exactly I want to recognize as floating point numbers (i.e. maximum digits after decimal point...).
I recently wrote a function to check whether a string is a number or not. This number can be an Integer or Float.
You can twist my code and add some unit tests.
bool isNumber(string s)
{
std::size_t char_pos(0);
// skip the whilespaces
char_pos = s.find_first_not_of(' ');
if (char_pos == s.size()) return false;
// check the significand
if (s[char_pos] == '+' || s[char_pos] == '-') ++char_pos; // skip the sign if exist
int n_nm, n_pt;
for (n_nm = 0, n_pt = 0; std::isdigit(s[char_pos]) || s[char_pos] == '.'; ++char_pos) {
s[char_pos] == '.' ? ++n_pt : ++n_nm;
}
if (n_pt>1 || n_nm<1) // no more than one point, at least one digit
return false;
// skip the trailing whitespaces
while (s[char_pos] == ' ') {
++ char_pos;
}
return char_pos == s.size(); // must reach the ending 0 of the string
}
void UnitTest() {
double num = std::stod("825FB7FC8CAF4342");
string num_str = std::to_string(num);
// Not number
assert(!isNumber("1a23"));
assert(!isNumber("3.7.1"));
assert(!isNumber("825FB7FC8CAF4342"));
assert(!isNumber(" + 23.24"));
assert(!isNumber(" - 23.24"));
// Is number
assert(isNumber("123"));
assert(isNumber("3.7"));
assert(isNumber("+23.7"));
assert(isNumber(" -423.789"));
assert(isNumber(" -423.789 "));
}
Quick and dirty solution using std::stof:
bool isFloat(const std::string& s) {
try {
std::stof(s);
return true;
} catch(...) {
return false;
}
}
I'd imagine you'd want to run a regex match on the input string. I'd think it may be fairly complicated to test all the edge cases.
This site has some good info on it. If you just want to skip to the end it says:
^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$
Which basically makes sense if you understand regex syntax.
[EDIT: Fixed to forbid initial whitespace and trailing nonsense.]
#include <sstream>
bool isFloat(string s) {
istringstream iss(s);
float dummy;
iss >> noskipws >> dummy;
return iss && iss.eof(); // Result converted to bool
}
You could easily turn this into a function templated on a type T instead of float. This is essentially what Boost's lexical_cast does.
I always liked strtof since it lets you specify an end pointer.
bool isFloat(const std::string& str)
{
char* ptr;
strtof(str.c_str(), &ptr);
return (*ptr) == '\0';
}
This works because the end pointer points to the character where the parse started to fail, therefore if it points to a nul-terminator, then the whole string was parsed as a float.
I'm surprised no one mentioned this method in the 10 years this question has been around, I suppose because it is more of a C-Style way of doing it. However, it is still perfectly valid in C++, and more elegant than any stream solutions. Also, it works with "+inf" "-inf" and so on, and ignores leading whitespace.
EDIT
Don't be caught out by empty strings, otherwise the end pointer will be on the nul-termination (and therefore return true). The above code should be:
bool isFloat(const std::string& str)
{
if (str.empty())
return false;
char* ptr;
strtof(str.c_str(), &ptr);
return (*ptr) == '\0';
}
With C++17:
bool isNumeric(std::string_view s)
{
double val;
auto [p, ec] = std::from_chars(s.data(), s.data() + s.size(), val);
return ec == std::errc() && p == s.data() + s.size();
}
Both checks on return are necessary. The first checks that there are no overflow or other errors. The second checks that the entire string was read.
You can use the methods described in How can I convert string to double in C++?, and instead of throwing a conversion_error, return false (indicating the string does not represent a float), and true otherwise.
The main issue with other responses is performance
Often you don't need every corner case, for example maybe nan and -/+ inf, are not as important to cover as having speed. Maybe you don't need to handle 1.0E+03 notation. You just want a fast way to parse strings to numbers.
Here is a simple, pure std::string way, that's not very fast:
size_t npos = word.find_first_not_of ( ".+-0123456789" );
if ( npos == std::string::npos ) {
val = atof ( word.c_str() );
}
This is slow because it is O(k*13), checking each char against 0 thur 9
Here is a faster way:
bool isNum = true;
int st = 0;
while (word.at(st)==32) st++; // leading spaces
ch = word.at(st);
if (ch == 43 || ch==45 ) st++; // check +, -
for (int n = st; n < word.length(); n++) {
char ch = word.at(n);
if ( ch < 48 || ch > 57 || ch != 46 ) {
isNum = false;
break; // not a num, early terminate
}
}
This has the benefit of terminating early if any non-numerical character is found, and it checks by range rather than every number digit (0-9). So the average compares is 3x per char, O(k*3), with early termination.
Notice this technique is very similar to the actual one used in the stdlib 'atof' function:
http://www.beedub.com/Sprite093/src/lib/c/stdlib/atof.c
You could use atof and then have special handling for 0.0, but I don't think that counts as a particularly good solution.
This is a common question on SO. Look at this question for suggestions (that question discusses string->int, but the approaches are the same).
Note: to know if the string can be converted, you basically have to do the conversion to check for things like over/underflow.
What you could do is use an istringstream and return true/false based on the result of the stream operation. Something like this (warning - I haven't even compiled the code, it's a guideline only):
float potential_float_value;
std::istringstream could_be_a_float(MyString)
could_be_a_float >> potential_float_value;
return could_be_a_float.fail() ? false : true;
it depends on the level of trust, you need and where the input data comes from.
If the data comes from a user, you have to be more careful, as compared to imported table data, where you already know that all items are either integers or floats and only thats what you need to differentiate.
For example, one of the fastest versions, would simply check for the presence of "." and "eE" in it. But then, you may want to look if the rest is being all digits. Skip whitespace at the beginning - but not in the middle, check for a single "." "eE" etc.
Thus, the q&d fast hack will probably lead to a more sophisticated regEx-like (either call it or scan it yourself) approach. But then, how do you know, that the result - although looking like a float - can really be represented in your machine (i.e. try 1.2345678901234567890e1234567890). Of course, you can make a regEx with "up-to-N" digits in the mantissa/exponent, but thats machine/OS/compiler or whatever specific, sometimes.
So, in the end, to be sure, you probably have to call for the underlying system's conversion and see what you get (exception, infinity or NAN).
I would be tempted to ignore leading whitespaces as that is what the atof function does also:
The function first discards as many
whitespace characters as necessary
until the first non-whitespace
character is found. Then, starting
from this character, takes as many
characters as possible that are valid
following a syntax resembling that of
floating point literals, and
interprets them as a numerical value.
The rest of the string after the last
valid character is ignored and has no
effect on the behavior of this
function.
So to match this we would:
bool isFloat(string s)
{
istringstream iss(s);
float dummy;
iss >> skipws >> dummy;
return (iss && iss.eof() ); // Result converted to bool
}
int isFloat(char *s){
if(*s == '-' || *s == '+'){
if(!isdigit(*++s)) return 0;
}
if(!isdigit(*s)){return 0;}
while(isdigit(*s)) s++;
if(*s == '.'){
if(!isdigit(*++s)) return 0;
}
while(isdigit(*s)) s++;
if(*s == 'e' || *s == 'E'){
s++;
if(*s == '+' || *s == '-'){
s++;
if(!isdigit(*s)) return 0;
}else if(!isdigit(*s)){
return 0;
}
}
while(isdigit(*s)) s++;
if(*s == '\0') return 1;
return 0;
}
I was looking for something similar, found a much simpler answer than any I've seen (Although is for floats VS. ints, would still require a typecast from string)
bool is_float(float val){
if(val != floor(val)){
return true;
}
else
return false;
}
or:
auto lambda_isFloat = [](float val) {return (val != floor(val)); };
Hope this helps !
ZMazz