I need to:
1) Find what is the maximum unsigned int value on my current system. I didn't find it on limits.h. Is it safe to write unsigned int maxUnsInt = 0 - 1;? I also tried unsigned int maxUnsInt = MAX_INT * 2 + 1 that returns the correct value but the compiler shows a warning about int overflow operation.
2) Once found, check if a C++ string (that I know it is composed only by digits) exceeded the maximum unsigned int value on my system.
My final objective is to convert the string to a unsigned int using atoi if and only if it is a valid unsigned int. I would prefer to use only the standard library.
There should be a #define UINT_MAX in <limits.h>; I'd be
very surprised if there wasn't. Otherwise, it's guaranteed
that:
unsigned int u = -1;
will result in the maximum value. In C++, you can also use
std::numeric_limits<unsigned int>::max(), but until C++11,
that wasn't an integral constant expression (which may or may
not be a problem).
unsigned int u = 2 * MAX_INT + 1;
is not guaranteed to be anything (on at least one system,
MAX_INT == UMAX_INT).
With regards to checking a string, the simplest solution would
be to use strtoul, then verify errno and the return value:
bool
isLegalUInt( std::string const& input )
{
char const* end;
errno = 0;
unsigned long v = strtoul( input.c_str(), &end, 10 );
return errno == 0 && *end == '\0' && end != input.c_str() && v <= UINT_MAX;
}
If you're using C++11, you could also use std::stoul, which
throws an std::out_of_range exception in case of overflow.
numeric_limits has limits for various numeric types:
unsigned int maxUnsInt = std::numeric_limits<unsigned int>::max();
stringstream can read a string into any type that supports operator>> and tell you whether it failed:
std::stringstream ss("1234567890123456789012345678901234567890");
unsigned int value;
ss >> value;
bool successful = !ss.fail();
According to this you do not need to calculate it, just use appropriate constant, which it this case should be UINT_MAX.
Few notes.
This seems more of a c way in contrast to c++ but since you say you want to use atol I stick with it. c++ would be using numeric_limits as Joachim suggested. However the c++ standard also defines the c-like macros/definitions, so it should be safe to use.
Also if you want it to be c++-way, it would probably be preferred to use stringstream (which is a part of standard c++ library) for conversion.
Lastly I deliberately don't post explicit code solution, 'cause it looks like homework, and you should be good to go from here now.
Related
I'm just wondering should I use std::size_t for loops and stuff instead of int?
For instance:
#include <cstdint>
int main()
{
for (std::size_t i = 0; i < 10; ++i) {
// std::size_t OK here? Or should I use, say, unsigned int instead?
}
}
In general, what is the best practice regarding when to use std::size_t?
A good rule of thumb is for anything that you need to compare in the loop condition against something that is naturally a std::size_t itself.
std::size_t is the type of any sizeof expression and as is guaranteed to be able to express the maximum size of any object (including any array) in C++. By extension it is also guaranteed to be big enough for any array index so it is a natural type for a loop by index over an array.
If you are just counting up to a number then it may be more natural to use either the type of the variable that holds that number or an int or unsigned int (if large enough) as these should be a natural size for the machine.
size_t is the result type of the sizeof operator.
Use size_t for variables that model size or index in an array. size_t conveys semantics: you immediately know it represents a size in bytes or an index, rather than just another integer.
Also, using size_t to represent a size in bytes helps making the code portable.
The size_t type is meant to specify the size of something so it's natural to use it, for example, getting the length of a string and then processing each character:
for (size_t i = 0, max = strlen (str); i < max; i++)
doSomethingWith (str[i]);
You do have to watch out for boundary conditions of course, since it's an unsigned type. The boundary at the top end is not usually that important since the maximum is usually large (though it is possible to get there). Most people just use an int for that sort of thing because they rarely have structures or arrays that get big enough to exceed the capacity of that int.
But watch out for things like:
for (size_t i = strlen (str) - 1; i >= 0; i--)
which will cause an infinite loop due to the wrapping behaviour of unsigned values (although I've seen compilers warn against this). This can also be alleviated by the (slightly harder to understand but at least immune to wrapping problems):
for (size_t i = strlen (str); i-- > 0; )
By shifting the decrement into a post-check side-effect of the continuation condition, this does the check for continuation on the value before decrement, but still uses the decremented value inside the loop (which is why the loop runs from len .. 1 rather than len-1 .. 0).
By definition, size_t is the result of the sizeof operator. size_t was created to refer to sizes.
The number of times you do something (10, in your example) is not about sizes, so why use size_t? int, or unsigned int, should be ok.
Of course it is also relevant what you do with i inside the loop. If you pass it to a function which takes an unsigned int, for example, pick unsigned int.
In any case, I recommend to avoid implicit type conversions. Make all type conversions explicit.
short answer:
Almost never. Use signed version ptrdiff_t or non-standard ssize_t. Use function std::ssize instead of std::size.
long answer:
Whenever you need to have a vector of char bigger that 2gb on a 32 bit system. In every other use case, using a signed type is much safer than using an unsigned type.
example:
std::vector<A> data;
[...]
// calculate the index that should be used;
size_t i = calc_index(param1, param2);
// doing calculations close to the underflow of an integer is already dangerous
// do some bounds checking
if( i - 1 < 0 ) {
// always false, because 0-1 on unsigned creates an underflow
return LEFT_BORDER;
} else if( i >= data.size() - 1 ) {
// if i already had an underflow, this becomes true
return RIGHT_BORDER;
}
// now you have a bug that is very hard to track, because you never
// get an exception or anything anymore, to detect that you actually
// return the false border case.
return calc_something(data[i-1], data[i], data[i+1]);
The signed equivalent of size_t is ptrdiff_t, not int. But using int is still much better in most cases than size_t. ptrdiff_t is long on 32 and 64 bit systems.
This means that you always have to convert to and from size_t whenever you interact with a std::containers, which not very beautiful. But on a going native conference the authors of c++ mentioned that designing std::vector with an unsigned size_t was a mistake.
If your compiler gives you warnings on implicit conversions from ptrdiff_t to size_t, you can make it explicit with constructor syntax:
calc_something(data[size_t(i-1)], data[size_t(i)], data[size_t(i+1)]);
if just want to iterate a collection, without bounds cheking, use range based for:
for(const auto& d : data) {
[...]
}
here some words from Bjarne Stroustrup (C++ author) at going native
For some people this signed/unsigned design error in the STL is reason enough, to not use the std::vector, but instead an own implementation.
size_t is a very readable way to specify the size dimension of an item - length of a string, amount of bytes a pointer takes, etc.
It's also portable across platforms - you'll find that 64bit and 32bit both behave nicely with system functions and size_t - something that unsigned int might not do (e.g. when should you use unsigned long
Use std::size_t for indexing/counting C-style arrays.
For STL containers, you'll have (for example) vector<int>::size_type, which should be used for indexing and counting vector elements.
In practice, they are usually both unsigned ints, but it isn't guaranteed, especially when using custom allocators.
Soon most computers will be 64-bit architectures with 64-bit OS:es running programs operating on containers of billions of elements. Then you must use size_t instead of int as loop index, otherwise your index will wrap around at the 2^32:th element, on both 32- and 64-bit systems.
Prepare for the future!
size_t is returned by various libraries to indicate that the size of that container is non-zero. You use it when you get once back :0
However, in the your example above looping on a size_t is a potential bug. Consider the following:
for (size_t i = thing.size(); i >= 0; --i) {
// this will never terminate because size_t is a typedef for
// unsigned int which can not be negative by definition
// therefore i will always be >= 0
printf("the never ending story. la la la la");
}
the use of unsigned integers has the potential to create these types of subtle issues. Therefore imho I prefer to use size_t only when I interact with containers/types that require it.
When using size_t be careful with the following expression
size_t i = containner.find("mytoken");
size_t x = 99;
if (i-x>-1 && i+x < containner.size()) {
cout << containner[i-x] << " " << containner[i+x] << endl;
}
You will get false in the if expression regardless of what value you have for x.
It took me several days to realize this (the code is so simple that I did not do unit test), although it only take a few minutes to figure the source of the problem. Not sure it is better to do a cast or use zero.
if ((int)(i-x) > -1 or (i-x) >= 0)
Both ways should work. Here is my test run
size_t i = 5;
cerr << "i-7=" << i-7 << " (int)(i-7)=" << (int)(i-7) << endl;
The output: i-7=18446744073709551614 (int)(i-7)=-2
I would like other's comments.
It is often better not to use size_t in a loop. For example,
vector<int> a = {1,2,3,4};
for (size_t i=0; i<a.size(); i++) {
std::cout << a[i] << std::endl;
}
size_t n = a.size();
for (size_t i=n-1; i>=0; i--) {
std::cout << a[i] << std::endl;
}
The first loop is ok. But for the second loop:
When i=0, the result of i-- will be ULLONG_MAX (assuming size_t = unsigned long long), which is not what you want in a loop.
Moreover, if a is empty then n=0 and n-1=ULLONG_MAX which is not good either.
size_t is an unsigned type that can hold maximum integer value for your architecture, so it is protected from integer overflows due to sign (signed int 0x7FFFFFFF incremented by 1 will give you -1) or short size (unsigned short int 0xFFFF incremented by 1 will give you 0).
It is mainly used in array indexing/loops/address arithmetic and so on. Functions like memset() and alike accept size_t only, because theoretically you may have a block of memory of size 2^32-1 (on 32bit platform).
For such simple loops don't bother and use just int.
I have been struggling myself with understanding what and when to use it. But size_t is just an unsigned integral data type which is defined in various header files such as <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <wchar.h> etc.
It is used to represent the size of objects in bytes hence it's used as the return type by the sizeof operator. The maximum permissible size is dependent on the compiler; if the compiler is 32 bit then it is simply a typedef (alias) for unsigned int but if the compiler is 64 bit then it would be a typedef for unsigned long long. The size_t data type is never negative(excluding ssize_t)
Therefore many C library functions like malloc, memcpy and strlen declare their arguments and return type as size_t.
/ Declaration of various standard library functions.
// Here argument of 'n' refers to maximum blocks that can be
// allocated which is guaranteed to be non-negative.
void *malloc(size_t n);
// While copying 'n' bytes from 's2' to 's1'
// n must be non-negative integer.
void *memcpy(void *s1, void const *s2, size_t n);
// the size of any string or `std::vector<char> st;` will always be at least 0.
size_t strlen(char const *s);
size_t or any unsigned type might be seen used as loop variable as loop variables are typically greater than or equal to 0.
size_t is an unsigned integral type, that can represent the largest integer on you system.
Only use it if you need very large arrays,matrices etc.
Some functions return an size_t and your compiler will warn you if you try to do comparisons.
Avoid that by using a the appropriate signed/unsigned datatype or simply typecast for a fast hack.
size_t is unsigned int. so whenever you want unsigned int you can use it.
I use it when i want to specify size of the array , counter ect...
void * operator new (size_t size); is a good use of it.
I am going trough the book "Accelerated C++" by Andrew Koenig and Barbara E. Moo and I have some questions about the main example in chap 2. The code can be summarized as below, and is compiling without warning/error with g++:
#include <string>
using std::string;
int main()
{
const string greeting = "Hello, world!";
// OK
const int pad = 1;
// KO
// int pad = 1;
// OK
// unsigned int pad = 1;
const string::size_type cols = greeting.size() + 2 + pad * 2;
string::size_type c = 0;
if (c == 1 + pad)
{;}
return 0;
}
However, if I replace const int pad = 1; by int pad = 1;, the g++ compiler will return a warning:
warning: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
if (c == 1 + pad)
If I replace const int pad = 1; by unsigned int pad = 1;, the g++ compiler will not return a warning.
I understand why g++ return the warning, but I am not sure about the three below points:
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
From the compiler point of view:
It is unsafe to compare signed and unsinged variables (non-constants).
It is safe to compare 2 unsinged variables of different sizes.
It is safe to compare an unsigned variable with a singed constant if the compiler can check that constant to be in the allowed range for the type of the signed variable (e.g. for 16-bit signed integer it is safe to use a constant in range [0..32767]).
So the answers to your questions:
Yes, it is safe to compare unsigned int and std::string::size_type.
There is no warning because the compiler can perform the safety check (while compiling :)).
There is no problem to use different unsigned types in comparison. Use unsinged int.
Comparing signed and unsigned values is "dangerous" in the sense that you may not get what you expect when the signed value is negative (it may well behave as a very large unsigned value, and thus a > b gives true when a = -1 and b = 100. (The use of const int works because the compiler knows the value isn't changing and thus can say "well, this value is always 1, so it works fine here")
As long as the value you want to compare fits in unsigned int (on typical machines, a little over 4 billion) is fine.
If you are using std::string with the default allocator (which is likely), then size_type is actually size_t.
[support.types]/6 defines that size_t is
an implementation-defined unsigned integer type that is large enough to contain the size
in bytes of any object.
So it's not technically guaranteed to be a unsigned int, but I believe it is defined this way in most cases.
Now regarding your second question: if you use const int something = 2, the compiler sees that this integer is a) never negative and b) never changes, so it's always safe to compare this variable with size_t. In some cases the compiler may optimize the variable out completely and simply replace all it's occurrences with 2.
I would say that it is better to use size_type everywhere where you are to the size of something, since it is more verbose.
What the compiler warns about is the comparison of unsigned and signed integer types. This is because the signed integer can be negative and the meaning is counter intuitive. This is because the signed is converted to unsigned before comparison, which means the negative number will compare greater than the positive.
Is it safe to use an unsigned int in order to compare with a std::string::size_type? The compiler does not return a warning in that case but I am not sure if it is safe.
Yes, they are both unsigned and then the semantics is what's expected. If their range differs the narrower are converted to a wider type.
Why is the compiler not giving a warning with the original code const int pad = 1. Is the compiler automatically converting the variable pad to an unsigned int?
This is because how the compiler is constructed. The compiler parses and to some extent optimizes the code before warnings are issued. The important point is that at the point this warning is being considered the compiler nows that the signed integer is 1 and then it's safe to compare with a unsigned integer.
I could also replace const int pad = 1; by string::size_type pad = 1;, but the meaning of the variable pad is not really linked to a string size in my opinion. Still, would this be the best approach in that case to avoid having different types in the comparison?
If you don't want it to be constant the best solution would probably be to make it at least an unsigned integer type. However you should be aware that there is no guaranteed relation between normal integer types and sizes, for example unsigned int may be narrower, wider or equal to size_t and size_type (the latter may also differ).
in c++, is it okay to compare an int to a char because of implicit type casting? Or am I misunderstanding the concept?
For example, can I do
int x = 68;
char y;
std::cin >> y;
//Assuming that the user inputs 'Z';
if(x < y)
{
cout << "Your input is larger than x";
}
Or do we need to first convert it to an int?
so
if(x < static_cast<int>(y))
{
cout << "Your input is larger than x";
}
The problem with both versions is that you cannot be sure about the value that results from negative/large values (the values that are negative if char is indeed a signed char). This is implementation defined, because the implementation defines whether char means signed char or unsigned char.
The only way to fix this problem is to cast to the appropriate signed/unsigned char type first:
if(x < (signed char)y)
or
if(x < (unsigned char)y)
Omitting this cast will result in implementation defined behavior.
Personally, I generally prefer use of uint8_t and int8_t when using chars as numbers, precisely because of this issue.
This still assumes that the value of the (un)signed char is within the range of possible int values on your platform. This may not be the case if sizeof(char) == sizeof(int) == 1 (possible only if a char is 16 bit!), and you are comparing signed and unsigned values.
To avoid this problem, ensure that you use either
signed x = ...;
if(x < (signed char)y)
or
unsigned x = ...;
if(x < (unsigned char)y)
Your compiler will hopefully complain with warning about mixed signed comparison if you fail to do so.
Your code will compile and work, for some definition of work.
Still you might get unexpected results, because y is a char, which means its signedness is implementation defined. That combined with unknown size of int will lead to much joy.
Also, please write the char literals you want, don't look at the ASCII table yourself. Any reader (you in 5 minutes) will be thankful.
Last point: Avoid gratuituous cast, they don't make anything better and may hide problems your compiler would normally warn about.
Yes you can compare an int to some char, like you can compare an int to some short, but it might be considered bad style. I would code
if (x < (int)y)
or like you did
if (x < static_cast<int>(y))
which I find a bit too verbose for that case....
BTW, if you intend to use bytes not as char consider also the int8_t type (etc...) from <cstdint>
Don't forget that on some systems, char are signed by default, on others they are unsigned (and you could explicit unsigned char vs signed char).
The code you suggest will compile, but I strongly recommend the static_cast version. Using static_cast you will help the reader understand what do you compare to an integer.
C++11 added some new string conversion functions:
http://en.cppreference.com/w/cpp/string/basic_string/stoul
It includes stoi (string to int), stol (string to long), stoll (string to long long), stoul (string to unsigned long), stoull (string to unsigned long long). Notable in its absence is a stou (string to unsigned) function. Is there some reason it is not needed but all of the others are?
related: No "sto{short, unsigned short}" functions in C++11?
The most pat answer would be that the C library has no corresponding “strtou”, and the C++11 string functions are all just thinly veiled wrappers around the C library functions: The std::sto* functions mirror strto*, and the std::to_string functions use sprintf.
Edit: As KennyTM points out, both stoi and stol use strtol as the underlying conversion function, but it is still mysterious why while there exists stoul that uses strtoul, there is no corresponding stou.
I've no idea why stoi exists but not stou, but the only difference between stoul and a hypothetical stou would be a check that the result is in the range of unsigned:
unsigned stou(std::string const & str, size_t * idx = 0, int base = 10) {
unsigned long result = std::stoul(str, idx, base);
if (result > std::numeric_limits<unsigned>::max()) {
throw std::out_of_range("stou");
}
return result;
}
(Likewise, stoi is also similar to stol, just with a different range check; but since it already exists, there's no need to worry about exactly how to implement it.)
unsigned long ulval = std::stoul(buf);
unsigned long mask = ~0xffffffffl;
unsigned int uival;
if( (ulval & mask) == 0 )
uival = (unsigned int)ulval;
else {
...range error...
}
Using masks to do this with the expected value size in bits expressed in the mask, will make this work for 64-bit longs vs 32-bit ints, but also for 32-bit longs vs 32-bit ints.
In the case of 64-bit longs, ~0xffffffffl will become 0xffffffff00000000 and will thus see if any of the top 32 bits are set. With 32-bit longs, it ~0xffffffffl becomes 0x00000000 and the mask check will always be zero.
C++11 added some new string conversion functions:
http://en.cppreference.com/w/cpp/string/basic_string/stoul
It includes stoi (string to int), stol (string to long), stoll (string to long long), stoul (string to unsigned long), stoull (string to unsigned long long). Notable in its absence is a stou (string to unsigned) function. Is there some reason it is not needed but all of the others are?
related: No "sto{short, unsigned short}" functions in C++11?
The most pat answer would be that the C library has no corresponding “strtou”, and the C++11 string functions are all just thinly veiled wrappers around the C library functions: The std::sto* functions mirror strto*, and the std::to_string functions use sprintf.
Edit: As KennyTM points out, both stoi and stol use strtol as the underlying conversion function, but it is still mysterious why while there exists stoul that uses strtoul, there is no corresponding stou.
I've no idea why stoi exists but not stou, but the only difference between stoul and a hypothetical stou would be a check that the result is in the range of unsigned:
unsigned stou(std::string const & str, size_t * idx = 0, int base = 10) {
unsigned long result = std::stoul(str, idx, base);
if (result > std::numeric_limits<unsigned>::max()) {
throw std::out_of_range("stou");
}
return result;
}
(Likewise, stoi is also similar to stol, just with a different range check; but since it already exists, there's no need to worry about exactly how to implement it.)
unsigned long ulval = std::stoul(buf);
unsigned long mask = ~0xffffffffl;
unsigned int uival;
if( (ulval & mask) == 0 )
uival = (unsigned int)ulval;
else {
...range error...
}
Using masks to do this with the expected value size in bits expressed in the mask, will make this work for 64-bit longs vs 32-bit ints, but also for 32-bit longs vs 32-bit ints.
In the case of 64-bit longs, ~0xffffffffl will become 0xffffffff00000000 and will thus see if any of the top 32 bits are set. With 32-bit longs, it ~0xffffffffl becomes 0x00000000 and the mask check will always be zero.