I'm just wondering should I use std::size_t for loops and stuff instead of int?
For instance:
#include <cstdint>
int main()
{
for (std::size_t i = 0; i < 10; ++i) {
// std::size_t OK here? Or should I use, say, unsigned int instead?
}
}
In general, what is the best practice regarding when to use std::size_t?
A good rule of thumb is for anything that you need to compare in the loop condition against something that is naturally a std::size_t itself.
std::size_t is the type of any sizeof expression and as is guaranteed to be able to express the maximum size of any object (including any array) in C++. By extension it is also guaranteed to be big enough for any array index so it is a natural type for a loop by index over an array.
If you are just counting up to a number then it may be more natural to use either the type of the variable that holds that number or an int or unsigned int (if large enough) as these should be a natural size for the machine.
size_t is the result type of the sizeof operator.
Use size_t for variables that model size or index in an array. size_t conveys semantics: you immediately know it represents a size in bytes or an index, rather than just another integer.
Also, using size_t to represent a size in bytes helps making the code portable.
The size_t type is meant to specify the size of something so it's natural to use it, for example, getting the length of a string and then processing each character:
for (size_t i = 0, max = strlen (str); i < max; i++)
doSomethingWith (str[i]);
You do have to watch out for boundary conditions of course, since it's an unsigned type. The boundary at the top end is not usually that important since the maximum is usually large (though it is possible to get there). Most people just use an int for that sort of thing because they rarely have structures or arrays that get big enough to exceed the capacity of that int.
But watch out for things like:
for (size_t i = strlen (str) - 1; i >= 0; i--)
which will cause an infinite loop due to the wrapping behaviour of unsigned values (although I've seen compilers warn against this). This can also be alleviated by the (slightly harder to understand but at least immune to wrapping problems):
for (size_t i = strlen (str); i-- > 0; )
By shifting the decrement into a post-check side-effect of the continuation condition, this does the check for continuation on the value before decrement, but still uses the decremented value inside the loop (which is why the loop runs from len .. 1 rather than len-1 .. 0).
By definition, size_t is the result of the sizeof operator. size_t was created to refer to sizes.
The number of times you do something (10, in your example) is not about sizes, so why use size_t? int, or unsigned int, should be ok.
Of course it is also relevant what you do with i inside the loop. If you pass it to a function which takes an unsigned int, for example, pick unsigned int.
In any case, I recommend to avoid implicit type conversions. Make all type conversions explicit.
short answer:
Almost never. Use signed version ptrdiff_t or non-standard ssize_t. Use function std::ssize instead of std::size.
long answer:
Whenever you need to have a vector of char bigger that 2gb on a 32 bit system. In every other use case, using a signed type is much safer than using an unsigned type.
example:
std::vector<A> data;
[...]
// calculate the index that should be used;
size_t i = calc_index(param1, param2);
// doing calculations close to the underflow of an integer is already dangerous
// do some bounds checking
if( i - 1 < 0 ) {
// always false, because 0-1 on unsigned creates an underflow
return LEFT_BORDER;
} else if( i >= data.size() - 1 ) {
// if i already had an underflow, this becomes true
return RIGHT_BORDER;
}
// now you have a bug that is very hard to track, because you never
// get an exception or anything anymore, to detect that you actually
// return the false border case.
return calc_something(data[i-1], data[i], data[i+1]);
The signed equivalent of size_t is ptrdiff_t, not int. But using int is still much better in most cases than size_t. ptrdiff_t is long on 32 and 64 bit systems.
This means that you always have to convert to and from size_t whenever you interact with a std::containers, which not very beautiful. But on a going native conference the authors of c++ mentioned that designing std::vector with an unsigned size_t was a mistake.
If your compiler gives you warnings on implicit conversions from ptrdiff_t to size_t, you can make it explicit with constructor syntax:
calc_something(data[size_t(i-1)], data[size_t(i)], data[size_t(i+1)]);
if just want to iterate a collection, without bounds cheking, use range based for:
for(const auto& d : data) {
[...]
}
here some words from Bjarne Stroustrup (C++ author) at going native
For some people this signed/unsigned design error in the STL is reason enough, to not use the std::vector, but instead an own implementation.
size_t is a very readable way to specify the size dimension of an item - length of a string, amount of bytes a pointer takes, etc.
It's also portable across platforms - you'll find that 64bit and 32bit both behave nicely with system functions and size_t - something that unsigned int might not do (e.g. when should you use unsigned long
Use std::size_t for indexing/counting C-style arrays.
For STL containers, you'll have (for example) vector<int>::size_type, which should be used for indexing and counting vector elements.
In practice, they are usually both unsigned ints, but it isn't guaranteed, especially when using custom allocators.
Soon most computers will be 64-bit architectures with 64-bit OS:es running programs operating on containers of billions of elements. Then you must use size_t instead of int as loop index, otherwise your index will wrap around at the 2^32:th element, on both 32- and 64-bit systems.
Prepare for the future!
size_t is returned by various libraries to indicate that the size of that container is non-zero. You use it when you get once back :0
However, in the your example above looping on a size_t is a potential bug. Consider the following:
for (size_t i = thing.size(); i >= 0; --i) {
// this will never terminate because size_t is a typedef for
// unsigned int which can not be negative by definition
// therefore i will always be >= 0
printf("the never ending story. la la la la");
}
the use of unsigned integers has the potential to create these types of subtle issues. Therefore imho I prefer to use size_t only when I interact with containers/types that require it.
When using size_t be careful with the following expression
size_t i = containner.find("mytoken");
size_t x = 99;
if (i-x>-1 && i+x < containner.size()) {
cout << containner[i-x] << " " << containner[i+x] << endl;
}
You will get false in the if expression regardless of what value you have for x.
It took me several days to realize this (the code is so simple that I did not do unit test), although it only take a few minutes to figure the source of the problem. Not sure it is better to do a cast or use zero.
if ((int)(i-x) > -1 or (i-x) >= 0)
Both ways should work. Here is my test run
size_t i = 5;
cerr << "i-7=" << i-7 << " (int)(i-7)=" << (int)(i-7) << endl;
The output: i-7=18446744073709551614 (int)(i-7)=-2
I would like other's comments.
It is often better not to use size_t in a loop. For example,
vector<int> a = {1,2,3,4};
for (size_t i=0; i<a.size(); i++) {
std::cout << a[i] << std::endl;
}
size_t n = a.size();
for (size_t i=n-1; i>=0; i--) {
std::cout << a[i] << std::endl;
}
The first loop is ok. But for the second loop:
When i=0, the result of i-- will be ULLONG_MAX (assuming size_t = unsigned long long), which is not what you want in a loop.
Moreover, if a is empty then n=0 and n-1=ULLONG_MAX which is not good either.
size_t is an unsigned type that can hold maximum integer value for your architecture, so it is protected from integer overflows due to sign (signed int 0x7FFFFFFF incremented by 1 will give you -1) or short size (unsigned short int 0xFFFF incremented by 1 will give you 0).
It is mainly used in array indexing/loops/address arithmetic and so on. Functions like memset() and alike accept size_t only, because theoretically you may have a block of memory of size 2^32-1 (on 32bit platform).
For such simple loops don't bother and use just int.
I have been struggling myself with understanding what and when to use it. But size_t is just an unsigned integral data type which is defined in various header files such as <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <wchar.h> etc.
It is used to represent the size of objects in bytes hence it's used as the return type by the sizeof operator. The maximum permissible size is dependent on the compiler; if the compiler is 32 bit then it is simply a typedef (alias) for unsigned int but if the compiler is 64 bit then it would be a typedef for unsigned long long. The size_t data type is never negative(excluding ssize_t)
Therefore many C library functions like malloc, memcpy and strlen declare their arguments and return type as size_t.
/ Declaration of various standard library functions.
// Here argument of 'n' refers to maximum blocks that can be
// allocated which is guaranteed to be non-negative.
void *malloc(size_t n);
// While copying 'n' bytes from 's2' to 's1'
// n must be non-negative integer.
void *memcpy(void *s1, void const *s2, size_t n);
// the size of any string or `std::vector<char> st;` will always be at least 0.
size_t strlen(char const *s);
size_t or any unsigned type might be seen used as loop variable as loop variables are typically greater than or equal to 0.
size_t is an unsigned integral type, that can represent the largest integer on you system.
Only use it if you need very large arrays,matrices etc.
Some functions return an size_t and your compiler will warn you if you try to do comparisons.
Avoid that by using a the appropriate signed/unsigned datatype or simply typecast for a fast hack.
size_t is unsigned int. so whenever you want unsigned int you can use it.
I use it when i want to specify size of the array , counter ect...
void * operator new (size_t size); is a good use of it.
Related
I just came across an extremely strange problem. The function I have is simply:
int strStr(string haystack, string needle) {
for(int i=0; i<=(haystack.length()-needle.length()); i++){
cout<<"i "<<i<<endl;
}
return 0;
}
Then if I call strStr("", "a"), although haystack.length()-needle.length()=-1, this will not return 0, you can try it yourself...
This is because .length() (and .size()) return size_t, which is an unsigned int. You think you get a negative number, when in fact it underflows back to the maximum value for size_t (On my machine, this is 18446744073709551615). This means your for loop will loop through all the possible values of size_t, instead of just exiting immediately like you expect.
To get the result you want, you can explicitly convert the sizes to ints, rather than unsigned ints (See aslgs answer), although this may fail for strings with sufficient length (Enough to over/under flow a standard int)
Edit:
Two solutions from the comments below:
(Nir Friedman) Instead of using int as in aslg's answer, include the header and use an int64_t, which will avoid the problem mentioned above.
(rici) Turn your for loop into for(int i = 0;needle.length() + i <= haystack.length();i ++){, which avoid the problem all together by rearranging the equation to avoid the subtraction all together.
(haystack.length()-needle.length())
length returns a size_t, in other words an unsigned int. Given the size of your strings, 0 and 1 respectively, when you calculate the difference it underflows and becomes the maximum possible value for an unsigned int. (Which is approximately 4.2 billions for a storage of 4 bytes, but could be a different value)
i<=(haystack.length()-needle.length())
The indexer i is converted by the compiler into an unsigned int to match the type. So you're gonna have to wait until i is greater than the max possible value for an unsigned int. It's not going to stop.
Solution:
You have to convert the result of each method to int, like so,
i <= ( (int)haystack.length() - (int)needle.length() )
I was wondering why this size_t is used where I can use say int type. Its said that size_t is a return type of sizeof operator. What does it mean? like if I use sizeof(int) and store what its return to an int type variable, then it also works, it's not necessary to store it in a size_t type variable. I just clearly want to know the basic concept of using size_t with a clearly understandable example.Thanks
size_t is guaranteed to be able to represent the largest size possible, int is not. This means size_t is more portable.
For instance, what if int could only store up to 255 but you could allocate arrays of 5000 bytes? Clearly this wouldn't work, however with size_t it will.
The simplest example is pretty dated: on an old 16-bit-int system with 64 k of RAM, the value of an int can be anywhere from -32768 to +32767, but after:
char buf[40960];
the buffer buf occupies 40 kbytes, so sizeof buf is too big to fit in an int, and it needs an unsigned int.
The same thing can happen today if you use 32-bit int but allow programs to access more than 4 GB of RAM at a time, as is the case on what are called "I32LP64" models (32 bit int, 64-bit long and pointer). Here the type size_t will have the same range as unsigned long.
You use size_t mostly for casting pointers into unsigned integers of the same size, to perform calculations on pointers as if they were integers, that would otherwise be prevented at compile time. Such code is intended to compile and build correctly in the context of different pointer sizes, e.g. 32-bit model versus 64-bit.
It is implementation defined but on 64bit systems you will find that size_t is often 64bit while int is still 32bit (unless it's ILP64 or SILP64 model).
depending on what architecture you are on (16-bit, 32-bit or 64-bit) an int could be a different size.
if you want a specific size I use uint16_t or uint32_t .... You can check out this thread for more information
What does the C++ standard state the size of int, long type to be?
size_t is a typedef defined to store object size. It can store the maximum object size that is supported by a target platform. This makes it portable.
For example:
void * memcpy(void * destination, const void * source, size_t num);
memcpy() copies num bytes from source into destination. The maximum number of bytes that can be copied depends on the platform. So, making num as type size_t makes memcpy portable.
Refer https://stackoverflow.com/a/7706240/2820412 for further details.
size_t is a typedef for one of the fundamental unsigned integer types. It could be unsigned int, unsigned long, or unsigned long long depending on the implementation.
Its special property is that it can represent the size of (in bytes) of any object (which includes the largest object possible as well!). That is one of the reasons it is widely used in the standard library for array indexing and loop counting (that also solves the portability issue). Let me illustrate this with a simple example.
Consider a vector of length 2*UINT_MAX, where UINT_MAX denotes the maximum value of unsigned int (which is 4294967295 for my implementation considering 4 bytes for unsigned int).
std::vector vec(2*UINT_MAX,0);
If you would want to fill the vector using a for-loop such as this, it would not work because unsigned int can iterate only upto the point UINT_MAX (beyond which it will start again from 0).
for(unsigned int i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
The solution here is to use size_t since it is guaranteed to represent the size of any object (and therefore our vector vec too!) in bytes. Note that for my implementation size_t is a typedef for unsigned long and therefore its max value = ULONG_MAX = 18446744073709551615 considering 8 bytes.
for(size_t i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
References: https://en.cppreference.com/w/cpp/types/size_t
I am getting following warning always for following type of code.
std::vector v;
for ( int i = 0; i < v.size(); i++) {
}
warning C4267: 'initializing' : conversion from 'size_t' to 'int', possible loss of data
I understand that size() returns size_t, just wanted to know is this safe to ignore this warning or should I make all my loop variable of type size_t
If you might need to hold more than INT_MAX items in your vector, use size_t. In most cases, it doesn't really matter, but I use size_t just to make the warning go away.
Better yet, use iterators:
for( auto it = v.begin(); it != v.end(); ++it )
(If your compiler doesn't support C++11, use std::vector<whatever>::iterator in place of auto)
C++11 also makes choosing the best index type easier (in case you use the index in some computation, not just for subscripting v):
for( decltype(v.size()) i = 0; i < v.size(); ++i )
What is size_t?
size_t corresponds to the integral data type returned by the language operator sizeof and is defined in the header file (among others) as an unsigned integral type.
Is it okay to cast size_t to int?
You could use a cast if you are sure that size is never going to be > than INT_MAX.
If you are trying to write a portable code, it is not safe because,
size_t in 64 bit Unix is 64 bits
size_tin 64 bit Windows is 32 bits
So if you port your code from Unix to WIndows and if above are the enviornments you will lose data.
Suggested Answer
Given the caveat, the suggestion is to make i of unsigned integral type or even better use it as type size_t.
is this safe to ignore this warning or should I make all my loop variable of type size_t
No. You are opening yourself up to a class of integer overflow attacks. If the vector size is greater than MAX_INT (and an attacker has a way of making that happen), your loop will run forever, causing a denial of service possibility.
Technically, std::vector::size returns std::vector::size_type, though.
You should use the right signedness for your loop counter variables. (Really, for most uses, you want unsigned integers rather than signed integers for loops anyway)
The problem is that you're mixing two different data types. On some architectures, size_t is a 32-bit integer, on others it's 64-bit. Your code should properly handle both.
since size() returns a size_t (not int), then that should be the datatype you compare it against.
std::vector v;
for ( size_t i = 0; i < v.size(); i++) {
}
Here's an alternate view from Bjarne Stroustrup:
http://www.stroustrup.com/bs_faq2.html#simple-program
for (int i = 0; i<v.size(); ++i) cout << v[i] << '\n';
Yes, I know that I could declare i to be a vector::size_type
rather than plain int to quiet warnings from some hyper-suspicious
compilers, but in this case,I consider that too pedantic and
distracting.
It's a trade-off. If you're worried that v.size() could go above 2,147,483,647, use size_t. If you're using i inside your loop for more than just looking inside the vector, and you're concerned about subtle signed/unsigned related bugs, use int. In my experience, the latter issue is more prevalent than the former. Your experience may differ.
Also see Why is size_t unsigned?.
Is it better to cast the iterator condition right operand from size_t to int, or iterate potentially past the maximum value of int? Is the answer implementation specific?
int a;
for (size_t i = 0; i < vect.size(); i++)
{
if (some_func((int)i))
{
a = (int)i;
}
}
int a;
for (int i = 0; i < (int)vect.size(); i++)
{
if (some_func(i))
{
a = i;
}
}
I almost always use the first variation, because I find that about 80% of the time, I discover that some_func should probably also take a size_t.
If in fact some_func takes a signed int, you need to be aware of what happens when vect gets bigger than INT_MAX. If the solution isn't obvious in your situation (it usually isn't), you can at least replace some_func((int)i) with some_func(numeric_cast<int>(i)) (see Boost.org for one implementation of numeric_cast). This has the virtue of throwing an exception when vect grows bigger than you've planned on, rather than silently wrapping around to negative values.
I'd just leave it as a size_t, since there's not a good reason not to do so. What do you mean by "or iterate potentially up to the maximum value of type_t"? You're only iterating up to the value of vect.size().
For most compilers, it won't make any difference. On 32 bit systems, it's obvious, but even on 64 bit systems, both variables will probably be stored in a 64-bit register and pushed on the stack as a 64-bit value.
If the compiler stores int values as 32 bit values on the stack, the first function should be more efficient in terms of CPU-cycles.
But the difference is negligible (although the second function "looks" cleaner)
Why is it that in C++ containers, it returns a size_type rather than an int? If we're creating our own structures, should we also be encouraged to use size_type?
In general, size_t should be used whenever you are measuring the size of something. It is really strange that size_t is only required to represent between 0 and SIZE_MAX bytes and SIZE_MAX is only required to be 65,535...
The other interesting constraints from the C++ and C Standards are:
the return type of sizeof() is size_t and it is an unsigned integer
operator new() takes the number of bytes to allocate as a size_t parameter
size_t is defined in <cstddef>
SIZE_MAX is defined in <limits.h> in C99 but not mentioned in C++98?!
size_t is not included in the list of fundamental integer types so I have always assumed that size_t is a type alias for one of the fundamental types: char, short int, int, and long int.
If you are counting bytes, then you should definitely be using size_t. If you are counting the number of elements, then you should probably use size_t since this seems to be what C++ has been using. In any case, you don't want to use int - at the very least use unsigned long or unsigned long long if you are using TR1. Or... even better... typedef whatever you end up using to size_type or just include <cstddef> and use std::size_t.
A few reasons might be:
The type (size_t) can be defined as the largest unsigned integer on that platform. For example, it might be defined as a 32 bit integer or a 64 bit integer or something else altogether that's capable of storing unsigned values of a great length
To make it clear when reading a program that the value is a size and not just a "regular" int
If you're writing an app that's just for you and/or throwaway, you're probably fine to use a basic int. If you're writing a library or something substantial, size_t is probably a better way to go.
Some of the answers are more complicated than necessary. A size_t is an unsigned integer type that is guaranteed to be big enough to store the size in bytes of any object in memory. In practice, it is always the same size as the pointer type. On 32 bit systems it is 32 bits. On 64 bit systems it is 64 bits.
All containers in the stl have various typedefs. For example, value_type is the element type, and size_type is the number stored type. In this way the containers are completely generic based on platform and implementation.
If you are creating your own containers, you should use size_type too. Typically this is done
typedef std::size_t size_type;
If you want a container's size, you should write
typedef vector<int> ints;
ints v;
v.push_back(4);
ints::size_type s = v.size();
What's nice is that if later you want to use a list, just change the typedef to
typedef list<int> ints;
And it will still work!
I assume you mean "size_t" -- this is a way of indicating an unsigned integer (an integer that can only be positive, never negative) -- it makes sense for containers' sizes since you can't have an array with a size of -7. I wouldn't say that you have to use size_t but it does indicate to others using your code "This number here is always positive." It also gives you a greater range of positive numbers, but that is likely to be unimportant unless you have some very big containers.
C++ is a language that could be implemented on different hardware architectures and platforms. As time has gone by it has supported 16-, 32-, and 64-bit architecture, and likely others in the future. size_type and other type aliases are ways for libraries to insulate the programmers/code from implementation details.
Assuming the size_type uses 32 bits on 32-bit machines and 64 bits on 64-bit machines, the same source code likely would work better if you've used size_type where needed. In most cases you could assume it would be the same as unsigned int, but it's not guaranteed.
size_type is used to express capacities of STL containers like std::vector whereas size_t is used to express byte size of an object in C/C++.
ints are not guaranteed to be 4 bytes in the specification, so they are not reliable. Yes, size_type would be preferred over ints
size_t is unsigned, so even if they're both 32 bits it doesn't mean quite the same thing as an unqualified int. I'm not sure why they added the type, but on many platforms today sizeof (size_t) == sizeof (int) == sizeof (long), so which type you choose is up to you. Note that those relations aren't guaranteed by the standard and are rapidly becoming out of date as 64 bit platforms move in.
For your own code, if you need to represent something that is a "size" conceptually and can never be negative, size_t would be a fine choice.
void f1(size_t n) {
if (n <= myVector.size()) { assert(false); }
size_t n1 = n - myVector.size(); // bug! myVector.size() can be > n
do_stuff_n_times(n1);
}
void f2(int n) {
int n1 = n - static_cast<int>(myVector.size());
assert(n1 >= 0);
do_stuff_n_times(n1);
}
f1() and f2() both have the same bug, but detecting the problem in f2() is easier. For more complex code, unsigned integer arithmetic bugs are not as easy to identify.
Personally I use signed int for all my sizes unless unsigned int should be used. I have never run into situation where my size won't fit into a 32 bit signed integer. I will probably use 64 bit signed integers before I use unsigned 32 bit integers.
The problem with using signed integers for size is a lot of static_cast from size_t to int in your code.