size_t vs int in C++ and/or C - c++

Why is it that in C++ containers, it returns a size_type rather than an int? If we're creating our own structures, should we also be encouraged to use size_type?

In general, size_t should be used whenever you are measuring the size of something. It is really strange that size_t is only required to represent between 0 and SIZE_MAX bytes and SIZE_MAX is only required to be 65,535...
The other interesting constraints from the C++ and C Standards are:
the return type of sizeof() is size_t and it is an unsigned integer
operator new() takes the number of bytes to allocate as a size_t parameter
size_t is defined in <cstddef>
SIZE_MAX is defined in <limits.h> in C99 but not mentioned in C++98?!
size_t is not included in the list of fundamental integer types so I have always assumed that size_t is a type alias for one of the fundamental types: char, short int, int, and long int.
If you are counting bytes, then you should definitely be using size_t. If you are counting the number of elements, then you should probably use size_t since this seems to be what C++ has been using. In any case, you don't want to use int - at the very least use unsigned long or unsigned long long if you are using TR1. Or... even better... typedef whatever you end up using to size_type or just include <cstddef> and use std::size_t.

A few reasons might be:
The type (size_t) can be defined as the largest unsigned integer on that platform. For example, it might be defined as a 32 bit integer or a 64 bit integer or something else altogether that's capable of storing unsigned values of a great length
To make it clear when reading a program that the value is a size and not just a "regular" int
If you're writing an app that's just for you and/or throwaway, you're probably fine to use a basic int. If you're writing a library or something substantial, size_t is probably a better way to go.

Some of the answers are more complicated than necessary. A size_t is an unsigned integer type that is guaranteed to be big enough to store the size in bytes of any object in memory. In practice, it is always the same size as the pointer type. On 32 bit systems it is 32 bits. On 64 bit systems it is 64 bits.

All containers in the stl have various typedefs. For example, value_type is the element type, and size_type is the number stored type. In this way the containers are completely generic based on platform and implementation.
If you are creating your own containers, you should use size_type too. Typically this is done
typedef std::size_t size_type;
If you want a container's size, you should write
typedef vector<int> ints;
ints v;
v.push_back(4);
ints::size_type s = v.size();
What's nice is that if later you want to use a list, just change the typedef to
typedef list<int> ints;
And it will still work!

I assume you mean "size_t" -- this is a way of indicating an unsigned integer (an integer that can only be positive, never negative) -- it makes sense for containers' sizes since you can't have an array with a size of -7. I wouldn't say that you have to use size_t but it does indicate to others using your code "This number here is always positive." It also gives you a greater range of positive numbers, but that is likely to be unimportant unless you have some very big containers.

C++ is a language that could be implemented on different hardware architectures and platforms. As time has gone by it has supported 16-, 32-, and 64-bit architecture, and likely others in the future. size_type and other type aliases are ways for libraries to insulate the programmers/code from implementation details.
Assuming the size_type uses 32 bits on 32-bit machines and 64 bits on 64-bit machines, the same source code likely would work better if you've used size_type where needed. In most cases you could assume it would be the same as unsigned int, but it's not guaranteed.
size_type is used to express capacities of STL containers like std::vector whereas size_t is used to express byte size of an object in C/C++.

ints are not guaranteed to be 4 bytes in the specification, so they are not reliable. Yes, size_type would be preferred over ints

size_t is unsigned, so even if they're both 32 bits it doesn't mean quite the same thing as an unqualified int. I'm not sure why they added the type, but on many platforms today sizeof (size_t) == sizeof (int) == sizeof (long), so which type you choose is up to you. Note that those relations aren't guaranteed by the standard and are rapidly becoming out of date as 64 bit platforms move in.
For your own code, if you need to represent something that is a "size" conceptually and can never be negative, size_t would be a fine choice.

void f1(size_t n) {
if (n <= myVector.size()) { assert(false); }
size_t n1 = n - myVector.size(); // bug! myVector.size() can be > n
do_stuff_n_times(n1);
}
void f2(int n) {
int n1 = n - static_cast<int>(myVector.size());
assert(n1 >= 0);
do_stuff_n_times(n1);
}
f1() and f2() both have the same bug, but detecting the problem in f2() is easier. For more complex code, unsigned integer arithmetic bugs are not as easy to identify.
Personally I use signed int for all my sizes unless unsigned int should be used. I have never run into situation where my size won't fit into a 32 bit signed integer. I will probably use 64 bit signed integers before I use unsigned 32 bit integers.
The problem with using signed integers for size is a lot of static_cast from size_t to int in your code.

Related

Can sizeof(size_t) be less than sizeof(int)?

Can sizeof(size_t) be less than sizeof(int)?
Do the C and/or C++ standards guarantee that using unsigned int for array indexing is always safe?
Yes, sizeof(size_t) can, in principle, be less than sizeof(int). I don't know of any implementations where this is true, and it's likely that there are none. I can imagine an implementation with 64-bit int and 32-bit size_t.
But indexing an array with unsigned int is safe -- as long as the value of the index is within the bounds imposed by the length of the array. The argument to the [] operator is merely required to be an integer. It's not converted to size_t. It's defined in terms of pointer arithmetic, in which the + operator has one argument that's a pointer and another argument that is of any integer type.
If unsigned int is wider than size_t, then an unsigned int index value that exceeds SIZE_MAX will almost certainly cause problems because the array isn't that big. In C++14 and later, defining a type bigger than SIZE_MAX bytes is explicitly prohibited (3.9.2 [compound.types] paragraph 2; section 6.9.2 in C++17). In earlier versions of C++, and in all versions of C, it isn't explicitly prohibited, but it's unlikely that any sane implementation would allow it.
[C answer]
Can sizeof(size_t) be less than sizeof(int)?
Yes. The size of size_t can be less, more or the same as int as their relative sizes/ranges are not specified in C - only their minimum _MAX values: 65535, 32767.
IMO, sizeof(size_t) < sizeof(int) is a unicorn. Theoretical, but not seen.
Code could use the following to detect such beasties.
#include <limits.h>
#include <stddef.h>
#if SIZE_MAX < UINT_MAX
#error Unexpected small size_t
#endif
Do the C and/or C++ standards guarantee that using unsigned int for array indexing is always safe?
In C, No.
Examples: A small array may only tolerate the indexes of [0 ... 2] - regardless of the type of the index - not the entire range of unsigned. A huge array may be index-able [0 ... UINT_MAX*42ull] and so an unsigned cannot represent all valid indexes.
A size_t is wide enough to index all arrays.

Type for array indices: signed/unsigned integer avantages

In C++, the default size for array indices is size_t which is a 64 bits unsigned 64-bits integer on most x86-64 platforms. I am in the process of building my own std::vector class for my library for High Performance Computing (One of the main reason is that I want this class to be able to take ownership of a pointer, something std::vector does not offer). For the type of the array index, I am thinking of either using:
size_t
my own index_t that would be a signed int or a long signed int depending on my program
The advantages or using a signed integer over an unsigned one are numerous, such as
for (index_t i = 0; i < v.size() - 1; ++i)
works like it is supposer to (with an unsigned integer, this loop goes crazy when v is of size 0)
for (index_t i = v.size() - 1; i >= 0; --i)
works like it is supposed to, and many other avantages. In terms of performance, it even seems to be a little bit better as
a + 1 < b + 1
can be reduced to a < b with signed integer (overflow is undefined), and not in the case of unsigned integers. The only avantage performance wise seems to be that a /= 2 can be reduced to a shift operation with unsigned integers but not with signed one.
I am wondering why the C++ committee has decided to use an unsigned integer for size_t as it seems to introduce a lot of pain and only few advantages.
The motivation for using an unsigned type as index or size in
the standard is based on constraints only relevant to 16 bit
machines. The natural type for any integral type in C++ is
int, and that's what should probably be used; as you've
noticed, trying to use unsigned types as numerical values in C++
is fraught with problems. If you're worried about the sizes
being so big that they don't fit into an int, ptrdiff_t
would be appropriate; this is, after all, the type of the
results of subtraction of pointers or iterators. (The fact that
v.size() has a different type than v.end() - v.begin() is
really a design flaw in the standard library.)
For me, unsigned sizes always make the most sense, since you can't have -32 elements in an array it is very very scary to consider the size/length as a signed quantity all the time.
The corner cases you mention can be coded around, you can e.g. abort the loop before entering it if v is empty for the first case (which doesn't look all that common to begin with, iterating over all elements except the last?).

What is the downside of replacing size_t with unsigned long

The library I am working on need to be used on both 32 and 64 bit machines; I have lots of compiler warnings because on 64bit machines unsigned int != size_t.
Is there any downside in replacing all unsigned ints and size_ts by 'unsigned long'? I appreciate it does not look very elegant, but, in out case, the memory is not too much of an issue... I am wondering if there is a possibility of any bugs/unwanted behaviour etc. created by such replace all operation (could you give examples)? Thanks.
What warnings? The most obvious one I can think of is for a "narrowing conversion", that is to say you're assigning size_t to unsigned int, and getting a warning that information might be lost.
The main downside of replacing size_t with unsigned long is that unsigned long is not guaranteed to be large enough to contain every possible value of size_t, and on Windows 64 it is not large enough. So you might find that you still have warnings.
The proper fix is that if you assign a size_t to a variable (or data member), you should make sure that variable has a type large enough to contain any value of size_t. That's what the warning is all about. So you should not switch to unsigned long, you should switch those variables to size_t.
Conversely, if you have a variable that doesn't need to be big enough to hold any size, just big enough for unsigned int, then don't use size_t for it in the first place.
Both types (size_t and unsigned int) have valid uses, so any approach that indiscriminately replaces all use of them by some other type must be wrong :-) Actually, you could replace everything with size_t or uintmax_t and for most programs that would be OK. The exceptions are where the code relies on using an unsigned type of the same size as int, or whatever, such that a larger type breaks the code.
The standard makes little guarantees about the sizes of types like int and long. size_t is guaranteed to be large enough to hold any object, and all std containers operate on size_t.
It's perfectly possible for a platform to define long as smaller than size_t, or have the size of long subject to compilation options, for example. To be safe, it's best to stick to size_t.
Another criterion to consider is that size_t carries a meaning - "this thing is used to store a size or an index." It makes the code slightly more self-documenting.
If you are using size_t in places where you should get a size_t and replace it with unsigned long, you will introduce new warnings.
example:
size_t count = some_vector.size();
Replace size_t with unsigned long, and (to the degree they are different) you will have introduced a new warning (because some_vector.size() returns a size_t - actually a std:::vector<something>::size_type but in practice it should evaluate to the same).
It may be a problem of assuming it's unsigned long when long is 8 bytes. then (unsigned int) -1 != (unsigned long) -1, the following code may have assertion failure.
unsigned int i = string::npos;
assert(i == string::npos);

can anyone explain why size_t type is used with an example?

I was wondering why this size_t is used where I can use say int type. Its said that size_t is a return type of sizeof operator. What does it mean? like if I use sizeof(int) and store what its return to an int type variable, then it also works, it's not necessary to store it in a size_t type variable. I just clearly want to know the basic concept of using size_t with a clearly understandable example.Thanks
size_t is guaranteed to be able to represent the largest size possible, int is not. This means size_t is more portable.
For instance, what if int could only store up to 255 but you could allocate arrays of 5000 bytes? Clearly this wouldn't work, however with size_t it will.
The simplest example is pretty dated: on an old 16-bit-int system with 64 k of RAM, the value of an int can be anywhere from -32768 to +32767, but after:
char buf[40960];
the buffer buf occupies 40 kbytes, so sizeof buf is too big to fit in an int, and it needs an unsigned int.
The same thing can happen today if you use 32-bit int but allow programs to access more than 4 GB of RAM at a time, as is the case on what are called "I32LP64" models (32 bit int, 64-bit long and pointer). Here the type size_t will have the same range as unsigned long.
You use size_t mostly for casting pointers into unsigned integers of the same size, to perform calculations on pointers as if they were integers, that would otherwise be prevented at compile time. Such code is intended to compile and build correctly in the context of different pointer sizes, e.g. 32-bit model versus 64-bit.
It is implementation defined but on 64bit systems you will find that size_t is often 64bit while int is still 32bit (unless it's ILP64 or SILP64 model).
depending on what architecture you are on (16-bit, 32-bit or 64-bit) an int could be a different size.
if you want a specific size I use uint16_t or uint32_t .... You can check out this thread for more information
What does the C++ standard state the size of int, long type to be?
size_t is a typedef defined to store object size. It can store the maximum object size that is supported by a target platform. This makes it portable.
For example:
void * memcpy(void * destination, const void * source, size_t num);
memcpy() copies num bytes from source into destination. The maximum number of bytes that can be copied depends on the platform. So, making num as type size_t makes memcpy portable.
Refer https://stackoverflow.com/a/7706240/2820412 for further details.
size_t is a typedef for one of the fundamental unsigned integer types. It could be unsigned int, unsigned long, or unsigned long long depending on the implementation.
Its special property is that it can represent the size of (in bytes) of any object (which includes the largest object possible as well!). That is one of the reasons it is widely used in the standard library for array indexing and loop counting (that also solves the portability issue). Let me illustrate this with a simple example.
Consider a vector of length 2*UINT_MAX, where UINT_MAX denotes the maximum value of unsigned int (which is 4294967295 for my implementation considering 4 bytes for unsigned int).
std::vector vec(2*UINT_MAX,0);
If you would want to fill the vector using a for-loop such as this, it would not work because unsigned int can iterate only upto the point UINT_MAX (beyond which it will start again from 0).
for(unsigned int i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
The solution here is to use size_t since it is guaranteed to represent the size of any object (and therefore our vector vec too!) in bytes. Note that for my implementation size_t is a typedef for unsigned long and therefore its max value = ULONG_MAX = 18446744073709551615 considering 8 bytes.
for(size_t i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
References: https://en.cppreference.com/w/cpp/types/size_t

Where can I look up the definition of size_type for vectors in the C++ STL?

It seems safe to cast the result of my vector's size() function to an unsigned int. How can I tell for sure, though? My documentation isn't clear about how size_type is defined.
Do not assume the type of the container size (or anything else typed inside).
Today?
The best solution for now is to use:
std::vector<T>::size_type
Where T is your type. For example:
std::vector<std::string>::size_type i ;
std::vector<int>::size_type j ;
std::vector<std::vector<double> >::size_type k ;
(Using a typedef could help make this better to read)
The same goes for iterators, and all other types "inside" STL containers.
After C++0x?
When the compiler will be able to find the type of the variable, you'll be able to use the auto keyword. For example:
void doSomething(const std::vector<double> & p_aData)
{
std::vector<double>::size_type i = p_aData.size() ; // Old/Current way
auto j = p_aData.size() ; // New C++0x way, definition
decltype(p_aData.size()) k; // New C++0x way, declaration
}
Edit: Question from JF
What if he needs to pass the size of the container to some existing code that uses, say, an unsigned int? – JF
This is a problem common to the use of the STL: You cannot do it without some work.
The first solution is to design the code to always use the STL type. For example:
typedef std::vector<int>::size_type VIntSize ;
VIntSize getIndexOfSomeItem(const std::vector<int> p_aInt)
{
return /* the found value, or some kind of std::npos */
}
The second is to make the conversion yourself, using either a static_cast, using a function that will assert if the value goes out of bounds of the destination type (sometimes, I see code using "char" because, "you know, the index will never go beyond 256" [I quote from memory]).
I believe this could be a full question in itself.
According to the standard, you cannot be sure. The exact type depends on your machine. You can look at the definition in your compiler's header implementations, though.
I can't imagine that it wouldn't be safe on a 32-bit system, but 64-bit could be a problem (since ints remain 32 bit). To be safe, why not just declare your variable to be vector<MyType>::size_type instead of unsigned int?
It should always be safe to cast it to size_t. unsigned int isn't enough on most 64-bit systems, and even unsigned long isn't enough on Windows (which uses the LLP64 model instead of the LP64 model most Unix-like systems use).
The C++ standard only states that size_t is found in <cstddef>, which puts the identifiers in <stddef.h>. My copy of Harbison & Steele places the minimum and maximum values for size_t in <stdint.h>. That should give you a notion of how big your recipient variable needs to be for your platform.
Your best bet is to stick with integer types that are large enough to hold a pointer on your platform. In C99, that'd be intptr_t and uintptr_t, also officially located in <stdint.h>.
As long as you're sure that an unsigned int on your system will be large enough to hold the number of items you'll have in the vector you should be safe ;-)
I'm not sure how well this will work because I'm just thinking off the top of my head, but a compile-time assertion (such as BOOST_STATIC_ASSERT() or see Ways to ASSERT expressions at build time in C) might help. Something like:
BOOST_STATIC_ASSERT( sizeof( unsigned int) >= sizeof( size_type));