Can sizeof(size_t) be less than sizeof(int)? - c++

Can sizeof(size_t) be less than sizeof(int)?
Do the C and/or C++ standards guarantee that using unsigned int for array indexing is always safe?

Yes, sizeof(size_t) can, in principle, be less than sizeof(int). I don't know of any implementations where this is true, and it's likely that there are none. I can imagine an implementation with 64-bit int and 32-bit size_t.
But indexing an array with unsigned int is safe -- as long as the value of the index is within the bounds imposed by the length of the array. The argument to the [] operator is merely required to be an integer. It's not converted to size_t. It's defined in terms of pointer arithmetic, in which the + operator has one argument that's a pointer and another argument that is of any integer type.
If unsigned int is wider than size_t, then an unsigned int index value that exceeds SIZE_MAX will almost certainly cause problems because the array isn't that big. In C++14 and later, defining a type bigger than SIZE_MAX bytes is explicitly prohibited (3.9.2 [compound.types] paragraph 2; section 6.9.2 in C++17). In earlier versions of C++, and in all versions of C, it isn't explicitly prohibited, but it's unlikely that any sane implementation would allow it.

[C answer]
Can sizeof(size_t) be less than sizeof(int)?
Yes. The size of size_t can be less, more or the same as int as their relative sizes/ranges are not specified in C - only their minimum _MAX values: 65535, 32767.
IMO, sizeof(size_t) < sizeof(int) is a unicorn. Theoretical, but not seen.
Code could use the following to detect such beasties.
#include <limits.h>
#include <stddef.h>
#if SIZE_MAX < UINT_MAX
#error Unexpected small size_t
#endif
Do the C and/or C++ standards guarantee that using unsigned int for array indexing is always safe?
In C, No.
Examples: A small array may only tolerate the indexes of [0 ... 2] - regardless of the type of the index - not the entire range of unsigned. A huge array may be index-able [0 ... UINT_MAX*42ull] and so an unsigned cannot represent all valid indexes.
A size_t is wide enough to index all arrays.

Related

Why is the result of sizeof implementation defined? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
In C99, §6.5.3.4:
2 The sizeof operator yields the size (in bytes) of its operand,
which may be an expression or the parenthesized name of a type. ...
4 The value of the result is implementation-defined, and its type (an
unsigned integer type) is size_t, defined in <stddef.h> (and other
headers).
In C++14, §5.3.3:
1 The sizeof operator yields the number of bytes in the object
representation of its operand. ... The result of sizeof applied to any
other fundamental type (3.9.1) is implementation-defined.
The only guaranteed values are sizeof(char), sizeof(unsigned char) and sizeof(signed char) which is one.
However, "the number of bytes in the object representation" seems pretty iron-clad to me. For example, in C99 §6.2.6.1:
4 Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object
of that type, in bytes. ...
So why is it implementation-defined if it seems pretty defined?
Many of you seem to be misinterpretating my question. I never claimed that:
A) The size of types are defined or the same on all systems,
B) implementation-defined means it can return "random values"
What I'm getting at here is that n * CHAR_BITS is a fixed formula. The formula itself can't changed between implementations. Yes, an int may be 4 bytes or 8 bytes. I get that. But between all implementations, the value must n * CHAR_BITS.
The result of sizeof is implementation defined because the size of the various basic types are implementation defined. The only guarantees we have on the size of the types in C++ is that
sizeof(char) = 1 and sizeof(char) <= sizeof(short) <= sizeof(int) <=
sizeof(long) <= sizeof(long long)
And that each type has a minimum value it must support C11 [Annex E (informative) Implementation limits]/1
[...]The minimum magnitudes shown shall be replaced by implementation-defined magnitudes with the same sign.[...]
#define CHAR_BIT 8
#define CHAR_MAX UCHAR_MAX or SCHAR_MAX
#define CHAR_MIN 0 or SCHAR_MIN
#define INT_MAX +32767
#define INT_MIN -32767
#define LONG_MAX +2147483647
#define LONG_MIN -2147483647
#define LLONG_MAX +9223372036854775807
#define LLONG_MIN -9223372036854775807
#define MB_LEN_MAX 1
#define SCHAR_MAX +127
#define SCHAR_MIN -127
#define SHRT_MAX +32767
#define SHRT_MIN -32767
#define UCHAR_MAX 255
#define USHRT_MAX 65535
#define UINT_MAX 65535
#define ULONG_MAX 4294967295
#define ULLONG_MAX 18446744073709551615
So per the standard a int has to be able to store a number that could be stored in 16 bits but it can be bigger and on most of today's systems it is 32 bits.
What I'm getting at here is that n * CHAR_BITS is a fixed formula. The formula itself can't changed between implementations. Yes, an int may be 4 bytes or 8 bytes. I get that. But between all implementations, the value must n * CHAR_BITS.
You are correct but n is defined per C99 §6.2.6.1 as
where n is the size of an object of that type
emphasis mine
So the formula may be fixed but n is not fixed and different implementations on the same system can use a different value of n.
The result of sizeof is not implementation defined. The standard does not say that; it says:
The value of the result is implementation-defined, [...]
That is semantically different. The result of sizeof is well defined:
[...] the size (in bytes) of its operand [...]
Both the bit width of a byte in this context and the number of bytes in non char types is implementation defined.
Because the sizes of basic types are defined in terms of efficiency, not in terms of exact number of bits. An "int" must be something that the CPU can manipulate efficiently. For most modern systems, this quantity turns out to be 32 bits (or 64 bits). For older systems, it was quite often 16 bits. However, if a 35 bits CPU were to exist, an int on such a system would be 35 bits. In other words, C++ does not apply a penalty to enforce a bit-width a CPU might not support at all.
Of course, one could argue that notions of exotic bit widths for basic types have been overtaken by history. I cannot think of any modern CPU that does not support the standard set of 8, 16, and 32 bits (feel free to disagree with this statement, but at least be so kind to give an example!), and 64 bits is also pretty common (and not a big deal to support in software if hardware support is unavailable).
Arguably the C++ language has already moved away from having variable numbers of bits for char; as far as I know, u8"..." converts to char *, but the unicode specification demands that u8 is encoded in 8 bits.
If a char of 8 bits is size 1, then an int of 32 bits is size 4. If a char of 16 bits is size 1, then an int of 32 bits is only size 2. Both situations are equally valid in C++, if such sizes happen to be good choices for their respective hardware.
Padding bits are "unspecified" not "implementation-defined".
Wrong. Very, very wrong. The values of padding bytes are unspecified. The intention here is that the values of these bits may represent trap values, but not necessarily.
The standard tells you sizeof returns bytes * CHAR_BITS, but doesn't specify a size (other than the exact-width types). The number of bytes a type occupies is implementation-defined, hence sizeof must be as well.
Implementation-defined is decribed as:
unspecified value where each implementation documents how the choice
is made
When you declare a new variable in example like this:
size_t a;
it will be equal with this:
unsigned short int a; // unsigned short a
On 32-bit computers size of the integer number (int) is 4 bytes.
Size of the short int is 2 bytes.
In C programming languange 'size_t' is the return type of the 'sizeof()' operator.When you use 'sizeof()' operator he will give you the size of the object.Argument of the 'sizeof()' must be an l-value type. Size of the element(object) cannot be a negative number and it must be an integer.

Type for array indices: signed/unsigned integer avantages

In C++, the default size for array indices is size_t which is a 64 bits unsigned 64-bits integer on most x86-64 platforms. I am in the process of building my own std::vector class for my library for High Performance Computing (One of the main reason is that I want this class to be able to take ownership of a pointer, something std::vector does not offer). For the type of the array index, I am thinking of either using:
size_t
my own index_t that would be a signed int or a long signed int depending on my program
The advantages or using a signed integer over an unsigned one are numerous, such as
for (index_t i = 0; i < v.size() - 1; ++i)
works like it is supposer to (with an unsigned integer, this loop goes crazy when v is of size 0)
for (index_t i = v.size() - 1; i >= 0; --i)
works like it is supposed to, and many other avantages. In terms of performance, it even seems to be a little bit better as
a + 1 < b + 1
can be reduced to a < b with signed integer (overflow is undefined), and not in the case of unsigned integers. The only avantage performance wise seems to be that a /= 2 can be reduced to a shift operation with unsigned integers but not with signed one.
I am wondering why the C++ committee has decided to use an unsigned integer for size_t as it seems to introduce a lot of pain and only few advantages.
The motivation for using an unsigned type as index or size in
the standard is based on constraints only relevant to 16 bit
machines. The natural type for any integral type in C++ is
int, and that's what should probably be used; as you've
noticed, trying to use unsigned types as numerical values in C++
is fraught with problems. If you're worried about the sizes
being so big that they don't fit into an int, ptrdiff_t
would be appropriate; this is, after all, the type of the
results of subtraction of pointers or iterators. (The fact that
v.size() has a different type than v.end() - v.begin() is
really a design flaw in the standard library.)
For me, unsigned sizes always make the most sense, since you can't have -32 elements in an array it is very very scary to consider the size/length as a signed quantity all the time.
The corner cases you mention can be coded around, you can e.g. abort the loop before entering it if v is empty for the first case (which doesn't look all that common to begin with, iterating over all elements except the last?).

can anyone explain why size_t type is used with an example?

I was wondering why this size_t is used where I can use say int type. Its said that size_t is a return type of sizeof operator. What does it mean? like if I use sizeof(int) and store what its return to an int type variable, then it also works, it's not necessary to store it in a size_t type variable. I just clearly want to know the basic concept of using size_t with a clearly understandable example.Thanks
size_t is guaranteed to be able to represent the largest size possible, int is not. This means size_t is more portable.
For instance, what if int could only store up to 255 but you could allocate arrays of 5000 bytes? Clearly this wouldn't work, however with size_t it will.
The simplest example is pretty dated: on an old 16-bit-int system with 64 k of RAM, the value of an int can be anywhere from -32768 to +32767, but after:
char buf[40960];
the buffer buf occupies 40 kbytes, so sizeof buf is too big to fit in an int, and it needs an unsigned int.
The same thing can happen today if you use 32-bit int but allow programs to access more than 4 GB of RAM at a time, as is the case on what are called "I32LP64" models (32 bit int, 64-bit long and pointer). Here the type size_t will have the same range as unsigned long.
You use size_t mostly for casting pointers into unsigned integers of the same size, to perform calculations on pointers as if they were integers, that would otherwise be prevented at compile time. Such code is intended to compile and build correctly in the context of different pointer sizes, e.g. 32-bit model versus 64-bit.
It is implementation defined but on 64bit systems you will find that size_t is often 64bit while int is still 32bit (unless it's ILP64 or SILP64 model).
depending on what architecture you are on (16-bit, 32-bit or 64-bit) an int could be a different size.
if you want a specific size I use uint16_t or uint32_t .... You can check out this thread for more information
What does the C++ standard state the size of int, long type to be?
size_t is a typedef defined to store object size. It can store the maximum object size that is supported by a target platform. This makes it portable.
For example:
void * memcpy(void * destination, const void * source, size_t num);
memcpy() copies num bytes from source into destination. The maximum number of bytes that can be copied depends on the platform. So, making num as type size_t makes memcpy portable.
Refer https://stackoverflow.com/a/7706240/2820412 for further details.
size_t is a typedef for one of the fundamental unsigned integer types. It could be unsigned int, unsigned long, or unsigned long long depending on the implementation.
Its special property is that it can represent the size of (in bytes) of any object (which includes the largest object possible as well!). That is one of the reasons it is widely used in the standard library for array indexing and loop counting (that also solves the portability issue). Let me illustrate this with a simple example.
Consider a vector of length 2*UINT_MAX, where UINT_MAX denotes the maximum value of unsigned int (which is 4294967295 for my implementation considering 4 bytes for unsigned int).
std::vector vec(2*UINT_MAX,0);
If you would want to fill the vector using a for-loop such as this, it would not work because unsigned int can iterate only upto the point UINT_MAX (beyond which it will start again from 0).
for(unsigned int i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
The solution here is to use size_t since it is guaranteed to represent the size of any object (and therefore our vector vec too!) in bytes. Note that for my implementation size_t is a typedef for unsigned long and therefore its max value = ULONG_MAX = 18446744073709551615 considering 8 bytes.
for(size_t i = 0; i<2*UINT_MAX; ++i) vec[i] = i;
References: https://en.cppreference.com/w/cpp/types/size_t

size guarantee for integral/arithmetic types in C and C++

I know that the C++ standard explicitly guarantees the size of only char, signed char and unsigned char. Also it gives guarantees that, say, short is at least as big as char, int as big as short etc. But no explicit guarantees about absolute value of, say, sizeof(int). This was the info in my head and I lived happily with it. Some time ago, however, I came across a comment in SO (can't find it) that in C long is guaranteed to be at least 4 bytes, and that requirement is "inherited" by C++. Is that the case? If so, what other implicit guarantees do we have for the sizes of arithmetic types in C++? Please note that I am absolutely not interested in practical guarantees across different platforms in this question, just theoretical ones.
18.2.2 guarantees that <climits> has the same contents as the C library header <limits.h>.
The ISO C90 standard is tricky to get hold of, which is a shame considering that C++ relies on it, but the section "Numerical limits" (numbered 2.2.4.2 in a random draft I tracked down on one occasion and have lying around) gives minimum values for the INT_MAX etc. constants in <limits.h>. For example ULONG_MAX must be at least 4294967295, from which we deduce that the width of long is at least 32 bits.
There are similar restrictions in the C99 standard, but of course those aren't the ones referenced by C++03.
This does not guarantee that long is at least 4 bytes, since in C and C++ "byte" is basically defined to mean "char", and it is not guaranteed that CHAR_BIT is 8 in C or C++. CHAR_BIT == 8 is guaranteed by both POSIX and Windows.
Don't know about C++. In C you have
Annex E
(informative)
Implementation limits
[#1] The contents of the header are given below,
in alphabetical order. The minimum magnitudes shown shall
be replaced by implementation-defined magnitudes with the
same sign. The values shall all be constant expressions
suitable for use in #if preprocessing directives. The
components are described further in 5.2.4.2.1.
#define CHAR_BIT 8
#define CHAR_MAX UCHAR_MAX or SCHAR_MAX
#define CHAR_MIN 0 or SCHAR_MIN
#define INT_MAX +32767
#define INT_MIN -32767
#define LONG_MAX +2147483647
#define LONG_MIN -2147483647
#define LLONG_MAX +9223372036854775807
#define LLONG_MIN -9223372036854775807
#define MB_LEN_MAX 1
#define SCHAR_MAX +127
#define SCHAR_MIN -127
#define SHRT_MAX +32767
#define SHRT_MIN -32767
#define UCHAR_MAX 255
#define USHRT_MAX 65535
#define UINT_MAX 65535
#define ULONG_MAX 4294967295
#define ULLONG_MAX 18446744073709551615
So char <= short <= int <= long <= long long
and
CHAR_BIT * sizeof (char) >= 8
CHAR_BIT * sizeof (short) >= 16
CHAR_BIT * size of (int) >= 16
CHAR_BIT * sizeof (long) >= 32
CHAR_BIT * sizeof (long long) >= 64
Yes, C++ type sizes are inherited from C89.
I can't find the specification right now. But it's in the Bible.
Be aware that the guaranteed ranges of these types are one less wide than on most machines:
signed char -127 ... +127 guranteed but most twos complement machines have -128 ... + 127
Likewise for the larger types.
There are several inaccuracies in what you read. These inaccuracies were either present in the source, or maybe you remembered it all incorrectly.
Firstly, a pedantic remark about one peculiar difference between C and C++. C language does not make any guarantees about the relative sizes of integer types (in bytes). C language only makes guarantees about their relative ranges. It is true that the range of int is always at least as large as the range of short and so on. However, it is formally allowed by C standard to have sizeof(short) > sizeof(int). In such case the extra bits in short would serve as padding bits, not used for value representation. Obviously, this is something that is merely allowed by the legal language in the standard, not something anyone is likely to encounter in practice.
In C++ on the other hand, the language specification makes guarantees about both the relative ranges and relative sizes of the types, so in C++ in addition to the above range relationship inherited from C it is guaranteed that sizeof(int) is greater or equal than sizeof(short).
Secondly, the C language standard guarantees minimum range for each integer type (these guarantees are present in both C and C++). Knowing the minimum range for the given type, you can always say how many value-forming bits this type is required to have (as minimum number of bits). For example, it is true that type long is required to have at least 32 value-forming bits in order to satisfy its range requirements. If you want to recalculate that into bytes, it will depend on what you understand under the term byte. If you are talking specifically about 8-bit bytes, then indeed type long will always consist of at least four 8-bit bytes. However, that does not mean that sizeof(long) is always at least 4, since in C/C++ terminology the term byte refers to char objects. char objects are not limited to 8-bits. It is quite possible to have 32-bit char type in some implementation, meaning that sizeof(long) in C/C++ bytes can legally be 1, for example.
The C standard do not explicitly say that long has to be at least 4 bytes, but they do specify a minimum range for the different integral types, which implies a minimum size.
For example, the minimum range of an unsigned long is 0 to 4,294,967,295. You need at least 32 bits to represent every single number in that range. So yes, the standard guarantee (indirectly) that a long is at least 32 bits.
C++ inherits the data types from C, so you have to go look at the C standard. The C++ standard actually references to parts of the C standard in this case.
Just be careful about the fact that some machines have chars that are more than 8 bits. For example, IIRC on the TI C5x, a long is 32 bits, but sizeof(long)==2 because chars, shorts and ints are all 16 bits with sizeof(char)==1.

size_t vs int in C++ and/or C

Why is it that in C++ containers, it returns a size_type rather than an int? If we're creating our own structures, should we also be encouraged to use size_type?
In general, size_t should be used whenever you are measuring the size of something. It is really strange that size_t is only required to represent between 0 and SIZE_MAX bytes and SIZE_MAX is only required to be 65,535...
The other interesting constraints from the C++ and C Standards are:
the return type of sizeof() is size_t and it is an unsigned integer
operator new() takes the number of bytes to allocate as a size_t parameter
size_t is defined in <cstddef>
SIZE_MAX is defined in <limits.h> in C99 but not mentioned in C++98?!
size_t is not included in the list of fundamental integer types so I have always assumed that size_t is a type alias for one of the fundamental types: char, short int, int, and long int.
If you are counting bytes, then you should definitely be using size_t. If you are counting the number of elements, then you should probably use size_t since this seems to be what C++ has been using. In any case, you don't want to use int - at the very least use unsigned long or unsigned long long if you are using TR1. Or... even better... typedef whatever you end up using to size_type or just include <cstddef> and use std::size_t.
A few reasons might be:
The type (size_t) can be defined as the largest unsigned integer on that platform. For example, it might be defined as a 32 bit integer or a 64 bit integer or something else altogether that's capable of storing unsigned values of a great length
To make it clear when reading a program that the value is a size and not just a "regular" int
If you're writing an app that's just for you and/or throwaway, you're probably fine to use a basic int. If you're writing a library or something substantial, size_t is probably a better way to go.
Some of the answers are more complicated than necessary. A size_t is an unsigned integer type that is guaranteed to be big enough to store the size in bytes of any object in memory. In practice, it is always the same size as the pointer type. On 32 bit systems it is 32 bits. On 64 bit systems it is 64 bits.
All containers in the stl have various typedefs. For example, value_type is the element type, and size_type is the number stored type. In this way the containers are completely generic based on platform and implementation.
If you are creating your own containers, you should use size_type too. Typically this is done
typedef std::size_t size_type;
If you want a container's size, you should write
typedef vector<int> ints;
ints v;
v.push_back(4);
ints::size_type s = v.size();
What's nice is that if later you want to use a list, just change the typedef to
typedef list<int> ints;
And it will still work!
I assume you mean "size_t" -- this is a way of indicating an unsigned integer (an integer that can only be positive, never negative) -- it makes sense for containers' sizes since you can't have an array with a size of -7. I wouldn't say that you have to use size_t but it does indicate to others using your code "This number here is always positive." It also gives you a greater range of positive numbers, but that is likely to be unimportant unless you have some very big containers.
C++ is a language that could be implemented on different hardware architectures and platforms. As time has gone by it has supported 16-, 32-, and 64-bit architecture, and likely others in the future. size_type and other type aliases are ways for libraries to insulate the programmers/code from implementation details.
Assuming the size_type uses 32 bits on 32-bit machines and 64 bits on 64-bit machines, the same source code likely would work better if you've used size_type where needed. In most cases you could assume it would be the same as unsigned int, but it's not guaranteed.
size_type is used to express capacities of STL containers like std::vector whereas size_t is used to express byte size of an object in C/C++.
ints are not guaranteed to be 4 bytes in the specification, so they are not reliable. Yes, size_type would be preferred over ints
size_t is unsigned, so even if they're both 32 bits it doesn't mean quite the same thing as an unqualified int. I'm not sure why they added the type, but on many platforms today sizeof (size_t) == sizeof (int) == sizeof (long), so which type you choose is up to you. Note that those relations aren't guaranteed by the standard and are rapidly becoming out of date as 64 bit platforms move in.
For your own code, if you need to represent something that is a "size" conceptually and can never be negative, size_t would be a fine choice.
void f1(size_t n) {
if (n <= myVector.size()) { assert(false); }
size_t n1 = n - myVector.size(); // bug! myVector.size() can be > n
do_stuff_n_times(n1);
}
void f2(int n) {
int n1 = n - static_cast<int>(myVector.size());
assert(n1 >= 0);
do_stuff_n_times(n1);
}
f1() and f2() both have the same bug, but detecting the problem in f2() is easier. For more complex code, unsigned integer arithmetic bugs are not as easy to identify.
Personally I use signed int for all my sizes unless unsigned int should be used. I have never run into situation where my size won't fit into a 32 bit signed integer. I will probably use 64 bit signed integers before I use unsigned 32 bit integers.
The problem with using signed integers for size is a lot of static_cast from size_t to int in your code.