When/where/why is a size_t not a uint?

When/where/why is a size_t not a uint? - c++

I've been getting a lot of criticism for using uint instead of size_t, but every time I check the toolchain I am working with turns out size_t is defined as a uint.
Are there any compiler implementations where size_t is actually not a uint? What are the grounds for that criticism?

size_t is the "size matching the largest possible address range you can use in the machine" (or some words to roughly that effect).
In particular, size_t will be 64 bits on a 64-bit machine, and 32 bits on a 32-bit system.
I'm assuming uint is short of unsigned int, which is pretty much universally 32 bits (these days, some older systems would be using 16-bit integers). So on a 64-bit system, an unsigned int will be 32 bits still, although memory allocations, strings, etc can be larger than 32 bits in size - which would cause problems if you are trying to use uint for the size.

Related

Why are there different names for same type of data unit?

As far as i know, in c++ on a 32bit compiler, int = __int32 = long = DWORD. But why have so many? Why not just one?
If i were to pick a name, int32 seems most appropriate since there is no confusion there as to what it could be.

int is a pre-C99 type which is guaranteed to be at least 16 bits, but is 32 bits on most modern architectures. (It was originally intended to be the "native" word size, but even on 64-bit architectures it is usually still 32 bits, largely for backwards compatibility reasons.)
long is a pre-C99 type which is guaranteed to be at least 32 bits, but is allowed to be wider. (Few compilers make it longer, even on 64-bit architectures, largely for backwards compatibility reasons.)
__int32/__int32_t is a nonstandard typedef which was implemented by many C compilers and runtime libraries, to guarantee a fixed width pre-C99.
int32_t is a C99 type which is guaranteed to be exactly 32 bits.
DWORD is a typedef from the original Windows API which is guaranteed to be exactly 32 bits, from the days when there was no language-defined type of exactly 32 bits.
So basically, the large number of ways to say "32-bit integer" come from how C dragged its feet on standardizing fixed-width types, and from the long tenure of 32-bit processors dominating the field, causing everyone to standardize on 32 bits as the "normal" integer size.

Because of legacy applications. An int doesn't describe how big it is at all. It's an integer. Big deal.
In the 16-bit era, an int was not a long. DWORD being a double-word was precise. A word is known as 2 bytes, and therefore a DWORD must be two of them.
__intXX are Microsoft specific.

So, there are lots of different reasons why different projects (e.g Microsoft Windows) uses different types.
Where compilers TODAY are typically 32-bit, this has not always been the case. And there are compilers that are 64-bit.
The term DWORD originates from way back when Windows was a 16-bit segmented mode application (many members here have probably never worked on a 16-bit segmented mode environment). It is "two 16-bit words", treated, at least these days, as an unsigned 32-bit value.
The type int32_t is defined by the C standard document (and through inheritance, also in C++). It is GUARANTEED to only exist if it is actually exactly 32 bits. On a machine with 36-bit words, there is no int32_t (there is a int32_least_t, which should exist on all systems that support AT LEAST 32 bits).
long is 32 bits in a Windows 32- or 64-bit compiler, but 64-bits in a Linux 64-bit compiler, and 32-bits in a Linux 32-bit compiler. So it's definitely "variable size".
It is also often a good idea to pick your OWN name for types. That is assuming you do care at all - it's also fine to use int, long, etc, as long as you are not RELYING on them being some size - for(i = 0; i < 10; i++) x += i; will work with i and x being any integer type - the sum is even below 128, so char would work. Using int here will be fine, since it's likely to be a "fast" type. In some architectures, using long may make the code slower - especially in 16-bit architectures where long takes up two 16-bit words and needs to be dealt with using (typically) two or more operations for addition and subtraction for example. This can really slow code down in sensitive places.

It is because they represent different types which can be translated to different sizes.
int is a default 'integer' and its size is not specified.
`int32' says it is 32 bit (four bytes integer)
long is a 'longer version integer' which can occupy larger about of bytes. On your 32bit compiler it is still 4 bytes integer. A 'long long' type, which on Windows, as I remember was __int64 was 64bit.
DWORD is a Microsoft introduced type. It is a 'double word', where word, at that time, meant 'two bytes'
You choice of int32 is good when you know that you need 32bit integer.

Why does long integer take more than 4 bytes on some systems?

I understand that the standard says that the size of a long integer is implementation dependant, but I am not sure why.
All it needs to do is to be able to store -2147483647 to 2147483647 or 0 to 4294967295.
Assuming that 1 byte is 8 bits, this should never need more than 4 bytes. Is it safe to say, then, that a long integer will take more than 4 bytes only if a byte has less than 8 bits? Or could there be other possibilities as well? Like maybe inefficient implementations wasting space?

An obvious use for a long larger than 32 bits is to have a larger range available.
For example, before long long int (and company) were in the standard, DEC was selling 64-bit (Alpha) processors and a 64-bit operating system. They built a (conforming) system with:
char = 1 byte
short = 2 bytes
int = 4 bytes
long = 8 bytes
As to why they'd do this: well, an obvious reason was so their customers would have access to a 64-bit type and take advantage of their 64-bit hardware.

The extra bytes aren't a waste of space. A larger range is quite useful. The standard specifies minimum ranges, not the precise range itself; there's nothing wrong with having wider types.
When the standard originally specified an int should be at least 16 bits, common processors had registers no larger than that. Representing a long took two registers and special operations!
But then 32 bits became the norm, and now ints are 32 bits everywhere and longs are 64. Nowadays most processors have 64-bit instructions, and a long can often be stored in a single register.

You're assuming quite a few things:
A byte is CHAR_BIT bits wide
The PDP-10 had bytes ranging from 1 to 36 bits. The DEC VAX supported operations on 128-bit integer types. So, there's plenty reason to go over and above what the standard mandates.
The limits for data types are given in §3.9.1/8
Specializations of the standard template std::numeric_limits (18.3)
shall specify the maximum and minimum values of each arithmetic type
for an implementation.
Lookup <limits> header.
This article by Jack Klein may be of interest to you!

If you want an integer of a specific size, then you want to use the types with the size specified:
int8_t
int16_t
int32_t
int64_t
int128_t
...
These are available in some random header file (it varies depending on the OS you're using, although in C++ it seems to be <stdint>)
You have the unsigned version using a u at the beginning (uint32_t).
The others already answered why the size would be so and so.
Note that the newest Intel processors support numbers of 256 bits too. What a waste, hey?! 8-)
Oh! And time_t is starting to use 64 bits too. In 2068, time_t in 32 bits will go negative and give you a date in late 1800... That's a good reason to adopt 64 bits for a few things.

One reason for using an 8 byte integer is to be able to address more than 4 gigs of memory. Ie. 2^32 = 4 gigabytes. 2^64 = well, it's a lot!
Personally, I've used 8 byte ints for implementing a radix sort on double floats (casting the floats as ints then doing magical things with it that aren't worth describing here. :))

Long Vs. Int C/C++ - What's The Point?

As I've learned recently, a long in C/C++ is the same length as an int. To put it simply, why? It seems almost pointless to even include the datatype in the language. Does it have any uses specific to it that an int doesn't have? I know we can declare a 64-bit int like so:
long long x = 0;
But why does the language choose to do it this way, rather than just making a long well...longer than an int? Other languages such as C# do this, so why not C/C++?

When writing in C or C++, every datatype is architecture and compiler specific. On one system int is 32, but you can find ones where it is 16 or 64; it's not defined, so it's up to compiler.
As for long and int, it comes from times, where standard integer was 16bit, where long was 32 bit integer - and it indeed was longer than int.

The specific guarantees are as follows:
char is at least 8 bits (1 byte by definition, however many bits it is)
short is at least 16 bits
int is at least 16 bits
long is at least 32 bits
long long (in versions of the language that support it) is at least 64 bits
Each type in the above list is at least as wide as the previous type (but may well be the same).
Thus it makes sense to use long if you need a type that's at least 32 bits, int if you need a type that's reasonably fast and at least 16 bits.
Actually, at least in C, these lower bounds are expressed in terms of ranges, not sizes. For example, the language requires that INT_MIN <= -32767, and INT_MAX >= +32767. The 16-bit requirements follows from this and from the requirement that integers are represented in binary.
C99 adds <stdint.h> and <inttypes.h>, which define types such as uint32_t, int_least32_t, and int_fast16_t; these are typedefs, usually defined as aliases for the predefined types.
(There isn't necessarily a direct relationship between size and range. An implementation could make int 32 bits, but with a range of only, say, -2**23 .. +2^23-1, with the other 8 bits (called padding bits) not contributing to the value. It's theoretically possible (but practically highly unlikely) that int could be larger than long, as long as long has at least as wide a range as int. In practice, few modern systems use padding bits, or even representations other than 2's-complement, but the standard still permits such oddities. You're more likely to encounter exotic features in embedded systems.)

long is not the same length as an int. According to the specification, long is at least as large as int. For example, on Linux x86_64 with GCC, sizeof(long) = 8, and sizeof(int) = 4.

long is not the same size as int, it is at least the same size as int. To quote the C++03 standard (3.9.1-2):
There are four signed integer types: “signed char”, “short int”,
“int”, and “long int.” In this list, each type provides at least as
much storage as those preceding it in the list. Plain ints have the
natural size suggested by the architecture of the execution
environment); the other signed integer types are provided to meet special needs.
My interpretation of this is "just use int, but if for some reason that doesn't fit your needs and you are lucky to find another integral type that's better suited, be our guest and use that one instead". One way that long might be better is if you 're on an architecture where it is... longer.

looking for something completely unrelated and stumbled across this and needed to answer. Yeah, this is old, so for people who surf on in later...
Frankly, I think all the answers on here are incomplete.
The size of a long is the size of the number of bits your processor can operate on at one time. It's also called a "word". A "half-word" is a short. A "doubleword" is a long long and is twice as large as a long (and originally was only implemented by vendors and not standard), and even bigger than a long long is a "quadword" which is twice the size of a long long but it had no formal name (and not really standard).
Now, where does the int come in? In part registers on your processor, and in part your OS. Your registers define the native sizes the CPU handles which in turn define the size of things like the short and long. Processors are also designed with a data size that is the most efficient size for it to operate on. That should be an int.
On todays 64bit machines you'd assume, since a long is a word and a word on a 64bit machine is 64bits, that a long would be 64bits and an int whatever the processor is designed to handle, but it might not be. Why? Your OS has chosen a data model and defined these data sizes for you (pretty much by how it's built). Ultimately, if you're on Windows (and using Win64) it's 32bits for both a long and int. Solaris and Linux use different definitions (the long is 64bits). These definitions are called things like ILP64, LP64, and LLP64. Windows uses LLP64 and Solaris and Linux use LP64:
Model ILP64 LP64 LLP64
int 64 32 32
long 64 64 32
pointer 64 64 64
long long 64 64 64
Where, e.g., ILP means int-long-pointer, and LLP means long-long-pointer
To get around this most compilers seem to support setting the size of an integer directly with types like int32 or int64.

What's the difference between size_t and int in C++?

In several C++ examples I see a use of the type size_t where I would have used a simple int. What's the difference, and why size_t should be better?

From the friendly Wikipedia:
The stdlib.h and stddef.h header files define a datatype called size_t which is used to represent the size of an object. Library functions that take sizes expect them to be of type size_t, and the sizeof operator evaluates to size_t.
The actual type of size_t is platform-dependent; a common mistake is to assume size_t is the same as unsigned int, which can lead to programming errors, particularly as 64-bit architectures become more prevalent.
Also, check Why size_t matters

size_t is the type used to represent sizes (as its names implies). Its platform (and even potentially implementation) dependent, and should be used only for this purpose. Obviously, representing a size, size_t is unsigned. Many stdlib functions, including malloc, sizeof and various string operation functions use size_t as a datatype.
An int is signed by default, and even though its size is also platform dependant, it will be a fixed 32bits on most modern machine (and though size_t is 64 bits on 64-bits architecture, int remain 32bits long on those architectures).
To summarize : use size_t to represent the size of an object and int (or long) in other cases.

The size_t type is defined as the unsigned integral type of the sizeof operator. In the real world, you will often see int defined as 32 bits (for backward compatibility) but size_t defined as 64 bits (so you can declare arrays and structures more than 4 GiB in size) on 64-bit platforms. If a long int is also 64-bits, this is called the LP64 convention; if long int is 32 bits but long long int and pointers are 64 bits, that’s LLP64. You also might get the reverse, a program that uses 64-bit instructions for speed, but 32-bit pointers to save memory. Also, int is signed and size_t is unsigned.
There were historically a number of other platforms where addresses were wider or shorter than the native size of int. In fact, in the ’70s and early ’80s, this was more common than not: all the popular 8-bit microcomputers had 8-bit registers and 16-bit addresses, and the transition between 16 and 32 bits also produced many machines that had addresses wider than their registers. I occasionally still see questions here about Borland Turbo C for MS-DOS, whose Huge memory mode had 20-bit addresses stored in 32 bits on a 16-bit CPU (but which could support the 32-bit instruction set of the 80386); the Motorola 68000 had a 16-bit ALU with 32-bit registers and addresses; there were IBM mainframes with 15-bit, 24-bit or 31-bit addresses. You also still see different ALU and address-bus sizes in embedded systems.
Any time int is smaller than size_t, and you try to store the size or offset of a very large file or object in an unsigned int, there is the possibility that it could overflow and cause a bug. With an int, there is also the possibility of getting a negative number. If an int or unsigned int is wider, the program will run correctly but waste memory.
You should generally use the correct type for the purpose if you want portability. A lot of people will recommend that you use signed math instead of unsigned (to avoid nasty, subtle bugs like 1U < -3). For that purpose, the standard library defines ptrdiff_t in <stddef.h> as the signed type of the result of subtracting a pointer from another.
That said, a workaround might be to bounds-check all addresses and offsets against INT_MAX and either 0 or INT_MIN as appropriate, and turn on the compiler warnings about comparing signed and unsigned quantities in case you miss any. You should always, always, always be checking your array accesses for overflow in C anyway.

It's because size_t can be anything other than an int (maybe a struct). The idea is that it decouples it's job from the underlying type.

The definition of SIZE_T is found at:
https://msdn.microsoft.com/en-us/library/cc441980.aspx and https://msdn.microsoft.com/en-us/library/cc230394.aspx
Pasting here the required information:
SIZE_T is a ULONG_PTR representing the maximum number of bytes to which a pointer can point.
This type is declared as follows:
typedef ULONG_PTR SIZE_T;
A ULONG_PTR is an unsigned long type used for pointer precision. It is used when casting a pointer to a long type to perform pointer arithmetic.
This type is declared as follows:
typedef unsigned __int3264 ULONG_PTR;

unsigned int vs. size_t

I notice that modern C and C++ code seems to use size_t instead of int/unsigned int pretty much everywhere - from parameters for C string functions to the STL. I am curious as to the reason for this and the benefits it brings.

The size_t type is the unsigned integer type that is the result of the sizeof operator (and the offsetof operator), so it is guaranteed to be big enough to contain the size of the biggest object your system can handle (e.g., a static array of 8Gb).
The size_t type may be bigger than, equal to, or smaller than an unsigned int, and your compiler might make assumptions about it for optimization.
You may find more precise information in the C99 standard, section 7.17, a draft of which is available on the Internet in pdf format, or in the C11 standard, section 7.19, also available as a pdf draft.

Classic C (the early dialect of C described by Brian Kernighan and Dennis Ritchie in The C Programming Language, Prentice-Hall, 1978) didn't provide size_t. The C standards committee introduced size_t to eliminate a portability problem
Explained in detail at embedded.com (with a very good example)

In short, size_t is never negative, and it maximizes performance because it's typedef'd to be the unsigned integer type that's big enough -- but not too big -- to represent the size of the largest possible object on the target platform.
Sizes should never be negative, and indeed size_t is an unsigned type. Also, because size_t is unsigned, you can store numbers that are roughly twice as big as in the corresponding signed type, because we can use the sign bit to represent magnitude, like all the other bits in the unsigned integer. When we gain one more bit, we are multiplying the range of numbers we can represents by a factor of about two.
So, you ask, why not just use an unsigned int? It may not be able to hold big enough numbers. In an implementation where unsigned int is 32 bits, the biggest number it can represent is 4294967295. Some processors, such as the IP16L32, can copy objects larger than 4294967295 bytes.
So, you ask, why not use an unsigned long int? It exacts a performance toll on some platforms. Standard C requires that a long occupy at least 32 bits. An IP16L32 platform implements each 32-bit long as a pair of 16-bit words. Almost all 32-bit operators on these platforms require two instructions, if not more, because they work with the 32 bits in two 16-bit chunks. For example, moving a 32-bit long usually requires two machine instructions -- one to move each 16-bit chunk.
Using size_t avoids this performance toll. According to this fantastic article, "Type size_t is a typedef that's an alias for some unsigned integer type, typically unsigned int or unsigned long, but possibly even unsigned long long. Each Standard C implementation is supposed to choose the unsigned integer that's big enough--but no bigger than needed--to represent the size of the largest possible object on the target platform."

The size_t type is the type returned by the sizeof operator. It is an unsigned integer capable of expressing the size in bytes of any memory range supported on the host machine. It is (typically) related to ptrdiff_t in that ptrdiff_t is a signed integer value such that sizeof(ptrdiff_t) and sizeof(size_t) are equal.
When writing C code you should always use size_t whenever dealing with memory ranges.
The int type on the other hand is basically defined as the size of the (signed) integer value that the host machine can use to most efficiently perform integer arithmetic. For example, on many older PC type computers the value sizeof(size_t) would be 4 (bytes) but sizeof(int) would be 2 (byte). 16 bit arithmetic was faster than 32 bit arithmetic, though the CPU could handle a (logical) memory space of up to 4 GiB.
Use the int type only when you care about efficiency as its actual precision depends strongly on both compiler options and machine architecture. In particular the C standard specifies the following invariants: sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) placing no other limitations on the actual representation of the precision available to the programmer for each of these primitive types.
Note: This is NOT the same as in Java (which actually specifies the bit precision for each of the types 'char', 'byte', 'short', 'int' and 'long').

Type size_t must be big enough to store the size of any possible object. Unsigned int doesn't have to satisfy that condition.
For example in 64 bit systems int and unsigned int may be 32 bit wide, but size_t must be big enough to store numbers bigger than 4G

This excerpt from the glibc manual 0.02 may also be relevant when researching the topic:
There is a potential problem with the size_t type and versions of GCC prior to release 2.4. ANSI C requires that size_t always be an unsigned type. For compatibility with existing systems' header files, GCC defines size_t in stddef.h' to be whatever type the system'ssys/types.h' defines it to be. Most Unix systems that define size_t in `sys/types.h', define it to be a signed type. Some code in the library depends on size_t being an unsigned type, and will not work correctly if it is signed.
The GNU C library code which expects size_t to be unsigned is correct. The definition of size_t as a signed type is incorrect. We plan that in version 2.4, GCC will always define size_t as an unsigned type, and the fixincludes' script will massage the system'ssys/types.h' so as not to conflict with this.
In the meantime, we work around this problem by telling GCC explicitly to use an unsigned type for size_t when compiling the GNU C library. `configure' will automatically detect what type GCC uses for size_t arrange to override it if necessary.

If my compiler is set to 32 bit, size_t is nothing other than a typedef for unsigned int. If my compiler is set to 64 bit, size_t is nothing other than a typedef for unsigned long long.

size_t is the size of a pointer.
So in 32 bits or the common ILP32 (integer, long, pointer) model size_t is 32 bits.
and in 64 bits or the common LP64 (long, pointer) model size_t is 64 bits (integers are still 32 bits).
There are other models but these are the ones that g++ use (at least by default)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js