Where do I find the definition of size_t? - c++

I see variables defined with this type but I don't know where it comes from, nor what is its purpose. Why not use int or unsigned int? (What about other "similar" types? Void_t, etc).

From Wikipedia
The stdlib.h and stddef.h header files define a datatype called size_t1 which is used to represent the size of an object. Library functions that take sizes expect them to be of type size_t, and the sizeof operator evaluates to size_t.
The actual type of size_t is platform-dependent; a common mistake is to assume size_t is the same as unsigned int, which can lead to programming errors,2 particularly as 64-bit architectures become more prevalent.
From C99 7.17.1/2
The following types and macros are defined in the standard header stddef.h
<snip>
size_t
which is the unsigned integer type of the result of the sizeof operator

According to size_t description on en.cppreference.com size_t is defined in the following headers :
std::size_t
...
Defined in header <cstddef>
Defined in header <cstdio>
Defined in header <cstring>
Defined in header <ctime>
Defined in header <cwchar>

size_t is the unsigned integer type of the result of the sizeof operator (ISO C99 Section 7.17.)
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. The value of the result is implementation-defined, and its type (an unsigned integer type) is size_t (ISO C99 Section 6.5.3.4.)
IEEE Std 1003.1-2017 (POSIX.1) specifies that size_t be defined in the header sys/types.h, whereas ISO C specifies the header stddef.h. In ISO C++, the type std::size_t is defined in the standard header cstddef.

Practically speaking size_t represents the number of bytes you can address. On most modern architectures for the last 10-15 years that has been 32 bits which has also been the size of a unsigned int. However we are moving to 64bit addressing while the uint will most likely stay at 32bits (it's size is not guaranteed in the c++ standard). To make your code that depends on the memory size portable across architectures you should use a size_t. For example things like array sizes should always use size_t's. If you look at the standard containers the ::size() always returns a size_t.
Also note, visual studio has a compile option that can check for these types of errors called "Detect 64-bit Portability Issues".

This way you always know what the size is, because a specific type is dedicated to sizes. The very own question shows that it can be an issue: is it an int or an unsigned int? Also, what is the magnitude (short, int, long, etc.)?
Because there is a specific type assigned, you don't have to worry about the length or the signed-ness.
The actual definition can be found in the C++ Reference Library, which says:
Type: size_t (Unsigned integral type)
Header: <cstring>
size_t corresponds to the integral data type returned by the language operator sizeof and is defined in the <cstring> header file (among others) as an unsigned integral type.
In <cstring>, it is used as the type of the parameter num in the functions memchr, memcmp, memcpy, memmove, memset, strncat, strncmp, strncpy and strxfrm, which in all cases it is used to specify the maximum number of bytes or characters the function has to affect.
It is also used as the return type for strcspn, strlen, strspn and strxfrm to return sizes and lengths.

size_t should be defined in your standard library's headers. In my experience, it usually is simply a typedef to unsigned int. The point, though, is that it doesn't have to be.
Types like size_t allow the standard library vendor the freedom to change its underlying data types if appropriate for the platform. If you assume size_t is always unsigned int (via casting, etc), you could run into problems in the future if your vendor changes size_t to be e.g. a 64-bit type. It is dangerous to assume anything about this or any other library type for this reason.

I'm not familiar with void_t except as a result of a Google search (it's used in a vmalloc library by Kiem-Phong Vo at AT&T Research - I'm sure it's used in other libraries as well).
The various xxx_t typedefs are used to abstract a type from a particular definite implementation, since the concrete types used for certain things might differ from one platform to another. For example:
size_t abstracts the type used to hold the size of objects because on some systems this will be a 32-bit value, on others it might be 16-bit or 64-bit.
Void_t abstracts the type of pointer returned by the vmalloc library routines because it was written to work on systems that pre-date ANSI/ISO C where the void keyword might not exist. At least that's what I'd guess.
wchar_t abstracts the type used for wide characters since on some systems it will be a 16 bit type, on others it will be a 32 bit type.
So if you write your wide character handling code to use the wchar_t type instead of, say unsigned short, that code will presumably be more portable to various platforms.

In minimalistic programs where a size_t definition was not loaded "by chance" in some include but I still need it in some context (for example to access std::vector<double>), then I use that context to extract the correct type. For example typedef std::vector<double>::size_type size_t.
(Surround with namespace {...} if necessary to make the scope limited.)

As for "Why not use int or unsigned int?", simply because it's semantically more meaningful not to. There's the practical reason that it can be, say, typedefd as an int and then upgraded to a long later, without anyone having to change their code, of course, but more fundamentally than that a type is supposed to be meaningful. To vastly simplify, a variable of type size_t is suitable for, and used for, containing the sizes of things, just like time_t is suitable for containing time values. How these are actually implemented should quite properly be the implementation's job. Compared to just calling everything int, using meaningful typenames like this helps clarify the meaning and intent of your program, just like any rich set of types does.

Related

What do I declare message length as? int or char?

I am relatively new to coding and would like to know how do I declare message length (mlen) and ciphertext (clen) length in my c++ code. However, I am not too sure what am I suppose to declare them as. (int? char? unsigned long long?)
The formula that was given to me to include in my code is:
*clen = mlen + CRYPTO_ABYTES
Information that was given to me is:
It operates on a state of 320-bits with message blocks of 64 bits.
UPDATE: Sorry for the bad question, I realized I was given unsigned long long for the message length, it was written in a smaller font that I did not realized it.
If there is no strict requirement regarding how big your type has to be, you should use the native C++ type designed to represent sizes:
std::size_t
The main advantages of std::size_t over types such as unsigned int, unsigned short etc is:
It is capable of holding the maximum possible size of any C++ type, including arrays (since it's defined as the result type of the sizeof operator).
As such, it's often optimized to your architecture (e.g. RAM address space).
It is typically used as the type to represent indices and lengths in new[], std::vector, and large parts of the STL in general. As you are likely going to use such data structures and algorithms, you can avoid useless conversions between different types, making your code more readable, without compiler warnings.
The type std::size_t in your code tells you something about its semantics (indices and sizes; unsigned implied). Other integer types such as unsigned int can be used to represent anything.
For additional information, check out cppreference.com.
On the other hand, if you need to pack your data tightly (e.g. for network protocol), you should use the smallest possible type such as std::uint16_t, std::uint32_t etc.

Content within types.h — where does the compiler define the width of int, signed int and others?

I read both /usr/include/bits/types.h and /usr/include/sys/types.h, but only see them use "unsigned int" or "signed int" to define some other relatively rarely used type for us, e.g:
typedef signed char __int8_t;
...
typedef signed int __int32_t;
or:
#define __S32_TYPE int
#define __U32_TYPE unsigned int;
As to "where is the signed int (or int) originally defined?" and "in which file, gcc decide the int should be 4 bytes width in my x86-64 server"? I cannot find anything.
I am wondering the process in which the gcc/g++ compiler define these primitive type for us, and want to see the originally definition file.
Please tell me the originally position or enlighten me about some method to find them.
int, unsigned int, long and some others are system type, they are defined by compiler itself. Compiler has some demands on those type, for instance int must be at least 16 bits, but compiler may make it longer. Usually int means most efficient integral type of at least 16 bits.
You should not rely on actual size of int, if you need it to hold more than 32767, please stick to long or long long type. If you need certain amount of bits integral due to desired overflow behavior, you can use uint16_t/uint32_t types. If you want to make sure there is at least certain amount of bits, you can also use uint_fast16_t/uint_fast32_t.
The basic types are intrinsic to the compiler; they are built in when the compiler is compiled, and are not defined anywhere you can find it easily. (Somewhere in the code there is the relevant information, but it won't be particularly easy to find.)
Thus, you won't find the information in a header directly. You can get the size information from the sizeof() operator. You can infer sizes from the macros in <limits.h> and <float.h>.

What are the usage differences between size_t and off_t?

Other than the size of the values that each type can hold, what are the main differences in usage between size_t and off_t? Is it just a convention that size_t types are used for absolute sizes and off_t types are used for offsets? Or does it go deeper than that?
I am writing a wrapper class to enable the writing of large files using mmap and I want to know what the best types are to use for their arguments. Given that I want to write to files > 4GB, I'm tempted to use size_t for everything, but is that the best practice? (or should I be using some off64_t types for certain functions?)
For example, should my writeAt function be declared as:
MMapWriter::writeAt(off64_t offset, const void* src, size_t size)
or
MMapWriter::writeAt(size_t offset, const void* src, size_t size)
size_t is for objects, off_t is for files.
mmap merges the two concepts, pretty much by definition. Personally I think I'd use size_t, since no matter what else it is, a mapped file is also an array in (virtual) memory.
size_t is standard C++, off_t is Posix, and off64_t is a GNU extension that goes with the functions fopen64, ftello64, etc. I think it should always be the same type as off_t on 64 bit GNU systems, but don't bet your company on that without checking.
Should it be relevant, off_t is signed whereas size_t is unsigned. But the signed counterpart to size_t is ptrdiff_t, so when you need a signed type it doesn't automatically mean you should use off_t or off64_t.
size_t is part of the C++ (and C) standards, and refers to the type of a sizeof expression. off_t is defined by the Posix standard, and refers to the size of a file.
Good rule of thumb for this scenario. Use whatever function signature that results in you using the least amount of explicit casting, be it c style or c++ style. If you have to cast, c++ style is safer as it is more limited in the types of cast it can do.
The benefit of this is if you port to a platform where the types don't match up (whether it be a signed, size or endianness issue etc.), you should catch most of the bugs at compile time, rather than at runtime. Casting is the sledgehammer approach for squeezing a triangular shaped object into a round hole (more or less your telling the compiler to keep quiet, I know what I'm doing).
Trying to find casting issues at runtime can be a pain as it can be hard to reproduce. It's better to find issues at compile-time rather than runtime. The compiler is your friend.

Getting a pointer to a 4-byte object.. in an implementation independent way

I was programming normally when I realized that its probably not perfectly safe to assume an int is going to be a pointer to something 4 bytes in length.
Because Some of the aspects of C++’s fundamental types, such as the size of an int, are implementation- defined..
What if you're dealing with something (like a waveform, for example) that has 32-bit signed integer samples. You cast the byte pointer to (int*) and deal with it one sample at a time.
I'm just curious what's the "safe way" to acquire a 4-byte pointer, that ISN'T going to stop working if sometime in the future MSVC committee decides int is now 8 bytes.
Related
There is a C99 header called stdint.h your compiler might have. It defines types like uint32_t, an unsigned 32-bit integer.
Since C++11, your compiler is required to have this header. You should include it with #include <cstdint>.
If not, check out Boost Integer, which mimics this header as <boost/cstdint.hpp>.
For storing pointers as integers, use intptr_t, defined in the same header.
Use a pointer to uint32_t instead of int.
this type (and others) is defined in stdint.h and is part of the C99 standard
One way I've seen it done is abstracting out the size with precompiler directives and typedefs. Then you use the abstracted types which will be correct for the set of systems you want to support.
Perhaps you could just use an assert on the sizeof(int) so that at least if your assumptions are violated in future you'll know.
By far the easiest solution is to get a char* to a char[4]. On each and every platform, char[4] is a 4-byte object. For a entire waveform, you might need a char[4*512]

guidelines on usage of size_t and offset_t?

This is probably a C++ 101 question: I'm curious what the guidelines are for using size_t and offset_t, e.g. what situations they are intended for, what situations they are not intended for, etc. I haven't done a lot of portable programming, so I have typically just used something like int or unsigned int for array sizes, indexes, and the like. However, I gather it's preferable to use some of these more standard typedefs when possible, so I'd like to know how to do that properly.
As a follow-up question, for development on Windows using Visual Studio 2008, where should I look to find the actual typedefs? I've found size_t defined in a number of headers within the VS installation directory, so I'm not sure which of those I should use, and I can't find offset_t anywhere.
You are probably referring to off_t, not offset_t. off_t is a POSIX type, not a C type, and it is used to denote file offsets (allowing 64-bit file offsets even on 32-bit systems). C99 has superceded that with fpos_t.
size_t is meant to count bytes or array elements. It matches the address space.
Instead of offset_t do you mean ptrdiff_t? This is the type returned by such routines as std::distance. My understanding is that size_t is unsigned (to match the address space as previously mentioned) whereas ptrdiff_t is signed (to theoretically denote "backwards" distances between pointers, though this is very rarely used).
offset_t isn't mentioned at all in my copy of the C++ standard.
size_t on the other hand, is meant simply to denote object or array sizes. A size_t is guaranteed to be big enough to store the size of any in-memory object, so it should be used for that.
You use size_t whenever the C++ language specification indicates that it is used in a particular expression. For example, you'd use size_t to store the return value of sizeof, or to represent the number of elements in an array (new[] takes size_t).
I've no idea what offset_t is - it's not mentioned once in ISO C++ spec.
Dragging an old one back up... offset_t (which is a long long) is used by Solaris' llseek() system call, and is to be used anywhere you're likely to run into real 64-bit file offsets... meaning you're working with really huge files.