guidelines on usage of size_t and offset_t?

guidelines on usage of size_t and offset_t? - c++

This is probably a C++ 101 question: I'm curious what the guidelines are for using size_t and offset_t, e.g. what situations they are intended for, what situations they are not intended for, etc. I haven't done a lot of portable programming, so I have typically just used something like int or unsigned int for array sizes, indexes, and the like. However, I gather it's preferable to use some of these more standard typedefs when possible, so I'd like to know how to do that properly.
As a follow-up question, for development on Windows using Visual Studio 2008, where should I look to find the actual typedefs? I've found size_t defined in a number of headers within the VS installation directory, so I'm not sure which of those I should use, and I can't find offset_t anywhere.

You are probably referring to off_t, not offset_t. off_t is a POSIX type, not a C type, and it is used to denote file offsets (allowing 64-bit file offsets even on 32-bit systems). C99 has superceded that with fpos_t.
size_t is meant to count bytes or array elements. It matches the address space.

Instead of offset_t do you mean ptrdiff_t? This is the type returned by such routines as std::distance. My understanding is that size_t is unsigned (to match the address space as previously mentioned) whereas ptrdiff_t is signed (to theoretically denote "backwards" distances between pointers, though this is very rarely used).

offset_t isn't mentioned at all in my copy of the C++ standard.
size_t on the other hand, is meant simply to denote object or array sizes. A size_t is guaranteed to be big enough to store the size of any in-memory object, so it should be used for that.

You use size_t whenever the C++ language specification indicates that it is used in a particular expression. For example, you'd use size_t to store the return value of sizeof, or to represent the number of elements in an array (new[] takes size_t).
I've no idea what offset_t is - it's not mentioned once in ISO C++ spec.

Dragging an old one back up... offset_t (which is a long long) is used by Solaris' llseek() system call, and is to be used anywhere you're likely to run into real 64-bit file offsets... meaning you're working with really huge files.

Related

Does "size_t" make code portable, or just 'more' portable?

I'm struggling to understand the usefulness of the C++ std::size_t data type. I realize that this data type is platform dependent, and is supposedly meant to make code more portable. However, it seems like it doesn't solve all the problems.
Say for example I'm working on a machine that has 32 bit int. Let's say I decide to write a c style function of this machine that just copies the bytes from one object to another. Inside this function, the memcpy function is used to write the data from object2 to object1. I've chosen an arbitrarily large number.
void writeBytes(obj *pobj1, obj *pobj2)
{
memcpy(pobj1, pobj2, 1048575);
}
This code should (hopefully) compile just fine. Because memcpy uses size_t in its declaration, and because size_t on this platform should be 32 bits, the number of 1048575 should work just fine.
But now let's say I decide to port this function over to a machine that 16 bit ints. Now the memcpy function interprets size_t as being of size 16. In this case, 1048575 is exceeds the allowed values for what memcpy was declared for. The code then fails to compile.
So my question: how exactly was size_t useful in this case? How did it make our code more portable?

size_t is able to hold the size of the largest object you can create. It is not required to be your platform's largest native integer type.
Given your example, your code would work regardless of the native integer being 16-bit or 32-bit if your platform allows 1048575 byte objects. Or the inverse -- if 1048575 doesn't fit in a size_t, you never could have created an object that large to memcpy.

size_t is susceptible to the same overflow and underflow rules as any other integral type. It's only typedef of an unsigned integral. Its purpose is not to prevent you from assigning or casting values that are outside of its range. It defines a standard type that will:
"store the maximum size of a theoretically possible object of any type
(including array)."
If you care about that maximum size, use
std::numeric_limits<std::size_t>::max()
and make decisions off of that.

I think size_t makes (not only your) code more readable and consistent, not necessarily more portable.
It will help porting it, however.
Imagine functions working with some size use a happy mixture of int, short, long, unsigned, unsigned short, ...
When one such function dealing with a size calls another that needs a size parameter too, it's very helpful when one size_t fits all (or at least most, like for the result of read()).

What are the usage differences between size_t and off_t?

Other than the size of the values that each type can hold, what are the main differences in usage between size_t and off_t? Is it just a convention that size_t types are used for absolute sizes and off_t types are used for offsets? Or does it go deeper than that?
I am writing a wrapper class to enable the writing of large files using mmap and I want to know what the best types are to use for their arguments. Given that I want to write to files > 4GB, I'm tempted to use size_t for everything, but is that the best practice? (or should I be using some off64_t types for certain functions?)
For example, should my writeAt function be declared as:
MMapWriter::writeAt(off64_t offset, const void* src, size_t size)
or
MMapWriter::writeAt(size_t offset, const void* src, size_t size)

size_t is for objects, off_t is for files.
mmap merges the two concepts, pretty much by definition. Personally I think I'd use size_t, since no matter what else it is, a mapped file is also an array in (virtual) memory.
size_t is standard C++, off_t is Posix, and off64_t is a GNU extension that goes with the functions fopen64, ftello64, etc. I think it should always be the same type as off_t on 64 bit GNU systems, but don't bet your company on that without checking.
Should it be relevant, off_t is signed whereas size_t is unsigned. But the signed counterpart to size_t is ptrdiff_t, so when you need a signed type it doesn't automatically mean you should use off_t or off64_t.

size_t is part of the C++ (and C) standards, and refers to the type of a sizeof expression. off_t is defined by the Posix standard, and refers to the size of a file.

Good rule of thumb for this scenario. Use whatever function signature that results in you using the least amount of explicit casting, be it c style or c++ style. If you have to cast, c++ style is safer as it is more limited in the types of cast it can do.
The benefit of this is if you port to a platform where the types don't match up (whether it be a signed, size or endianness issue etc.), you should catch most of the bugs at compile time, rather than at runtime. Casting is the sledgehammer approach for squeezing a triangular shaped object into a round hole (more or less your telling the compiler to keep quiet, I know what I'm doing).
Trying to find casting issues at runtime can be a pain as it can be hard to reproduce. It's better to find issues at compile-time rather than runtime. The compiler is your friend.

What are the arguments against using size_t?

I have a API like this,
class IoType {
......
StatusType writeBytes(......, size_t& bytesWritten);
StatusType writeObjects(......, size_t& objsWritten);
};
A senior member of the team who I respect seems to have a problem with the type size_t and suggest that I use C99 types. I know it sounds stupid but I always think c99 types like uint32_t and uint64_t look ugly. I do use them but only when it's really necessary, for instance when I need to serialize/deserialize a structure, I do want to be specific about the sizes of my data members.
What are the arguments against using size_t? I know it's not a real type but if I know for sure even a 32-bit integer is enough for me and a size type seems to be appropriate for number of bytes or number of objects, etc.

Use exact-size types like uint32_t whenever you're dealing with serialization of any sort (binary files, networking, etc.). Use size_t whenever you're dealing with the size of an object in memory—that's what it's intended for. All of the functions that deal with object sizes, like malloc, strlen, and the sizeof operator all size_t.
If you use size_t correctly, your program will be maximally portable, and it will not waste time and memory on platforms where it doesn't need to. On 32-bit platforms, a size_t will be 32 bits—if you instead used a uint64_t, you'd waste time and space. Conversely, on 64-bit platforms, a size_t will be 64 bits—if you instead used a uint32_t, your program could behave incorrectly (maybe even crash or open up a security vulnerability) if it ever had to deal with a piece of memory larger than 4 GB.

I can't think of anything wrong in using size_t in contexts where you don't need to serialize values. Also using size_t correctly will increase the code's safety/portability across 32 and 64 bit patforms.

Uhm, it's not a good idea to replace size_t (a maximally portable thing) with a less portable C99 fixed size or minimum size unsigned type.
On the other hand, you can avoid a lot of technical problems (wasted time) by using the signed ptrdiff_t type instead. The standard library’s use of unsigned type is just for historical reasons. It made sense in its day, and even today on 16-bit architectures, but generally it is nothing but trouble & verbosity.
Making that change requires some support, though, in particular a general size function that returns array or container size as ptrdiff_t.
Now, regarding your function signature
StatusType writeBytes(......, size_t& bytesWritten);
This forces the calling code’s choice of type for the bytes written count.
And then, with unsigned type size_t forced, it is easy to introduce a bug, e.g. by checking if that is less or more than some computed quantity.
A grotesque example: std::string("ah").length() < -5 is guaranteed true.
So instead, make that …
Size writeBytes(......);
or, if you do not want to use exceptions,
Size writeBytes(......, StatusType& status );
It is OK to have an enumeration of possible statuses as unsigned type or as whatever, because the only operations on status values will be equality checking and possibly as keys.

Getting a pointer to a 4-byte object.. in an implementation independent way

I was programming normally when I realized that its probably not perfectly safe to assume an int is going to be a pointer to something 4 bytes in length.
Because Some of the aspects of C++’s fundamental types, such as the size of an int, are implementation- defined..
What if you're dealing with something (like a waveform, for example) that has 32-bit signed integer samples. You cast the byte pointer to (int*) and deal with it one sample at a time.
I'm just curious what's the "safe way" to acquire a 4-byte pointer, that ISN'T going to stop working if sometime in the future MSVC committee decides int is now 8 bytes.
Related

There is a C99 header called stdint.h your compiler might have. It defines types like uint32_t, an unsigned 32-bit integer.
Since C++11, your compiler is required to have this header. You should include it with #include <cstdint>.
If not, check out Boost Integer, which mimics this header as <boost/cstdint.hpp>.
For storing pointers as integers, use intptr_t, defined in the same header.

Use a pointer to uint32_t instead of int.
this type (and others) is defined in stdint.h and is part of the C99 standard

One way I've seen it done is abstracting out the size with precompiler directives and typedefs. Then you use the abstracted types which will be correct for the set of systems you want to support.

Perhaps you could just use an assert on the sizeof(int) so that at least if your assumptions are violated in future you'll know.

By far the easiest solution is to get a char* to a char[4]. On each and every platform, char[4] is a 4-byte object. For a entire waveform, you might need a char[4*512]

Where do I find the definition of size_t?

I see variables defined with this type but I don't know where it comes from, nor what is its purpose. Why not use int or unsigned int? (What about other "similar" types? Void_t, etc).

From Wikipedia
The stdlib.h and stddef.h header files define a datatype called size_t1 which is used to represent the size of an object. Library functions that take sizes expect them to be of type size_t, and the sizeof operator evaluates to size_t.
The actual type of size_t is platform-dependent; a common mistake is to assume size_t is the same as unsigned int, which can lead to programming errors,2 particularly as 64-bit architectures become more prevalent.
From C99 7.17.1/2
The following types and macros are defined in the standard header stddef.h
<snip>
size_t
which is the unsigned integer type of the result of the sizeof operator

According to size_t description on en.cppreference.com size_t is defined in the following headers :
std::size_t
...
Defined in header <cstddef>
Defined in header <cstdio>
Defined in header <cstring>
Defined in header <ctime>
Defined in header <cwchar>

size_t is the unsigned integer type of the result of the sizeof operator (ISO C99 Section 7.17.)
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. The value of the result is implementation-deﬁned, and its type (an unsigned integer type) is size_t (ISO C99 Section 6.5.3.4.)
IEEE Std 1003.1-2017 (POSIX.1) specifies that size_t be defined in the header sys/types.h, whereas ISO C specifies the header stddef.h. In ISO C++, the type std::size_t is deﬁned in the standard header cstddef.

Practically speaking size_t represents the number of bytes you can address. On most modern architectures for the last 10-15 years that has been 32 bits which has also been the size of a unsigned int. However we are moving to 64bit addressing while the uint will most likely stay at 32bits (it's size is not guaranteed in the c++ standard). To make your code that depends on the memory size portable across architectures you should use a size_t. For example things like array sizes should always use size_t's. If you look at the standard containers the ::size() always returns a size_t.
Also note, visual studio has a compile option that can check for these types of errors called "Detect 64-bit Portability Issues".

This way you always know what the size is, because a specific type is dedicated to sizes. The very own question shows that it can be an issue: is it an int or an unsigned int? Also, what is the magnitude (short, int, long, etc.)?
Because there is a specific type assigned, you don't have to worry about the length or the signed-ness.
The actual definition can be found in the C++ Reference Library, which says:
Type: size_t (Unsigned integral type)
Header: <cstring>
size_t corresponds to the integral data type returned by the language operator sizeof and is defined in the <cstring> header file (among others) as an unsigned integral type.
In <cstring>, it is used as the type of the parameter num in the functions memchr, memcmp, memcpy, memmove, memset, strncat, strncmp, strncpy and strxfrm, which in all cases it is used to specify the maximum number of bytes or characters the function has to affect.
It is also used as the return type for strcspn, strlen, strspn and strxfrm to return sizes and lengths.

size_t should be defined in your standard library's headers. In my experience, it usually is simply a typedef to unsigned int. The point, though, is that it doesn't have to be.
Types like size_t allow the standard library vendor the freedom to change its underlying data types if appropriate for the platform. If you assume size_t is always unsigned int (via casting, etc), you could run into problems in the future if your vendor changes size_t to be e.g. a 64-bit type. It is dangerous to assume anything about this or any other library type for this reason.

I'm not familiar with void_t except as a result of a Google search (it's used in a vmalloc library by Kiem-Phong Vo at AT&T Research - I'm sure it's used in other libraries as well).
The various xxx_t typedefs are used to abstract a type from a particular definite implementation, since the concrete types used for certain things might differ from one platform to another. For example:
size_t abstracts the type used to hold the size of objects because on some systems this will be a 32-bit value, on others it might be 16-bit or 64-bit.
Void_t abstracts the type of pointer returned by the vmalloc library routines because it was written to work on systems that pre-date ANSI/ISO C where the void keyword might not exist. At least that's what I'd guess.
wchar_t abstracts the type used for wide characters since on some systems it will be a 16 bit type, on others it will be a 32 bit type.
So if you write your wide character handling code to use the wchar_t type instead of, say unsigned short, that code will presumably be more portable to various platforms.

In minimalistic programs where a size_t definition was not loaded "by chance" in some include but I still need it in some context (for example to access std::vector<double>), then I use that context to extract the correct type. For example typedef std::vector<double>::size_type size_t.
(Surround with namespace {...} if necessary to make the scope limited.)

As for "Why not use int or unsigned int?", simply because it's semantically more meaningful not to. There's the practical reason that it can be, say, typedefd as an int and then upgraded to a long later, without anyone having to change their code, of course, but more fundamentally than that a type is supposed to be meaningful. To vastly simplify, a variable of type size_t is suitable for, and used for, containing the sizes of things, just like time_t is suitable for containing time values. How these are actually implemented should quite properly be the implementation's job. Compared to just calling everything int, using meaningful typenames like this helps clarify the meaning and intent of your program, just like any rich set of types does.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

guidelines on usage of size_t and offset_t? - c++

You are probably referring to off_t, not offset_t. off_t is a POSIX type, not a C type, and it is used to denote file offsets (allowing 64-bit file offsets even on 32-bit systems). C99 has superceded that with fpos_t. size_t is meant to count bytes or array elements. It matches the address space.

offset_t isn't mentioned at all in my copy of the C++ standard. size_t on the other hand, is meant simply to denote object or array sizes. A size_t is guaranteed to be big enough to store the size of any in-memory object, so it should be used for that.

Dragging an old one back up... offset_t (which is a long long) is used by Solaris' llseek() system call, and is to be used anywhere you're likely to run into real 64-bit file offsets... meaning you're working with really huge files.

Related

Does "size_t" make code portable, or just 'more' portable?

What are the usage differences between size_t and off_t?

What are the arguments against using size_t?

Getting a pointer to a 4-byte object.. in an implementation independent way

Where do I find the definition of size_t?

Categories

Resources