Is it appropriate to use off_t for non-byte offsets?

Is it appropriate to use off_t for non-byte offsets? - c++

Suppose I'm writing a function which takes a float a[] and an offset, into this array, and returns the element at that offset. Is it reasonable to use the signature
float foo(float* a, off_t offset);
for it? Or is off_t only relevant to offsets in bytes, rather than pointer arithmetic with aribtrary element sizes? i.e. is it reasonable to say a[offset] when offset is of type off_t?
The GNU C Library Reference Manual says:
off_t
This is a signed integer type used to represent file sizes.
but that doesn't tell me much.
My intuition is that the answer is "no", since the actual address used in a[offset] is the address of a + sizeof(float) * offset , so "sizeof(float) * offset" is an off_t, and sizeof(float) is a size_t, and both are constants with 'dimensions'.
Note: The offset might be negative.

Is there any good reason why you just don't use int? It's the
default type for integral values in C++, and should be used
unless there is a good reason not to.
Of course, one good reason could be that it might overflow. If
the context is such that you could end up with very large
arrays, you might want to use ptrdiff_t, which is defined (in
C and C++) as the type resulting from the subtraction of two
pointers: in other words, it is guaranteed not to overflow (when
used as an offset) for all types with a size greater than 1.

You could use size_t or ptrdiff_t as the type of an index (your second parameter is more an index inside a float array than an offset).
Your use is an index, not an offset. Notice that the standard offsetof macro is defined to return byte offsets!
In practice, you could even use int or unsigned, unless you believe your array could have billions of components.
You may want to #include <stdint.h> (or <cstdint> with a recent C++) and have explicitly sized types like int32_t for your indexes.
For source readability reasons, you might define
typedef unsigned index_t;
and later use it, e.g.
float foo(float a[], index_t i);
My opinion is that you just should use int as the type of your indexes. (but handle out-of-bound indexes appropriately).

I would say it is not appropriate, since
off_t is (intended to be) used to represent file sizes
off_t is a signed type.
I would go for size_type (usually a "typedef"ed name for size_t), which is the one used by std containers.

Perhaps the answer is to use ptrdiff_t? It...
can be negative;
alludes to the difference not being in bytes, but in units of arbitrary size depending on the element type.
What do you think?

Related

Get the largest signed integer type in C++

Let's say I have to accept a size as an argument to an interface which deals with arrays. For e.x.
void doSomethingRelatedToArrays(const size_t length)
Here, I used size_t with the following in mind:
length must always be positive
size_t is always typedef-ed to the largest unsigned integer type in the system. std::size_t can store the maximum size of a theoretically possible object of any type.
However, I should not use unsigned types in my interfaces because the client can pass in a negative number, which is implicitly converted to an unsigned number and I do not have any way of validating it in my method. Refer to Scott Meyer's article on this subject here.
So, I should pass in signed integer type to the API. But how can I get the largest signed integer type in the system? Is there any typedef similar to size_t which is signed? Or should I just use size_t instead?

The signed equivalent of size_t is simply ssize_t but this type is not defined in C99 standard even if it is known to many compilers.
For the C99 standard the largest signed integer type is defined as intmax_t.
Reference : 7. Library / 7.18 Integer types / 7.18.1.5 Greatest-width integer types

The standard type to use is std::intmax_t which is defined in <cstdint>. To get the maximum value you can use std::numeric_limits<intmax_t>::max().
Sample code:
#include <cstdint>
#include <iostream>
#include <limits>
int main(int argc, char* argv[]) {
std::cout << "max size = " << std::numeric_limits<intmax_t>::max() << std::endl;
}

The largest signed integer type in recent C and C++ standards is long long. Until there is a type wider than long long you can always use it. If you want to be more future proof, use intmax_t.
size_t is always typedef-ed to the largest unsigned integer type in the system
That's completely incorrect!!! ❌
size_t in 32-bit systems is typically also a 32-bit type, which obviously not the widest type possible. It's only guaranteed to be big enough to represent the size of the biggest object on the system. long long is obviously much wider in that cases. In many 16-bit systems it has only 16 bits, not even the same range as long.
In case size_t is needed, ptrdiff_t can be used as a signed counterpart. But it's not the biggest type either.
For char arrays shorter than PTRDIFF_MAX, std::ptrdiff_t acts as the signed counterpart of std::size_t: it can store the size of the array of any type and is, on most platforms, synonymous with std::intptr_t
http://en.cppreference.com/w/cpp/types/ptrdiff_t
In C++20 std::ssize was introduced and guess what, it also uses ptrdiff_t as the signed counterpart of size_t

But how can I get the largest signed integer type in the system?
In C++11's <cstdint> you can find type intmax_t, which is defined after the C standard (7.20.1.5):
The following type designates a signed integer type capable of
representing any value of any signed integer type
Is there any typedef similar to size_t which is signed?
No, not in the C++ standard. POSIX defines ssize_t, but:
it is not meant to replace size_t (which itself is also more restricted by POSIX)
is not meant to store/pass negative values, but to indicate error:
ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}].
So, I should pass in signed integer type to the API. (...) Or should I
just use size_t instead?
If your interface is meant to deal with C++ arrays, size_t is the only type that is assured by the standard to be able to hold any of the possible array indexes. Using any other type you might (in theory) loose ability to address all of the array (which is even noted in the article you linked).
Using size_t for indexing purposes is therefore common and customary - and used by many libraries (including STL).

why is sizeof(ptrdiff_t) == sizeof(uintptr_t)

I see several posts (such as size_t vs. uintptr_t) about size_t versus uintptr_t/ptrdiff_t, but none about the relative sizes of these new c99 ptr size types.
example machine: vanilla ubuntu 14lts x64, gcc 4.8:
printf("%zu, %zu, %zu\n", sizeof(uintptr_t), sizeof(intptr_t), sizeof(ptrdiff_t));
prints: "8, 8, 8"
this does not make sense to me, as i would expect the diff type, which must be signed, to require more bits than the unsigned ptr itself.
consider:
NULL - (2^64-1) /*largest ptr, 64bits of 1's.*/
which being 2's complement negative would not fit in 64bits; hence I would expect ptrdiff_t to be larger than than ptr_t.
[a related question is why is intptr_t the same size as uintptr_t .... although i was comfortable this was possibly just to allow a signed type to contain the representation's bits (eg, using signed arithmetic on a negative ptr would (a) be undefined, and (b) have limited utility as ptrs are by definition "positive")]
thanks!

Firstly, it is clear not what uintptr_t is doing here. The languages (C and C++) do not allow you to subtract just any arbitrary pointer values from each other. Two pointers can only be subtracted if they point into the same object (into the same array object). Otherwise, the behavior is undefined. This means that these two pointers cannot possibly be farther than SIZE_MAX bytes apart. Note: the distance is limited by the range of size_t, not by the range of uintptr_t. In general case uintptr_t can be a larger type than size_t. Nobody in C/C++ ever promised you that you should be able to subtract two pointers located UINTPTR_MAX bytes apart.
(And yes, I know that on flat-memory platforms uintptr_t and size_t are usually the same type, at least by range and representation. But from the language point of view it is incorrect to assume that they always are.)
Your NULL - (2^64-1) (if interpreted as address subtraction) is a clear example of such questionable subtraction. What made you think that you should be able to do that in the first place?
Secondly, after switching from the irrelevant uintptr_t to the much more relevant size_t, one can say that your logic is perfectly valid. sizeof(ptrdiff_t) should be greater than sizeof(size_t) because of an extra bit required to represent the signed result. Nevertheless, however weird it sounds, the language specification does not require ptrdiff_t to be wide enough to accommodate all pointer subtraction results, even if two pointers point to parts of the same object (i.e. they are no farther than SIZE_MAX bytes apart). ptrdiff_t is legally permitted to have the same bit-count as size_t.
This means that a "seemingly valid" pointer subtraction may actually lead to undefined behavior simply because the result is too large. If your implementation allows you to declare a char array of size, say, SIZE_MAX / 3 * 2
char array[SIZE_MAX / 3 * 2]; // This is smaller than `SIZE_MAX`
then subtracting perfectly valid pointers to the end and to the beginning of this array might lead to undefined behavior if ptrdiff_t has the same size as size_t
char *b = array;
char *e = array + sizeof array;
ptrdiff_t distance = e - b; // Undefined behavior!
The authors of these languages decided to opt for this easier solution instead of requiring compilers to implement support for [likely non-native] extra wide signed integer type ptrdiff_t.
Real-life implementations are aware of this potential problem and usually take steps to avoid it. They artificially restrict the size of the largest supported object to make sure that pointer subtraction never overflows. In a typical implementation you will not be able to declare an array larger than PTRDIFF_MAX bytes (which is about SIZE_MAX / 2). E.g. even if SIZE_MAX on your platform is 264-1, the implementation will not let you to declare anything larger than 263-1 bytes (and real-life restrictions derived from other factors might be even tighter than that). With this restriction in place, any legal pointer subtraction will produce a result that fits into the range of ptrdiff_t.
See also,
Why is the maximum size of an array “too large”?

The accepted answer is not wrong, but does not offer much insight into why intptr_t, size_t and ptrdiff_t is actually useful, and how to use them. So here it is:
size_t is basically the type of a size_of expression. It is only required to be able to hold the size of the largest object that you can make, including arrays. So if you can only ever use 64k continues memory, then size_t can be as little as 16 bits, even if you have 64 bit pointers.
ptrdiff_t is the type of pointer difference, e.g &a - &b. And while it is true that 0 - &a is undefined behavior (as doing almost everything in C/C++), whatever it is, must fit into ptrdiff_t. It is usually the same size as pointers, because that makes the most sense. If ptrdiff_t would be a weird size, pointer arithmetics itself would break.
intptr_t/uintptr_t has the same size as pointers. They fit into the same int*_t pattern, where * is the size of the int. As with all int*_t/uint*_t types the standard for some reason allows them to be larger then required, but that's very rare.
As a rule of thumb, you can use size_t for sizes and array indices, and use intptr_t/uintptr_t for everything pointer related. Do not use ptrdiff_t.

Using sizeof instead of literal

Whenever I see malloc in someone else's code, it typically uses sizeof(short) or sizeof(double) etc. to help define the size of memory to be allocated. Why do they not just replace those expressions with 2 or 8, in those two examples?

It makes the code easier to port.
In general there are compiler options which allow you to say how data is to be alligned in a struct. The size of a double may vary between platforms.
By consistantly using the data type, you reduce the occurance of some types of size mismatch errors.
I think it is a better practice to use the variable name instead of the data type for the size of piece.
float Pi = 3.14f;
float *pieArray = (float *) malloc(sizeof (Pi) * 1000);
Personally I would prefer this method.
typedef float Pi;
Pi *piArray = new Pi[1000];
// use it
delete[] piArray;
new/delete should be preferred over malloc/free in most cases.

The most portable and maintainable way to write a malloc call in C is:
T *p = malloc( N * sizeof *p );
or
T *p;
...
p = malloc( N * sizeof *p );
where T is any arbitrary type and N is the number of objects of that type you want to allocate. Type sizes are not uniform across platforms, and the respective language standards only mandate minimum ranges of values that non-char types must be able to represent. For example, an int must represent at least the range [-32767...32767], meaning it must be at least 16 bits wide, although it may be (and often is) wider. For another example, struct types may have different amounts of padding between members depending on the platform's alignment requirements, so a struct foo type may take up 24 bytes on one platform and 32 on another.
The expression *p has type T, so sizeof *p gives the same result as sizeof (T), which is the number of bytes required to store an object of type T. This will always give you the right number of bytes to store your object (or sequence of objects), regardless of platform, and if you ever change T (from int to long, for example), you don't have to go back and change the arguments to the malloc call.
Note that you shouldn't use malloc or calloc in C++ code; you should use a standard container like a vector or map that handles all the memory management for you. If for some reason a standard container doesn't meet your needs, use the new operator to allocate a single object of type T and new [] to allocate an array of objects.

Neither the size of a double or a short is fixed by the c++ standard. Note that for a double, it doesn't even have to be an IEEE754 floating point type. In this respect c++ differs from Java. So it would be a poor idea to hardcode the size.
And use new / new[] and delete / delete[] in C++.

Is sizeof any type other than char guaranteed?

I know that in C++ sizeof(char) is guaranteed to be 1, but is it the only case or are there any other built-in types guaranteed to have exact size?
In particular is sizeof(bool) == 1 or sizeof(int) == 4 demanded by language or is it an implementation detail?

The size is only guaranteed explicitly for char: sizeof(char) == 1. Implicitly this guarantee also applies to signed char and unsigned char as one of them is required to use the same representation as char and the other is bound by the conversion rules between signed char and unsigned char to use the same size.
Other than that there are only guarantees on the number if bits present in some types and a size relation between some types. Note, that char can have any number of bits equal or bigger than 8.

The rules are strict enough that size of signed char and unsigned char must also be 1.
There is no other type for which the size is guaranteed--and I know of compilers that make sizeof(bool) a value larger than 1, and that make sizeof(int) a value other than 4.

Types are not necessarily guaranteed to always have the same byte size across architectures. sizeof(X) is actually implemented by the compiler and outputs an integer (1,2,4,8, etc.) and is therefore not a function call. As a result, the output for a given type (e.g. int) will depend on the system for which your application was compiled. This is why you have to recompile an application for a different architecture.
That said, some types are always a particular size (e.g. int32).
See: What does the C++ standard state the size of int, long type to be?

In theory, an (old C++) implementation (but probably not C++11) might have sizeof every scalar type (numerical, pointer, boolean) be 1. But I cannot name such an implementation (where sizeof(int), sizeof(double), sizeof(long long), sizeof(bool), sizeof(void*) are all 1).
You probably should use <cstdint> header if you care about data type sizes.
Also, code portability can be tricky. You should care not only about integral data type size, but also about endianess and operating system issues (standards like POSIX should help). An aphorism says that there is no software that is portable, only code that has been painfully ported.

The use of size_t in an array iterator

I have learned recently that size_t was introduced to help future-proof code against native bit count increases and increases in available memory. The specific use definition seems to be on the storing of the size of something, generally an array.
I now must wonder how far this future proofing should be taken. Surely it is pointless to have an array length defined using the future-proof and appropriately sized size_t if the very next task of iterating over the array uses say an unsigned int as the index array:
void (double* vector, size_t vectorLength) {
for (unsigned int i = 0; i < vectorLength; i++) {
//...
}
}
In fact in this case I might expect the syntax strictly should up-convert the unsigned int to a size_t for the relation operator.
Does this imply the iterator variable i should simply be a size_t?
Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?
Does it imply any code using logic that develops the index programmatically should then create a new result value of type size_t, particularly if the logic relies on potentially signed integer values? i.e.
double foo[100];
//...
int a = 4;
int b = -10;
int c = 50;
int index = a + b + c;
double d = foo[(size_t)index];
Surely though since my code logic creates a fixed bound, up-converting to the size_t provides no additional protection.

You should keep in mind the automatic conversion rules of the language.
Does this imply the iterator variable i should simply be a size_t?
Yes it does, because if size_t is larger than unsigned int and your array is actually larger than can be indexed with an unsigned int, then your variable (i) can never reach the size of the array.
Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?
You try to make it sound drastic, while it's not. Why do you choose a variable as double and not float? Why would you make a variable as unsigned and one not? Why would you make a variable short while another is int? Of course, you always know what your variables are going to be used for, so you decide what types they should get. The choice of size_t is one among many and it's similarly decided.
In other words, every variable in a program should be functionally identified and given the correct type.
Does it imply any code using logic that develops the index programmatically should then create a new result value of type size_t, particularly if the logic relies on potentially signed integer values?
Not at all. First, if the variable can never have negative values, then it could have been unsigned int or size_t in the first place. Second, if the variable can have negative values during computation, then you should definitely make sure that in the end it's non-negative, because you shouldn't index an array with a negative number.
That said, if you are sure your index is non-negative, then casting it to size_t doesn't make any difference. C11 at 6.5.2.1 says (emphasis mine):
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2th element of E1 (counting from zero).
Which means whatever type of index for which some_pointer + index makes sense, is allowed to be used as index. In other words, if you know your int has enough space to contain the index you are computing, there is absolutely no need to cast it to a different type.

Surely it is pointless to have an array length defined using the future-proof and appropriately sized size_t if the very next task of iterating over the array uses say an unsigned int as the index array
Yes it is. So don't do it.
In fact in this case I might expect the syntax strictly should up-convert the unsigned int to a size_t for the relation operator.
It will only be promoted in that particular < operation. The upper limit of your int variable will not be changed, so the ++ operation will always work with an int, rather than a size_t.
Does this imply the iterator variable i should simply be a size_t?
Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?
Yeah well, it is better than int... But there is a smarter way to write programs: use common sense. Whenever you declare an array, you can actually stop and consider in advance how many items the array would possibly need to store. If it will never contain more than 100 items, there is absolutely no reason for you to use int nor to use size_t to index it.
In the 100 items case, simply use uint_fast8_t. Then the program is optimized for size as well as speed, and 100% portable.
Whenever declaring a variable, a good programmer will activate their brain and consider the following:
What is the range of the values that I will store inside this variable?
Do I actually need to store negative numbers in it?
In the case of an array, how many values will I need in the worst-case? (If unknown, do I have to use dynamic memory?)
Are there any compatibility issues with this variable if I decide to port this program?
As opposed to a bad programmer, who does not activate their brain but simply types int all over the place.

As discussed by Neil Kirk, iterators are a future proof counterpart of size_t.
An additional point in your question is the computation of a position, and this typically includes an absolute position (e.g. a in your example) and possibly one or more relative quantities (e.g. b or c), potentially signed.
The signed counterpart of size_t is ptrdiff_t and the analogous for iterator type I is typename I::difference_type.
As you describe in your question, it is best to use the appropriate types everywhere in your code, so that no conversions are needed. For memory efficiency, if you have e.g. an array of one million positions into other arrays and you know these positions are in the range 0-255, then you can use unsigned char; but then a conversion is necessary at some point.
In such cases, it is best to name this type, e.g.
using pos = unsigned char;
and make all conversions explicit. Then the code will be easier to maintain, should the range 0-255 increase in the future.

Yep, if you use int to index an array, you defeat the point of using size_t in other places. This is why you can use iterators with STL. They are future proof. For C arrays, you can use either size_t, pointers, or algorithms and lambdas or range-based for loops (C++11). If you need to store the size or index in variables, they will need to be size_t or other appropriate types, as will anything else they interact with, unless you know the size will be small. (For example, if you store the distance between two elements which will always be in a small range, you can use int).
double *my_array;
for (double *it = my_array, *end_it = my_array + my_array_size, it != end_it; ++it)
{
// use it
}
std::for_each(std::begin(my_array), std::end(my_array), [](double& x)
{
// use x
});
for (auto& x : my_array)
{
// use x
}

Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?
I'll pick that point, and say clearly Yes. Besides, in most cases a variable used as an array index is only used as that (or something related to it).
And this rule does not only apply here, but also in other circumstances: There are many use cases where nowadays a special type exists: ptrdiff_t, off_t (which even may change depeding on the configuration we use!), pid_t and a lot of others.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js