Whenever I see malloc in someone else's code, it typically uses sizeof(short) or sizeof(double) etc. to help define the size of memory to be allocated. Why do they not just replace those expressions with 2 or 8, in those two examples?
It makes the code easier to port.
In general there are compiler options which allow you to say how data is to be alligned in a struct. The size of a double may vary between platforms.
By consistantly using the data type, you reduce the occurance of some types of size mismatch errors.
I think it is a better practice to use the variable name instead of the data type for the size of piece.
float Pi = 3.14f;
float *pieArray = (float *) malloc(sizeof (Pi) * 1000);
Personally I would prefer this method.
typedef float Pi;
Pi *piArray = new Pi[1000];
// use it
delete[] piArray;
new/delete should be preferred over malloc/free in most cases.
The most portable and maintainable way to write a malloc call in C is:
T *p = malloc( N * sizeof *p );
or
T *p;
...
p = malloc( N * sizeof *p );
where T is any arbitrary type and N is the number of objects of that type you want to allocate. Type sizes are not uniform across platforms, and the respective language standards only mandate minimum ranges of values that non-char types must be able to represent. For example, an int must represent at least the range [-32767...32767], meaning it must be at least 16 bits wide, although it may be (and often is) wider. For another example, struct types may have different amounts of padding between members depending on the platform's alignment requirements, so a struct foo type may take up 24 bytes on one platform and 32 on another.
The expression *p has type T, so sizeof *p gives the same result as sizeof (T), which is the number of bytes required to store an object of type T. This will always give you the right number of bytes to store your object (or sequence of objects), regardless of platform, and if you ever change T (from int to long, for example), you don't have to go back and change the arguments to the malloc call.
Note that you shouldn't use malloc or calloc in C++ code; you should use a standard container like a vector or map that handles all the memory management for you. If for some reason a standard container doesn't meet your needs, use the new operator to allocate a single object of type T and new [] to allocate an array of objects.
Neither the size of a double or a short is fixed by the c++ standard. Note that for a double, it doesn't even have to be an IEEE754 floating point type. In this respect c++ differs from Java. So it would be a poor idea to hardcode the size.
And use new / new[] and delete / delete[] in C++.
Related
I'm currently writing a runtime for my compiler project and I want general and easy to use struct for encoding different types (the source language is scheme).
My current approach is:
struct SObj {
SType type;
uint64_t *value;
};
Pointer are always 64 or 32 bit wide, so shouldn't it be possible to literally put a float into my value? Then, if I want the actual value of the float, I just take the raw bytes and interprete them as a float.
Thanks in advance.
Not really.
When you write C++ you're programming an abstraction. You're describing a program. Contrary to popular belief, it's not "all just bytes".
Compilers are complex. They can, and will, assume that you follow the rules, and use that assumption to produce the most efficient "actual" code (read: machine code) possible.
One of those rules is that a uint64_t* is a pointer that points to a uint64_t. When you chuck arbitrary bits into there — whether they are identical to the bits that form a valid float, or something else — it is no longer a valid pointer, and simply evaluating it has undefined behaviour.
There are language facilities that can do what you want, like union. But you have to be careful not to violate aliasing rules. You'd store a flag (presumably, that's what your type is) that tells you which union member you're using. Make life easier and have a std::variant instead, which does all this for you.
That being said, you can std::memcpy/std::copy bits in and copy bits out, in say a uint64_t as long as they are a valid representation of the type you've chosen on your system. Just don't expect reinterpret_cast to be valid: it won't be.
Pointer are always 64 or 32 bit wide
No.
so shouldn't it be possible to literally put a float into my value?
Yes, that is possible, although that would be very strongly advised against. C++ has many, many other facilities so you do not have to resort such things yourself. Anyway, you can interpret the bytes inside a pointer as another type. Like this:
static_assert(sizeof(float*) >= sizeof(float));
static_assert(std::is_pod<float>::value == true); // overdramatic
float *ptr; // just allocate sizeof(float*) bytes on stack
float a = 5;
// use the memory of the pointer to store float value
std::memcpy(&ptr, &a, sizeof(float));
float b;
std::memcpy(&b, &ptr, sizeof(float));
a == b; // true
I can write operator new for one-dimensional array as follows:
int n{3};
new int[n];
It allocates at least sizeof(int) * n bytes. But when I want to create two and more dimensional array, only first dimension may be non-constant:
int n{3};
new int[n][3]; //ok
new int[n][n]; //error;
Why such restrictions are take place? Are there any difficulties to determine, that it is at least sizeof(int) * n * n bytes to allocate?
The problem in this case is not about determining how much memory to allocate. That part is actually easy, as you noted it yourself.
The problem is organizing access to such an array afterwards. If you know, multidimensional arrays in C++ are implemented as linear (one-dimensional) arrays with index remapping. For example, when you declare
int a[N][M];
the compiler actually creates an int [N * M] array under the hood. And when you later access it as a[i][j], the latter is simply implicitly translated into access to a[i * M + j]. C++ compilers insist on knowing the value of M at compile time (meanwhile, note that the value of N does not participate in the index recalculation formula at all).
This is the reason why in contexts where arrays decay to pointers the first size of multi-dimensional array does not matter, while the second, third and further sizes have to be compile-time constants. This is what determines the restrictions imposed on new [] as well.
P.S. C language supports Variable Length Arrays, which allow all sizes to be run-time values. This requires additional efforts under the hood, like storing the run-time values of M and N together with the array a from the above example. This was eventually deemed unsuitable for C++.
The C++ type system does not include arrays with runtime bound. This is a very complicated thing to do, considering that it will have implications for templates and overload resolution. There have been proposals but none has progressed to being accepted for standardization.
So T[n] is not a valid type. However it can be used in a new-expression because there is a special case for it. The new-expression can be either:
new X, where X is a type
new T[n], where T is a type and n is not a constant expression.
Note that both cases are needed because T[n] is not a type but we want to allow that in a new-expression.
The second point needs a little bit more explanation. It actually uses the C++ infix notation, so if T is an array or function type, the [n] will be in a different place. For example new int[n][3] is OK , which is the same as typedef int T[3]; new T[n]. But new int[3][n] is not.
If we did allow new int[3][n], what would the return type be? int (*)[n] is not part of the C++ type system as mentioned earlier.
I see several posts (such as size_t vs. uintptr_t) about size_t versus uintptr_t/ptrdiff_t, but none about the relative sizes of these new c99 ptr size types.
example machine: vanilla ubuntu 14lts x64, gcc 4.8:
printf("%zu, %zu, %zu\n", sizeof(uintptr_t), sizeof(intptr_t), sizeof(ptrdiff_t));
prints: "8, 8, 8"
this does not make sense to me, as i would expect the diff type, which must be signed, to require more bits than the unsigned ptr itself.
consider:
NULL - (2^64-1) /*largest ptr, 64bits of 1's.*/
which being 2's complement negative would not fit in 64bits; hence I would expect ptrdiff_t to be larger than than ptr_t.
[a related question is why is intptr_t the same size as uintptr_t .... although i was comfortable this was possibly just to allow a signed type to contain the representation's bits (eg, using signed arithmetic on a negative ptr would (a) be undefined, and (b) have limited utility as ptrs are by definition "positive")]
thanks!
Firstly, it is clear not what uintptr_t is doing here. The languages (C and C++) do not allow you to subtract just any arbitrary pointer values from each other. Two pointers can only be subtracted if they point into the same object (into the same array object). Otherwise, the behavior is undefined. This means that these two pointers cannot possibly be farther than SIZE_MAX bytes apart. Note: the distance is limited by the range of size_t, not by the range of uintptr_t. In general case uintptr_t can be a larger type than size_t. Nobody in C/C++ ever promised you that you should be able to subtract two pointers located UINTPTR_MAX bytes apart.
(And yes, I know that on flat-memory platforms uintptr_t and size_t are usually the same type, at least by range and representation. But from the language point of view it is incorrect to assume that they always are.)
Your NULL - (2^64-1) (if interpreted as address subtraction) is a clear example of such questionable subtraction. What made you think that you should be able to do that in the first place?
Secondly, after switching from the irrelevant uintptr_t to the much more relevant size_t, one can say that your logic is perfectly valid. sizeof(ptrdiff_t) should be greater than sizeof(size_t) because of an extra bit required to represent the signed result. Nevertheless, however weird it sounds, the language specification does not require ptrdiff_t to be wide enough to accommodate all pointer subtraction results, even if two pointers point to parts of the same object (i.e. they are no farther than SIZE_MAX bytes apart). ptrdiff_t is legally permitted to have the same bit-count as size_t.
This means that a "seemingly valid" pointer subtraction may actually lead to undefined behavior simply because the result is too large. If your implementation allows you to declare a char array of size, say, SIZE_MAX / 3 * 2
char array[SIZE_MAX / 3 * 2]; // This is smaller than `SIZE_MAX`
then subtracting perfectly valid pointers to the end and to the beginning of this array might lead to undefined behavior if ptrdiff_t has the same size as size_t
char *b = array;
char *e = array + sizeof array;
ptrdiff_t distance = e - b; // Undefined behavior!
The authors of these languages decided to opt for this easier solution instead of requiring compilers to implement support for [likely non-native] extra wide signed integer type ptrdiff_t.
Real-life implementations are aware of this potential problem and usually take steps to avoid it. They artificially restrict the size of the largest supported object to make sure that pointer subtraction never overflows. In a typical implementation you will not be able to declare an array larger than PTRDIFF_MAX bytes (which is about SIZE_MAX / 2). E.g. even if SIZE_MAX on your platform is 264-1, the implementation will not let you to declare anything larger than 263-1 bytes (and real-life restrictions derived from other factors might be even tighter than that). With this restriction in place, any legal pointer subtraction will produce a result that fits into the range of ptrdiff_t.
See also,
Why is the maximum size of an array “too large”?
The accepted answer is not wrong, but does not offer much insight into why intptr_t, size_t and ptrdiff_t is actually useful, and how to use them. So here it is:
size_t is basically the type of a size_of expression. It is only required to be able to hold the size of the largest object that you can make, including arrays. So if you can only ever use 64k continues memory, then size_t can be as little as 16 bits, even if you have 64 bit pointers.
ptrdiff_t is the type of pointer difference, e.g &a - &b. And while it is true that 0 - &a is undefined behavior (as doing almost everything in C/C++), whatever it is, must fit into ptrdiff_t. It is usually the same size as pointers, because that makes the most sense. If ptrdiff_t would be a weird size, pointer arithmetics itself would break.
intptr_t/uintptr_t has the same size as pointers. They fit into the same int*_t pattern, where * is the size of the int. As with all int*_t/uint*_t types the standard for some reason allows them to be larger then required, but that's very rare.
As a rule of thumb, you can use size_t for sizes and array indices, and use intptr_t/uintptr_t for everything pointer related. Do not use ptrdiff_t.
Suppose I'm writing a function which takes a float a[] and an offset, into this array, and returns the element at that offset. Is it reasonable to use the signature
float foo(float* a, off_t offset);
for it? Or is off_t only relevant to offsets in bytes, rather than pointer arithmetic with aribtrary element sizes? i.e. is it reasonable to say a[offset] when offset is of type off_t?
The GNU C Library Reference Manual says:
off_t
This is a signed integer type used to represent file sizes.
but that doesn't tell me much.
My intuition is that the answer is "no", since the actual address used in a[offset] is the address of a + sizeof(float) * offset , so "sizeof(float) * offset" is an off_t, and sizeof(float) is a size_t, and both are constants with 'dimensions'.
Note: The offset might be negative.
Is there any good reason why you just don't use int? It's the
default type for integral values in C++, and should be used
unless there is a good reason not to.
Of course, one good reason could be that it might overflow. If
the context is such that you could end up with very large
arrays, you might want to use ptrdiff_t, which is defined (in
C and C++) as the type resulting from the subtraction of two
pointers: in other words, it is guaranteed not to overflow (when
used as an offset) for all types with a size greater than 1.
You could use size_t or ptrdiff_t as the type of an index (your second parameter is more an index inside a float array than an offset).
Your use is an index, not an offset. Notice that the standard offsetof macro is defined to return byte offsets!
In practice, you could even use int or unsigned, unless you believe your array could have billions of components.
You may want to #include <stdint.h> (or <cstdint> with a recent C++) and have explicitly sized types like int32_t for your indexes.
For source readability reasons, you might define
typedef unsigned index_t;
and later use it, e.g.
float foo(float a[], index_t i);
My opinion is that you just should use int as the type of your indexes. (but handle out-of-bound indexes appropriately).
I would say it is not appropriate, since
off_t is (intended to be) used to represent file sizes
off_t is a signed type.
I would go for size_type (usually a "typedef"ed name for size_t), which is the one used by std containers.
Perhaps the answer is to use ptrdiff_t? It...
can be negative;
alludes to the difference not being in bytes, but in units of arbitrary size depending on the element type.
What do you think?
I think I understand the semantics of pointer arithmetic fairly well, but I only ever see examples when dealing with arrays. Does it have any other uses that can't be achieved by less opaque means? I'm sure you could find a way with clever casting to use it to access members of a struct, but I'm not sure why you'd bother. I'm mostly interested in C, but I'll tag with C++ because the answer probably applies there too.
Edit, based on answers received so far: I know pointers can be used in many non-array contexts. I'm specifically wondering about arithmetic on pointers, e.g. incrementing, taking a difference, etc.
Pointer arithmetic by definition in C happens only on arrays. However, as every object has a representation consisting of an overlaid unsigned char [sizeof object] array, it's also valid to perform pointer arithmetic on this representation. For example:
struct foo {
int a, b, c;
} bar;
/* Equivalent to: bar.c = 1; */
*(int *)((unsigned char *)&bar + offsetof(struct foo, c)) = 1;
Actually char * would work just as well.
If you follow the language standard to the letter, then pointer arithmetic is only defined when pointing to an array, and not in any other case.
A pointer may point to any element of an array, or one step past the end of the array.
From the top of my head I know it's used in XOR linked-lists (very nifty) and I've seen it used in very hacky recursions.
On the other hand, it's very hard to find uses since according to the standard pointer arithmic is only defined if within the bounds of an array.
a[n] is "just" syntactic sugar for *(a + n). For lulz, try the following
int a[2];
0[a] = 10;
1[a] = 20;
So one could argue that indexing and pointer arithmetic are merely interchangeable syntax.
Pointer arithmetic is only defined on arrays. Adding an integer to a pointer that does not point to an array element produces undefined behavior.
In embedded systems, pointers are used to represent addresses or locations. There may not be an array defined. (Although one could say that all of memory is one huge array.)
For example, a stack (holding variables and addresses) is manipulated by adding or subtracting values from the stack pointer. (In this case, the stack could be said to be an array based stack.)
Here's a case for pointer arithmetic outside of (strictly defined) arrays:
double d = 0.5;
unsigned char *bytes = (void *)&d;
for(size_t i = 0; i < sizeof d; i++)
printf("Byte %zu of d is %hhu\n", i, bytes[i]);
Why would you do this? I don't know. But if you want to look at the bitwise representation of an object (useful for things like memcpy and memcmp), you'll need to cast their addresses to unsigned char *s (or signed char *s if you like) and work with them byte-by-byte. (If your task isn't too difficult you can even write the code to work word-by-word, which most memcpy implementations will do. It's the same principle, though, just replace char with int32_t.)
Note that, in the standard, the exact values (or the number of values) that are printed are implementation-defined, but that this will always work as a way to access an object's internal bytewise representation. (It is not required to work for larger integer types, but almost always will - no processor I know of has had trap representations for integers in quite some time).