__restrict pointer aliasing with only one pointer

__restrict pointer aliasing with only one pointer - c++

Is there any advantage to specifying the MSVC/GCC non-standard __restrict qualifier on a function pointer parameter if it is the only pointer parameter? For example,
int longCalculation(int a, int* __restrict b)
My guess is it should allow better optimization since it implies b does not point to a, but all examples I've seen __restrict two pointers to indicate no aliasing between them.

As mentioned in the comments b can't point to a anyways, so there is no aliasing potential there anyways. So if the function is pure in the sense that it works only on its parameters there shouldn't be any real benefits.
However if the function uses global variables internally then __restrict might offer benefits once again, since it makes clear that b doesn't point to any of those global variables.
An interesting case might be the situation where you allocate and deallocate memory inside the function. The compiler could theoretically be sure that b doesn't point to that memory, however whether or not it realizes that I'm not sure and might depend how the allocation is called.
Personally however I prefer to keep __restrict out of the signature and do something like this
int longCalculation(int a, int* b){
assert(...);//ensure that b doesn't point to anything used
int* __restrict bx = b;
...
}
IMO this has the following advantages:
The function signature doesn't expose the non standard __restrict used
The ability to ensure that the variables actually conform to __restrict using assert, since passing aliasing pointers to a function expecting them to be nonaliasing can lead to hard to track down bugs.

Related

Is there a function to load a non-atomic value atomically?

In C++20 we can write:
double x;
double x_value = std::atomic_ref(x).load();
Is there a function with the same effect?
I have tried std::atomic_load but there seem to be no overloads for non-atomic objects.

Non-portably of course, there is GNU C __atomic_load_n(&x, __ATOMIC_SEQ_CST) __atomic builtin.
I'm pretty sure you don't find a function in ISO C++ that takes a double * or double &.
Possibly one that takes a std::atomic_ref<double> * or reference, I didn't check, but I think the intent of atomic_ref is to be constructed on the fly for free inside a function that needs it.
If you want such a function, write you own that constructs + uses an atomic_ref. It will all inline down to an __atomic_load_n on compilers where atomic uses that under the hood anyway.
But do make sure to declare your global like this, to make sure it's safe + efficient to use with atomic_ref. It's UB (I think) to take an atomic_ref to an object that's not sufficiently aligned, so the atomic_ref constructor can simply assume that the object you use is aligned the same as atomic<T> needs to be.
alignas (std::atomic_ref<double>::required_alignment) double x;
In practice that's only going to be a problem for 8-byte primitive types like double inside structs on 32-bit targets, but something like struct { char c[8]; } could in practice be not naturally aligned if you don't ask for alignment.

Can C's restrict keyword be emulated using strict aliasing in C++?

The Problem
The restrict keyword in C is missing in C++, so out of interest I was looking for a way to emulate the same feature in C++.
Specifically, I would like the following to be equivalent:
// C
void func(S *restrict a, S *restrict b)
// C++
void func(noalias<S, 1> a, noalias<S, 2> b)
where noalias<T, n>
behaves just like T* when accessed with -> and *
can be constructed from an T* (so that the function can be called as func(t1, t2), where t1 and t2 are both of type T*)
the index n specifies the "aliasing class" of the variable, so that variables of type noalias<T, n> and noalias<T, m> may be assumed never to alias for n != m.
An Attempt
Here is my deeply flawed solution:
template <typename T, int n>
class noalias
{
struct T2 : T {};
T *t;
public:
noalias(T *t_) : t(t_) {}
T2 *operator->() const {return static_cast<T2*>(t);} // <-- UB
};
When accessed with ->, it casts the internally-stored T* to a noalias<T, n>::T2* and returns that instead. Since this is a different type for each n, the strict aliasing rule ensures that they will never alias. Also, since T2 derives from T, the returned pointer behaves just like a T*. Great!
Even better, the code compiles and the assembly output confirms that it has the desired effect.
The problem is the static_cast. If t were really pointing to an object of type T2 then this would be fine. But t points to a T so this is UB. In practice, since T2 is a subclass which adds nothing extra to T it will probably have the same data layout, and so member accesses on the T2* will look for members at the same offsets as they occur in T and everything will be fine.
But having an n-dependent class is necessary for strict aliasing, and that this class derives from T is also necessary so that the pointer can be treated like a T*. So UB seems unavoidable.
Questions
Can this be done in c++14 without invoking UB - possibly using a completely different idea?
If not, then I have heard about a "dot operator" in c++1z; would it be possible with this?
If the above, will something similar to noalias be appearing in the standard library?

You could use the __restrict__ GCC extension for un/aliasing.
From the docs
In addition to allowing restricted pointers, you can specify restricted references, which indicate that the reference is not aliased in the local context.
void fn (int *__restrict__ rptr, int &__restrict__ rref)
{
/* ... */
}
In the body of fn, rptr points to an unaliased integer and rref refers to a (different) unaliased integer.
You may also specify whether a member function's this pointer is unaliased by using __restrict__ as a member function qualifier.
void T::fn () __restrict__
{
/* ... */
}
Within the body of T::fn, this will have the effective definition T *__restrict__ const this. Notice that the interpretation of a __restrict__ member function qualifier is different to that of const or volatile qualifier, in that it is applied to the pointer rather than the object. This is consistent with other compilers which implement restricted pointers.
As with all outermost parameter qualifiers, __restrict__ is ignored in function definition matching. This means you only need to specify __restrict__ in a function definition, rather than in a function prototype as well.

Maybe i dont understand your question, but c restrict keyword was removed from STANDARD C++, but almost each compiler has their "C restrict" equivalents:
Microsoft VS has __declspec(restrict): https://msdn.microsoft.com/en-us/library/8bcxafdh.aspx
and GCC has __ restrict__ : https://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html
If you want a common definition you could use #define's
#if defined(_MSC_VER)
#define RESTRICT __declspec(restrict)
#else
#define RESTRICT __restrict__
#endif
I dont test it, let me know is that does not work

If we're only talking about pure C++ Standard solution, runtime check is the only way. I'm actually not even sure if this is possible given the strength in the definition of C's restrict lvalue qualifier, which is that the object can only be accessed by the restrict pointer.

The proper way to add restrict-like semantics to C++ would be to have the Standard define templates for restricted references and restricted pointers in such a way that dummy versions which work like ordinary references and pointers could be coded in C++. While it might be possible to generate templates which behave as required in all defined cases, and invoke UB in all cases which should not be defined, doing so will be useless if not counterproductive unless a compiler is programmed to exploit the UB in question to facilitate such optimizations. Programming a compiler to exploit such optimizations in cases where code uses a Standard-defined type that exists for that purpose is apt to be easier and more effective than trying to identify patterns within user types where it would be exploitable, and would also be less likely to have undesired side-effects.

I think that your solution doesn't fully achieve the intended goal even if the noted UB didn't exist. After all, all real data accesses occur on the built-in type level. If decltype(a->i) is int and your function manipulates int* pointers, then under certain circumstances the compiler should still assume that those pointers could alias a->i.
Example:
int func(noalias<S, 1> a) {
int s = 0;
int* p = getPtr();
for ( int i = 0; i < 10; ++i ) {
++*p;
s += a->i;
}
return s;
}
Usage of noalias will hardly enable optimizing above function to the following:
int func(noalias<S, 1> a) {
*getPtr() += 10;
return 10 * a->i;
}
My feeling is that restrict cannot be emulated, and must be supported by the compiler directly.

Pointer aliasing of pointer containers

I learned that pointer aliasing may hurt performance, and that a __restrict__ attribute (in GCC, or equivalent attributes in other implementations) may help keeping track of which pointers should or should not be aliased. Meanwhile, I also learned that GCC's implementation of valarray stores a __restrict__'ed pointer (line 517 in https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.1/valarray-source.html), which I think hints the compiler (and responsible users) that the private pointer can be assumed not to be aliased anywhere in valarray methods.
But if we alias a pointer to a valarray object, for example:
#include <valarray>
int main() {
std::valarray<double> *a = new std::valarray<double>(10);
std::valarray<double> *b = a;
return 0;
}
is it valid to say that the member pointer of a is aliased too? And would the very existence of b hurt any optimizations that valarray methods could benefit otherwise? (Is it bad practice to point to optimized pointer containers?)

Let's first understand how aliasing hurts optimization.
Consider this code,
void
process_data(float *in, float *out, float gain, int nsamps)
{
int i;
for (i = 0; i < nsamps; i++) {
out[i] = in[i] * gain;
}
}
In C or C++, it is legal for the parameters in and out to point to overlapping regions in memory.... When the compiler optimizes the function, it does not in general know whether in and out are aliases. It must therefore assume that any store through out can affect the memory pointed to by in, which severely limits its ability to reorder or parallelize the code (For some simple cases, the compiler could analyze the entire program to determine that two pointers cannot be aliases. But in general, it is impossible for the compiler to determine whether or not two pointers are aliases, so to be safe, it must assume that they are).
Coming to your code,
#include <valarray>
int main() {
std::valarray<double> *a = new std::valarray<double>(10);
std::valarray<double> *b = a;
return 0;
}
Since a and b are aliases. The underlying storage structure used by valarray will also be aliased(I think it uses an array. Not very sure about this). So, any part of your code that uses a and b in a fashion similar to that shown above will not benefit from compiler optimizations like parallelization and reordering. Note that JUST the existence of b will not hurt optimization but how you use it.
Credits:
The quoted part and the code is take from here. This should serve as a good source for more information about the topic as well.

is it valid to say that the member pointer of a is aliased too?
Yes. For example, a->[0] and b->[0] reference the same object. That's aliasing.
And would the very existence of b hurt any optimizations that valarray methods could benefit otherwise?
No.
You haven't done anything with b in your sample code. Suppose you have a function much larger than this sample code that starts with the same construct. There's usually no problem if the first several lines of that function uses a but never b, and the remaining lines uses b but never a. Usually. (Optimizing compilers do rearrange lines of code however.)
If on the other hand you intermingle uses of a and b, you aren't hurting the optimizations. You are doing something much worse: You are invoking undefined behavior. "Don't do it" is the best solution to the undefined behavior problem.
Addendum
The C restrict and gcc __restrict__ keywords are not constraints on the developers of the compiler or the standard library. Those keywords are promises to the compiler/library that restricted data do not overlap other data. The compiler/library doesn't check whether the programmer violated this promise. If this promise enables certain optimizations that might otherwise be invalid with overlapping data, the compiler/library is free to apply those optimizations.
What this means is that restrict (or __restrict__) is a restriction on you, not the compiler. You can violate those restrictions even without your b pointer. For example, consider
*a = a->[std::slice(a.size()-1,a.size(),-1)];
This is undefined behavior.

How to implement "_mm_storeu_epi64" without aliasing problems?

(Note: Although this question is about "store", the "load" case has the same issues and is perfectly symmetric.)
The SSE intrinsics provide an _mm_storeu_pd function with the following signature:
void _mm_storeu_pd (double *p, __m128d a);
So if I have vector of two doubles, and I want to store it to an array of two doubles, I can just use this intrinsic.
However, my vector is not two doubles; it is two 64-bit integers, and I want to store it to an array of two 64-bit integers. That is, I want a function with the following signature:
void _mm_storeu_epi64 (int64_t *p, __m128i a);
But the intrinsics provide no such function. The closest they have is _mm_storeu_si128:
void _mm_storeu_si128 (__m128i *p, __m128i a);
The problem is that this function takes a pointer to __m128i, while my array is an array of int64_t. Writing to an object via the wrong type of pointer is a violation of strict aliasing and is definitely undefined behavior. I am concerned that my compiler, now or in the future, will reorder or otherwise optimize away the store thus breaking my program in strange ways.
To be clear, what I want is a function I can invoke like this:
__m128i v = _mm_set_epi64x(2,1);
int64_t ra[2];
_mm_storeu_epi64(&ra[0], v); // does not exist, so I want to implement it
Here are six attempts to create such a function.
Attempt #1
void _mm_storeu_epi64(int64_t *p, __m128i a) {
_mm_storeu_si128(reinterpret_cast<__m128i *>(p), a);
}
This appears to have the strict aliasing problem I am worried about.
Attempt #2
void _mm_storeu_epi64(int64_t *p, __m128i a) {
_mm_storeu_si128(static_cast<__m128i *>(static_cast<void *>(p)), a);
}
Possibly better in general, but I do not think it makes any difference in this case.
Attempt #3
void _mm_storeu_epi64(int64_t *p, __m128i a) {
union TypePun {
int64_t a[2];
__m128i v;
};
TypePun *p_u = reinterpret_cast<TypePun *>(p);
p_u->v = a;
}
This generates incorrect code on my compiler (GCC 4.9.0), which emits an aligned movaps instruction instead of an unaligned movups. (The union is aligned, so the reinterpret_cast tricks GCC into assuming p_u is aligned, too.)
Attempt #4
void _mm_storeu_epi64(int64_t *p, __m128i a) {
union TypePun {
int64_t a[2];
__m128i v;
};
TypePun *p_u = reinterpret_cast<TypePun *>(p);
_mm_storeu_si128(&p_u->v, a);
}
This appears to emit the code I want. The "type-punning via union" trick, although technically undefined in C++, is widely-supported. But is this example -- where I pass a pointer to an element of a union rather than access via the union itself -- really a valid way to use the union for type-punning?
Attempt #5
void _mm_storeu_epi64(int64_t *p, __m128i a) {
p[0] = _mm_extract_epi64(a, 0);
p[1] = _mm_extract_epi64(a, 1);
}
This works and is perfectly valid, but it emits two instructions instead of one.
Attempt #6
void _mm_storeu_epi64(int64_t *p, __m128i a) {
std::memcpy(p, &a, sizeof(a));
}
This works and is perfectly valid... I think. But it emits frankly terrible code on my system. GCC spills a to an aligned stack slot via an aligned store, then manually moves the component words to the destination. (Actually it spills it twice, once for each component. Very strange.)
...
Is there any way to write this function that will (a) generate optimal code on a typical modern compiler and (b) have minimal risk of running afoul of strict aliasing?

SSE intrinsics is one of those niche corner cases where you have to push the rules a bit.
Since these intrinsics are compiler extensions (somewhat standardized by Intel), they are already outside the specification of the C and C++ language standards. So it's somewhat self-defeating to try to be "standard compliant" while using a feature that clearly is not.
Despite the fact that the SSE intrinsic libraries try to act like normal 3rd party libraries, underneath, they are all specially handled by the compiler.
The Intent:
The SSE intrinsics were likely designed from the beginning to allow aliasing between the vector and scalar types - since a vector really is just an aggregate of the scalar type.
But whoever designed the SSE intrinsics probably wasn't a language pedant.(That's not too surprising. Hard-core low-level performance programmers and language lawyering enthusiasts tend to be very different groups of people who don't always get along.)
We can see evidence of this in the load/store intrinsics:
__m128i _mm_stream_load_si128(__m128i* mem_addr) - A load intrinsic that takes a non-const pointer?
void _mm_storeu_pd(double* mem_addr, __m128d a) - What if I want to store to __m128i*?
The strict aliasing problems are a direct result of these poor prototypes.
Starting from AVX512, the intrinsics have all been converted to void* to address this problem:
__m512d _mm512_load_pd(void const* mem_addr)
void _mm512_store_epi64 (void* mem_addr, __m512i a)
Compiler Specifics:
Visual Studio defines each of the SSE/AVX types as a union of the scalar types. This by itself allows strict-aliasing. Furthermore, Visual Studio doesn't do strict-aliasing so the point is moot:
The Intel Compiler has never failed me with all sorts of aliasing. It probably doesn't do strict-aliasing either - though I've never found any reliable source for this.
GCC does do strict-aliasing, but from my experience, not across function boundaries. It has never failed me to cast pointers which are passed in (on any type). GCC also declares SSE types as __may_alias__ thereby explicitly allowing it to alias other types.
My Recommendation:
For function parameters that are of the wrong pointer type, just cast it.
For variables declared and aliased on the stack, use a union. That union will already be aligned so you can read/write to them directly without intrinsics. (But be aware of store-forwarding issues that come with interleaving vector/scalar accesses.)
If you need to access a vector both as a whole and by its scalar components, consider using insert/extract intrinsics instead of aliasing.
When using GCC, turn on -Wall or -Wstrict-aliasing. It will tell you about strict-aliasing violations.

Needless pointer-casts in C

I got a comment to my answer on this thread:
Malloc inside a function call appears to be getting freed on return?
In short I had code like this:
int * somefunc (void)
{
int * temp = (int*) malloc (sizeof (int));
temp[0] = 0;
return temp;
}
I got this comment:
Can I just say, please don't cast the
return value of malloc? It is not
required and can hide errors.
I agree that the cast is not required in C. It is mandatory in C++, so I usually add them just in case I have to port the code in C++ one day.
However, I wonder how casts like this can hide errors. Any ideas?
Edit:
Seems like there are very good and valid arguments on both sides. Thanks for posting, folks.

It seems fitting I post an answer, since I left the comment :P
Basically, if you forget to include stdlib.h the compiler will assume malloc returns an int. Without casting, you will get a warning. With casting you won't.
So by casting you get nothing, and run the risk of suppressing legitimate warnings.
Much is written about this, a quick google search will turn up more detailed explanations.
edit
It has been argued that
TYPE * p;
p = (TYPE *)malloc(n*sizeof(TYPE));
makes it obvious when you accidentally don't allocate enough memory because say, you thought p was TYPe not TYPE, and thus we should cast malloc because the advantage of this method overrides the smaller cost of accidentally suppressing compiler warnings.
I would like to point out 2 things:
you should write p = malloc(sizeof(*p)*n); to always ensure you malloc the right amount of space
with the above approach, you need to make changes in 3 places if you ever change the type of p: once in the declaration, once in the malloc, and once in the cast.
In short, I still personally believe there is no need for casting the return value of malloc and it is certainly not best practice.

This question is tagged both for C and C++, so it has at least two answers, IMHO:
C
Ahem... Do whatever you want.
I believe the reason given above "If you don't include "stdlib" then you won't get a warning" is not a valid one because one should not rely on this kind of hacks to not forget to include an header.
The real reason that could make you not write the cast is that the C compiler already silently cast a void * into whatever pointer type you want, and so, doing it yourself is overkill and useless.
If you want to have type safety, you can either switch to C++ or write your own wrapper function, like:
int * malloc_Int(size_t p_iSize) /* number of ints wanted */
{
return malloc(sizeof(int) * p_iSize) ;
}
C++
Sometimes, even in C++, you have to make profit of the malloc/realloc/free utils. Then you'll have to cast. But you already knew that. Using static_cast<>() will be better, as always, than C-style cast.
And in C, you could override malloc (and realloc, etc.) through templates to achieve type-safety:
template <typename T>
T * myMalloc(const size_t p_iSize)
{
return static_cast<T *>(malloc(sizeof(T) * p_iSize)) ;
}
Which would be used like:
int * p = myMalloc<int>(25) ;
free(p) ;
MyStruct * p2 = myMalloc<MyStruct>(12) ;
free(p2) ;
and the following code:
// error: cannot convert ‘int*’ to ‘short int*’ in initialization
short * p = myMalloc<int>(25) ;
free(p) ;
won't compile, so, no problemo.
All in all, in pure C++, you now have no excuse if someone finds more than one C malloc inside your code...
:-)
C + C++ crossover
Sometimes, you want to produce code that will compile both in C and in C++ (for whatever reasons... Isn't it the point of the C++ extern "C" {} block?). In this case, C++ demands the cast, but C won't understand the static_cast keyword, so the solution is the C-style cast (which is still legal in C++ for exactly this kind of reasons).
Note that even with writing pure C code, compiling it with a C++ compiler will get you a lot more warnings and errors (for example attempting to use a function without declaring it first won't compile, unlike the error mentioned above).
So, to be on the safe side, write code that will compile cleanly in C++, study and correct the warnings, and then use the C compiler to produce the final binary. This means, again, write the cast, in a C-style cast.

One possible error it can introduce is if you are compiling on a 64-bit system using C (not C++).
Basically, if you forget to include stdlib.h, the default int rule will apply. Thus the compiler will happily assume that malloc has the prototype of int malloc(); On Many 64-bit systems an int is 32-bits and a pointer is 64-bits.
Uh oh, the value gets truncated and you only get the lower 32-bits of the pointer! Now if you cast the return value of malloc, this error is hidden by the cast. But if you don't you will get an error (something to the nature of "cannot convert int to T *").
This does not apply to C++ of course for 2 reasons. Firstly, it has no default int rule, secondly it requires the cast.
All in all though, you should just new in c++ code anyway :-P.

Well, I think it's the exact opposite - always directly cast it to the needed type. Read on here!

The "forgot stdlib.h" argument is a straw man. Modern compilers will detect and warn of the problem (gcc -Wall).
You should always cast the result of malloc immediately. Not doing so should be considered an error, and not just because it will fail as C++. If you're targeting a machine architecture with different kinds of pointers, for example, you could wind up with a very tricky bug if you don't put in the cast.
Edit: The commentor Evan Teran is correct. My mistake was thinking that the compiler didn't have to do any work on a void pointer in any context. I freak when I think of FAR pointer bugs, so my intuition is to cast everything. Thanks Evan!

Actually, the only way a cast could hide an error is if you were converting from one datatype to an smaller datatype and lost data, or if you were converting pears to apples. Take the following example:
int int_array[10];
/* initialize array */
int *p = &(int_array[3]);
short *sp = (short *)p;
short my_val = *sp;
in this case the conversion to short would be dropping some data from the int. And then this case:
struct {
/* something */
} my_struct[100];
int my_int_array[100];
/* initialize array */
struct my_struct *p = &(my_int_array[99]);
in which you'd end up pointing to the wrong kind of data, or even to invalid memory.
But in general, and if you know what you are doing, it's OK to do the casting. Even more so when you are getting memory from malloc, which happens to return a void pointer which you can't use at all unless you cast it, and most compilers will warn you if you are casting to something the lvalue (the value to the left side of the assignment) can't take anyway.

#if CPLUSPLUS
#define MALLOC_CAST(T) (T)
#else
#define MALLOC_CAST(T)
#endif
...
int * p;
p = MALLOC_CAST(int *) malloc(sizeof(int) * n);
or, alternately
#if CPLUSPLUS
#define MYMALLOC(T, N) static_cast<T*>(malloc(sizeof(T) * N))
#else
#define MYMALLOC(T, N) malloc(sizeof(T) * N)
#endif
...
int * p;
p = MYMALLOC(int, n);

People have already cited the reasons I usually trot out: the old (no longer applicable to most compilers) argument about not including stdlib.h and using sizeof *p to make sure the types and sizes always match regardless of later updating. I do want to point out one other argument against casting. It's a small one, but I think it applies.
C is fairly weakly typed. Most safe type conversions happen automatically, and most unsafe ones require a cast. Consider:
int from_f(float f)
{
return *(int *)&f;
}
That's dangerous code. It's technically undefined behavior, though in practice it's going to do the same thing on nearly every platform you run it on. And the cast helps tell you "This code is a terrible hack."
Consider:
int *p = (int *)malloc(sizeof(int) * 10);
I see a cast, and I wonder, "Why is this necessary? Where is the hack?" It raises hairs on my neck that there's something evil going on, when in fact the code is completely harmless.
As long as we're using C, casts (especially pointer casts) are a way of saying "There's something evil and easily breakable going on here." They may accomplish what you need accomplished, but they indicate to you and future maintainers that the kids aren't alright.
Using casts on every malloc diminishes the "hack" indication of pointer casting. It makes it less jarring to see things like *(int *)&f;.
Note: C and C++ are different languages. C is weakly typed, C++ is more strongly typed. The casts are necessary in C++, even though they don't indicate a hack at all, because of (in my humble opinion) the unnecessarily strong C++ type system. (Really, this particular case is the only place I think the C++ type system is "too strong," but I can't think of any place where it's "too weak," which makes it overall too strong for my tastes.)
If you're worried about C++ compatibility, don't. If you're writing C, use a C compiler. There are plenty really good ones avaliable for every platform. If, for some inane reason, you have to write C code that compiles cleanly as C++, you're not really writing C. If you need to port C to C++, you should be making lots of changes to make your C code more idiomatic C++.
If you can't do any of that, your code won't be pretty no matter what you do, so it doesn't really matter how you decide to cast at that point. I do like the idea of using templates to make a new allocator that returns the correct type, although that's basically just reinventing the new keyword.

Casting a function which returns (void *) to instead be an (int *) is harmless: you're casting one type of pointer to another.
Casting a function which returns an integer to instead be a pointer is most likely incorrect. The compiler would have flagged it had you not explicitly cast it.

One possible error could (depending on this is whether what you really want or not) be mallocing with one size scale, and assigning to a pointer of a different type. E.g.,
int *temp = (int *)malloc(sizeof(double));
There may be cases where you want to do this, but I suspect that they are rare.

I think you should put the cast in. Consider that there are three locations for types:
T1 *p;
p = (T2*) malloc(sizeof(T3));
The two lines of code might be widely separated. Therefore it's good that the compiler will enforce that T1 == T2. It is easier to visually verify that T2 == T3.
If you miss out the T2 cast, then you have to hope that T1 == T3.
On the other hand you have the missing stdlib.h argument - but I think it's less likely to be a problem.

On the other hand, if you ever need to port the code to C++, it is much better to use the 'new' operator.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js