In C++20 we can write:
double x;
double x_value = std::atomic_ref(x).load();
Is there a function with the same effect?
I have tried std::atomic_load but there seem to be no overloads for non-atomic objects.
Non-portably of course, there is GNU C __atomic_load_n(&x, __ATOMIC_SEQ_CST) __atomic builtin.
I'm pretty sure you don't find a function in ISO C++ that takes a double * or double &.
Possibly one that takes a std::atomic_ref<double> * or reference, I didn't check, but I think the intent of atomic_ref is to be constructed on the fly for free inside a function that needs it.
If you want such a function, write you own that constructs + uses an atomic_ref. It will all inline down to an __atomic_load_n on compilers where atomic uses that under the hood anyway.
But do make sure to declare your global like this, to make sure it's safe + efficient to use with atomic_ref. It's UB (I think) to take an atomic_ref to an object that's not sufficiently aligned, so the atomic_ref constructor can simply assume that the object you use is aligned the same as atomic<T> needs to be.
alignas (std::atomic_ref<double>::required_alignment) double x;
In practice that's only going to be a problem for 8-byte primitive types like double inside structs on 32-bit targets, but something like struct { char c[8]; } could in practice be not naturally aligned if you don't ask for alignment.
Related
Say I have the function
int foo(int * const bar){
while(!*bar){
printf("qwertyuiop\n");
}
}
where I intend to change the value at bar to something other than 0 to stop this loop. Would it be appropriate to instead write it as below?
int foo(int volatile * const bar){
while(!*bar){
printf("qwertyuiop\n");
}
}
volatile was intended for things like memory-mapped device registers, where the pointed-to value could "magically" change "behind the compiler's back" due to the nature of the hardware involved. Assuming you're not writing code that deals with special hardware that might "spontaneously" change the value that bar points to, then you needn't (and shouldn't) use the volatile keyword. Simply omitting the const keyword is sufficient to let the compiler (and any programmer that might call the function) know that the pointed-to value is subject to change.
Note that if you are intending to set *bar from another thread, then the volatile keyword isn't good enough; even if you tag the pointer volatile, the compiler still won't guarantee correct handling. For that use-case to work correctly, you need to either synchronize all reads and writes to *bar with a mutex, or alternatively use a std::atomic<int> instead of a plain int.
Problem
Is it safe to cast a complex * to a float * or double * pointer using reinterpret_cast()
thrust::complex<float> *devicePtr; // only to show type, devicePtr otherwise lives in an object
/* OR */
float _Complex *devicePtr;
/* OR */
std::complex<float> *devicePtr;
cublasScnrm2(cublasv2handle,n,(cuComplex*)xarray,1,reinterpret_cast<float *>(obj->devicePtr));
If not, are there clever ways to solve this problem?
Restrictions
obj is a C struct (so no direct operator overloading possible)
I cannot store devicePtr as a float * within obj
devicePtr only ever holds a pointer to a single value. It may be relevant given the reinterpret_cast trickery but behind the scenes devicePtr is part of a pool:
static thrust::complex<float> *pool;
/* OR */
static float _Complex *pool;
/* OR */
std::complex<float> *pool;
void giveObjectDevicePtr(object obj)
{
for (int i = 0; i < poolSize; ++i) {
if (poolEntryIsFree(pool,i)) obj->devicePtr = pool+i;
}
}
The cublas call is made asynchronously on a stream, so copying the contents of devicePtr up to host and syncing stream to perform conversion is to be avoided.
Likewise, launching a micro kernel is also not ideal but perhaps unavoidable.
I have seen many questions about casting double * or float * to complex * but not many the other way around.
Underneath the hood, complex types usable in CUDA should generally be a struct of two values. You can see what I mean by looking at the cuComplex.h header file, as one possible example.
Casting a pointer to such, to a pointer type consistent with the values in that struct, should generally be less risky than the other way around (the other way around has additional alignment requirements beyond the base type).
If you posit the type you are discussing, exactly, then I claim this question has nothing to do with CUDA, and is really just a c++ question.
If you do such a cast, then provide that to a cublas function, in the general case I think you're going to be computing over both real and imaginary components, which seems weird to me. It should not be an issue for the case you have shown, however.
You also seem to have some confusion about where a device pointer lives:
copying devicePtr up to host
Any device pointer usable in a CUBLAS call for recent versions of CUBLAS lives in host memory.
According to the documentation, this should theoretically be possible, though it is not explicit.
If you were dealing with std::complex<T>, the answer would be a definitive "yes". According to cppreference, a pointer to a std::complex<T> array can be reinterpret_cast to a pointer to a T array, with the intuitive semantics. This is for compatibility with C's complex numbers.
Now thrust::complex<T>, the documentation states, "It is functionally identical to it, but can also be used in device code which std::complex currently cannot." Whether or not "functionally identical" includes compatibility with C's complex types is not explicit. That said, the structure is laid out as one would expect std::complex<T> to be laid out, which means (in a practical sense) it's likely that such a cast will work just like for std::complex<T>.
This question already has answers here:
Pass int by const reference or by value , any difference? [duplicate]
(4 answers)
Closed 4 years ago.
There are multiple ways of making a method. I'm not quite sure when to use const and reference in method parameters.
Imagine a method called 'getSum' that returns the sum of two integers. The parameters in such a method can have multiple forms.
int getSum1(int, int);
int getSum2(int&, int&);
int getSum3(const int, const int);
int getSum4(const int&, const int&);
Correct me if I'm wrong, but here's how I see these methods:
getSum1 - Copies integers and calculates
getSum2 - Doesn't copy integers, but uses the values directly from memory and calculates
getSum3 - Promises that the values won't change
getSum4 - Promises that the values won't change & doesn't copy the integers, but uses the values directly from memory
So here are some questions:
So is getSum2 faster than getSum1 since it doesn't copy the integers, but uses them directly?
Since the values aren't changed, I don't think 'const' makes any difference in this situation, but should it still be there for const correctness?
Would it be the same with doubles?
Should a reference only be used with very large parameters? e.g. if I were to give it a whole class, then it would make no sense to copy the whole thing
For integers, this is irrelevant in practice. Processors work with registers (and an int fits in a register in all but the most exotic hardware), copying a register is basically the cheapest operation (after a noop) and it may not even be necessary if the compiler allocates registers in a smart way.
Use this if you want to change the passed ints. Non-const reference parameters generally indicate that you intend to modify the argument (for example, store multiple return values).
This does exactly the same as 1. for basically the same reason. You cannot change the passed ints but nobody would be any the wiser if you did (i.e. used 1. instead).
Again, this will effectively do the same thing as 1. for ints (or doubles, if your CPU handles them natively) because the compiler understands that passing a const pointer to an int (or double) is the same as providing a copy, but the latter avoids unnecessary trips to memory. Unless you take a pointer to the arguments (in which case the compiler would have to guarantee it points to the int on the call site) this is thus pointless.
Note that the above is not in terms of the C++ abstract machine but in terms of what happens with modern hardware/compilers. If you are working on hardware without dedicated floating point capabilities or where ints don't fit in registers, you have to be more careful. I don't have an overview over current embedded hardware trends, but unless you literally write code for toasters, you should be good.
If you are not dealing with ints but with (large) classes, then the semantic differences are much stronger:
The function receives a copy. Note that if you pass in a temporary, that copy may be move-constructed (or even better, elided).
Same as in the "int section", use this over 4. only if you want to change the passed value.
You receive a copy that cannot be changed. This is generally not very useful outside of specific circumstances (or for marginal code clarity increases).
This should be the default to pass a large class (well, pretty much anything bigger than a pointer) if you intend to only read from (or call const methods on) it.
You are correct. the values of a and b would not be copied. But the addresses to a and b would be copied, and in this case you would not gain any speed since int and pointer to int are of the same (or about the same) size. You would gain speed if the size of the arguments to the function is large, like a struct or class as you mention in Q4.
2)
Const means that you can not change the value of the parameter. If it is not declared as a const you can change it inside the function, but the original value or variable you used when calling the function will not be changed.
int getSum1(int a, int b)
{
a = a + 5;
return a + b;
}
int a, b, foo;
a = 10;
b = 5;
foo = getSum1(a, b);
In this case foo has the value 20
a equals 10
b equals 5
Since the modification of a is only local to the function getSum1()
I learned that pointer aliasing may hurt performance, and that a __restrict__ attribute (in GCC, or equivalent attributes in other implementations) may help keeping track of which pointers should or should not be aliased. Meanwhile, I also learned that GCC's implementation of valarray stores a __restrict__'ed pointer (line 517 in https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.1/valarray-source.html), which I think hints the compiler (and responsible users) that the private pointer can be assumed not to be aliased anywhere in valarray methods.
But if we alias a pointer to a valarray object, for example:
#include <valarray>
int main() {
std::valarray<double> *a = new std::valarray<double>(10);
std::valarray<double> *b = a;
return 0;
}
is it valid to say that the member pointer of a is aliased too? And would the very existence of b hurt any optimizations that valarray methods could benefit otherwise? (Is it bad practice to point to optimized pointer containers?)
Let's first understand how aliasing hurts optimization.
Consider this code,
void
process_data(float *in, float *out, float gain, int nsamps)
{
int i;
for (i = 0; i < nsamps; i++) {
out[i] = in[i] * gain;
}
}
In C or C++, it is legal for the parameters in and out to point to overlapping regions in memory.... When the compiler optimizes the function, it does not in general know whether in and out are aliases. It must therefore assume that any store through out can affect the memory pointed to by in, which severely limits its ability to reorder or parallelize the code (For some simple cases, the compiler could analyze the entire program to determine that two pointers cannot be aliases. But in general, it is impossible for the compiler to determine whether or not two pointers are aliases, so to be safe, it must assume that they are).
Coming to your code,
#include <valarray>
int main() {
std::valarray<double> *a = new std::valarray<double>(10);
std::valarray<double> *b = a;
return 0;
}
Since a and b are aliases. The underlying storage structure used by valarray will also be aliased(I think it uses an array. Not very sure about this). So, any part of your code that uses a and b in a fashion similar to that shown above will not benefit from compiler optimizations like parallelization and reordering. Note that JUST the existence of b will not hurt optimization but how you use it.
Credits:
The quoted part and the code is take from here. This should serve as a good source for more information about the topic as well.
is it valid to say that the member pointer of a is aliased too?
Yes. For example, a->[0] and b->[0] reference the same object. That's aliasing.
And would the very existence of b hurt any optimizations that valarray methods could benefit otherwise?
No.
You haven't done anything with b in your sample code. Suppose you have a function much larger than this sample code that starts with the same construct. There's usually no problem if the first several lines of that function uses a but never b, and the remaining lines uses b but never a. Usually. (Optimizing compilers do rearrange lines of code however.)
If on the other hand you intermingle uses of a and b, you aren't hurting the optimizations. You are doing something much worse: You are invoking undefined behavior. "Don't do it" is the best solution to the undefined behavior problem.
Addendum
The C restrict and gcc __restrict__ keywords are not constraints on the developers of the compiler or the standard library. Those keywords are promises to the compiler/library that restricted data do not overlap other data. The compiler/library doesn't check whether the programmer violated this promise. If this promise enables certain optimizations that might otherwise be invalid with overlapping data, the compiler/library is free to apply those optimizations.
What this means is that restrict (or __restrict__) is a restriction on you, not the compiler. You can violate those restrictions even without your b pointer. For example, consider
*a = a->[std::slice(a.size()-1,a.size(),-1)];
This is undefined behavior.
Hey there,
I wonder if it's worth passing primitive single values like int, float, double or char by pointer? Probably it's not worth!? But if you would simply pass everything by pointer, is this making the program slower?
Should you always just pass arrays as pointer?
Thanks!
I wonder if it's worth passing primitive single values like int, float, double or char by pointer?
What are you trying to accomplish? Do you want to be able to write to the passed in value? Or do you just need to use it? If you want to write to it, the idiomatic way is to pass by reference. If you don't need to write to it, you're best avoiding any risk that you'll write to it accidentally and pass by value. Pass by value will make a copy of the variable for local use. (as an aside, if you don't want to make a copy AND want some level of safety, you can pass by const reference)
But if you would simply pass everything by pointer, is this making the program slower?
Difficult to say. Depends on a lot of things. In both pass by value and pass by reference (or pointer) your making a new primitive type. In pass by value, you're making a copy. In pass by reference/pointer you're passing an address to the original. In the latter case, however, you're requiring an extra fetch of memory that may or may not be cached. Its very difficult to say 100% without measuring it.
That all being said, I doubt the difference is even noticeable. The compiler may be able to optimize out the copy in many pass-by-value cases, as indicated in this article. (thanks Space C0wb0y).
Should you always just pass arrays as pointer?
From this.
In C++ it is not possible to pass a complete block of memory by value as a parameter to a function, but we are allowed to pass its address.
To pass an array:
int foo(int bar[], unsigned int length)
{
// do stuff with bar but don't go past length
}
I'd recommended avoiding arrays and using std::vector which has more easily understood copy semantics.
It's probably not worth passing primitive values by pointer if your concern is speed -- you then have the overhead of the "indirection" to access the value.
However, pointers often are the "width of the bus", meaning the processor can send the whole value at once, and not "shift" values to send-down-the-bus. So, it is possible pointers are transferred on the bus faster than smaller types (like char). That's why the old Cray computers used to make their char values 32 bits (the width of the bus at that time).
When dealing with large objects (such as classes or arrays) passing pointer is faster than copying the whole object onto the stack. This applies to OOP for example
Look in your favorite C++ textbook for a discussion of "output parameters".
Some advantages of using a pointer for output parameters instead of a reference are:
No surprising behavior, no action at a distance, the semantics are clear at the call site as well as the caller.
Compatibility with C (which your question title suggests is important)
Usable by other languages, functions exported from a shared library or DLL should not use C++-only features such as references.
You should rarely have to pass anything by pointer. If you need to modify the value of the parameter, or want to prevent a copy, pass by reference, otherwise pass by value.
Note that preventing a copy can also be done by copy-elision, so you have to be very careful not to fall into the trap of premature optimization. This can actually make your code slower.
There's is no real answer to your question except few rules that I tend to bare in mind:
char is 8 bytes and a pointer is 4 bytes so never pass a single char as a pointer.
after things like int and float are the same size as a pointer but a pointer has to be referenced so that technically takes more time
if we go to the pentium i386 assembler:
loading the value in a register of a parameter "a" in C which is an int:
movl 8(%ebp),%eax
the same thing but passed as a pointer:
movl 8(%ebp),%eax
movl (%eax),%eax
Having to dereference the pointer takes another memory operation so theorically (not sure it is in real life) passing pointers is longer...
After there's the memory issue. If you want to code effectively everything composed type (class,structure,arrays...) has to be passed by pointer.
Just imagine doing a recursive function with a type of 16bytes that is passed by copy for 1000 calls that makes 16000 bytes in the stack (you don't really want that do you ? :) )
So to make it short and clear: Look at the size of your type if it's bigger than a pointer pass it by pointer else pass it by copy...
Pass primitive types by value and objects as const references. Avoid pointers as much as you can. Dereferencing pointers have some overhead and it clutters code. Compare the two versions of the factorial function below:
// which version of factorial is shorter and easy to use?
int factorial_1 (int* number)
{
if ((*number) <= 1)
return 1;
int tmp = (*number) - 1;
return (*number) * factorial_1 (&tmp);
}
// Usage:
int r = 10;
factorial_1 (&r); // => 3628800
int factorial_2 (int number)
{
return (number <= 1) ? 1 : (number * factorial_2 (number - 1));
}
// Usage:
// No need for the temporary variable to hold the argument.
factorial_1 (10); // => 3628800
Debugging becomes hard, as you cannot say when and where the value of an object could change:
int a = 10;
// f cound modify a, you cannot guarantee g that a is still 10.
f (&a);
g (&a);
Prefer the vector class over arrays. It can grow and shrink as needed and keeps track of its size. The way vector elements are accessed is compatible with arrays:
int add_all (const std::vector<int>& vec)
{
size_t sz = vec.size ();
int sum = 0;
for (size_t i = 0; i < sz; ++i)
sum += vec[i];
}
NO, the only time you'd pass a non-const reference is if the function requires an output parameter.