Increment pointer when Unpacking Variadic Arguments with Lambda - c++

I have the following code which has a function VariadicWrite and this function accepts a pointer that it can modify by incrementing it to it points to a different memory location depending on how much data is written to it:
#include <iostream>
#include <cstring>
template<typename T>
T Read(void* &ptr) noexcept
{
T result;
memcpy(&result, ptr, sizeof(result));
ptr = static_cast<T*>(ptr) + 1;
return result;
}
template<typename T>
void Write(void* &ptr, T result) noexcept
{
memcpy(ptr, &result, sizeof(result));
ptr = static_cast<T*>(ptr) + 1;
}
template<typename... Args>
auto VariadicWrite(void*& ptr, Args&&... args)
{
(Write(ptr, args), ...); //unpack Args
return 0;
}
int main()
{
void* ptr = malloc(1024);
memset(ptr, 0, 1024);
void* temp = ptr;
VariadicWrite(ptr, 1, 2, 3);
std::cout<<Read<int>(ptr)<<"\n";
std::cout<<Read<int>(ptr)<<"\n";
std::cout<<Read<int>(ptr)<<"\n";
free(temp);
return 0;
}
The problem here is that the code prints out 0, 0, 0, 0 if I use: void*& ptr.
If I do void* ptr it prints out 0, 1, 2, 3 but the ptr pointer is never incremented.
How can I modify the ptr pointer of VariadicWrite? I thought that void*& would have worked but in this case it doesn't :S

The problem is your VariadicWrite() call will modify ptr to point to the end of your written data. Then you call Read() without resetting the pointer back to the start, so you read zeros from the uninitialized portion of your buffer that follows the data already written.
Insert ptr = temp; between the write and read and see if that fixes it.
The reason void* ptr does not work is that each call to Write(ptr, ...) will increment the local copy of the argument in the scope of the Write() function. The variable ptr in VariadicWrite() does not change after a call to Write() so the next call will use the same value.
If you change to VariadicWrite(void* ptr, …) and Write(void*& ptr, …), you might get the behavior you want. But I would suggest this a bad idea.
As we can see from the bug in your example, knowing if the function will modify the pass-by-reference parameter or not is of critical importance, yet not readily apparent from the code using the function. This tends to invite bugs just like the one you have created here. An inconsistent interface, where VariadicWrite() does not modify its argument but Write() does, will only make it doubly hard to avoid this kind of bug.
Generally, it's better to avoid non-const references because they often result in bugs like this. I suggest returning the new pointer instead of modifying the argument.
template<typename T>
void* Write(void* ptr, const T& arg)
{
return static_cast<T*>(ptr) + 1;
}
template<typename... Args>
void* WriteV(void* ptr, Args&&... args)
{
((ptr = Write(ptr, args)), ...);
return ptr;
}

Related

C/C++ Inline coercion of array of pointers (void**)

I am using a library function that expects an array of pointers (void**) at some point, it works more or less like this.
void* args[] = { &var_a, &var_b, &var_c, ... };
someFunction(args);
After using it once, I would like to call the same function again, so what I do is create another variable like:
void* args_2[] = { &var_d, &var_e, &var_f, ... };
someFunction(args_2);
And so on ...
I would like to find a way to recycle the args symbol, so I don't have to do args_2, args_3, args_4 every time I call it; but when I try to reassign it like:
args = { &var_d, &var_e, &var_f, ... };
I get the following:
error: assigning to an array from an initializer list
I understand the error but I don't know how to avoid it or coerce this thing into the intended array of pointers type.
I know they are two different languages, but I am looking for a solution that works in both C and C++.
Why use a local variable at all, if you only need it once to call someFunction?
using args = void *[];
someFunction(args{&a, &b});
someFunction(args{&a, &b, &c});
Alternatively, C++ify this a bit more with a wrapper:
template <class... T>
decltype(auto) someFunction(T *... args) {
void *args_[] { args... };
return someFunction(args_);
}
someFunction(&a, &b);
someFunction(&a, &b, &c);
You could make use of compound literals as below with pointer to pointer.
void** args = (void *[]){ &var_a, &var_b, &var_c, ... };
args = (void *[]){ &var_d, &var_e, &var_f, ... };

Smart replacement for 'new[]' to make_unique

I am in process of replacing statements like:
auto pszOutBuffer = new char[dwSize];
ReadDataFromHttp(pszOutBuffer, dwSize);
if(dwSize>100)
ParseHttpData(pszOutBuffer);
...
delete []pszOutBuffer;
TO:
auto OutBufferPtr = make_unique<char[]>(dwSize);
auto pszOutBuffer = OutBufferPtr.get();
ReadDataFromHttp(pszOutBuffer, dwSize);
if(dwSize>100)
ParseHttpData(pszOutBuffer);
...
So that I get advantage of smart pointer unique_ptr. I would like to keep the variable pszOutBuffer as is so less changes in Git commits appear, and unique_ptr.get() isn't required to be repeated.
To make this simple, and less verbose to read; I thought of writing a macro MAKE_UNIQUE_PTR(type,size) which will be a single statement (not two as shown above). But such macro will not be able to have unqiue_ptr as well as pszOutBuffer as in:
auto pszOutBuffer = MAKE_UNIQUE_PTR(char, dwSize);
I may think of writing a function template, but then... how to keep unique_ptr after function (MAKE_UNIQUE_PTR) returns?
EDIT:
With this hypothetical macro/function, the code would simply be:
auto pszOutBuffer = MAKE_UNIQUE_PTR(char, dwSize);
ReadDataFromHttp(pszOutBuffer, dwSize);
if(dwSize>100)
ParseHttpData(pszOutBuffer);
...
With these advantages:
unique_ptr still controls the life time of buffer.
Raw-style pointer is still in place without chaging them to unique_ptr::get() calls.
Hence can safely delete delete[] calls from multiple code paths.
NOTE that the very first code is the code I've in hand. No macro, no unique_ptr- just legacy code having new and delete. And yeah... those Hungarian notation variables.
Unfortunately, std::make_unique<T[N]>() is not supported for reasons you can see in the original proposal. However, nothing stops you from crafting your own make_unique for arrays (e.g., make_unique_array) as below:
template<typename T>
std::enable_if_t<std::is_array<T>::value, std::unique_ptr<T>>
make_unique_array(std::size_t const n) {
using RT = std::remove_extent_t<T>;
return std::unique_ptr<T>(new RT[n]);
}
As you are already using macro you can write one which creates both. It's ugly but does its job.
#define SmartMacro(pszOutBuffer, dwSize) \
auto pszOutBuffer##ptr = make_unique<char[]>(dwSize); \
auto pszOutBuffer = pszOutBuffer##ptr.get(); \
memset(pszOutBuffer, 0, dwSize);
// Usage
SmartMacro(buffer, 10);
// 'buffer' is the raw pointer
// 'bufferptr' is the unique pointer
template<class T, class D>
struct smart_unique:std::unique_ptr<T,D> {
using std::unique_ptr<T,D>::unique_ptr;
operator T*()const{return this->get();}
operator T const*()const{return this->get();}
};
template<class T, class D>
struct make_smart_unique_t {
template<class...Ts>
smart_unique<T, D> operator()(Ts&&...ts)const{
return smart_unique<T,D>( new T(std::forward<Ts>(ts)...); );
}
template<class T0, class...Ts>
smart_unique<T, D> operator()(std::initializer_list<T0> il, Ts&&...ts)const{
return smart_unique<T,D>( new T(il, std::forward<Ts>(ts)...); );
}
};
template<class T, class D>
struct make_smart_unique_t<T[], D> {
smart_unique<T[], D> operator[](std::size_t N)const{
return smart_unique<T[],D>( new T[N]; );
}
template<class...Ts>
smart_unique<T[], D> operator()(Ts&&...ts)const{
return smart_unique<T[],D>( new T[sizeof...(Ts)]{std::forward<Ts>(ts)...}; );
}
};
template<class T, class D=std::default_delete<T>>
constexpr make_smart_unique_t<T,D> make_smart_unique{};
this should support:
auto pszOutBuffer = make_smart_unique<char[]>[dwSize];
ReadDataFromHttp(pszOutBuffer, dwSize);
if(dwSize>100)
ParseHttpData(pszOutBuffer);
as well as:
auto pszDataBuffer = make_smart_unique<int[]>(1,2,3,4,5);
and
auto pszDataBuffer = make_smart_unique<int>();
no macro magic needed.
The design here is simple: make_smart_unique<scalar> is similar to make_unique, but it returns a smart_unique instead (which implicitly casts to T*: be very careful!)
make_smart_unique<Array[]> has two different ways to invoke it. With [N] is creates an array of the passed in size: with (args...) it creates an array of the size of the number of elements, and constructs each one from each arg.

Wrapper function for cudaMalloc and cudaMemcpy

I was sick of looking at all the boilerplate cuda code for copying data to the device so I wrote this wrapper function:
void allocateAndCopyToDevice(void* device_array, const void* host_array, const size_t &count)
{
gpuErrchk(cudaMalloc((void**)&device_array, count));
gpuErrchk(cudaMemcpy(device_array, host_array, count, cudaMemcpyHostToDevice));
}
but for some reason this resulted in an out of bounds memory access whenever using an array initialized in this way. The initialization code that I used looked like this:
cuDoubleComplex *d_cmplx;
allocateAndCopyToDevice(d_cmplx,cmplx,size*sizeof(cuDoubleComplex));
Could anyone explain why this doesn't work?
After seeing immibis's comment I realized that cudaMalloc expects a pointer to a pointer, so instead I'm passing by value the pointer to the pointer:
void allocateAndCopyToDevice(void** device_array, const void* host_array, const size_t &count)
{
gpuErrchk(cudaMalloc(device_array, count));
gpuErrchk(cudaMemcpy(*device_array, host_array, count, cudaMemcpyHostToDevice));
}
and the initialization now looks like this:
cuDoubleComplex *d_cmplx;
allocateAndCopyToDevice((void **)&d_cmplx,cmplx,size*sizeof(cuDoubleComplex));
It works, but I'm still wondering if there is a better way of doing this? How do other people handle memory transfers in cuda code?
I would do something like
template <typename T>
T* allocateAndCopyToDevice(const T* host_array, std::size_t count)
{
// some static_assert for allowed types: pod and built-in.
T* device_array = nullptr;
gpuErrchk(cudaMalloc(&device_array, count * sizeof(T)));
gpuErrchk(cudaMemcpy(device_array, host_array, count * sizeof(T), cudaMemcpyHostToDevice));
return device_array;
}
and use it:
cuDoubleComplex *d_cmplx = allocateAndCopyToDevice(cmplx, size);

Understanding the overhead of lambda functions in C++11

This was already touched in Why C++ lambda is slower than ordinary function when called multiple times? and C++0x Lambda overhead
But I think my example is a bit different from the discussion in the former and contradicts the result in the latter.
On the search for a bottleneck in my code I found a recusive template function that processes a variadic argument list with a given processor function, like copying the value into a buffer.
template <typename T>
void ProcessArguments(std::function<void(const T &)> process)
{}
template <typename T, typename HEAD, typename ... TAIL>
void ProcessArguments(std::function<void(const T &)> process, const HEAD &head, const TAIL &... tail)
{
process(head);
ProcessArguments(process, tail...);
}
I compared the runtime of a program that uses this code together with a lambda function as well as a global function that copies the arguments into a global buffer using a moving pointer:
int buffer[10];
int main(int argc, char **argv)
{
int *p = buffer;
for (unsigned long int i = 0; i < 10E6; ++i)
{
p = buffer;
ProcessArguments<int>([&p](const int &v) { *p++ = v; }, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
}
}
compiled with g++ 4.6 and -O3 measuring with the tool time takes more than 6 seconds on my machine while
int buffer[10];
int *p = buffer;
void CopyIntoBuffer(const int &value)
{
*p++ = value;
}
int main(int argc, char **argv)
{
int *p = buffer;
for (unsigned long int i = 0; i < 10E6; ++i)
{
p = buffer;
ProcessArguments<int>(CopyIntoBuffer, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
}
return 0;
}
takes about 1.4 seconds.
I do not get what is going on behind the scenes that explains the time overhead and am wondering if I can change something to make use of lambda functions without paying with runtime.
The problem here is your usage of std::function.
You send it by copy and therefore copying its contents (and doing that recursively as you unwind parameters).
Now, for pointer to function, contents is, well, just pointer to function.
For lambda, contents are at least pointer to function + reference that you captured. This is twice as much to copy. Plus, because of std::function's type erasure copying any data will most likely be slower (not inlined).
There are several options here, and the best would probably be passing not std::function, but template instead. The benefits are that your method call is more likely to be inlined, no type erasure happens by std::function, no copying happens, everything is so very good. Like that:
template <typename TFunc>
void ProcessArguments(const TFunc& process)
{}
template <typename TFunc, typename HEAD, typename ... TAIL>
void ProcessArguments(const TFunc& process, const HEAD &head, const TAIL &... tail)
{
process(head);
ProcessArguments(process, tail...);
}
Second option is doing the same, but sending the process by copy. Now, copying does happen, but still is neatly inlined.
What's equally important is that process' body can also be inlined, especially for lamda. Depending on complexity of copying the lambda object and its size, passing by copy may or may not be faster than passing by reference. It may be faster because compiler may have harder time reasoning about reference than the local copy.
template <typename TFunc>
void ProcessArguments(TFunc process)
{}
template <typename TFunc, typename HEAD, typename ... TAIL>
void ProcessArguments(TFunc process, const HEAD &head, const TAIL &... tail)
{
process(head);
ProcessArguments(process, tail...);
}
Third option is, well, try passing std::function<> by reference. This way you at least avoid copying, but calls will not be inlined.
Here are some perf results (using ideones' C++11 compiler).
Note that, as expected, inlined lambda body is giving you best performance:
Original function:
0.483035s
Original lambda:
1.94531s
Function via template copy:
0.094748
### Lambda via template copy:
0.0264867s
Function via template reference:
0.0892594s
### Lambda via template reference:
0.0264201s
Function via std::function reference:
0.0891776s
Lambda via std::function reference:
0.09s

What would be a proper invalid value for a pointer?

Suppose I have this code. Your basic "if the caller doesn't provide a value, calculate value" scenario.
void fun(const char* ptr = NULL)
{
if (ptr==NULL) {
// calculate what ptr value should be
}
// now handle ptr normally
}
and call this with either
fun(); // don't know the value yet, let fun work it out
or
fun(something); // use this value
However, as it turns out, ptr can have all kinds of values, including NULL, so I can't use NULL as a signal that the caller doesn't provide ptr.
So I'm not sure what default value to give ptr now instead of NULL. What magic value can I use? Does anybody have ideas?
void fun()
{
// calculate what ptr value should be
const char* ptr = /*...*/;
// now handle ptr normally
fun(ptr);
}
Depending on your platform, a pointer is likely either a 32 or 64-bit value.
In those cases, consider using:
0xFFFFFFFF or 0xFFFFFFFFFFFFFFFF
But I think the bigger question is, "How can NULL be passed as a valid parameter?"
I'd recommend instead having another parameter:
void fun(bool isValidPtr, const char* ptr = NULL)
or maybe:
void fun( /*enum*/ ptrState, const char* ptr = NULL)
I agree with all the other answers provided, but here's one more way of handling that, which to me personally looks more explicit, if more verbose:
void fun()
{
// Handle no pointer passed
}
void fun(const char* ptr)
{
// Handle non-nullptr and nullptr separately
}
You should use the nullptr for that. Its new in the C++11 standart. Have a look here for some explanation.
Using overloaded versions of the same function for different input is best, but if you want to use a single function, you could make the parameter be a pointer-to-pointer instead:
void fun(const char** ptr = NULL)
{
if (ptr==NULL) {
// calculate what ptr value should be
}
// now handle ptr normally
}
Then you can call it like this:
fun();
.
char *ptr = ...; // can be NULL
fun(&ptr);
If you want a special value that corresponds to no useful argument, make one.
header file:
extern const char special_value;
void fun(const char* ptr=&special_value);
implementation:
const char special_value;
void fun(const char* ptr)
{
if (ptr == &special_value) ....
}
1?
I can't imagine anyone allocating you memory with that address.