Different ways to make kernel - c++

In this tutorial
There are 2 methods to run the kernel, and another one mentioned in the comments:
1.
cl::KernelFunctor simple_add(cl::Kernel(program,"simple_add"),queue,cl::NullRange,cl::NDRange(10),cl::NullRange);
simple_add(buffer_A,buffer_B,buffer_C);
However, I found out, that KernelFunctor has gone.
So I tried the alternative way:
2.
cl::Kernel kernel_add=cl::Kernel(program,"simple_add");
kernel_add.setArg(0,buffer_A);
kernel_add.setArg(1,buffer_B);
kernel_add.setArg(2,buffer_C);
queue.enqueueNDRangeKernel(kernel_add,cl::NullRange,cl::NDRange(10),cl::NullRange);
queue.finish();
It compiles and runs succussfully.
However, there is a 3rd option in the comments:
3.
cl::make_kernel simple_add(cl::Kernel(program,"simple_add"));
cl::EnqueueArgs eargs(queue,cl::NullRange,cl::NDRange(10),cl::NullRange);
simple_add(eargs, buffer_A,buffer_B,buffer_C).wait();
Which does not compile, I think the make_kernel needs template arguments.
I'm new to OpenCl, and didn't manage to fix the code.
My question is:
1. How should I modify the 3. code to compile?
2. Which way is better and why? 2. vs. 3.?

You can check the OpenCL C++ Bindings Specification for a detailed description of the cl::make_kernel API (in section 3.6.1), which includes an example of usage.
In your case, you could write something like this to create the kernel functor:
auto simple_add = cl::make_kernel<cl::Buffer&, cl::Buffer&, cl::Buffer&>(program, "simple_add");
Your second question is primarily opinion based, and so is difficult to answer. One could argue that the kernel functor approach is simpler, as it allows you to 'call' the kernel almost as if it were just a function and pass the arguments in a familiar manner. The alternative approach (option 2 in your question) is more explicit about setting arguments and enqueuing the kernel, but more closely represents how you would write the same code using the OpenCL C API. Which method you use is entirely down to personal preference.

Related

C++ link time resource "allocation" without defines

I'm currently working on a C++ class for an ESP32. I want to implement resource allocation of the resources like: IO-Pins, available RMT channels and so on.
My idea is to do this with some kind of resource handler which checks this at compile time, but I have no good idea nor did I find anything about something like this yet.
To clarify my problem lets have an example of what I mean.
Microcontroller X has IO pins 1-5, each of these can be used by exactly one component.
Components don't know anything from each other an take the pin they should use as a ctor argument.
Now I want to have a class/method/... that checks if the pin, a component needs, is already allocated at compile time.
CompA a(5); //works well: 5 is not in use
CompB b(3); //same as before, without the next line it should compile
CompC c(5); //Pin 5 is already in use: does not compile!
Im not sure yet how to do so. My best guess (as I can't use defines here: users should be able to use it only by giving a parameter or template argument) is, that it might work with a template function, but I did not find any way of checking which other parameters have been passed to a template method/class yet.
Edit1: Parts of the program may be either autogenerated or user defined in a manner, they do not know about other pin usages. The allocation thus is a "security" feature which should disallow erroneous code. This should also forbid it, if the register functions are in different code pathes (even if they might exclude each other)
Edit2: I got a response, that compile time is wrong here as components might be compiled separated from another. So the only way to do so seems like a linker error.
A silly C-style method: you could desperately use __COUNTER__ as the constructor's argument. This dynamic macro increases itself after each appearance, starting with 0.
I hope there's a better solution.

Can I read a SMT2 file into a solver through the z3 c++ interface?

I've got a problem where the z3 code embedded in a larger system isn't finding a solution to a certain set of constraints (added through the C++ interface) despite some fairly long timeouts. When I dump the constraints to a file (using the to_smt2() method on the solver, just before the call to check()), and run the file through the standalone z3 executable, it solves the system in about 4 seconds (returning sat). For what it's worth, the file is 476,587 lines long, so a fairly big set of constraints.
Is there a way I can read that file back into the embedded solver using the C++ interface, replacing the existing constraints, to see if the embedded version can solve starting from the exact same starting point as the standalone solver? (Essentially, how could I create a corresponding from_smt2(stream) method on the solver class?)
They should be the same set of constraints as now, of course, but maybe there's some ordering effect going on when they are read from the file, or maybe there are some subtle differences in the solver introduced when we embedded it, or something that didn't get written out with to_smt2(). So I'd like to try reading the file back, if I can, to narrow down the possible sources of the difference. Suggestions on what to look for while debugging the long-running version would also be helpful.
Further note: it looks like another user is having similar issues here. Unlike that user, my problem uses all bit-vectors, and the only unknown result is the one from the embedded code. Is there a way to invoke the (get-info :reason-unknown) from the C++ interface, as suggested there, to find out why the embedded version is having a problem?
You can use the method "solver::reason_unknown()" to retrieve explanations for search failure.
There are methods for parsing files and strings into a single expression.
In case of a set of assertions, the expression is a conjunction.
It is perhaps a good idea to add such a method directly to the solver class for convenience. It would be:
void from_smt2_string(char const* smt2benchmark) {
expr fml = ctx().parse_string(smt2benchmark);
add(fml);
}
So if you were to write it outside of the solver class you need to:
expr fml = solver.ctx().parse_string(smt2benchmark);
solver.add(fml);

Changing an OpenCV function Standard Parameters

Is there a way to permanently change the standard parameters in a OpenCV function?
For example, how can I modify the MSER Feature Detector so that I can call
MserFeatureDetector detector
instead of
MserFeatureDetector detector(10,50,1000)
I am not precisley well versed in the inner mechanisms of C++ libraries, but I imagine the actual program code has to be somewhere, right?
A bit of information on my actual problem:
I'm currently using MEXOpenCV to run OpenCV functions in MatLab, and some MEX-Functions lack (as far as I know) the option to pass input parameters and run with the defaults like this:
detector = cv.FeatureDetector('MSER'); % 'MSER' is the only parameter taken
I recon changing the standard parameters directly at the OpenCV programs would be a way to do it.
Any other ideas on how to solve the actual problem are welcome too!
I solved the actual problem by setting the parameters with the 'set' method of DescriptorExtractor like this
detector=cv.FeatureDetector('MSER'); detector.set('delta',10);

Insert text into C++ code between functions

I have following requirement:
Adding text at the entry and exit point of any function.
Not altering the source code, beside inserting from above (so no pre-processor or anything)
For example:
void fn(param-list)
{
ENTRY_TEXT (param-list)
//some code
EXIT_TEXT
}
But not only in such a simple case, it'd also run with pre-processor directives!
Example:
void fn(param-list)
#ifdef __WIN__
{
ENTRY_TEXT (param-list)
//some windows code
EXIT_TEXT
}
#else
{
ENTRY_TEXT (param-list)
//some any-os code
if (condition)
{
return; //should become EXIT_TEXT
}
EXIT_TEXT
}
So my question is: Is there a proper way doing this?
I already tried some work with parsers used by compilers but since they all rely on running a pre-processor before parsing, they are useless to me.
Also some of the token generating parser, which do not need a pre-processor are somewhat useless because they generate a memory-mapping of tokens, which then leads to a complete new source code, instead of just inserting the text.
One thing I am working on is to try it with FLEX (or JFlex), if this is a valid option, I would appreciate some input on it. ;-)
EDIT:
To clarify a little bit: The purpose is to allow something like a stack trace.
I want to trace every function call, and in order to follow the call-hierachy, I need to place a macro at the entry-point of a function and at the exit point of a function.
This builds a function-call trace. :-)
EDIT2: Compiler-specific options are not quite suitable since we have many different compilers to use, and many that are propably not well supported by any tools out there.
Unfortunately, your idea is not only impractical (C++ is complex to parse), it's also doomed to fail.
The main issue you have is that exceptions will bypass your EXIT_TEXT macro entirely.
You have several solutions.
As has been noted, the first solution would be to use a platform dependent way of computing the stack trace. It can be somewhat imprecise, especially because of inlining: ie, small functions being inlined in their callers, they do not appear in the stack trace as no function call was generated at assembly level. On the other hand, it's widely available, does not require any surgery of the code and does not affect performance.
A second solution would be to only introduce something on entry and use RAII to do the exit work. Much better than your scheme as it automatically deals with multiple returns and exceptions, it suffers from the same issue: how to perform the insertion automatically. For this you will probably want to operate at the AST level, and modify the AST to introduce your little gem. You could do it with Clang (look up the c++11 migration tool for examples of rewrites at large) or with gcc (using plugins).
Finally, you also have manual annotations. While it may seem underpowered (and a lot of work), I would highlight that you do not leave logging to a tool... I see 3 advantages to doing it manually: you can avoid introducing this overhead in performance sensitive parts, you can retain only a "summary" of big arguments and you can customize the summary based on what's interesting for the current function.
I would suggest using LLVM libraries & Clang to get started.
You could also leverage the C++ language to simplify your process. If you just insert a small object into the code that is constructed on function scope entrance & rely on the fact that it will be destroyed on exit. That should massively simplify recording the 'exit' of the function.
This does not really answer you question, however, for your initial need, you may use the backtrace() function from execinfo.h (if you are using GCC).
How to generate a stacktrace when my gcc C++ app crashes

In C++ is there any portable way of jumping to a computed offset?

I'm looking for a portable way of jumping to a computed offset in C++.
I know that GCC has a mechanism for doing this using goto as discussed here:
http://social.msdn.microsoft.com/forums/en-US/vclanguage/thread/ec7e52b5-0978-4123-9d29-9dc7d807c6b4
Sadly I don't think other compilers implement this.
Normally I wouldn't have reason to use goto in C++ but I found that it could be useful for optimizing an interpreted language (search for 'threaded interpreter' if you are interested in this).
I know that I can implement this using inline assembly language, but the problem then is I have to implement this for every platform the interpreter runs on.
So does anyone know if there is a portable way of doing this?
The solution might involve goto but I'm open to any other sort of hackery that you can think of ;)
UPDATE: Currently the interpreter uses a switch statement. I'm looking for techniques that improve on this and make the interpreter run faster. Specifically I'm trying to figure out a portable way of saying 'goto <next-byte-code-instruction>' where <next-byte-code-instruction> is a computed offset that can be stored in the byte code itself.
UPDATE: I found a related question here.
What opcode dispatch strategies are used in efficient interpreters?
switch allows to jump to predefined offsets
Array of function pointers/function objects
setjmp/longjmp
I think setjmp/longjmp is as close as you can get. Beyond that, the spec calls things like offsets in the instruction stream "implementation details", and you're stuck with platform-specific stuff like intrinsics and inline asm.
The other (really ugly) thing you could try is using a switch statement, which is typically implemented as a jump table of offsets. Ie,
int ip = 0;
top:
switch( ip )
{
case 0:
ip += do_whatever(); // returns an offset
goto top;
case 1:
ip += some_other_function();
goto top;
case 2:
ip += etc();
goto top;
// ad infinitum...
}
This is in the spirit of Bell's original article, and the gist is that the body of each case is a single VM "opcode" in the stream. But that seems really icky.
Aside from goto's and longjumping and other nonportable tricks, could you consider virtual functions or at least ponters to functions? You can have several singleton objects of types derived from single base class. Array of pointers has all those objects. Virtual function returns index of the next interpreter state. Base object holds all data ever needed by any derived object.
EDIT: Pointers to functions could be a little faster even, but a little messier too. There is a Guru of the Week article that explains it.
It looks like it is not possible to implement this in a portable way.
(Although I still welcome alternative answers!)
I found this blog post from someone who has already tried what I wanted to do. He uses the GCC approach and got a 33% speed improvement (stats are at the end of the post).
The solution is conditionally compiled under Win32 to use inline assembly to compute the address of the labels. But he reports that using inline assembly in this way is 3 times slower than normal! Ouch
http://abepralle.wordpress.com/2009/01/25/how-not-to-make-a-virtual-machine-label-based-threading/
Oh well, I didn't want to use inline assembly anyway.
I seriously wouldn't use any form of goto. It died a long time ago. If you must, however, perhaps you should just write a macro for this, and use a bunch of ifdefs to make it portable. This shouldn't be too hard, and it is the fastest solution.