Can one overload a RenderScript kernel? - overloading

I would like to overload a RenderScript kernel:
/* */
uchar4 __attribute__((kernel, overloadable)) root (uchar4 in) {
return in;
float4 __attribute__((kernel, overloadable)) root (float4 in) {
return in;
However, this generates identically-named Java methods:
public void forEach_root(Allocation ain, Allocation aout, Script.LaunchOptions sc) {
// check ain
if (!ain.getType().getElement().isCompatible(__U8_4)) {
throw new RSRuntimeException("Type mismatch with U8_4!");
public void forEach_root(Allocation ain, Allocation aout, Script.LaunchOptions sc) {
// check ain
if (!ain.getType().getElement().isCompatible(__F32_4)) {
throw new RSRuntimeException("Type mismatch with F32_4!");
Is there a way to write the kernels so that overloading works? The usage I expected was:
mInAllocation = Allocation.createFromBitmap(mRS, ...
mOutAllocation = Allocation.createTyped(mRS, mInAllocation.getType());
mScript = new ScriptC_donothing(mRS);
mScript.forEach_root(mInAllocation, mOutAllocation);
// calls uchar4 kernel

There's no way to overload kernel names right now. We're investing some ways to associate more type information with an allocation in the future, though; we'll keep this use case in mind.


storing, parsing through and executing stored member functions

I'm currently working on some code and need to make something like an event handler that I can register explicit events and store them into a vector that I can loop through in my main listen() function. I'm missing something about pointers that I cant pinpoint with docs and a google search and need help figuring out why my compilers asking for a pointer to a member.
I've tried creating a typedef with a member function definition but I have the problem of it not compiling currently with a "non standard syntax, put an & to create a pointer to a member." error.
class Obj {
typedef int (Obj::*Event) (std::vector<std::string> in);
std::vector<Event> events;
int exampleEvent(std::vector<std::string> input);
int regEvent(Event ev);
int listen();
example event code
int Obj::exampleEvent(std::vector<std::string> input)
// heres my app logic
return 0;
register events in constructor
regEvent(exampleEvent); // exampleEvent: non-standard syntax; use
//'&' to create a pointer to member
listen, and add event to vector.
int Obj::regEvent(Event ev)
return 0;
// listen for command input
int Obj::listen()
// get input
string str;
getline(cin, str);
vector<string> input = split(str, " ");
// loop through events
for (auto ev : events)
ev(input); // <-- Term does not evaluate to function taking 1 arg.
return 0;
regEvent(exampleEvent); // exampleEvent: non-standard syntax; use
//'&' to create a pointer to member
must be
for (auto ev : events)
ev(input); // <-- Term does not evaluate to function taking 1 arg.
must be
for (auto ev : events)
Exxpected syntax is:

Deleting an element in a vector by value

How can I retrieve an object from the Flight to be compared to the input (flightNumber) in the main? How do I declare the attributes type in the main?
When I compile, a error message is displayed: invalid conversion of 'int' to '*Flight*' at agent1.delete(flightNumber);.
class Flight
int FlightNumber
class TravelAgent
vector <Flight *> flightList;
void Agent::delete(Flight *obj)
vector<Flight*>::iterator ptr;
if ((ptr) == flightList.end())
cout<<"Flight not found"<<endl;
int main
Agent agent1;
int flightNumber;
cout<<"Enter the number of the flight: "<<flush;
You can add(if not present) a getter in Flight class
class Flight{
int FlightNumber;
int getflightNumber(){ return flightNumber;}
and go as following:-
void Agent::delete(int flightNumber)
vector<Flight*>::iterator ptr;
if(((*Ptr)->getflightNumber() == flightNumber)
if ((fPtr) == listFlight.end())
cout<<"Flight not found"<<endl;
Since the code here isn't fully functional, it's hard to give you good advice.
First, your error happens because you call (what seems to be) the member function, void Agent::delete(Flight *obj), with a variable of type int instead of type Flight. The compiler is not able to interpret your Flight object as an int, so it throws an error.
Secondly, you want to know how to retrieve attributes from an object. I will advise you to have a look to accessors and mutators.
If you want to retrieve information hold in your Flight object, you should expose member functions allowing that.
// in your header file
class Flight
int flight_number;
// retrieve flight number value
int get_flight_number(void) const;
// allow to set the flight number value
void set_flight_number(int new_flight_number);
// some other member functions
// in your source file
int Flight::get_flight_number(void) const
return this->flight_number;
void Flight::set_flight_number(int new_flight_number)
// let's do some verification (do whatever you want)
if (new_flight_number > 0)
this->flight_number = new_flight_number;
This way you will be able to set and access your flight_number by writing, for example :
void test_function(Flight *f)
if (f->get_flight_number() == 42)
// do some stuff
int main()
Flight *my_f = new Flight();
my_f->set_flight_number = 4242;
Now, you have enough information to get going.
You heavily use pointers. Modern C++ strongly tends to not! Try to use references or move operation. You can consult this pages for info:
cpp-reference - references
cpp-reference - move semantics
It's a bit hardcore for beginner though. The web is full of great article. about it
You original error is in your main method. You need to change it so that instead of passing the flight number to your delete method, you create an instance of your Flight class.
int main() { // you are also missing parenthesis
Agent agent1;
int flightNumber;
cout<<"Enter the number of the flight: "<<flush; // I don't know what flush is but maybe you meant std::endl
Flight flight(flightNumber);
agent1.delete(&flight); // delete takes a Flight* not an int
This requires that your Flight class have an appropriate constructor.
class Flight
Flight(int flightNumber)
: flightNumber_(flightNumber)
int flightNumber_;
Then in your delete method you search your vector for the Flight instance that has the same flightNumber_ as the Flight you want to remove from your vector. This will require your Flight class to have some way of returning it's flightNumber_ member variable.
This is definitely NOT the best way to do this and is far from being in accordance with modern C++ standards but it should get you going.

Using delete in a std::deque image buffer (OFX Plug-in)

I'm trying to program a video buffer in a std::deque in my OFX video plug-in. I would like to access previously processed images in order to process the current image. My idea is to push processed images to the front of the deque and pop them from the back if the buffer exceeds the maximum size.
The plug-in crashes when I try to free the memory of an image using delete before removing it from the buffer. I found out that I can add one or several images to the buffer and delete and remove them immediately afterwards with no problem. However, if I try to delete an image which has been added in an earlier cycle, it crashes.
The plug-in consists of the main class OFXPlugin and the processor class myplugin. The instance of OFXPlugin stays over time, but for every image to be processed it creates an instance of myplugin and destroys it after processing that frame.
I'm not sure if I'm doing something wrong in the way I use the deque, if I'm not allowed to free memory which has been allocated by another instance of myplugin or if I'm doing something illegal related to the OFX API.
The Code below shows the extracts of the plug-in related to the problem. It's based on the OFX Support examples. It crashes at delete videoBuffer_.back().img; in the function OFXPlugin::addToVBuff(OFX::Image *img, double t). I cannot catch an exception, apparently it is handled (ignored) in the OFX API.
Thanks a lot for your help!
#include "ofxsImageEffect.h"
#include "ofxsMultiThread.h"
#include "../Support/Plugins/include/ofxsProcessing.H"
#include <deque>
// Video Buffer Element
typedef struct vBuffEl
OFX::Image* img;
double time;
} vBuffEl;
bool operator==(const vBuffEl &a, const double b)
return a.time == b;
class myplugin : public OFX::ImageProcessor {
protected :
OFX::Image *_srcImg;
double _time;
OFXPlugin *_opInstance;
public :
// ctor
myplugin(OFX::ImageEffect &instance)
: OFX::ImageProcessor(instance)
, _srcImg(0)
, _time(0)
void multiThreadProcessImages(OfxRectI procWindow);
void setOFXPlugin(OFXPlugin* opInstance) {_opInstance = opInstance;}
OFXPlugin* getOFXPlugin() {return _opInstance;}
void setTime(double argsTime) {_time = argsTime;}
double getTime() {return _time;}
void setSrcImg(OFX::Image *v) {_srcImg = v;}
OFX::Image* getSrcImg() {return _srcImg;}
class OFXPlugin : public OFX::ImageEffect {
protected :
OFX::Clip *dstClip_;
OFX::Clip *srcClip_;
double time_;
std::deque<vBuffEl> videoBuffer_;
public :
/** #brief ctor */
OFXPlugin(OfxImageEffectHandle handle);
/** #brief dtor */
/* Override the render */
virtual void render(const OFX::RenderArguments &args);
/* get the source Clip */
OFX::Clip* getSrcClip();
/* get the current time */
double getTime();
/* set up and run a processor */
void setupAndProcess(myplugin &, const OFX::RenderArguments &args);
/* add to video buffer */
void addToVBuff(OFX::Image *img, double t);
/* fetch a dst image from buffer */
void fetchDstImageBuff(double t, OFX::Image* &img, bool &buff);
#include "myplugin.h"
#include <algorithm>
void myplugin::multiThreadProcessImages(OfxRectI procWindow)
// Do some filtering of the source image and store result in destination image
myfiltering(_dstImg, _srcImg, procWindow);
// add to buffer
_opInstance->addToVBuff(_dstImg, _time);
/* set up and run a processor */
OFXPlugin::setupAndProcess(myplugin &processor, const OFX::RenderArguments &args)
// get a dst image
std::auto_ptr<OFX::Image> dst(dstClip_->fetchImage(args.time));
OFX::BitDepthEnum dstBitDepth = dst->getPixelDepth();
OFX::PixelComponentEnum dstComponents = dst->getPixelComponents();
// fetch main input image
std::auto_ptr<OFX::Image> src(srcClip_->fetchImage(args.time));
// set the images
// set the render window
// set time
time_ = args.time;
// set OFXPlugin instance
// Call the base class process member, this will call the derived templated process code
OFX::Clip* OFXPlugin::getSrcClip()
return srcClip_;
/* get the current time */
return time_;
// the overridden render function
OFXPlugin::render(const OFX::RenderArguments &args)
try {
myplugin fred(*this);
setupAndProcess(fred, args);
} catch (...) {
outputMessage("ERROR: An unknown error happened!");
/* add to video buffer */
OFXPlugin::addToVBuff(OFX::Image *img, double t)
try {
// if frame already exists in buffer, remove
std::deque<vBuffEl>::iterator it;
it = find(videoBuffer_.begin(), videoBuffer_.end(), t);
if(it != videoBuffer_.end())
delete it->img;
// add new frame to the front
vBuffEl e;
e.time = t;
e.img = new OFX::Image(img->getPropertySet().propSetHandle());
memcpy(e.img, img, sizeof(img));
// remove elements at the end, if the buffer exceeds the max size
while(videoBuffer_.size() > LASTIMG_ARRAY_SIZE)
delete videoBuffer_.back().img;
} catch (...) {
outputMessage("ERROR: An unknown error happened!");
/* fetch a dst image from buffer */
OFXPlugin::fetchDstImageBuff(double t, OFX::Image* &img, bool &buff)
try {
std::deque<vBuffEl>::iterator it;
it = find(videoBuffer_.begin(), videoBuffer_.end(), t);
if(it != videoBuffer_.end())
img = it->img; // return buffered dst image
buff = true;
img = getSrcClip()->fetchImage(t); // fetch and return src image
buff = false;
} catch (...) {
outputMessage("ERROR: An unknown error happened!");
The statement
memcpy(e.img, img, sizeof(img));
doesn't do what you expect it to.
The sizeof operation of a pointer returns the size of the pointer, not what it points to. This means that in this case, you are only copying 4 or 8 bytes (depending on if you are on a 32 or 64 bit platform).
However, there is another worse problem hidden in that memcpy call. If the OFX::Image contains data member pointers, copying the data will copy the pointers and not the data. It's a shallow copy, not a deep copy. This is a reason C++ has copy constructors and copy assignment operators.
What you should to is a simple assignment, and hope that OFX::Image follows the rule of three:
*e.img = *img;

C++ Inheritance and how to pass and maintain subclass data through a superclass

Alright, wasn't quite sure how to word the question and couldn't find any duplicates that I think really address this situation.
Essentially I have a super class that gets extra data appended to it through a subclass. The container class for this data recognizes only the super class and adjust characteristics based on an id parameter in the super class.
I've actually never had to used inheritance in c++ till recently so forgive me if this is trivial. I'm under the impression that when I go to hard copy a bunch of data using the superclass, the subclass data is loss in translation so to speak. In order to bypass this limitation I'm trying to use a typecast-ed pointer however I now get a segmentation fault when trying to free the memory even when typecasting the pointer parameter in the free() function.
Here is the sample code...
// Super class
struct Vertex {
__declspec(align(4)) unsigned int vType; // Identifies the vertex type.
Vertex(const unsigned int _vType) : vType(_vType) { }
Vertex(const Vertex &_rV) : vType(_rV.vType) { } // Copy constructor
virtual ~Vertex() { }
unsigned int GetVType() const { return vType; }
// Subclass
// Id = 1
struct V_Pos : Vertex {
__declspec(align(4)) XMFLOAT3 position;
V_Pos(void) : Vertex(1) { }
V_Pos(XMFLOAT3 &_rPosition) : Vertex(1), position(_rPosition) { }
V_Pos(const V_Pos &_rV) : Vertex(_rV), position(_rV.GetPosition()) { } // Copy constructor
~V_Pos() { }
XMFLOAT3 GetPosition() const { return position; }
Here is how I'm currently copying the data.
// pBuffer is declared as a Vertex* data type
pBuffer = new V_Pos[_bufSize];
if (_pVBuffer->GetVType() == 1)
for (unsigned int i = 0; i < bufSize; ++i) {
V_Pos *_temp = (V_Pos*)&_pVBuffer[i];
pBuffer[i] = *_temp;
Here is how I am currently de-allocating the data.
if (pBuffer != 0) {
delete [] pBuffer;
pBuffer = 0;
What is the correct approach for this situation?
Edit 1 -
Updated the above code blocks to clarify the comment discussion under knulp's answer.
If you start mixing low level memory allocation with malloc()/free(), and C++ objects, you will run into a lot of troubles, while making your code almost unreadable.
You should create a new object with new on a proper constructor, which automatically 1) allocates memory and 2) initializes the struct. To properly free the memory you should use delete and the destructor.
You should copy using a copy constructor and an assignment operator. If you do not define them, the default ones are automatically defined by the compiler to perform a bitwise copy.
Why are you using a type field? C++ has a very strong typing features, so it makes very little sense to bypass all C++ mechanism to define a vType. Rather, define a base class, and two or more derived classes from there, and just eliminate the vtype field.
If you use clean OO programming, you will avoid all these problems from the start.
Your base class needs to have a virtual destructor. This will allow you to safely delete a derived class with a base class pointer.
Not that! Use a copy constructor.
// Super class
struct Vertex {
__declspec(align(4)) unsigned int vType; // Identifies the vertex type.
Vertex(const unsigned int _vType) : vType(_vType) { }
unsigned int GetVType() const { return vType; }
Vertex(const Vertex& v) : vType(v.vType) {}
// Subclass
// Id = 1
struct V_Pos : Vertex {
__declspec(align(4)) XMFLOAT3 position;
V_Pos(void) : Vertex(1) { }
V_Pos(XMFLOAT3 &_rPosition) : Vertex(1), position(_rPosition) { }
V_Pos(const V_Pos& v) : Vertex(v) {
position[0] = v.position[0];
position[1] = v.position[1];
position[2] = v.position[2];
Better yet, use a vecotr instead of XMFLOAT3;
Create a copy on the heap:
V_Pos original(...);
V_Pos * copyPtr = new V_Pos(original);

Are C++ exceptions sufficient to implement thread-local storage?

I was commenting on an answer that thread-local storage is nice and recalled another informative discussion about exceptions where I supposed
The only special thing about the
execution environment within the throw
block is that the exception object is
referenced by rethrow.
Putting two and two together, wouldn't executing an entire thread inside a function-catch-block of its main function imbue it with thread-local storage?
It seems to work fine, albeit slowly. Is this novel or well-characterized? Is there another way of solving the problem? Was my initial premise correct? What kind of overhead does get_thread incur on your platform? What's the potential for optimization?
#include <iostream>
#include <pthread.h>
using namespace std;
struct thlocal {
string name;
thlocal( string const &n ) : name(n) {}
struct thread_exception_base {
thlocal &th;
thread_exception_base( thlocal &in_th ) : th( in_th ) {}
thread_exception_base( thread_exception_base const &in ) : th( ) {}
thlocal &get_thread() throw() {
try {
} catch( thread_exception_base &local ) {
void print_thread() {
cerr << get_thread().name << endl;
void *kid( void *local_v ) try {
thlocal &local = * static_cast< thlocal * >( local_v );
throw thread_exception_base( local );
} catch( thread_exception_base & ) {
return NULL;
int main() {
thlocal local( "main" );
try {
throw thread_exception_base( local );
} catch( thread_exception_base & ) {
pthread_t th;
thlocal kid_local( "kid" );
pthread_create( &th, NULL, &kid, &kid_local );
pthread_join( th, NULL );
return 0;
This does require defining new exception classes derived from thread_exception_base, initializing the base with get_thread(), but altogether this doesn't feel like an unproductive insomnia-ridden Sunday morning…
EDIT: Looks like GCC makes three calls to pthread_getspecific in get_thread. EDIT: and a lot of nasty introspection into the stack, environment, and executable format to find the catch block I missed on the first walkthrough. This looks highly platform-dependent, as GCC is calling some libunwind from the OS. Overhead on the order of 4000 cycles. I suppose it also has to traverse the class hierarchy but that can be kept under control.
In the playful spirit of the question, I offer this horrifying nightmare creation:
class tls
void push(void *ptr)
// allocate a string to store the hex ptr
// and the hex of its own address
char *str = new char[100];
sprintf(str, " |%x|%x", ptr, str);
strtok(str, "|");
template <class Ptr>
Ptr *next()
// retrieve the next pointer token
return reinterpret_cast<Ptr *>(strtoul(strtok(0, "|"), 0, 16));
void *pop()
// retrieve (and forget) a previously stored pointer
void *ptr = next<void>();
delete[] next<char>();
return ptr;
// private constructor/destructor
tls() { push(0); }
~tls() { pop(); }
static tls &singleton()
static tls i;
return i;
void *set(void *ptr)
void *old = pop();
return old;
void *get()
// forget and restore on each access
void *ptr = pop();
return ptr;
Taking advantage of the fact that according to the C++ standard, strtok stashes its first argument so that subsequent calls can pass 0 to retrieve further tokens from the same string, so therefore in a thread-aware implementation it must be using TLS.
example *e = new example;
example *e2 = reinterpret_cast<example *>(tls::singleton().get());
So as long as strtok is not used in the intended way anywhere else in the program, we have another spare TLS slot.
I think you're onto something here. This might even be a portable way to get data into callbacks that don't accept a user "state" variable, as you've mentioned, even apart from any explicit use of threads.
So it sounds like you've answered the question in your subject: YES.
void *kid( void *local_v ) try {
thlocal &local = * static_cast< thlocal * >( local_v );
throw local;
} catch( thlocal & ) {
return NULL;
void *kid (void *local_v ) { print_thread(local_v); }
I might be missing something here, but it's not a thread local storage, just unnecessarily complicated argument passing. Argument is different for each thread only because it is passed to pthread_create, not because of any exception juggling.
It turned out that I indeed was missing that GCC is producing actual thread local storage calls in this example. It actually makes the issue interesting. I'm still not quite sure whether it is a case for other compilers, and how is it different from calling thread storage directly.
I still stand by my general argument that the same data can be accessed in a more simple and straight-forward way, be it arguments, stack walking or thread local storage.
Accessing data on the current function call stack is always thread safe. That's why your code is thread safe, not because of the clever use of exceptions. Thread local storage allows us to store per-thread data and reference it outside of the immediate call stack.