I have a program that was originally being executed sequentially and now I'm trying to parallelize it via OpenMP Offloading. The thing is that when I use the update clause, depending on the case, if I include the size of the array I want to move it returns an incorrect result, but other times it works. For example, this pragma:
#pragma omp target update from(image[:bands])
Is not the same as:
#pragma omp target update from(image)
What I want to do is move the whole thing. Suppose the variable was originally declared in the host as follows:
double* image = (double*)malloc(bands*sizeof(double));
And that these update pragmas are being called inside a target data region where the variable image has been mapped like this:
#pragma omp target data map(to: image[:bands]) {
// the code
}
I want to move it to the host to do some work that cannot be done in the device. Note: The same thing may happen with the "to" update pragmas, not only the "from".
Well I don't know why anyone from OpenMP answered this question, as the answer was pretty simple (I say this because they don't have a forum anymore and this is supposed to be the best place to ask questions about OpenMP...). If you want to copy data dynamically allocated using pointers you have to use the omp_target_memcpy() function.
I would really appreciate your inputs on moving from a YieldTermStructure pointer to that of adding a spread as below::
boost::shared_ptr<YieldTermStructure> depoFutSwapTermStructure(new PiecewiseYieldCurve<Discount,
LogLinear>(settlementDate, depoFutSwapInstruments_New, termStructureDayCounter, 1.0e-15));
I tried adding a spread of 50 bps as below...
double OC_Spread(0.50 / 100);
Rate OCSQuote = OC_Spread;
boost::shared_ptr<Quote> OCS_Handler(new SimpleQuote(OCSQuote));
I then proceed to create a zerospreaded object as below:
ZeroSpreadedTermStructure Z_Spread(Handle<YieldTermStructure>(*depoFutSwapTermStructure), Handle<Quote>(OCS_Handler));
But now I am stuck as the code repeatedly breaks down if I go on ahead to do anything like
Z_Spread.zeroYieldImpl;
What is the issue with above code. I have tried several flavors of above approach and failed on all the fronts.
Also is there a native way of calling directly the discount function just like as I do now with the TermStructure object prior to adding the spread currently as below???
depoFutSwapTermStructure->discount(*it)
I'm afraid you got your interfaces a bit mixed up. The zeroYieldImpl method you're trying to call on your ZeroSpreadedTermStructure is protected, so you can't use it from your code (at least, that's how I'm guessing your code breaks, since you're not reporting the error you get).
The way you interact with the curve you created is through the public YieldTermStructure interface that it inherits; that includes the discount method that you want to call, as well as methods such as zeroRate or forwardRate.
Again, it's hard to say why your call to discount fails precisely, since you're not quoting the error and you're not saying what *it is in the call. From the initialization you do report, and from the call you wrote, I'm guessing that you might have instantiated a ZeroSpreadedTermStructure object but you're trying to use it with the -> syntax as if it were a pointer. If that's the case, calling Z_Spread.discount(*it) should work instead (assuming *it resolves to a number).
If that's not the problem, I'm afraid you'll have to add a few more details to your question.
Finally, for a more general treatment of term structures in QuantLib, you can read here and here.
I would like to build a parallel version of R sample() function using Rcpp parallel but i lack some C++ experience and time.
I thought about starting from the Matrix transform example#rcpp parallel site
and using RcppArmadillo:sample function instead of std:transform.
Two questions:
1/Is it possible? (i.e thread safe)
2/I don't fully grasp the operator part in the example and how to change to use another function (the begin and end usage is confusing to me).
Thank you
I seem to have an odd problem that every time i try to increment an integer that tracks the outgoing networking requests(the response requests will match that int so we can pair up response data). Well every time I try to increment the console will "block" and freeze at the incremntation? Is there any reason why it might do this? Its just a normal tracker_id += 1 code shouldn't be blocking and im usually am never noobish at these things.
Sometimes you may get the impression that the debugger is on one line while indeed the code is stopped at the istruction before or after.
If tracker_id is a simple variable (e.g. int, long) and not a class instance then there is no way that tracker_id += 1 is blocking. It's just impossible.
Note also that the compilers are becoming more and more liberal on how they translate source code to machine code, so be sure to compile with all optimizations disabled if you want to be able to track source code and variables correctly.
I had to classes in my main class, the first class is a simple networking class i created to easily call from an API (the Bitcoin JSON-RPC api so i could call just coin_server->getbalance()) the issue was that both classes were located in the main class and apparently the bitcoin class would be destroyed before it was set inside the game server class. There for it would explain why when i tried to call coin apis functions it would crash.
Having used gprof and callgrind many times, I have reached the (obvious) conclusion that I cannot use them efficiently when dealing with large (as in a CAD program that loads a whole car) programs. I was thinking that maybe, I could use some C/C++ MACRO magic and somehow build a simple (but nice) logging mechanism. For example, one can call a function using the following macro:
#define CALL_FUN(fun_name, ...) \
fun_name (__VA_ARGS__);
We could add some clocking/timing stuff before and after the function call, so that every function called with CALL_FUN gets timed, e.g
#define CALL_FUN(fun_name, ...) \
time_t(&t0); \
fun_name (__VA_ARGS__); \
time_t(&t1);
The variables t0, t1 could be found in a global logging object. That logging object can also hold the calling graph for each function called through CALL_FUN. Afterwards, that object can be written in a (specifically formatted) file, and be parsed from some other program.
So here comes my (first) question: Do you find this approach tractable ? If yes, how can it be enhanced, and if not, can you propose a better way to measure time and log callgraphs ?
A collegue proposed another approach to deal with this problem, which is annotating with a specific comment each function (that we care to log). Then, during the make process, a special preprocessor must be run, parse each source file, add logging logic for each function we care to log, create a new source file with the newly added (parsing) code, and build that code instead. I guess that reading CALL_FUN... macros (my proposal) all over the place is not the best approach, and his approach would solve this problem. So what is your opinion about this approach?
PS: I am not well versed in the pitfalls of C/C++ MACROs, so if this can be developed using another approach, please say it so.
Thank you.
Well you could do some C++ magic to embed a logging object. something like
class CDebug
{
CDebug() { ... log somehow ... }
~CDebug() { ... log somehow ... }
};
in your functions then you simply write
void foo()
{
CDebug dbg;
...
you could add some debug info
dbg.heythishappened()
...
} // not dtor is called or if function is interrupted called from elsewhere.
I am bit late, but here is what I am doing for this:
On Windows there is a /Gh compiler switch which makes the compiler to insert a hidden _penter function at the start of each function. There is also a switch for getting a _pexit call at the end of each function.
You can utilizes this to get callbacks on each function call. Here is an article with more details and sample source code:
http://www.johnpanzer.com/aci_cuj/index.html
I am using this approach in my custom logging system for storing the last few thousand function calls in a ring buffer. This turned out to be useful for crash debugging (in combination with MiniDumps).
Some notes on this:
The performance impact very much depends on your callback code. You need to keep it as simple as possible.
You just need to store the function address and module base address in the log file. You can then later use the Debug Interface Access SDK to get the function name from the address (via the PDB file).
All this works suprisingly well for me.
Many nice industrial libraries have functions' declarations and definitions wrapped into void macros, just in case. If your code is already like that -- go ahead and debug your performance problems with some simple asynchronous trace logger. If no -- post-insertion of such macros can be an unacceptably time-consuming.
I can understand the pain of running an 1Mx1M matrix solver under valgrind, so I would suggest starting with so called "Monte Carlo profiling method" -- start the process and in parallel run pstack repeatedly, say each second. As a result you will have N stack dumps (N can be quite significant). Then, the mathematical approach would be to count relative frequencies of each stack and make a conclusion about the ones most frequent. In practice you either immediately see the bottleneck or, if no, you switch to bisection, gprof, and finally to valgrind's toolset.
Let me assume the reason you are doing this is you want to locate any performance problems (bottlenecks) so you can fix them to get higher performance.
As opposed to measuring speed or getting coverage info.
It seems you're thinking the way to do this is to log the history of function calls and measure how long each call takes.
There's a different approach.
It's based on the idea that mainly the program walks a big call tree.
If time is being wasted it is because the call tree is more bushy than necessary,
and during the time that's being wasted, the code that's doing the wasting is visible on the stack.
It can be terminal instructions, but more likely function calls, at almost any level of the stack.
Simply pausing the program under a debugger a few times will eventually display it.
Anything you see it doing, on more than one stack sample, if you can improve it, will speed up the program.
It works whether or not the time is being spent in CPU, I/O or anything else that consumes wall clock time.
What it doesn't show you is tons of stuff you don't need to know.
The only way it can not show you bottlenecks is if they are very small,
in which case the code is pretty near optimal.
Here's more of an explanation.
Although I think it will be hard to do anything better than gprof, you can create a special class LOG for instance and instantiate it in the beginning of each function you want to log.
class LOG {
LOG(const char* ...) {
// log time_t of the beginning of the call
}
~LOG(const char* ...) {
// calculate the total time spent,
//by difference between current time and that saved in the constructor
}
};
void somefunction() {
LOG log(__FUNCTION__, __FILE__, ...);
.. do other things
}
Now you can integrate this approach with the preprocessing one you mentioned. Just add something like this in the beginning of each function you want to log:
// ### LOG
and then you replace the string automatically in debug builds (shoudn't be hard).
May be you should use a profiler. AQTime is a relatively good one for Visual Studio. (If you have VS2010 Ultimate, you already have a profiler.)