Optimal lag of Johansen cointegration test in Eviews - eviews

If I choose the optimal lag of VAR from lag length criteria function in Eviews. When I conduct Johansen Cointegration test, as Eviews tells that the lag in Johansen is for differened terms, so the lag that I need to specify is the optimal lag from VAR minus one.
Is this right?
There are some videos posted on Youtube saying that to perform Johansen Test, we can use the optimal lag from VAR. However, there are some saying it should be optimal lag minus one.
Which one is right?

You can use the optimal lag from VAR. Eviews reports VAR lag order criteria which include zero lag. Therefore, zero minus one can not be accepted as optimum lag. Regardless of its theoretical framework, it is not possible (technically) use optimum lag minus one.

Related

Up to how much extent redundant conditions inside IIF affects the overall performance of code?

Let me explain the situation first.
We have an IIF condition that looks like this:
IIF (IN( PRCSS_TYP,'X#') AND IN( YR_TYP,'X4','X5','X6'), 'YES', 'NO')
Now here it is a known fact that if YR_TYP is any of X4, X5 or X6, PRCSS_TYP will always be X#. Hence, the first part of the condition {IN( PRCSS_TYP, 'X#')} becomes unnecessary.
Obviously, in any programming language this type of programming practise is not encouraged. But here my question is specific to Informatica. To how much extent this affects the performance of this code snippet in informatica? Is it negligible? Or it is huge? Also is there any way to measure how much performance degradation this can make?
There will not be a noticeable performance issue if you consider only this redundancy. For large data volume, it might impact slightly.
This is because the number of operations increases linearly with the volume of data being processed. There are serious performance issues, when running time/number of operations increases as a quadratic/cubic or exponential function of the input data volume.
Also, there are usually more source and target bottlenecks in a typical Informatica process for you to notice any lag in the expression transformation. Informatica uses separate threads for reading, writing and transformation. If your transformation thread is sitting idle most of the time, redundant code in transformations will not have any impact on the total running time. You can check the session log, how much time the transformation thread is busy. If any transformation is taking longer time, Informatica will include the running time (%) of that transformation separately in the session log.
However, as #Maciejg suggested, you should always try to avoid redundant code, as multiple such glitches can build up and impact the performance significantly.

Spike Filtering in Real time C++

I'm trying to implement a spike filter on some torque that I'm reading in from an SEA in real time. As of now, we're using a moving average to replace the spike values that cross a certain threshold. (We're getting spikes b/c the actuator sometimes messes up and gives a sudden spike).
I am trying to figure out a better, more accurate way to filter the spikes, so that it more accurately predicts what the torque would have been instead of the spike.
BTW, this is a c++ program.
Thanks!
If your torque isn't changing very fast the easiest way to filter spikes is so-called "slew rate limiter". The operation is trivial and can easily be implemented in any language. You need to store last good value. When you get a reading compare it with the last and if it's larger then increment last one, if it's smaller then decrement the last one.

Caching in a high-performance financial application

I am writing an application whose purpose is to optimize a trading strategy. For the sake of simplicity, assume only that we have a trading strategy that says "enter here", then another that says "exit here if in a trade" and then lets have two models: one says how much risk we should take (how much we lose if we're on the wrong side of the market) and the other says how much profit we should take (i.e. how much profit we will take if the market agrees).
For simplicity sake, I will refer to historical realized trades as ticks. That means if I "enter on tick 28" this means I would have entered a trade in the time of 28th trade in my dataset at the price of this trade. Ticks are stored chronologically in my dataset.
Now, imagine the entry strategy on the whole dataset comes up with 500 entries. For each entry, I can precalculate the exact entry tick. I can also calculate the exit points determined by the exit strategy for each entry point (again as tick numbers). For each entry, I can also precalculate the modeled loss and profit and the ticks where these losses or profits would have been hit. The last thing that remains to be done is calculating what would have happenned first, i.e. exit on strategy, exit on a loss or exit on a profit.
Hence, I iterate through the array of trades and calculate exitTick[i] = min(exitTickByStrat[i], exitTickByLoss[i], exitTickByProfit[i]). And the whole process is bloody slow (let's say I do this 100M times). I suspect cache misses are the main culprit. And the question is: can this be made faster somehow? I have to iterate through 4 arrays of some non-trivial length. One suggestion I have come up with would be to group data in tuples of four, i.e. have one array of structures like (entryTick, exitOnStrat, exitOnLoss, exitOnProfit). This might be faster due to better cache predictability, but I cannot say for sure. Why I haven't tested it so far is that instrumenting profilers somehow don't work for release binaries of my app while sampling profilers seem to me to be unreliable (I have tried Intel's profiler).
So the final questions are: can this problem be made faster? What is the best profiler to use for mem profiling with release binaries? I work on Win7, VS2010.
Edit:
Many thanks to all. I tried to simplify my original question as much as possible, hence the confusion. Just to make sure it's readable - target means an envisaged/realized profit, stop means an envisaged/realized loss.
The optimizer is a brute-force one. So, i have some strat settings (e.g. indicator periods, whatever), then min/max breakEvenAfter/breakEvenBy and then formulas to give you stop/target values in ticks. These formulas are also objects of optimization. Hence, I have a structure of optimization like
for each in params
{
calculateEntries()
for each in beSettings
{
precalculateBeData()
for each in targetFormulaSettings
{
precalculateTargetsAndRespectiveExitTicks
for each in stopFormulaSettings
{
precalulcateStopsAndRespectiveExitsTicks
evaluateExitsAndDetermineImprovement()
}
}
}
}
So I precalculate stuff as much as possible and only calculate something when I need it. And out of 30 seconds, the calculation spends 25 seconds in the evaluateExitsAndDetermineImprovement() function which does just what I described in the original question, i.e. picks min(exitOnPattern, exitOnStop, exitOnTarget). The reason why I need to call the function 100M times is because I have 100M combinations of all params combined. But within the last for cycle only the exitOnStops array changes. I can post some code if that helps. Im grateful for all the comments!
I don't know much about trading strategies, but i usually do some optimisation.
Well, there are many optimisation methods.
Like, type of container, using a different min function(i think boost has a somewhat faster function than in stl library), try reducing same calculations,etc.
Also you can optimise by using faster functions to gain speed, or by redesinging your algorithm.
For profiling I use GlowCode under Win7 x64, and it's ok for release builds too.
Maybe I misunderstand your system completely, but:
what is it that you "pre-calculate" and when and WHY 100M times???
I don't know if it will help you but it may simplify your system significantly - there are 2 common trading strategies: (descriptions are my and not official)
1) "fixed point exit" - when the trade happens all exit points are calculated once and they are checked against market conditions/price periodically.
2) "variable point exit" - when the market moves the exit points are recalculated (usually to lock in more profit/reduce loss).
In case 1) the actual calculation happens only once so it should be VERY fast
In case 2) the calculations will happen every time, but it can be optimised in many different ways - one of them being that you may store your trades indexed by exit points and only get and re-calculate those close to the actual market situation.
I am not sure which cache misses you are referring to? You data cache? CPU cache?
So, after some work, I understood the advice by Alexandre C. When I ran cache-miss profiling, I found that out of 15M calls of the evaluateExits() function I have only 30K cache misses hence the performance of this function cannot be hindered by cache. Hence, I had to "start believing" that VTune is actually producing valid results, albeit weird. Since the analysis of VTune output does not match the current thread's name, I decided to start a new thread. Thank you all for opinions and recommendations.

Adaptive optimization of django transaction size

I'm bulk loading data into a django model, and have noticed that the number of objects loaded into memory before doing a commit affects the average time to save each object. I realise this can be due to many different factors, so would rather focus on optimizing this STEPSIZE variable.
What would be a simple algorithm for optimizing this variable, in realtime, while taking into account the fact that this optimum might also change during the process?
I imagine this would be some sort of gradient descent, with a bit of jitter to look for changes in the landscape? Is there a formally defined algorithm for this type of search?
I'd start out assuming that
1) Your function increases monotonically in both directions away from the optimum
2) You roughly know the size of the space of regions in which the optimum will live.
Then I'd recommend a bracket and subdivide approach as follows:
Eval you function outwards from the previous optimum in both directions. Stop the search in each direction when a value higher than the previous optimum is achieved. With the assumptions above, this will give you a bracketed interval in which the new optimum lives. Break this region into two new regions left and right by evaluating the midpoint of the region. Choose left or right based on who has the lowest values, and repeat recursively until your region is small enough for your liking.

Predict C++ program running time

How to predict C++ program running time, if program executes different functions (working with database, reading files, parsing xml and others)? How installers do it?
They do not predict the time. They calculate the number of operations to be done on a total of operations.
You can predict the time by using measurement and estimation. Of course the quality of the predictions will differ. And BTW: The word "predict" is correct.
You split the workload into small tasks, and create an estimation rule for each task, e.g.: if copying files one to ten took 10s, then the remaining 90 files may take another 90s. Measure the time that these tasks take at runtime, and update your estimations.
Each new measurement will make the prediction a bit more precise.
There really is no way to do this in any sort of reliable way, since it depends on thousands of factors.
Progress bars typically measure this in one of two ways:
Overall progress - I have n number of bytes/files/whatever to transfer, and so far I have done m.
Overall work divided by current speed - I have n bytes to transfer, and so far I have done m and it took t seconds, so if things continue at this rate it will take u seconds to complete.
Short answer:
No you can't. For progress bars and such, most applications simply increase the bar length with a percentage based on the overall tasks done. Some psuedo-code:
for(int i=0; i<num_files_to_load; ++i){
files.push_back(File(filepath[i]));
SetProgressBarLength((float)i/((float)num_files_to_load) - 1.0f);
}
This is a very simplified example. Making a for-loop like this would surely block the window system's event/message queue. You would probably add a timed event or something similar instead.
Longer answer:
Given N known parameters, the problem finding whether a program completes at all is undecidable. This is called the Halting problem. You can however, find the time it takes to execute a single instruction. Some very old games actually depended on exact cycle timings, and failed to execute correctly on newer computers due to race conditions that occur because of subtle differences in runtime. Also, on architectures with data and instruction caches, the cycles the instructions consume is not constant anymore. So cache makes cycle-counting unpredictable.
Raymond Chen discussed this issue in his blog.
Why does the copy dialog give such
horrible estimates?
Because the copy dialog is just
guessing. It can't predict the future,
but it is forced to try. And at the
very beginning of the copy, when there
is very little history to go by, the
prediction can be really bad.
In general it is impossible to predict the running time of a program. It is even impossible to predict whether a program will even halt at all. This is undecidable.
http://en.wikipedia.org/wiki/Halting_problem
As others have said, you can't predict the time. Approaches suggested by Partial and rmn are valid solutions.
What you can do more is assign weights to certain operations (for instance, if you know a db call takes roughly twice as long as some processing step, you can adjust accordingly).
A cool installer compiler would execute a faux install, time each op, then save this to disk for the future.
I used such a technique for a 3D application once, which had a pretty dead-on progress bar for loading and mashing data, after you've run it a few times. It wasn't that hard, and it made development much nicer. (Since we had to see that bar 10-15 times/day, startup was 10-20 secs)
You can't predict it entirely.
What you can do is wait until a fraction of the work is done, say 1%, and estimate the remaining time by that - just time how long it takes for 1% and multiply by 100, for example. That is easily done if you can enumerate all that you have to do in advance, or have some kind of a loop going on..
As I mentioned in a previous answer, it is impossible in general to predict the running time.
However, empirically it may be possible to predict with good accuracy.
Typically all of these programs are approximatelyh linear in some input.
But if you wanted a more sophisticated approach, you could define a large number of features (database size, file size, OS, etc. etc.) and input those feature values + running time into a neural network. If you had millions of examples (obviously you would have an automated method for gathering data, e.g. some discovery programs) you might come up with a very flexible and intelligent prediction algorithm.
Of course this would only be worth doing for fun, as I'm sure the value to your company over some crude guessing algorithm will probably be nil :)
You should make estimation of time needed for different phases of the program. For example: reading files - 50, working with database - 30, working with network - 20. In ideal it would be good if you make some progress callback during all of those phases, but it requires coding the progress calculation into the iterations of algorithm.