Interval branching - c++

A project (C++11) I am working on involves a block of code that will be run somewhere in the trillions of times. I have an integer parameter B in [1,N] and points 1 = b1 < b2 < ... < bk = N where the code executes a different small block of code depending on which interval [bi, b(i+1)) B lies in. The only value that is changing throughout execution is B. However while the value of the bi's are fixed, they are only determined at runtime.
The naive thing to do is to write a bunch of if and else if statements, which at worst case involves k comparisons. However one can do this in constant time: construct a vector myGotos of size N and on each interval [bi, b(i+1)) store the location of the corresponding code block. Then you just do goto myGotos[B].
The solution above seems to me like it would be on average quicker, but the code would be quite ugly. I was wondering if there is a better way to do this.

The common way to do this is with a switch statement
switch(B){
case b1:..//
break;
}
If you can declare those sections of code as lambdas or std::function provided they took the same arguments.Even a templated function might be ok. Its tough to answer without knowing what you actually need to run these functions.
map<int,decltype(yourLambda)>
Seems like it would work ok as well.

Initialize an array of N slots, let K, where each slot contains the index of the containing interval.
Then
switch (K[B])
{
case 1: // [B1,B2)
...
}

Related

Is there a way to manipulate function arguments by their position number?

I wish to be able to manipulate function arguments by the order in which they are given. So,
void sum(int a, int b, int c)
std::cout<< arguments[0]+arguments[1];
sum(1,1,4);
should print 2. This feature is in JavaScript.
I need it to implement a numerical scheme. I'd create a function that takes 4 corner values and tangential direction as input. Then, using the tangential direction, it decides which corners to use. I wish to avoid an 'if' condition as this function would be called several times.
EDIT - The reason why I do not wish to use an array as input is for potential optimization and readability reasons. I would explain my situation a bit more. solution is a 2D array. We would be running this double for loop several times
for (int i = 0;i<N_x;i++)
for (int j = 0;j<N_y;j++)
update_solution(solution[i,j],solution[i+1,j],solution[i-1,j],...);
Optimization: N_x,N_y are large enough for me to be concerned about whether or not adding a step like variables_used = {solution(i,j),solution(i+1,j),...} in every single loop will increase the cost.
Readability The arguments of update_solution indicate which indices were used to update the solution. Putting that in the previous line is slightly non-standard, judging by the codes I have read.

Haxe: Efficiency of mapping a variable to one of two values

I feel this question is too general and poorly asked. Improvement suggestions would be appreciated.
Say I have an integer variable which I need to convert to one of two other values. If the variable is 0, it will remain 0, otherwise, it will become 1. I can think of two ways to do this.
Method 1: An inline if/else assignment.
function runFunc(input:Int):Void
{
<script>
}
for (index in 0...5)
{
runFunc(if (index == 0) {0;} else {1;});
}
Method 2: Division and rounding.
function runFunc(input:Int):Void
{
<script>
}
for (index in 0...5)
{
runFunc(Math.round(index / index));
}
Method 1 is more standardised and would work for other data-types and other values, but Method 2 seems like it would take less processing power (especially if this needs to be done almost constantly).
Assuming that Method 2 wouldn't have a problem with rounding, is there enough of a difference in times between the two methods to consider using one over the other? How do different things like if/else statements and Math.round() impact processing time?
Well, first things first, your method 2 has a division by zero. So it's not even a valid solution.
However, I assume you want an answer "in general". And, of course, this kind of question comes with a TON of "it depends" qualifications. It depends on the type of CPU, the programming language, the optimizations the compiler might make, runtime optimizations, etc, etc.
However, in general, the order of cost of the relevant operations is roughly:
Logic and arithmetic operations are cheapest
Of these, integer operations are cheaper than floating point operations.
Branches (ifs and loops) are moderately expensive.
Division is moderately expensive.
Function calls are very expensive.
Basically, this is because (respectively):
This is what CPUs do, and they are optimized to do it quickly
Branches represent two possible paths, which can slow down CPU parallelization and optimizations.
Division is a multi-step process.
Function calls have to push and pop the stack.
Thus, I'd probably bet on your method 1 above. And, just for clarity, I'd write the if statement in your first example with an equivalent ternary operator:
for (index in 0...5)
{
runFunc( index==0 ? 0 : 1 );
}
This is well and good for mapping a simple boolean (either index is 0 or it isn't). But what about when the mapping becomes more complex? There are a couple constructs that generally give good performance when mapping one value to another: a lookup table or a hash map.
The lookup table is useful if your keys are integers, bounded between some minimum and maximum values. The hash map is good if your keys are hashable (often Ints, Strings, or Objects can be used as hash keys. Look at haxe.ds.IntMap, haxe.ds.StringMap and haxe.ds.ObjectMap.)
Example 1: A common look-up table might map numeric integer "day of the week" to the word used for that day. Assuming day:Int is always 0-6, a day int to string lookup table would be:
var day_name_lut = [ "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday" ];
So now trace('Today is a '+day_name_lut[Date.now().getDay()]); will tell us what day it is today. That's mapping an integer to an String for VERY low cost. Cheaper than a function call or a IntMap<String>.
Example 2: A sample StringMap might be used to store some object by a string value (for example, maybe a Person by their name.) This will allow us to lookup people by name later on:
var people_by_name = new StringMap<Person>();
var joe = new Person("Joe");
people_by_name.set(joe.name, joe);
var bob = new Person("Bob");
people_by_name.set(bob.name, bob);
This is very common for caching expensive oeprations. Imagine caching an HTTP response by its URL. It'd be much cheaper to lookup the second time from the StringMap than going back to fetch the response again.
--- update ---
I also note you say "inline if/else assignment", referring to this:
runFunc(if (index == 0) {0;} else {1;});
Note that actually writing the if inline is the kind of thing that makes absolutely no performance difference. It'll perform exactly the same as:
var tmp = if (index == 0) {0;} else {1;}
runFunc(tmp);
And these are exactly the same as well:
runFunc( if (index == 0) 0 else 1);
runFunc( (index == 0) ? 0 : 1);
These types of semantic differences don't make any difference to the final execution of code. The processor still has to compute and temporarily store the value to call the function, whether you wrote it inline, stored it in a tmp variable, or wrote the branch with an if/else or ternary operator. Compilers and VMs are optimized to take your (potentially variable) code and boil it down to the best instructions for the target CPU.
How about binary op index & 1 ?

performance of passing arguments by value

in the process of refactoring some code, want to change a function like this
bool A::function() {
return this->a == this->b || this->c == this->d || this->e == this->f || this->g == this->h ;
}
to something like this
bool A::function(int a, int b, int c, int d, int e, int g) {
return a == b || c == d || e == this->f || g == this->h ;
}
this function is supposed to be called each time inside a main loop which would have at most 10M elements
The people I'm working with are reluctant to use the second version because of the performance cost of passing 6 ints.
I'm pretty sure that this is negligeable, considering that each iteration of the loop goes through a LOT of code, and it roughly takes ~1 minute to proces the 10M elements.
Is the cost of passing 6 int by value all the time so hight? if not, how can I make them change their mind?
edit :
about inlining, I told them that the penality would be 0 if the function was inlined but their answer was basically "we can't know for sure if it will be inlined", which I seem to recall is true (up to the compiler)
I suspect that you won't see any big difference between these two variants in reasonably optimised code. However, the proof of that would be to actually change the code and compare the different times. (And more so if 10M entries are being processed in a minute, that's 6 microseconds per item, so around 30000-200000 instructions on a modern processor - adding 6 argument passes won't budge it one or the other way, I'd say - unless this function is called many times in the loop, of course).
And yes, if the function is inlined, the result would be identical code for the two alternatives - but are your colleagues say, you can't know for sure that it is inlined or not - the only way to really determine that is to have a look at the generated machine-code (-S or use objdump or similar).
In terms of performance, I would suggest you profile your code, and see if there is a difference that matters. Passing ints around is usually very cheap and open to automatic optimization, so I doubt you would see a measurable performance hit.
Also worth pointing out that the two functions are different. The second doesn't necessarily use the member variables and the first does. If you're always comparing member variables, why pass them as parameters? Extra unnecessary parameters means more source code and a greater scope for bugs.
Write the code and as Shane says, profile it, or I prefer to grab a few stack samples because you can see exactly what's going on.
If you find the program counter in the instructions that pass those int arguments, on more than one sample, then they are costing a significant fraction of time, and you should do something about it.
On the other hand, the samples might tell you something else is the main time-taker, and maybe you should fix that first.
Then the program will be faster, and if you do the whole process again, it might come back to your original question.

How do jump-tables work?

In the following document, pages 4-5:
http://www.open-std.org/jtc1/sc22/wg21/docs/ESC_Boston_01_304_paper.pdf
typedef int (* jumpfnct)(void * param);
static int CaseError(void * param)
{
return -1;
}
static jumpfnct const jumptable[] =
{
CaseError, CaseError, ...
.
.
.
Case44, CaseError, ...
.
.
.
CaseError, Case255
};
result = index <= 0xFF ? jumptable[index](param) : -1;
it is comparing IF-ELSE vs SWITCH and then introduces this "Jump table". Apparently it is the fastest implementation of the three. What exactly is it? I cannot see how it could work??
The jumptable is a method of mapping some input integer to an action. It stems from the fact that you can use the input integer as the index of an array.
The code sets up an array of pointers to functions. Your input integer is then used to select on of these function-pointers. Generally, it looks like it's going to be a pointer to the function CaseError. However, every now and again, it will be a different function that is being pointed to.
It's designed so that
jumptable[62] = Case62;
jumptable[95] = Case95;
jumptable[35] = Case35;
jumptable[34] = CaseError; /* For example... and so it goes on */
Thus, selecting the right function to call is constant time... with the if-elses and selects, the time taken to select the correct function is dependent on the input integer... assuming the compiler doesn't optimize the select to a jumptable itself... if it's for embedded code, then there's a chance that optimizations of this kind have been disabled... you'd have to check.
Once the correct function-pointer is found, the last line simply calls it:
result = index <= 0xFF ? jumptable[index](param) : -1;
becomes
result = index <= 0xFF /* Check that the index is within
the range of the jump table */
? jumptable[index](param) /* jumptable[index] selects a function
then it gets called with (param) */
: -1; /* If the index is out of range, set result to be -1
Personally, I think a better choice would be to call
CaseError(param) here */
Jumpfnct is a pointer to a function. Jumptable is an array that consists of a number of jumpfncts. The functions can be called just by referencing their position in the array.
For example, jumptable0 will execute the first function, passing along param. jumptable1 will execute the second function, etc.
If you don't know about function pointers, you shouldn't use this trick. They're very handy, in a narrow domain.
It's very fast and space efficient, when what you're doing is switching between a large number of similar function calls. You are adding a function call overhead that a switch statement doesn't necessarily have, so it might not be appropriate in all circumstances. If your code is something like this:
switch(x) {
case 1:
function1();
break;
case 2:
function2();
break;
...
}
A jump table might be a good substitution. If, though, your switch is something like this:
switch(x) {
case 1:
y++;
break;
case 1023:
y--;
break;
...
}
It probably wouldn't be worth doing.
I've used them in a toy FORTH language interpreter, where they were invaluable, but in most cases you're not going to see a speed benefit that makes them worth using. Use them if it makes the logic of your program clearer, not for optimization.
This jumptable returns a pointer-to-function by its index. You define this table in a way that invalid indexes point to the function that returns some invalid code (like -1 in the example) and valid indexes point to the functions you need to call.
Construction
jumptable[index]
returns pointer-to-function and this function gets called
jumptable[index](param)
where param is some custom parameter.
A Jump-Table is an obvious, but rarely used optimization, that for some reason seems to have fallen out of favor.
Briefly, instead of testing a value and exiting out of a switch/case or if-else block to branch to function or code path, you create an array which is filled with the addresses of the functions the program can branch to.
Once completed, this arrangement eliminates the relentless if testing attendant with if-else and switch/case blocks. The code uses the variable that would otherwise be tested with if as a subscript into the function-pointer array, and proceeds directly the the appropriate code - sans ANY if testing. A perfectly efficient branch. The assembly code should literally be a jump.
If you profile code, and find a hot-spot where the program is spending a large % of it's time, look to this kind of optimization to improve performance. A little bit of this can go a long way if it's part of a code's hot-spot.
Thanks for the link. Nice find!
As mentioned in the comment above, whether this solution is more or less effiicent than, for example, a switch statement depends on the amount of work needed to be done for each case.
Writing a regular switch statement for the values you want to process will definitely be a clearer way to see what the code does. So unless either space or speed requirements dictate that a more sophisticated solution, I would suggest that this is not a "better" solution.
Tables of function pointers is however an efficient and good way to solve certain problems. I use function pointers in a table quite regularly to do things like "Benchmark 11 different solutions to a problem", where I have a struct wiht the name of the function and the function, and some parameters perhaps. Then I have one function to time and loop over the code a few million times (or whatever it takes to get a long enough measurement to make sense)

How does this C++ function use memoization?

#include <vector>
std::vector<long int> as;
long int a(size_t n){
if(n==1) return 1;
if(n==2) return -2;
if(as.size()<n+1)
as.resize(n+1);
if(as[n]<=0)
{
as[n]=-4*a(n-1)-4*a(n-2);
}
return mod(as[n], 65535);
}
The above code sample using memoization to calculate a recursive formula based on some input n. I know that this uses memoization, because I have written a purely recursive function that uses the same formula, but this one much, much faster for much larger values of n. I've never used vectors before, but I've done some research and I understand the concept of them. I understand that memoization is supposed to store each calculated value, so that instead of performing the same calculations over again, it can simply retrieve ones that have already been calculated.
My question is: how is this memoization, and how does it work? I can't seem to see in the code at which point it checks to see if a value for n already exists. Also, I don't understand the purpose of the if(as[n]<=0). This formula can yield positive and negative values, so I'm not sure what this check is looking for.
Thank you, I think I'm close to understanding how this works, it's actually a bit more simple than I was thinking it was.
I do not think the values in the sequence can ever be 0, so this should work for me, as I think n has to start at 1.
However, if zero was a viable number in my sequence, what is another way I could solve it? For example, what if five could never appear? Would I just need to fill my vector with fives?
Edit: Wow, I got a lot of other responses while checking code and typing this one. Thanks for the help everyone, I think I understand it now.
if (as[n] <= 0) is the check. If valid values can be negative like you say, then you need a different sentinel to check against. Can valid values ever be zero? If not, then just make the test if (as[n] == 0). This makes your code easier to write, because by default vectors of ints are filled with zeroes.
The code appears to be incorrectly checking is (as[n] <= 0), and recalculates the negative values of the function(which appear to be approximately every other value). This makes the work scale linearly with n instead of 2^n with the recursive solution, so it runs a lot faster.
Still, a better check would be to test if (as[n] == 0), which appears to run 3x faster on my system. Even if the function can return 0, a 0 value just means it will take slightly longer to compute (although if 0 is a frequent return value, you might want to consider a separate vector that flags whether the value has been computed or not instead of using a single vector to store the function's value and whether it has been computed)
If the formula can yield both positive and negative values then this function has a serious bug. The check if(as[n]<=0) is supposed to be checking if it had already cached this value of computation. But if the formula can be negative this function recalculates this cached value alot...
What it really probably wanted was a vector<pair<bool, unsigned> >, where the bool says if the value has been calculated or not.
The code, as posted, only memoizes about 40% of the time (precisely when the remembered value is positive). As Chris Jester-Young pointed out, a correct implementation would instead check if(as[n]==0). Alternatively, one can change the memoization code itself to read as[n]=mod(-4*a(n-1)-4*a(n-2),65535);
(Even the ==0 check would spend effort when the memoized value was 0. Luckily, in your case, this never happens!)
There's a bug in this code. It will continue to recalculate the values of as[n] for as[n] <= 0. It will memoize the values of a that turn out to be positive. It works a lot faster than code without the memoization because there are enough positive values of as[] so that the recursion is terminated quickly. You could improve this by using a value of greater than 65535 as a sentinal. The new values of the vector are initialized to zero when the vector expands.