How do jump-tables work? - c++

In the following document, pages 4-5:
http://www.open-std.org/jtc1/sc22/wg21/docs/ESC_Boston_01_304_paper.pdf
typedef int (* jumpfnct)(void * param);
static int CaseError(void * param)
{
return -1;
}
static jumpfnct const jumptable[] =
{
CaseError, CaseError, ...
.
.
.
Case44, CaseError, ...
.
.
.
CaseError, Case255
};
result = index <= 0xFF ? jumptable[index](param) : -1;
it is comparing IF-ELSE vs SWITCH and then introduces this "Jump table". Apparently it is the fastest implementation of the three. What exactly is it? I cannot see how it could work??

The jumptable is a method of mapping some input integer to an action. It stems from the fact that you can use the input integer as the index of an array.
The code sets up an array of pointers to functions. Your input integer is then used to select on of these function-pointers. Generally, it looks like it's going to be a pointer to the function CaseError. However, every now and again, it will be a different function that is being pointed to.
It's designed so that
jumptable[62] = Case62;
jumptable[95] = Case95;
jumptable[35] = Case35;
jumptable[34] = CaseError; /* For example... and so it goes on */
Thus, selecting the right function to call is constant time... with the if-elses and selects, the time taken to select the correct function is dependent on the input integer... assuming the compiler doesn't optimize the select to a jumptable itself... if it's for embedded code, then there's a chance that optimizations of this kind have been disabled... you'd have to check.
Once the correct function-pointer is found, the last line simply calls it:
result = index <= 0xFF ? jumptable[index](param) : -1;
becomes
result = index <= 0xFF /* Check that the index is within
the range of the jump table */
? jumptable[index](param) /* jumptable[index] selects a function
then it gets called with (param) */
: -1; /* If the index is out of range, set result to be -1
Personally, I think a better choice would be to call
CaseError(param) here */

Jumpfnct is a pointer to a function. Jumptable is an array that consists of a number of jumpfncts. The functions can be called just by referencing their position in the array.
For example, jumptable0 will execute the first function, passing along param. jumptable1 will execute the second function, etc.
If you don't know about function pointers, you shouldn't use this trick. They're very handy, in a narrow domain.
It's very fast and space efficient, when what you're doing is switching between a large number of similar function calls. You are adding a function call overhead that a switch statement doesn't necessarily have, so it might not be appropriate in all circumstances. If your code is something like this:
switch(x) {
case 1:
function1();
break;
case 2:
function2();
break;
...
}
A jump table might be a good substitution. If, though, your switch is something like this:
switch(x) {
case 1:
y++;
break;
case 1023:
y--;
break;
...
}
It probably wouldn't be worth doing.
I've used them in a toy FORTH language interpreter, where they were invaluable, but in most cases you're not going to see a speed benefit that makes them worth using. Use them if it makes the logic of your program clearer, not for optimization.

This jumptable returns a pointer-to-function by its index. You define this table in a way that invalid indexes point to the function that returns some invalid code (like -1 in the example) and valid indexes point to the functions you need to call.
Construction
jumptable[index]
returns pointer-to-function and this function gets called
jumptable[index](param)
where param is some custom parameter.

A Jump-Table is an obvious, but rarely used optimization, that for some reason seems to have fallen out of favor.
Briefly, instead of testing a value and exiting out of a switch/case or if-else block to branch to function or code path, you create an array which is filled with the addresses of the functions the program can branch to.
Once completed, this arrangement eliminates the relentless if testing attendant with if-else and switch/case blocks. The code uses the variable that would otherwise be tested with if as a subscript into the function-pointer array, and proceeds directly the the appropriate code - sans ANY if testing. A perfectly efficient branch. The assembly code should literally be a jump.
If you profile code, and find a hot-spot where the program is spending a large % of it's time, look to this kind of optimization to improve performance. A little bit of this can go a long way if it's part of a code's hot-spot.
Thanks for the link. Nice find!

As mentioned in the comment above, whether this solution is more or less effiicent than, for example, a switch statement depends on the amount of work needed to be done for each case.
Writing a regular switch statement for the values you want to process will definitely be a clearer way to see what the code does. So unless either space or speed requirements dictate that a more sophisticated solution, I would suggest that this is not a "better" solution.
Tables of function pointers is however an efficient and good way to solve certain problems. I use function pointers in a table quite regularly to do things like "Benchmark 11 different solutions to a problem", where I have a struct wiht the name of the function and the function, and some parameters perhaps. Then I have one function to time and loop over the code a few million times (or whatever it takes to get a long enough measurement to make sense)

Related

Interval branching

A project (C++11) I am working on involves a block of code that will be run somewhere in the trillions of times. I have an integer parameter B in [1,N] and points 1 = b1 < b2 < ... < bk = N where the code executes a different small block of code depending on which interval [bi, b(i+1)) B lies in. The only value that is changing throughout execution is B. However while the value of the bi's are fixed, they are only determined at runtime.
The naive thing to do is to write a bunch of if and else if statements, which at worst case involves k comparisons. However one can do this in constant time: construct a vector myGotos of size N and on each interval [bi, b(i+1)) store the location of the corresponding code block. Then you just do goto myGotos[B].
The solution above seems to me like it would be on average quicker, but the code would be quite ugly. I was wondering if there is a better way to do this.
The common way to do this is with a switch statement
switch(B){
case b1:..//
break;
}
If you can declare those sections of code as lambdas or std::function provided they took the same arguments.Even a templated function might be ok. Its tough to answer without knowing what you actually need to run these functions.
map<int,decltype(yourLambda)>
Seems like it would work ok as well.
Initialize an array of N slots, let K, where each slot contains the index of the containing interval.
Then
switch (K[B])
{
case 1: // [B1,B2)
...
}

Use of Literals, yay/nay in C++

I've recently heard that in some cases, programmers believe that you should never use literals in your code. I understand that in some cases, assigning a variable name to a given number can be helpful (especially in terms of maintenance if that number is used elsewhere). However, consider the following case studies:
Case Study 1: Use of Literals for "special" byte codes.
Say you have an if statement that checks for a specific value stored in (for the sake of argument) a uint16_t. Here are the two code samples:
Version 1:
// Descriptive comment as to why I'm using 0xBEEF goes here
if (my_var == 0xBEEF) {
//do something
}
Version 2:
const uint16_t kSuperDescriptiveVarName = 0xBEEF;
if (my_var == kSuperDescriptiveVarName) {
// do something
}
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once. Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
Case Study 2: Use of sizeof
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns. Take the two code examples into account. The scenario is that you are computing the offset into a packet buffer (an array of uint8_t) where the first part of the packet is stored as my_packet_header, which let's say is a uint32_t.
Version 1:
const int offset = sizeof(my_packet_header);
Version 2:
const int offset = 4; // good comment telling reader where 4 came from
Clearly, version 1 is preferred, but what about for cases where you have multiple data fields to skip over? What if you have the following instead:
Version 1:
const int offset = sizeof(my_packet_header) + sizeof(data_field1) + sizeof(data_field2) + ... + sizeof(data_fieldn);
Version 2:
const int offset = 47;
Which is preferred in this case? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
Thanks for the help in advance as I attempt to better my code practices.
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
Sounds like you understand the main point... factoring values (and their comments) that are used in multiple places. Further, it can sometimes help to have a group of constants in one place - so their values can be inspected, verified, modified etc. without concern for where they're used in the code. Other times, there are many constants used in proximity and the comments needed to properly explain them would obfuscate the code in which they're used.
Countering that, having a const variable means all the programmers studying the code will be wondering whether it's used anywhere else, keeping it in mind as they inspect the rest of the scope in which it's declared etc. - the less unnecessary things to remember the surer the understanding of important parts of the code will be.
Like so many things in programming, it's "an art" balancing the pros and cons of each approach, and best guided by experience and knowledge of the way the code's likely to be studied, maintained, and evolved.
Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
There's no performance implications in optimised code.
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns.
And other reasons too. A big factor in good programming is reducing the points of maintenance when changes are done. If you can modify the type of a variable and know that all the places using that variable will adjust accordingly, that's great - saves time and potential errors. Using sizeof helps with that.
Which is preferred [for calculating offsets in a struct]? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
The offsetof macro (#include <cstddef>) is better for this... again reducing maintenance burden. With the this + that approach you illustrate, if the compiler decides to use any padding your offset will be wrong, and further you have to fix it every time you add or remove a field.
Ignoring the offsetof issues and just considering your this + that example as an illustration of a more complex value to assign, again it's a balancing act. You'd definitely want some explanation/comment/documentation re intent here (are you working out the binary size of earlier fields? calculating the offset of the next field?, deliberately missing some fields that might not be needed for the intended use or was that accidental?...). Still, a named constant might be enough documentation, so it's likely unimportant which way you lean....
In every example you list, I would go with the name.
In your first example, you almost certainly used that special 0xBEEF number at least twice - once to write it and once to do your comparison. If you didn't write it, that number is still part of a contract with someone else (perhaps a file format definition).
In the last example, it is especially useful to show the computation that yielded the value. That way, if you encounter trouble down the line, you can easily see either that the number is trustworthy, or what you missed and fix it.
There are some cases where I prefer literals over named constants though. These are always cases where a name is no more meaningful than the number. For example, you have a game program that plays a dice game (perhaps Yahtzee), where there are specific rules for specific die rolls. You could define constants for One = 1, Two = 2, etc. But why bother?
Generally it is better to use a name instead of a value. After all, if you need to change it later, you can find it more easily. Also it is not always clear why this particular number is used, when you read the code, so having a meaningful name assigned to it, makes this immediately clear to a programmer.
Performance-wise there is no difference, because the optimizers should take care of it. And it is rather unlikely, even if there would be an extra instruction generated, that this would cause you troubles. If your code would be that tight, you probably shouldn't rely on an optimizer effect anyway.
I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
I think kSuperDescriptiveVarName will definitely be used more than once. One for check and at least one for assignment, maybe in different part of your program.
There will be no difference in performance, since an optimization called Constant Propagation exists in almost all compilers. Just enable optimization for your compiler.

Switch statement with huge number of cases

What happens if the switch has more than 5000 case. What are the drawbacks and how we can replace it with something faster?
Note: I am not expecting to use array to store cases as it's the same.
There's no specific reason to think you'd want anything other than a switch/case statement (and indeed I'd actively expect it to be unhelpful). The compiler should create efficient dispatching code, which might involve some combination of static [sparse] table(s) and direct indexing, binary branching etc.; it's got insights into the static values of the cases and should do an excellent job (retuning it on the fly each time you change the cases, whereas new values that don't fit well with a hand-crafted approach - such as wildly differing values when you'd had a pretty packed array lookup - could require reworking of code or silently cause memory bloat or a performance drop).
People really cared about this kind of thing back when C was trying to win over hard-core assembly programmers... the compilers were held accountable for generating good code. Put another way - if it's not (measurably) broken, don't fix it.
More generally, it's great to be curious about this kind of thing and get people's ideas on alternatives and their performance implications, but if you really care and the performance difference could make a useful difference to your program (especially if profiling suggests it) then always benchmark with your program doing real work.
As food for thought... in case one might be stuck with an old/buggy/inefficient compiler or just love hacking.
Inner work of switch statement consist of two parts. Finding address to jump, and well jumping there. For the first part you need to use a table to find the address. If the number of cases increases, table gets bigger - searching address to jump takes time. This is the point compilers tries to optimize, combining several techniques but one easy approach is to use table directly which depends on case value space.
In a back of the napkin example;
switch (n) {
case 1: foo(); break;
case 2: bar(); break;
case 3: baz(); break;
}
with such piece of code compiler can create an array of jump_addresses and directly get the address by array[n]. Now search just took O(1). But if you had a switch like below:
switch (n) {
case 10: foo(); break;
case 17: bar(); break;
case 23: baz(); break;
// and a lot other
}
compiler needs to generate a table containing case_id, jump_address pairs and code to search through that structure which with worst implementation can take O(n). (Decent compilers optimize the hell out of such scenario when they are fully unleashed by enabling their optimization flags to a degree that when you need to debug such optimized code your brain starts to fry.)
Then question is can you do this all yourself at C level to beat the compiler? and funny thing is while creating tables and searching through them seems easy, jumping to a variable point using goto is not possible in standard C. So there is a chance that if you are not going to use function pointers due to overhead or code structure, you are stuck... well if you are not using GCC. GCC has a non-standard feature called Labels as Values which helps you to get pointers to labels.
To complete the example you can write the second switch statement with "labels as values" feature like this:
const void *cases[] = {&&case_foo, &&case_bar, &&case_baz, ....};
goto *labels[n];
case_foo:
foo();
goto switch_end;
case_bar:
bar();
goto switch_end;
case_baz:
baz();
goto switch_end;
// and a lot other
switch_end:
Of course if you are talking about 5000 cases, it is much better if you write a piece of code to create this code for you - and it is probably only way to maintain such software.
As closing notes; will this improve your daily work? No. Will this improve your skills? Yes and talking from experience, I once found myself improved a security algorithm in a smart card just by optimizing case values. It is a strange world.
Try to use Dictionary class with Delegate values. At least it makes code a little bit more readable.
Big switch statement, generally auto-generated one, may take long time to compile. But I like the idea that compiler optimizes the switch statement.
One way to break apart the switch statement is to use bucketing,
int getIt(int input)
{
int bucket = input%16;
switch(bucket)
{
case 1:
return getItBucket1(input);
case 2:
return getItBucket2(input);
...
...
}
return -1;
}
So in the code above, we broke apart our switch statement into 16 parts. It is easy to change the number of buckets in auto-generated code.
This code has added run-time cost of one layer of indirection or function-call. . But considering the buckets defined in different files, it is faster to compile them in parallel.

C++ test to verify equality operator is kept consistent with struct over time

I voted up #TomalakGeretkal for a good note about by-contract; I'm haven't accepted an answer as my question is how to programatically check the equals function.
I have a POD struct & an equality operator, a (very) small part of a system with >100 engineers.
Over time I expect the struct to be modified (members added/removed/reordered) and I want to write a test to verify that the equality op is testing every member of the struct (eg is kept up to date as the struct changes).
As Tomalak pointed out - comments & "by contract" is often the best/only way to enforce this; however in my situation I expect issues and want to explore whether there are any ways to proactively catch (at least many) of the modifications.
I'm not coming up with a satisfactory answer - this is the best I've thought of:
-new up two instances struct (x, y), fill each with identical non-zero data.
-check x==y
-modify x "byte by byte"
-take ptr to be (unsigned char*)&x
-iterator over ptr (for sizeof(x))
-increment the current byte
-check !(x==y)
-decrement the current byte
-check x==y
The test passes if the equality operator caught every byte (NOTE: there is a caveat to this - not all bytes are used in the compilers representation of x, therefore the test would have to 'skip' these bytes - eg hard code ignore bytes)
My proposed test has significant problems: (at least) the 'don't care' bytes, and the fact that incrementing one byte of the types in x may not result in a valid value for the variable at that memory location.
Any better solutions?
(This shouldn't matter, but I'm using VS2008, rtti is off, googletest suite)
Though tempting to make code 'fool-proof' with self-checks like this, it's my experience that keeping the self-checks themselves fool-proof is, well, a fool's errand.
Keep it simple and localise the effect of any changes. Write a comment in the struct definition making it clear that the equality operator must also be updated if the struct is; then, if this fails, it's just the programmer's fault.
I know that this will not seem optimal to you as it leaves the potential for user error in the future, but in reality you can't get around this (at least without making your code horrendously complicated), and often it's most practical just not to bother.
I agree with (and upvoted) Tomalak's answer. It's unlikely that you'll find a foolproof solution. Nonetheless, one simple semi-automated approach could be to validate the expected size within the equality operator:
MyStruct::operator==(const MyStruct &rhs)
{
assert(sizeof(MyStruct) == 42); // reminder to update as new members added
// actual functionality here ...
}
This way, if any new members are added, the assert will fire until someone updates the equality operator. This isn't foolproof, of course. (Member vars might be replaced with something of same size, etc.) Nonetheless, it's a relatively simple (one line assert) that has a good shot of detecting the error case.
I'm sure I'm going to get downvoted for this but...
How about a template equality function that takes a reference to an int parameter, and the two objects being tested. The equality function will return bool, but will increment the size reference (int) by the sizeof(T).
Then have a large test function that calls the template for each object and sums the total size --> compare this sum with the sizeof the object. The existence of virtual functions/inheritance, etc could kill this idea.
it's actually a difficult problem to solve correctly in a self-test.
the easiest solution i can think of is to take a few template functions which operate on multiple types, perform the necessary conversions, promotions, and comparisons, then verify the result in an external unit test. when a breaking change is introduced, at least you'll know.
some of these challenges are more easily maintained/verified using approaches such as composition, rather than extension/subclassing.
Agree with Tomalak and Eric. I have used this for very similar problems.
Assert does not work unless the DEBUG is defined, so potentially you can release code that is wrong. These tests will not always work reliably. If the structure contains bit fields, or items are inserted that take up slack space cause by compiler aligning to word boundaries, the size won't change. For this reason they offer limited value. e.g.
struct MyStruct {
char a ;
ulong l ;
}
changed to
struct MyStruct {
char a ;
char b ;
ulong l ;
}
Both structures are 8 bytes (on 32bit Linux x86)

Optimal virtual machine/byte-code interpreter loop

My project has a VM that executes a byte-code compiled from a domain-specific-language. I'm looking at ways that I can improve the execution time of the byte-code. As a first step I'd like to see if there is a way to simply improve the byte-code interpreter before I venture into machine code compilation.
The main loop of the interpreter looks like this:
while(true)
{
uint8_t cmd = *code++;
switch( cmd )
{
case op_1: ...; break;
...
}
}
QUESTION: Is there a faster way to implement this loop without resorting to assembler?
The one option I see is GCC specific to use dynamic goto with label addresses. Rather than a break at the end of each case I could jump directly to the next instruction. I had hoped the optimizer would do this for me, but looking at the disassembly it apparently doesn't: there is a repeated constant jump at the end of most op_codes.
If relevant the VM is a simple register based machine with floating point and integer registers (8 of each). There is no stack, only a global heap (that language is not that complicated).
One very easy optimisation is that instead of
switch /case/case/case/case/case,
just define an array with function pointers (where each function would process a specified command, or a couple of commands in which case you could set several entries in the array to the same function, and the function itself could check the exact code), and instead of
switch(cmd)
just do a
array[cmd]()
This is given that you dont have too many commands. Also, do some checking if you will not define all the possible commands (maybe you only have 300 commands, but you have to use 2 bytes for representing them, so instead of definining an array with 65536 items, just check if the command is less than 301 and if its not, dont do the lookup)
If you won't do that, at least sort the commands that the most used ones are in the beginning of the switch statement.
Otherwise it would be to look into hashtables, but I assume you don't have that many commands, and in that case overhead of doing a hash function would probably cost you more than not having to do a switch. (Or have a VERY simple hash function)
What's the architecture? You may get a speed-up with word-aligned opcodes, but it'll blow out your code size, which means you'll have to balance it against the cost of a cache miss.
Few obvious optimization I see are,
If you don't use cmd anywhere than switch() then, directly use the pointer indirection, switch( *code++ ). For longer while(true) loop, this can be little helpful.
In switch(), you can use continue instead of break. Because when continue is used inside if/else or switch, compiler knows that execution has to jump to the outer loop; the same is not true for break (with respect to switch).
Hope this helps.