LLVM suitable parameters for the createBranchWeights() - llvm

We are trying to insrument branch instruction with the API SplitBlockAndInsertIfThenElse(). We noticed that the 3rd parameter of the API seems to be an "estimation" of the branch predication, which is generated by createBranchWeights(param1, param2). However, we want to ask which parameter is suitable for the estimation, i.e., the parameter of the createBranchWeights().
To the best of our knowledge, 64 and 4 are the llvm.expect's default values. Someone also uses createBranchWeights(1, 1000) to declare a rarely executed true branch. How the parameters affect the possibility of the branch? Which parameter is suitable?

If your expect that one branch will be taken twice as often as the other, 2 and 1 are suitable. If three times as often, 3 and 1. If you have no idea, then you can skip setting this metadata. It is optional.

Related

Optimize simple comparison with zero for performance

I have a bottleneck (about 20% CPU time) in my code which is in following if statement:
if (a == 0) { // here
...
}
where a is a uint8_t, so a number from 0 to 255.
Are there any low level optimizations to make it faster?
I thought about using bitwise NOR (~(a| 0)), but that would work only if a was a 1-bit, right?
Just in case: I don't care about code readability in this particular case.
Unless your compiler is garbage, there is nothing you can do to speed up integer comparison.
However, it is possible that the bottleneck you observe is not really the comparison itself, but rather the result of unlucky branch prediction.
There are two ways of getting around this:
If "to branch or not to branch" follows a pattern, move this last second decision further up in your program logic where you can use the pattern, just don't branch in your hot function. This might require serious thinking. A hacky way to find out whether you have patterns: Print 1 if you branch and 0 else for enough calls, Zip is up and see whether the resulting archive gets much smaller (in bits) than the number of values you printed. (Of course there are also smart formulas for that if you like it more theoretical.)
If you choose one branch over the other most of the time, you can tell the compiler which branch is the likely one. With gcc, checkout __builtin_expect, for other compilers, read the manual.
Important for both solutions: You will need to measure whether that actually helped. Especially the second one will not be magically be better, it might even make things much worse.

Different conditional checking ways

To check an int within range [1, ∞) or not, I can use the following ways (use #1, #2 a lot):
if (a>=1)
if (a>0)
if (a>1 || a==1)
if (a==1 || a>1)
Is there any difference that I should pay attention to among the four versions?
Functionally there is no difference between the 4 ways you listed. This is mainly an issue of style. I would venture that #1 and #2 are the most common forms though, if I saw #3 or #4 on a code review I would suggest a change.
Perf wise I suppose it is possible that some compiler out there optimizes one better than the other. But I really doubt it. At best it would be a micro-optimization and nothing I would ever base my coding style on without direct profiler input
I don't really see why you would use 3 or 4. Apart from being longer to type, they will generate more code. Since in a or condition the second check is skipped if the first is true, there shouldn't be a performance hit except for version 4 if value is not 1 often(of course hardware with branch prediction will mostly negate that).
1. if (a>=1)
2. if (a>0)
3. if (a>1 || a==1)
4. if (a==1 || a>1)
On x86, options 1 and 2 produce a cmp instruction. This will set various registers. The cmp is then followed by a condition branch/jump based on registers. For the first, it emits bge, for the second it emits bgt.
Option 3 and 4 - in theory - require two cmps and two branches, but chances are the compiler will simply optimize them to be the same as 1.
You should generally choose whichever (a) follows the conventions in the code you are working on (b) use whichever most clearly expresses the algorithm you are implementing.
There are times when explicitly writing "if a is equal to one, or it has a value greater than 1", and in those times you should write if (a == 1 || a > 1). But if you are just checking that a has a positive, non-zero, integer value, you should write if (a > 0), since that is what that says.
If you find that such a case is a part of a performance bottleneck, you should inspect the assembly instructions and adjust accordingly - e.g. if you find you have two cmps and branches, then write the code to use one compare and one branch.
Nope! They all are the same for an int. However, I would prefer to use if (a>0).

Use of Literals, yay/nay in C++

I've recently heard that in some cases, programmers believe that you should never use literals in your code. I understand that in some cases, assigning a variable name to a given number can be helpful (especially in terms of maintenance if that number is used elsewhere). However, consider the following case studies:
Case Study 1: Use of Literals for "special" byte codes.
Say you have an if statement that checks for a specific value stored in (for the sake of argument) a uint16_t. Here are the two code samples:
Version 1:
// Descriptive comment as to why I'm using 0xBEEF goes here
if (my_var == 0xBEEF) {
//do something
}
Version 2:
const uint16_t kSuperDescriptiveVarName = 0xBEEF;
if (my_var == kSuperDescriptiveVarName) {
// do something
}
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once. Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
Case Study 2: Use of sizeof
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns. Take the two code examples into account. The scenario is that you are computing the offset into a packet buffer (an array of uint8_t) where the first part of the packet is stored as my_packet_header, which let's say is a uint32_t.
Version 1:
const int offset = sizeof(my_packet_header);
Version 2:
const int offset = 4; // good comment telling reader where 4 came from
Clearly, version 1 is preferred, but what about for cases where you have multiple data fields to skip over? What if you have the following instead:
Version 1:
const int offset = sizeof(my_packet_header) + sizeof(data_field1) + sizeof(data_field2) + ... + sizeof(data_fieldn);
Version 2:
const int offset = 47;
Which is preferred in this case? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
Thanks for the help in advance as I attempt to better my code practices.
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
Sounds like you understand the main point... factoring values (and their comments) that are used in multiple places. Further, it can sometimes help to have a group of constants in one place - so their values can be inspected, verified, modified etc. without concern for where they're used in the code. Other times, there are many constants used in proximity and the comments needed to properly explain them would obfuscate the code in which they're used.
Countering that, having a const variable means all the programmers studying the code will be wondering whether it's used anywhere else, keeping it in mind as they inspect the rest of the scope in which it's declared etc. - the less unnecessary things to remember the surer the understanding of important parts of the code will be.
Like so many things in programming, it's "an art" balancing the pros and cons of each approach, and best guided by experience and knowledge of the way the code's likely to be studied, maintained, and evolved.
Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
There's no performance implications in optimised code.
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns.
And other reasons too. A big factor in good programming is reducing the points of maintenance when changes are done. If you can modify the type of a variable and know that all the places using that variable will adjust accordingly, that's great - saves time and potential errors. Using sizeof helps with that.
Which is preferred [for calculating offsets in a struct]? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
The offsetof macro (#include <cstddef>) is better for this... again reducing maintenance burden. With the this + that approach you illustrate, if the compiler decides to use any padding your offset will be wrong, and further you have to fix it every time you add or remove a field.
Ignoring the offsetof issues and just considering your this + that example as an illustration of a more complex value to assign, again it's a balancing act. You'd definitely want some explanation/comment/documentation re intent here (are you working out the binary size of earlier fields? calculating the offset of the next field?, deliberately missing some fields that might not be needed for the intended use or was that accidental?...). Still, a named constant might be enough documentation, so it's likely unimportant which way you lean....
In every example you list, I would go with the name.
In your first example, you almost certainly used that special 0xBEEF number at least twice - once to write it and once to do your comparison. If you didn't write it, that number is still part of a contract with someone else (perhaps a file format definition).
In the last example, it is especially useful to show the computation that yielded the value. That way, if you encounter trouble down the line, you can easily see either that the number is trustworthy, or what you missed and fix it.
There are some cases where I prefer literals over named constants though. These are always cases where a name is no more meaningful than the number. For example, you have a game program that plays a dice game (perhaps Yahtzee), where there are specific rules for specific die rolls. You could define constants for One = 1, Two = 2, etc. But why bother?
Generally it is better to use a name instead of a value. After all, if you need to change it later, you can find it more easily. Also it is not always clear why this particular number is used, when you read the code, so having a meaningful name assigned to it, makes this immediately clear to a programmer.
Performance-wise there is no difference, because the optimizers should take care of it. And it is rather unlikely, even if there would be an extra instruction generated, that this would cause you troubles. If your code would be that tight, you probably shouldn't rely on an optimizer effect anyway.
I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
I think kSuperDescriptiveVarName will definitely be used more than once. One for check and at least one for assignment, maybe in different part of your program.
There will be no difference in performance, since an optimization called Constant Propagation exists in almost all compilers. Just enable optimization for your compiler.

Profiling a simple, one cycle length operation

We have an assignment where we need to profile a 'simple instruction' (addition or bit-wise and for example). This means performing the same operation a large number of times (100K+) and measuring the average time in microseconds. The result should be presented in cycle-lengths: (totalTime/iterations)*cphMHz.
So, results may vary but all in all we were told that we should get a result close to 1 cycle-length. Actual result doesn't matter as long as programming is correct.
My question is: what is a good operation to profile?
There are two points I need to concider:
I use loop unrolling to be a bit more accurate, so in each iteration I perform 10 simple instruction. This means I have to choose an operation to wouldn't be performed only once due to compiler optimization (we can't use -o0 flag as school staff does not).
Bad example: var = i; - the compiler would only perform the last command.
What is a real 'simple instruction'? How do I know the number of operations that are actually performed? I tried reading the assembly output, but I couldn't understand it.
Hope I was clear enough, any idea would be great.
Thanks anyway
P.S don't know if it matters but I write in CPP
1) This sounds (to me) like an impossible task, if optimizations are (or might be) enabled. You can never be sure on what the compiler will do during optimizations. I'd definitely do something like reusing the previous result. If allowed to/possible, I'd try to include a raw assembler snippet to be profiled (so you can be sure there's no additional overhead; although it still could be optimized).
2) As for instructions: One assembler command is one instruction. E.g. a += i will - depending on available instruction set and stuff - most likely result in 4 instructions: read a, read i, add, write a. Reading assembly is pretty much straightforward. Depending on the instruction set/processor, there might be different "directions" for reading (i.e. "from -> to"). x86 assemblers (and those for most other common processors) will prefer instruction target, source, while DSPs prefer to use instruction source, target. Just important to know: moving data has to happen through registers. So even a single assignment like a = b will result in two instructions (b to register and register to a).
In general, if this answer goes into the wrong direction, try to elaborate a bit more on your specific task and its requirements (e.g. which compiler is to be used) and drop me a short comment.

Why do we follow opposite conventions while returning from main()?

I have gone through this and this,
but the question I am asking here is that why is 0 considered a Success?
We always associate 0 with false, don't we?
Because there are more fail cases than success cases.
Usually, there is only one reason we succeed (because we're successful :)), but there are a whole lot of reasons why we could fail. So 0 means success, and everything else means failure, and the value could be used to report the reason.
For functions in your code, this is different, because you are the one specifying the interface, and thus can just use a bool if it suffices. For main, there is one fixed interface for returns, and there may be programs that just report succeed/fail, but others that need more fine error reporting. To satisfy them all, we will have multiple error cases.
I have to quibble with with Johannes' answer a bit. True 0 is used for success because there is only 1 successful outcome while there can be many unsuccessful outcomes. But my experience is that return codes have less to do with reasons for failure than levels of failure.
Back in the days of batch programming there were usually conventions for return codes that allowed for some automation of the overall stream of execution. So a return code of 4 might be a warning but the next job could continue; an 8 might mean the job stream should stop; a 12 might mean something catastrophic happened and the fire department should be notified.
Similarly, batches would set aside some range of return codes so that the overall batch stream could branch. If an update program returned XX, for instance, the batch might skip a backup step because nothing changed.
Return codes as reasons for failure aren't all that helpful, certainly not nearly as much as log files, core dumps, console alerts, and whatnot. I have never seen a system that returns XX because "such and such file was not found", for instance.
Generally the return values for any given program tend to be a list (enum) of possible values, such as Success or specific errors. As a "list", this list generally tends to start at 0 and count upwards. (As an aside, this is partly why the Microsoft Error Code 0 is ERROR_SUCCESS).
Globally speaking, Success tends to be one of the only return values that almost any program should be capable of returning. Even if a program has several different error values, Success tends to be a shared necessity, and as such is given the most common position in a list of return values.
It's just the simplest way to allow a most common return value by default. It's completely separate from the idea of a boolean.
Here's the convention that I'm used to from various companies (although this obviously varies from place to place):
A negative number indicates an error occured. The value of the negative number indicates (hopefully) the type of error.
A zero indicates success (generic success)
A positive number indicates a type of success (under some business cases there various things that trigger a successful case....this indicates which successful case happened).
I, too, found this confusing when I first started programming. I resolved it in my mind by saying, 0 means no problems.