Prevent misuse of logical operator instead of bitwise operators - c++

In C++ it's possible to use a logical operator where a biwise operator was intended:
int unmasked = getUnmasked(); //some wide value
int masked = unmasked & 0xFF; // izolate lowest 8 bits
the second statement could be easily mistyped:
int masked = unmasked && 0xFF; //&& used instead of &
This will cause incorrect behaviour - masked will now be either 0 or 1 when it is inteded to be from 0 to 255. And C++ will not ever complain.
Is is possible to design code in such a way that such errors are detected at compiler level?

Ban in your coding standards the direct use of any bitwise operations in an arbitrary part of the code. Make it mandatory to call a function instead.
So instead of:
int masked = unmasked & 0xFF; // izolate lowest 8 bits
You write:
int masked = GetLowestByte(unmasked);
As a bonus, you'll get a code base which doesn't have dozens of error prone bitwise operations spread all over it.
Only in one place (the implementation of GetLowestByte and its sisters) you'll have the actual bitwise operations. Then you can read these lines two or three times to see if you blew it. Even better, you can unit test that part.

This is a bit Captain Obvious, but you could of course apply encapsulation and just hide the bitmask inside a class. Then you can use operator overloading to make sure that the boolean operator&&() as you see fit.
I guess that a decent implementation of such a "safe mask" need not be overly expensive performance-wise, either.

In some instances you might get a compiler warning (I wouldn't expect one in your example though). A tool like lint might be able to spot possible mistakes.
I think the only way to be sure is to define your coding standards to make the difference between the two operators more obvious - something like:
template<typename T>
T BitwiseAnd( T value, T mask ) { return value & mask; }
and ban the bitwise operators & and |

Both operators represent valid operations on integers, so I don't see any way of detecting a problem. How is the compiler supposed to know which operation you really wanted?

Related

Standardized ways to pack multiple values into one atomic

Assuming I have two atomic variables of types int32, I could instead chose to represent them as std::atomic<int64> both and reserve the first 32 bits for my first in and the last for my second int.
This seems like quite a space & time saver on x64 architectures, not to mention it allows for all sorts of black magic since one can abstract over various operations and make them atomic:
first == a && second ==b
becomes
both == ( int64(a) + int64(b) << 32 )
//Or some such... I'm not 100% sure this is correct but you get the idea
The one problem with this trick that I see is that I'm not particularly found with operating at the bit level and C++ is not very kind when it comes to operation at the bit level, especially once you try to accomplish more complex operations or pack more than two variables (e.g. two numbers and several bools) into the same atomic.
So I'm wondering if there is a standardized way to apply this kind of trick. A pattern or even std functionality that is easily recognizable by other coder when seen and easier to work with for the implementer ? Likewise, is this pattern useful enough to warrant such a standardization, or does its usefulness quickly become obsolete when compares to the possible annoyances and UB it can bring?
The way to get around Read-Then-Write with atomics is using a loop:
void setBit(atomic<int64_t>& bitset, int bit)
{
int64_t val = 1LL << bit;
int64_t prev = bitset;
while ((!(bitset & val)) &&
!bitset.compare_exchange_weak(prev, (prev | val))
;
}
You can extend this method to create generic bitwise operation functions

Is masking effective for thwarting side channel attacks?

I'm working with some bigint public-key cryptography code. Is it safe to use bitwise masking to ensure that the calculation timing and memory addresses accessed are independent of the data values?
Is this technique vulnerable to side-channel attacks based on instruction timing, power, RF emissions, or other things I'm unaware of? (For reference, I'm aware of techniques like RSA blinding, EC Montgomery ladder, cache flushing, and such.)
Example of straightforward code (C/C++):
uint a = (...), b = (...);
if (a < b)
a += b;
Now translated to use constant-time masking:
uint a = (...), b = (...);
uint mask = -(uint)(a < b);
a = ((a + b) & mask) | (a & ~mask);
Note that a < b is 0 or 1, and mask is 0x00000000 or 0xFFFFFFFF.
Similarly, for a high-level operation (C++):
Integer x = (...);
if (x.isFoo())
x.doBar();
Is the following an acceptable safe translation?
Integer x = (...);
uint mask = -(uint)x.isFoo(); // Assume this is constant-time
Integer y(x); // Copy constructor
y.doBar(); // Assume this is constant-time
x.replace(y, mask); // Assume this uses masking
This technique may be safe... if the operations we assume to take constant time really do, and if the compiler doesn't change the code to do something else instead.
In particular, let's take a look at your first example:
uint a = (...), b = (...);
uint mask = -(uint)(a < b);
a = ((a + b) & mask) | (a & ~mask);
I see two somewhat plausible ways in which this could fail to run in constant time:
The comparison a < b might or might not take constant time, depending on the compiler (and CPU). If it's compiled to simple bit manipulation, it may be constant-time; if it's compiled to use a conditional jump, it may well not be.
At high optimization levels, it's possible that a too-clever compiler might detect what's happening (say, by splitting the code into two paths based on the comparison, and optimizing them separately before merging them back) and "optimize" it back into the non-constant time code we were trying to avoid.
(Of course, it's also possible that a sufficiently clever compiler could optimize the naïve, seemingly non-constant time code into a constant-time operation, if it thought that would be more efficient!)
One possible way to avoid the first issue would be to replace the comparison with explicit bit manipulation, as in:
uint32_t a = (...), b = (...);
uint32_t mask = -((a - b) >> 31);
a = ((a + b) & mask) | (a & ~mask);
However, note that this is only equivalent to your original code if we can be sure that a and b differ by less than 231. If that is not guaranteed, we'd have to cast the variables into a longer type before the subtraction, e.g.:
uint32_t mask = (uint32_t)(( (uint64_t)a - (uint64_t)b ) >> 32);
All that said, even this is not foolproof, as the compiler could still decide to turn this code into something that is not constant-time. (For instance, 64-bit subtraction on a 32-bit CPU could potentially take variable time depending on whether there's a borrow or not — which is precisely what we're trying to hide, here.)
In general, the only way to make sure that such timing leaks don't occur is to:
inspect the generated assembly code manually (e.g. looking for jump instructions where you didn't expect any), and
actually benchmark the code to verify that it does, indeed, take the same time to run regardless of the inputs.
Obviously, you'll also need to do this separately for each combination of compiler and target platform that you wish to support.
It can be sketchy using masking or other techniques in code because compilers do all sorts of optimizations that you are often not aware of. Some of the methods that you mentioned in your original post are much better.
As a general rule of thumb use well known crypto libraries because they should be hardened against side channel attacks. Failing that you can often transform the information, process it and then transform back the results. This can work particularly well with public key cryptography as it is often Homomorphic.

Explain Bit Test macro in C++

I'm trying to figure out how does this code work, but I can't manage to get a single answer.
#define testbit(x, y) ( ( ((const char*) & (x))[(y)>>3] & 0x80 >> ((y)&0x07)) >> (7-((y)&0x07) ) )
I'm new at pointers, so if you can figure out a way to explain this in simplified english, I would really appreciate it.
It belongs to a segment of code for an X-Plane Plug-in found at https://code.google.com/p/xplugins/source/browse/trunk/Xsaitekpanels/SwitchPanel.cpp?r=38 line=19
The macro tests the value of the y-th bit in x. You can't directly address bits, so the code starts by treating x as an array of bytes (the const char* cast).
It then looks up the byte where the bit lives. There are 8 bits in a byte, so it divides by 8. Chasing performance, instead of simply dividing by 8, the code uses the binary trick of shifting right 3 places. In general, for unsigned x and y, x >> y = x/2^y, and x << y = x*2^y.
At this point you need to test the bit within the byte, so you get the remainder of y/8. Yet another bit trick, using y & 7 instead of the clearer y % 8.
With this information you can make a mask, a single on bit, 0x80 and shift it into position to test the y%8-th bit. The mask is ANDed against the byte and a non-zero result here means the bit was set to 1, otherwise 0.
Completing #RhythmicFistman's answer
#RhythmicFistman's answer is missing one small part to it and that is the last step in the shifts.
The >> (7-((y)&0x07) step ensures that you only ever get a result of 1 or 0. With this code it is safe to do comparisons like:
if (testbit(varible, 6) == 1) {
// do something
}
Where without that step testbit would return a bit mask in which the 6th bit would be set to 1 or 0 and all the other bits are always set to 0. That is the intent but it is not implemented in what is considered a portable way, see Warning 3 below.
Possible issues with using this code
Now to add something to the other answers. The other answers have not pointed out some keywords that should be mentioned here and they are strict aliasing and shift arithmetic right. My elaboration will come in the form of warnings below.
Warning 1: Endianness
This code assumes that you are using a big endian architecture or only wish to get the correct bit from an array of chars.
The reason is that if you convert an int into an array of chars (bytes) you will get different results on a big endian machine vs a little endian machine.
Warning 2: Strict Aliasing
The macro makes use of a cast (const char*) &(x) which is designed to change the type, a.k.a. alias, of (x) so that it is easier to get to the correct bits.
This is dangerous and the reason why is explained beautifully in this SO answer. The short version is that if you compile this code with optimisations strange things can happen.
The wikipedia pages on Aliasing and Pointer Aliasing are also useful and should be read.
Warning 3: Shift Arithmetic Right
In addition to this there could be a potential issue with the way this code uses the right shift operator >>. This operator has two different behaviors depending on whether the variable it is operating on is signed or unsigned. So long as you never use negative numbers you will be safe but this code will not protect you against that mistake. I suspect though, that you're less likely to make such a mistake anyway so it should be ok to use it.
Also worth mentioning, you are using signed char and are shifting it right. Though this works I would prefer unsigned char which would improve portability because it will not risk generating an arithmetic shift right when char and int are the same width (which is almost never the case in practice, granted). This works because char is promoted to int for the shift, see this SO answer for an explanation.
What you see is a macro, that make the following job :
(In order)
Make a bit shift to y (value : 3)
That take the address of x and pick the character in position y (into the string x)
Make a binary operation between the selected char and 0x80
Make a bit shift to the previous result (value: result of binary operation between y and 0x7)
Make a bit shift ti the previous result (value: 7 - (result of binary operation between y and 0x7))
Well, this is help you? I don't think so!
Because this macro is clairly unproper, and kind of tricky.
Bit mask, Binary operation, Binary shift...
If you can explain more precisly what you want to understand in this, maybe i can be helpfull.

Why use the '+' operator when '|' is perfectly good?

This is more of a philosophical question, but I've seen this a bunch of times in codebases here and there and do not really understand how this programming method came to be.
Suppose you have to set bits 2 and 3 to some value x without changing the other values in the uint. Doing so is pretty trivial and a common task, and I would be inclined to do it this way:
uint8_t someval = 0xFF; //some random previous value
uint8_t x = 0x2; //some random value to assign.
someval = (somval & ~0xC) | (x << 2); //Set the value to 0x2 for bits 2-3
I've seen code that instead or using '|' uses '+':
uint8_t someval = 0xFF; //some random previous value
uint8_t x = 0x2; //some random value to assign.
someval = (somval & ~0xC) + (x << 2); //Set the value to 0x2 for bits 2-3
Are they equivalent?
Yes.
Is one better than the other?
Only if your hardware doesn't have a bitwise OR instruction, but I have never ever ever seen a processor that didn't have a bitwise OR (even small PIC10 processors have an OR instruction).
So why would some programmers be inclined to use '+' instead of '|'? Am I missing some really obvious, uber powerful optimization here?
If you want to perform bitwise operations, use bitwise operators.
If you want to perform arithmetic operations, use arithmetic operators.
It's true that for some values some arithmetic operations can be implemented as simple bitwise operations, but that's essentially an implementation detail to which you should never expose your readers. First and foremost the logic of the code should be clear and if possible self-explanatory. The compiler will choose appropriate low-level operations for you to implement your desire.
That's being philanthropic.
Are they equivalent?
Yes, as long as the bitfield being written to is clear beforehand. Otherwise, they'll go wrong in slightly different ways.
Is one better than the other?
No, although some would say that bitwise operations express the intent more clearly.
So why would some programmers be inclined to use '+' instead of '|'?
Because they're equivalent, and neither is particularly better than the other.
Am I missing some really obvious, uber powerful optimization here?
No.
So why would some programmers be inclined to use '+' instead of '|'?
+ could bring out logical bugs faster. a | a would appear to work, whereas a simple a + a definitely wouldn't (of course, depends on the logic, but the + version is more error-prone).
Of course you should stick to the standard way of doing things (use bitwise operations when you want a bitwise operation, and arithmetic operations when you want to do math).
It's just a question of style. Any modern CPU will complete both operations in the same number of cycles (typically 1). Personally I prefer | in these cases since it more explicitly states to the code reader that you're doing bit twiddling instead of arithmetic.
If you have a bug in your code, then using + could lead to strange behavior, whereas using | would tend to mask the bug. For example, if you accidentally include the same bit twice, ORing it again is a no-op, but adding it will clear the bit and carry up into the next bit (and possibly farther, if more bits are set). So that would usually lead to fail-fast behavior instead of failure-masking behavior, which is generally preferable.

Is there any advantage to using '<< 1' instead of '* 2'?

I've seen this a couple of times, but it seems to me that using the bitwise shift left hinders readability. Why is it used? Is it faster than just multiplying by 2?
You should use * when you are multiplying, and << when you are bit shifting. They are mathematically equivalent, but have different semantic meanings. If you are building a flag field, for example, use bit shifting. If you are calculating a total, use multiplication.
It is faster on old compilers that don't optimize the * 2 calls by emitting a left shift instruction. That optimization is really easy to detect and any decent compiler already does.
If it affects readability, then don't use it. Always write your code in the most clear and concise fashion first, then if you have speed problems go back and profile and do hand optimizations.
It's used when you're concerned with the individual bits of the data you're working with. For example, if you want to set the upper byte of a word to 0x9A, you would not write
n |= 0x9A * 256
You'd write:
n |= 0x9A << 8
This makes it clearer that you're working with bits, rather than the data they represent.
For some architectures, bit shifting is faster than multiplying. However, any compiler worth its salt will optimize *2 (or any multiplication by a power of 2) to a left bit shift (when a bit shift would be faster).
For readability of values used as bitfields:
enum Flags { UP = (1<<0),
DOWN = (1<<1),
STRANGE = (1<<2),
CHARM = (1<<3),
...
which I think is preferable to either '=1,...,=2,...=4' or '=1,...=2, =2*2,...=2*3' especially if you have 8+ flags.
If you are using a old C compiler, it is preferrable to use bitwise. For readability you can comment you code though.