I have a problem where 2 different compilers (GCC and IAR) are dropping my mask from an if of different sized variables.
I have the following code:
uint8_t Value2;
uint16_t WriteOffset;
bool Fail;
void test(void)
{
uint8_t buff[100];
uint16_t r;
for(r=0;r<Value2+1;r++)
{
if(buff[r]!=(WriteOffset+r)&0xFF)
{
Fail=true;
}
}
}
The if fails (goes into the {} block) when buff[r] ==0 and WriteOffset+r == 0x100.
GCC outputs the following assembly:
movzwl -0xc(%ebp),%eax ; Load 'r'->EAX
mov -0x70(%ebp,%eax,1),%al ; Load 'buff[r]'->AL
movzbl %al,%edx ; Move AL to (unsigned int)EDX
mov 0x4b19e0,%ax ; Load 'WriteOffset'->AX
movzwl %ax,%ecx ; Move AX to (unsigned int)ECX
movzwl -0xc(%ebp),%eax ; Load 'r'->EAX
lea (%ecx,%eax,1),%eax ; 'WriteOffset' + 'r'->EAX
cmp %eax,%edx ; (unsigned int)'WriteOffset+r' == (unsigned int)'buff[r]'
je 0x445e28 <Test+1254> ; If == skip {} block
My question is why is the compiler dropping my &0xFF from the if (I have already fixed the problem with a cast, but I still do not understand why it dropped it in the first place)?
It isn't, operator precedence is biting you here
You want if( buff[r] != ((WriteOffset+r)&0xFF) )
What you currently have is the same as if( (buff[r]!=(WriteOffset+r)) & 0xFF )
The precedence confusion is causing you to mask a value that can only be 0 or 1 (the result of the comparison expression) with 0xFF, so the optimizer is quite reasonably removing it.
!= is higher precedence than &. I think you need an extra set of parentheses.
http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B has a C/C++ operator precedence table, have a look.
Operator != has higher priority than &. So you should write so:
if(buff[r]!=((WriteOffset+r)&0xFF))
Related
I've been struggling trying to convert this assembly code to C++ code.
It's a function from an old game that takes pixel data Stmp, and I believe it places it to destination void* dest
void Function(int x, int y, int yl, void* Stmp, void* dest)
{
unsigned long size = 1280 * 2;
unsigned long j = yl;
void* Dtmp = (void*)((char*)dest + y * size + (x * 2));
_asm
{
push es;
push ds;
pop es;
mov edx,Dtmp;
mov esi,Stmp;
mov ebx,j;
xor eax,eax;
xor ecx,ecx;
loop_1:
or bx,bx;
jz exit_1;
mov edi,edx;
loop_2:
cmp word ptr[esi],0xffff;
jz exit_2;
mov ax,[esi];
add edi,eax;
mov cx,[esi+2];
add esi,4;
shr ecx,2;
jnc Next2;
movsw;
Next2:
rep movsd;
jmp loop_2;
exit_2:
add esi,2;
add edx,size;
dec bx;
jmp loop_1;
exit_1:
pop es;
};
}
That's where I've gotten as far to: (Not sure if it's even correct)
while (j > 0)
{
if (*stmp != 0xffff)
{
}
++stmp;
dtmp += size;
--j;
}
Any help is greatly appreciated. Thank you.
It saves / restores ES around setting it equal to DS so rep movsd will use the same addresses for load and store. That instruction is basically memcpy(edi, esi, ecx) but incrementing the pointers in EDI and ESI (by 4 * ecx). https://www.felixcloutier.com/x86/movs:movsb:movsw:movsd:movsq
In a flat memory model, you can totally ignore that. This code looks like it might have been written to run in 16-bit unreal mode, or possibly even real mode, hence the use of 16-bit registers all over the place.
Look like it's loading some kind of records that tell it how many bytes to copy, and reading until the end of the record, at which point it looks for the next record there. There's an outer loop around that, looping through records.
The records look like this I think:
struct sprite_line {
uint16_t skip_dstbytes, src_bytes;
uint16_t src_data[]; // flexible array member, actual size unlimited but assumed to be a multiple of 2.
};
The inner loop is this:
;; char *dstp; // in EDI
;; struct spriteline *p // in ESI
loop_2:
cmp word ptr[esi],0xffff ; while( p->skip_dstbytes != (uint16_t)-1 ) {
jz exit_2;
mov ax,[esi]; ; EAX was xor-zeroed earlier; some old CPUs maybe had slow movzx loads
add edi,eax; ; dstp += p->skip_dstbytes;
mov cx,[esi+2]; ; bytelen = p->src_len;
add esi,4; ; p->data
shr ecx,2; ; length in dwords = bytelen >> 2
jnc Next2;
movsw; ; one 16-bit (word) copy if bytelen >> 1 is odd, i.e. if last bit shifted out was a 1.
; The first bit shifted out isn't checked, so size is assumed to be a multiple of 2.
Next2:
rep movsd; ; copy in 4-byte chunks
Old CPUs (before IvyBridge) had rep movsd faster than rep movsb, otherwise this code could just have done that.
or bx,bx;
jz exit_1;
That's an obsolete idiom that comes from 8080 for test bx,bx / jnz, i.e. jump if BX was zero. So it's a while( bx != 0 ) {} loop. With dec bx in it. It's an inefficient way to write a while (--bx) loop; a compiler would put a dec/jnz .top_of_loop at the bottom, with a test once outside the loop in case it needs to run zero times. Why are loops always compiled into "do...while" style (tail jump)?
Some people would say that's what a while loop looks like in asm, if they're picturing totally naive translation from C to asm.
my code has a lot of patterns like
int a, b.....
bool c = x ? a >= b : a <= b;
and similarly for other inequality comparison operators. Is there a way to write this to achieve better performance/branchlessness for x86.
Please spare me with have you benchmarked your code? Is this really your bottleneck? type comment. I am asking for other ways to write this so I can benchmark and test.
EDIT:
bool x
Original expression:
x ? a >= b : a <= b
Branch-free equivalent expression without short-circuit evaluation:
!!x & a >= b | !x & a <= b
This is an example of a generic pattern without resorting to arithmetic trickery. Watch out for operator precedence; you may need parentheses for more complex examples.
Another way would be :
bool c = (2*x - 1) * (a - b) >= 0;
This generates a branch-less code here: https://godbolt.org/z/1nAp7G
#include <stdbool.h>
bool foo(int a, int b, bool x)
{
return (2*x - 1) * (a - b) >= 0;
}
------------------------------------------
foo:
movzx edx, dl
sub edi, esi
lea eax, [rdx-1+rdx]
imul eax, edi
not eax
shr eax, 31
ret
Since you're just looking for equivalent expressions, this comes from patching #AlexanderZhang's comment:
(a==b) || (c != (a<b))
The way you currently have it is possibly unbeatable.
But for positive integral a and b and bool x you can use
a / b * x + b / a * !x
(You could adapt this, at the cost of extra cpu burn, by replacing a with a + 1 and similarly for b if you need to support zero.)
If a>=b, a-b will be positive and the first bit(sign bit) is 0. Otherwise a-b is negative and first bit is 1.
So we can simply “xor” the first bit of a-b and the the value of x
constexpr auto shiftBit = sizeof(int)*8-1;
bool foo(bool x, int a, int b){
return x ^ bool((a-b)>>shiftBit);
}
foo(bool, int, int):
sub esi, edx
mov eax, edi
shr esi, 31
xor eax, esi
ret
I'm trying to solve a timing leak by removing an if statement in my code but because of c++'s interpretation of integer inputs in if statements I am stuck.
Note that I assume the compiler does create a conditional branch, which results in timing information being leaked!
The original code is:
int s
if (s)
r = A
else
r = B
Now I'm trying to rewrite it as:
int s;
r = sA+(1-s)B
Because s is not bound to [0,1] I run into the problem that it multiplies by A and B incorrectly if s is out of [0,1]. What can I do, without using an if-statement on s to solve this?
Thanks in advance
What evidence do you have that the if statement is resulting in the timing leak?
If you use a modern compiler with optimizations turned on, that code should not produce a branch. You should check what your compiler is doing by looking at the assembly language output.
For instance, g++ 5.3.0 compiles this code:
int f(int s, int A, int B) {
int r;
if (s)
r = A;
else
r = B;
return r;
}
to this assembly:
movl %esi, %eax
testl %edi, %edi
cmove %edx, %eax
ret
Look, ma! No branches! ;)
If you know the number of bits in the integer, it's pretty easy, although there are a few complications making it standards-clean with the possibility of unusual integer representations.
Here's one simple solution for 32-bit integers:
uint32_t mask = s;
mask |= mask >> 1;
mask |= mask >> 2;
mask |= mask >> 4;
mask |= mask >> 8;
mask |= mask >> 16;
mask &= 1;
r = b ^ (-mask & (a ^ b)):
The five shift-and-or statements propagate any set bit in mask so that in the end the low-order bit is 1 unless the mask was originally 0. Then we isolate the low-order bit, resulting in a 1 or 0. The last statement is a bit-hacking equivalent of your two multiplies and add.
Here is a faster one based on the observation that if you subtract one from a number and the sign bit changes from 0 to 1, then the number was 0:
uint32_t mask = ((uint32_t(s)-1U)&~uint32_t(s))>>31) - 1U;
That is essentially the same computation as subtracting 1 and then using the carry bit, but unfortunately the carry bit is not exposed to the C language (except possibly through compiler-specific intrinsics).
Other variations are possible.
The only way to do it without branches when the optimization is not available is to resort to inline assembly. Assuming 8086:
mov ax, s
neg ax ; CF = (ax != 0)
sbb ax, ax ; ax = (s != 0 ? -1 : 0)
neg ax ; ax = (s != 0 ? 1 : 0)
mov s, ax ; now use s at will, it will be: s = (s != 0 ? 1 : 0)
C++: How do i check if a character is between a given range of characters?
Say, if I have a string name.
I want to check if the first character of this string is between 'a' to 'n'.
How do I do it?
To do (name[0] == 'a') (name[0] == 'b')... would be too long...
If possible, I would like a solution that deals with ASCII values elegantly.
If you want to check whether or not the first character of you string is between 'a' and 'n', for instance, checking name[0] >= 'a' && name[0] <= 'n' should do the job properly.
Keep in mind, however, that if you can also have caps as a first character in your letter, you have to check (name[0] >= 'a' && name[0] <= 'n') || (name[0] >= 'A' && name[0] <= 'N') instead.
You can use std::all_of in combination with a lambda expression:
std::all_of(name.begin(), name.end(), [](char i) { return (i >= 'a' && i <= 'z'); });
Live demo
This is portable enough for most application, since the character set is usually implemented following the ASCII conventions as explain in §2.3/14:
The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files.
The complexity of the above algorithm is O(n). The alternative (check every character to be one in the character range with k characters) is O(n*k), but at least you can be sure it's not implementation defined.
If you're sure the used character set on your platform(s) is ASCII, you can use something like :
if (std::all_of(name.begin(), name.end(), [](char c){return ((c >= 'a') && (c <= 'n'));}) ) {
// name contains only characters between 'a' and 'n' inclusive
}
Otherwise, something like this should do the trick :
if (name.find_first_not_of("abcdefghijklmn") == std::string::npos) {
// name contains only characters between 'a' and 'n' inclusive
}
An old fashioned portable method:
bool is_in_range(char range_start, char range_end, char c)
{
static const char alphabet[] = "abcdefghijklmnopqrstuvwxyz";
unsigned int start_position = 0;
unsigned int end_position = 0;
unsigned int character_position = 0;
c = std::tolower(c);
for (unsigned int i = 0; i < sizeof(alphabet); ++i)
{
if (range_start == alphabet[i])
{
start_position = i;
}
if (range_end == alphabet[i])
{
end_position = i;
}
if (c == alphabet[i])
{
character_position = i;
}
}
bool result = false;
if (end_position <= start_position)
{
result = false;
}
else
{
if ((character_position >= start_position) && (character_position <= end_position))
{
result = true;
}
}
return result;
}
loop through the string, check every character and see if it stays between a and n using str[i]>'a' and str[i]<'n'
For a contiguous range of characters you can:
_Bool isbetween(int c, int start, int end){
return ((unsigned)c-start < (end-start));
}
To account for case, use tolower() and the lower case range:
static inline int tolower(int c){
return c | ( ((unsigned)c-'A' < 26)<<5 );
}
//isbetween(tolower(x),'a','n');
For a non-contiguous range, you may need to create a mask. In this example, I will check for vowels (for brevity because there are only 5, but any combination in a range of 32 could be used or 64 with some modifications ...
in fact, a 64 bit mask on a 64 bit platform would eliminate the need for case handling).
static const unsigned vowel_mask = (1<<('a'-'a'))
|(1<<('e'-'a'))|(1<<('i'-'a'))|(1<<('o'-'a'))|(1<<('u'-'a'));
int isvowel(int c){ //checks if c is a,A,e,E,i,I,o,O,u,U
unsigned x = (c|32)-'a';
return ((x<32)<<x)&vowel_mask;
}
Note that these implementations contain no branches; however the use of unsigned comparison may prevent automatic compiler vectorization (intel intrinsics, don't have unsigned compare) ... if that is your goal, you can use 2 &ed comparisons instead. This method may or may not work on non-ascii systems depending on the separation distance of the characters.
GCC
isvowel:
or edi, 32 # tmp95,
xor eax, eax # tmp97
sub edi, 97 # x,
cmp edi, 31 # x,
setbe al #, tmp97
shlx eax, eax, edi # tmp99, tmp97, x
and eax, 1065233 # tmp96,
ret
Clang
isvowel: # #isvowel
or edi, 32
add edi, -97
mov eax, 32
xor ecx, ecx
cmp edi, eax
setb cl
shlx eax, ecx, edi
and eax, 1065233
ret
ICC
isvowel:
xor eax, eax #15.26
or edi, 32 #14.23
add edi, -97 #14.27
cmp edi, 32 #15.26
setb al #15.26
shlx eax, eax, edi #15.23
and eax, 1065233 #15.26
ret #15.26
In addition to the standard stackoverflow license, this code is released to the Public Domain
bool x = false, y = false, z = true;
if(x || y || z){}
or
if(x | y | z){}
Does the second if statement perform a bit wise "or" operation on all booleans? treating them as if there were bytes? ex) (0000 | 0000 | 0001) = true...
Or does it act like a Java | on booleans, where it will evaluate every bool in the expression even if the first was true?
I want to know how bit wise operators work on bool values. is it equivalent to integer bitwise operations?
Efficiency depends, the logical or operator || is a short circuit operator
meaning if x in your example is true it will not evaluate y or z.
If it was a logical and && then if x is false, it will not test y or z.
Its important to note that this operation does not exist as an instruction
so that means you have to use test and jump instructions. This means branching, which slows down things. Since modern CPU's are pipelined.
But the real answer is it depends, like many other questions of this nature, as sometimes the benefit of short circuiting operations outweighs the cost.
In the following extremely simple example you can see that bitwise or | is superior.
#include <iostream>
bool test1(bool a, bool b, bool c)
{
return a | b | c;
}
bool test2(bool a, bool b, bool c)
{
return a || b || c;
}
int main()
{
bool a = true;
bool b = false;
bool c = true;
test1(a,b,c);
test2(a,b,c);
return 0;
}
The following is the intel-style assembly listings produced by gcc-4.8 with -O3 :
test1 assembly :
_Z5test1bbb:
.LFB1264:
.cfi_startproc
mov eax, edx
or eax, esi
or eax, edi
ret
.cfi_endproc
test2 assembly :
_Z5test2bbb:
.LFB1265:
.cfi_startproc
test dil, dil
jne .L6
test sil, sil
mov eax, edx
jne .L6
rep; ret
.p2align 4,,10
.p2align 3
.L6:
mov eax, 1
ret
.cfi_endproc
You can see that it has branch instructions, which mess up the pipeline.
Sometimes however short-circuiting is worth it such as
return x && deep_recursion_function();
Disclaimer:
I would always use logical operators on bools. Unless performance really is critical, or maybe simple case like in test1 and test2 but with lots of bools.
And in either case first verify that you do get an improvement.
The second acts a java | on integers, a bit-wise or. As C originally didn't have a boolean type, the if statement reads any non-zero as true, so you can use it as that, but it is often more efficient to use the short-circuiting operator || instead, especially when calling functions that return the conditions.
I would also like to point out that short-circuit lets you check unsafe conditions, like if(myptr == NULL || myptr->struct_member < 0) return -1;, while using the bitwise or there will give you a segfault when myptr is null.