How to optimize a cycle? - c++

I have the following bottleneck function.
typedef unsigned char byte;
void CompareArrays(const byte * p1Start, const byte * p1End, const byte * p2, byte * p3)
{
const byte b1 = 128-30;
const byte b2 = 128+30;
for (const byte * p1 = p1Start; p1 != p1End; ++p1, ++p2, ++p3) {
*p3 = (*p1 < *p2 ) ? b1 : b2;
}
}
I want to replace C++ code with SSE2 intinsic functions. I have tried _mm_cmpgt_epi8 but it used signed compare. I need unsigned compare.
Is there any trick (SSE, SSE2, SSSE3) to solve my problem?
Note:
I do not want to use multi-threading in this case.

Instead of offsetting your signed values to make them unsigned, a slightly more efficient way would be to do the following:
use _mm_min_epu8 to get the unsigned min of p1, p2
compare this min for equality with p2 using _mm_cmpeq_epi8
the resulting mask will now be 0x00 for elements where p1 < p2 and 0xff for elements where p1 >= p2
you can now use this mask with _mm_or_si128 and _mm_andc_si128 to select the appropriate b1/b2 values
Note that this is 4 instructions in total, compared with 5 using the offset + signed comparison approach.

You can subtract 127 from your numbers, and then use _mm_cmpgt_epi8

Yes, this can be done in SIMD, but it will take a few steps to make the mask.
Ruslik got it right, I think. You want to xor each component with 0x80 to flip the sense of the signed and unsigned comparison. _mm_xor_si128 (PXOR) gets you that -- you'll need to create the mask as a static char array somewhere before loading it into a SIMD register. Then _mm_cmpgt_epi8 gets you a mask and you can use a bitwise AND (eg _mm_and_si128) to perform a masked-move.

Yes, SSE will not work here.
You can improve this code performance on multi-core computer by using OpenMP:
void CompareArrays(const byte * p1Start, const byte * p1End, const byte * p2, byte * p3)
{
const byte b1 = 128-30;
const byte b2 = 128+30;
int n = p1End - p1Start;
#pragma omp parallel for
for (int i = 0; i < n; ++p1, ++i)
{
p3[i] = (p1[i] < p2[i]) ? b1 : b2;
}
}

Unfortunately, many of the answers above are incorrect. Let's assume a 3-bit word:
unsigned: 4 5 6 7 0 1 2 3 == signed: -4 -3 -2 -1 0 1 2 3 (bits: 100 101 110 111 000 001 010 011)
The method by Paul R is incorrect. Suppose we want to know if 3 > 2. min(3,2) == 2, which suggests yes, so the method works here. Now suppose we want to know if if 7 > 2. The value 7 is -1 in signed representation, so min(-1,2) == -1, which suggests wrongly that 7 is not greater than 2 unsigned.
The method by Andrey is also incorrect. Suppose we want to know if 7 > 2, or a = 7, and b = 2. The value 7 is -1 in signed representation, so the first term (a > b) fails, and the method suggests that 7 is not greater than 2.
However, the method by BJobnh, as corrected by Alexey, is correct. Just subtract 2^(n-1) from the values, where n is the number of bits. In this case, we would subtract 4 to obtain new corresponding values:
old signed: -4 -3 -2 -1 0 1 2 3 => new signed: 0 1 2 3 -4 -3 -2 -1 == new unsigned 0 1 2 3 4 5 6 7.
In other words, unsigned_greater_than(a,b) is equivalent to signed_greater_than(a - 2^(n-1), b - 2^(n-1)).

use pcmpeqb and be the Power with you.

Related

Bitwise Operators NOT

I encountered a problem with bit arithmetic. It is bitwise NOT.
if A = 5; then ~A = ?
The binary of 5 is 101, the inverse is 010, and then converted to decimal is 0 * 2^2 + 1 * 2^1 + 0 * 2^0 = 2
But when I test in the IDE, the output is as follows:
System.out.println( ~5 );
Output:
-6
I don't know why. Thanks!!!
If you using a standard int, then after assignment your A to 5:
int A = 5;
Then your "A" would be not 101b, but 00000000000000000000000000000101b - all 32 bits.
After NEG operation, which inverse all bits, you will get:
A = 11111111111111111111111111111010
And this int-value is -6, in the 2-complement representation, used int the most of computers.

Using bit wise operators

Am working on a C++ app in Windows platform. There's a unsigned char pointer that get's bytes in decimal format.
unsigned char array[160];
This will have values like this,
array[0] = 0
array[1] = 0
array[2] = 176
array[3] = 52
array[4] = 0
array[5] = 0
array[6] = 223
array[7] = 78
array[8] = 0
array[9] = 0
array[10] = 123
array[11] = 39
array[12] = 0
array[13] = 0
array[14] = 172
array[15] = 51
.......
........
.........
and so forth...
I need to take each block of 4 bytes and then calculate its decimal value.
So for eg., for the 1st 4 bytes the combined hex value is B034. Now i need to convert this to decimal and divide by 1000.
As you see, for each 4 byte block the 1st 2 bytes are always 0. So i can ignore those and then take the last 2 bytes of that block. So from above example, it's 176 & 52.
There're many ways of doing this, but i want to do it via using bit wise operators.
Below is what i tried, but it's not working. Basically am ignoring the 1st 2 bytes of every 4 byte block.
int index = 0
for (int i = 0 ; i <= 160; i++) {
index++;
index++;
float Val = ((Array[index]<<8)+Array[index+1])/1000.0f;
index++;
}
Since you're processing the array four-by-four, I recommend that you increment i by 4 in the for loop. You can also avoid confusion after dropping the unnecessary index variable - you have i in the loop and can use it directly, no?
Another thing: Prefer bitwise OR over arithmetic addition when you're trying to "concatenate" numbers, although their outcome is identical.
for (int i = 0 ; i <= 160; i += 4) {
float val = ((array[i + 2] << 8) | array[i + 3]) / 1000.0f;
}
First of all, i <= 160 is one iteration too many.
Second, your incrementation is wrong; for index, you have
Iteration 1:
1, 2, 3
And you're combining 2 and 3 - this is correct.
Iteration 2:
4, 5, 6
And you're combining 5 and 6 - should be 6 and 7.
Iteration 3:
7, 8, 9
And you're combining 8 and 9 - should be 10 and 11.
You need to increment four times per iteration, not three.
But I think it's simpler to start looping at the first index you're interested in - 2 - and increment by 4 (the "stride") directly:
for (int i = 2; i < 160; i += 4) {
float Val = ((Array[i]<<8)+Array[i+1])/1000.0f;
}

C++ using AND operator in integer expression

I'm reading some source code for designing an Octree and found this in the code. I have removed some elements for simplication, but can anyone explain what i&4 is supposed to evaluate to?
for (int i = 0; i < 8; i++)
{
float j = i&4 ? .5f : -.5f;
}
& is the bitwise AND Operator.
It just does a bitwise operation of the value stored in i AND 0x4.
It exactly just isolates the third bit as 2^2 = 4.
Your expression in the loop checks if third bit is set in i and assigns to j (which must be a float!) 0.5 or if not set -0.5
I am not sure, but it may evaluate the bitwise and operation of i and 4 (100), so any number which has a '1' in its third bit will be evaluted to true, otherwise false.
Ex:
5 (101) & 4 (100) = 100 (4) which is different from 0 so its true
8 (1000) & 4 (100) = 0000 (0) which is false
The & operator in this case is a bitwise AND. Since the second operand is 4, a power of 2, it evaluates to 4 when i has its second least-significant bit set, and to 0 otherwise.
The for loop takes i from 0 to 7, inclusive. Consider bit representations of i in this range:
0000 - 0
0001 - 1
0010 - 2
0011 - 3
0100 - 4
0101 - 5
0110 - 6
0111 - 7
^
|
This bit determines the result of i & 4
Therefore, the end result of the conditional is as follows: if the second bit is set (i.e. when i is 4, 5, 6, or 7), the result is 0.5f; otherwise, it is -0.5f.
For the given range of values, this expression can be rewritten as
float j = (i >= 4) ? .5f : -.5f;
i & 4 just evaluate to true when the value-4 bit is set. In your case this only happens in the second half of the loop. So the code could actually be rewritten:
for (int i = 0; i < 4; i++)
{
float j = -.5f;
}
for (int i = 4; i < 8; i++)
{
float j = .5f;
}

Why does "number & (~(1 << 3))" not work for 0's?

I'm writing a program that exchanges the values of the bits on positions 3, 4 and 5 with bits on positions 24, 25 and 26 of a given 32-bit unsigned integer.
So lets say I use the number 15 and I want to turn the 4th bit into a 0, I'd use...
int number = 15
int newnumber = number & (~(1 << 3));
// output is 7
This makes sense because I'm exchanging the 4th bit from 1 to 0 so 15(1111) becomes 7(0111).
However this wont work the other way round (change a 0 to a 1), Now I know how to achieve exchanging a 0 to a 1 via a different method, but I really want to understand the code in this method.
So why wont it work?
The truth table for x AND y is:
x y Output
-----------
0 0 0
0 1 0
1 0 0
1 1 1
In other words, the output/result will only be 1 if both inputs are 1, which means that you cannot change a bit from 0 to 1 through a bitwise AND. Use a bitwise OR for that (e.g. int newnumber = number | (1 << 3);)
To summarize:
Use & ~(1 << n) to clear bit n.
Use | (1 << n) to set bit n.
To set the fourth bit to 0, you AND it with ~(1 << 3) which is the negation of 1000, or 0111.
By the same reasoning, you can set it to 1 by ORing with 1000.
To toggle it, XOR with 1000.

Bitfields in C++

I have the following code for self learning:
#include <iostream>
using namespace std;
struct bitfields{
unsigned field1: 3;
unsigned field2: 4;
unsigned int k: 4;
};
int main(){
bitfields field;
field.field1=8;
field.field2=1e7;
field.k=18;
cout<<field.k<<endl;
cout<<field.field1<<endl;
cout<<field.field2<<endl;
return 0;
}
I know that unsigned int k:4 means that k is 4 bits wide, or a maximum value of 15, and the result is the following.
2
0
1
For example, filed1 can be from 0 to 7 (included), field2 and k from 0 to 15. Why such a result? Maybe it should be all zero?
You're overflowing your fields. Let's take k as an example, it's 4 bits wide. It can hold values, as you say, from 0 to 15, in binary representation this is
0 -> 0000
1 -> 0001
2 -> 0010
3 -> 0011
...
14 -> 1110
15 -> 1111
So when you assign 18, having binary representation
18 -> 1 0010 (space added between 4th and 5th bit for clarity)
k can only hold the lower four bits, so
k = 0010 = 2.
The equivalent holds true for the rest of your fields as well.
You have these results because the assignments overflowed each bitfield.
The variable filed1 is 3 bits, but 8 takes 4 bits to present (1000). The lower three bits are all zero, so filed1 is zero.
For filed2, 17 is represented by 10001, but filed2 is only four bits. The lower four bits represent the value 1.
Finally, for k, 18 is represented by 10010, but k is only four bits. The lower four bits represent the value 2.
I hope that helps clear things up.
In C++ any unsigned type wraps around when you hit its ceiling[1]. When you define a bitfield of 4 bits, then every value you store is wrapped around too. The possible values for a bitfield of size 4 are 0-15. If you store '17', then you wrap to '1', for '18' you go one more to '2'.
Mathematically, the wrapped value is the original value modulo the number of possible values for the destination type:
For the bitfield of size 4 (2**4 possible values):
18 % 16 == 2
17 % 16 == 1
For the bitfield of size 3 (2**3 possible values):
8 % 8 == 0.
[1] This is not true for signed types, where it is undefined what happens then.