I have known, '&' as bitwise and as an operator to get memory address of a variable.
What happens in this case of the code?
res = res & (a[i]<[a[i+1]]);
If it is bitwise and , as far as I know the second condition is also checked,
but what if I used logical and instead of it , wouldn't it still be the same?
As first part is (say) false , second parts get checked comes true, but still res remains false.
Would it be same (for this case) to use logical and for this? or it has some other use (& operator) for this case?
int a[] {1,3,4,2};
int pos = 3;
bool res = true;
for(int i = 0; i < pos; i++)
res &= (a[i] < a[i + 1]);
(Sorry for bad english)
If it is bitwise and , as far as I know the second condition is also checked, but what if I used logical and instead of it , wouldn't it still be the same?
No. Boolean and (written as && or and) has short circuit evaluation - if left part is false right part is not evaluated at all. This allows to write code like this:
if( pointer != nullptr && pointer->value > 100 ) ...
if not short circuit evaluation this code would have UB. For example this code:
if( pointer != nullptr & pointer->value > 100 ) ...
has Undefined Behaviour when pointer is equal to nullptr
Would it be same (for this case) to use logical and for this? or it has some other use (& operator) for this case?
You cannot, as there is no &&= operator in C++. You can write:
res = res && (a[i] < a[i + 1]);
and that would have short circuit as well and compiler may even be smart enough to stop the loop, though I doubt and it should be expressed explicitly anyway:
bool res = true;
for(int i = 0; res && i < pos; i++)
res = a[i] < a[i + 1];
which does the same, but cleaner and more efficient.
Anyway when you need logical or boolean and you should use one to make your intention clear and avoid unexpected surprises.
Besides the short circuiting issue, If res == 2 then:
res & 1 will return 0 which will be interpreted as false.
res && 1 will return true.
Your question is not clear.
Okay, let's dive into your code.
Your given code is very clear. You are performing bitwise and for pos(3) times. For every loop you are comparing a[i] with a[i+1]. Please note that for the last loop, I mean when variable i becomes 3, then i+1 will be 4. And your array a[] doesn't have a[4]. It only has the last element having index 3.
So for bitwise and operation the value of res isn't predictable as a[4] isn't defined.
Now let's think about logical AND operation. For logical and your expression inside the for loop will once generate a false boolean value for a[i] < a[i+1] as your array a[] = {1,3,4,2}. Here 4>2 not 4<2. Hence it will generate false boolean value and your entire response will be false 'cause you know logical AND will be eventually 0 if one of the operands is false.
I think you have got this.
I am new to C++ and I have a problem where i have to transform a pseudocode in C++ / C / Pascal language. The answer at the end of the book written in Pascal.
The problem in my C++ code is that at the line 12, I get the error which can be found in the title. Any idea?
Pascal Code:
var n,x:integer;
begin
n:=0;
repeat
write('x=');read(X);
if x<>0 then
if x mod 5 = 0 then
n:=n+1
else
n:=n-1;
until x=0;
if n=0 then
write('yes')
else
write('no')
end;
My C++ Code:
int main()
{
int x,n;
cin>>x;
while(x>0)
{
if(x>0)
{
if(x%5=0){
n=n+1;
} else {
n=n-1;
}
}
if(n=0){
cout<<"Yes"<<;
} else {
cout<<"No"<<;
}
}
}
You have a simple typo: if(x%5=0){ is an attempt to assign 0 to x % 5 (due to operator precedence modulus is computed before assignment). x % 5 cannot be assigned to (it's not an lvalue) and the compiler is telling you that.
The fix, of course, is to write x % 5 == 0.
You're lucky in this case that the error is picked up at compile-time. Something like if (n = 0) (on line 18) might not be, since x = 0 is an expression with value 0.
Two ways to guard against that:
Ensure that your compiler warnings are as aggressive as you can bear. With gcc, I use -Wall -Wextra, and that combination is enough to catch this common problem.
Some developers will write if (0 == x) since an errant if (0 = x) would be picked up at compile time as an attempt to assign to 0. Personally, I find that obfuscating.
Assignment operator requires lvalue means the left side operand need to be a variable/location that can hold a value.
This is what is meant by the error.
What you need in your if statement is == likely not assignment as mentioned by other answers
You need to use == in conditions (while, if, ...) for equality check in C++.
if(x%5 = 0)
should be
if(x%5 == 0)
"x%5" is not an lvalue in that you can not assign a value to it, hence the error.
I have a function that looks like this:
int div_round_up(int x, int y) {
/**
* This function only works for positive divisor and non-negative dividend!!
*/
assert(y > 0 && x >= 0);
if (x == 0)
return 0;
return (x - 1) / y + 1;
}
It won't work with y <= 0 or x < 0. That's ok with me, I can even dynamically check for right values, but I would like to check statically, when someone feeds it wrong values. If I defined x and y as unsigned, they would get silently converted from negative values to huge positive values which would produce erroneous result, so I don't want that. I would like to make compilation fail when someone attempts to feed it negative values like in div_round_up(variable, -7). What should I do?
To verify a number at compile time (which is what static_assert does), it has to be known at compile time. To see why this is needed, consider that something like div_round_up(read_integer_from_file(), read_keyboard_character()). The obvious drawback of doing that is that you have to know the numbers at compile time.
The easiest way is to make them template parameters, which allows you to leave the implementation of the function (almost) the same:
template<int x, int y>
int div_round_up() {
static_assert(y > 0 && x >= 0, "This function only works for positive divisor and non-negative dividend");
if (x == 0)
return 0;
return (x - 1) / y + 1;
}
It can be called as div_round_up<3, 4>() and will fail the compilation when the static_assert fires.
If you're using gcc or clang you can include a macro
#define div_round_up(a, b) (__builtin_constant_p(b) ? drus(a, b) : drud(a, b))
and two different function where drus includes a static assertion for b while drud does not.
Yeap you can do it with some magic(one nonamed russian code guru told me this trick)
#define check2(x) typedef char checkVal[(x)?1:-1];
int main() {
check2(3<4);
check2(5<4);
return 0;
}
but also in this case there is one limit. Compiler should know result of this value. In any another case it`s imossible(IMHO).
Given a vector X of size L, where every scalar element of X is from a binary set {0,1}, it is to find a dot product z=dot(X,Y) if vector Y of size L consists of the integer-valued elements. I suggest, there must exist a very fast way to do it.
Let's say we have L=4; X[L]={1, 0, 0, 1}; Y[L]={-4, 2, 1, 0} and we have to find z=X[0]*Y[0] + X[1]*Y[1] + X[2]*Y[2] + X[3]*Y[3] (which in this case will give us -4).
It is obvious that X can be represented using binary digits, e.g. an integer type int32 for L=32. Then, all what we have to do is to find a dot product of this integer with an array of 32 integers. Do you have any idea or suggestions how to do it very fast?
This really would require profiling but an alternative you might want to consider:
int result=0;
int mask=1;
for ( int i = 0; i < L; i++ ){
if ( X & mask ){
result+=Y[i];
}
mask <<= 1;
}
Typically bit shifting and bitwise operations are faster than multiplication, however, the if statement might be slower than a multiplication, although with branch prediction and large L my guess is it might be faster. You would really have to profile it, though, to determine if it resulted in any speedup.
As has been pointed out in the comments below, unrolling the loop either manually or via a compiler flag (such as "-funroll-loops" on GCC) could also speed this up (eliding the loop condition).
Edit
In the comments below, the following good tweak has been proposed:
int result=0;
for ( int i = 0; i < L; i++ ){
if ( X & 1 ){
result+=Y[i];
}
X >>= 1;
}
Is a suggestion to look into SSE2 helpful? It has dot-product type operations already, plus you can trivially do 4 (or perhaps 8, I forget the register size) simple iterations of your naive loop in parallel.
SSE also has some simple logic-type operations so it may be able to do additions rather than multiplications without using any conditional operations... again you'd have to look at what ops are available.
Try this:
int result=0;
for ( int i = 0; i < L; i++ ){
result+=Y[i] & (~(((X>>i)&1)-1));
}
This avoids a conditional statement and uses bitwise operators to mask the scalar value with either zeros or ones.
Since size explicitly doesn’t matter, I think the following is probably the most efficient general-purpose code:
int result = 0;
for (size_t i = 0; i < 32; ++i)
result += Y[i] & -X[i];
Bit-encoding X just doesn’t bring anything to the table (even if the loop may potentially terminate earlier as #Mathieu correctly noted). But omitting the if inside the loop does.
Of course, loop unrolling can speed this up drastically, as others have noted.
This solution is identical to, but slightly faster (by my test), than Micheal Aaron's:
long Lev=1;
long Result=0
for (int i=0;i<L;i++) {
if (X & Lev)
Result+=Y[i];
Lev*=2;
}
I thought there was a numerical way to rapidly establish the next set bit in a word which should improve performance if your X data is very sparse but currently cannot find said numerical formulation currently.
I've seen a number of responses with bit trickery (to avoid branching) but none got the loop right imho :/
Optimizing #Goz answer:
int result=0;
for (int i = 0, x = X; x > 0; ++i, x>>= 1 )
{
result += Y[i] & -(int)(x & 1);
}
Advantages:
no need to do i bit-shifting operations each time (X>>i)
the loop stops sooner if X contains 0 in higher bits
Now, I do wonder if it runs faster, especially since the premature stop of the for loop might not be as easy for loop unrolling (compared to a compile-time constant).
How about combining a shifting loop with a small lookup table?
int result=0;
for ( int x=X; x!=0; x>>=4 ){
switch (x&15) {
case 0: break;
case 1: result+=Y[0]; break;
case 2: result+=Y[1]; break;
case 3: result+=Y[0]+Y[1]; break;
case 4: result+=Y[2]; break;
case 5: result+=Y[0]+Y[2]; break;
case 6: result+=Y[1]+Y[2]; break;
case 7: result+=Y[0]+Y[1]+Y[2]; break;
case 8: result+=Y[3]; break;
case 9: result+=Y[0]+Y[3]; break;
case 10: result+=Y[1]+Y[3]; break;
case 11: result+=Y[0]+Y[1]+Y[3]; break;
case 12: result+=Y[2]+Y[3]; break;
case 13: result+=Y[0]+Y[2]+Y[3]; break;
case 14: result+=Y[1]+Y[2]+Y[3]; break;
case 15: result+=Y[0]+Y[1]+Y[2]+Y[3]; break;
}
Y+=4;
}
The performance of this will depend on how good the compiler is at optimising the switch statement, but in my experience they are pretty good at that nowadays....
There is probably no general answer to this question. You need to profile your code under all the different targets. Performance will depend on compiler optimizations such as loop unwinding and SIMD instructions that are available on most modern CPUs (x86, PPC, ARM all have their own implementations).
For small L, you can use a switch statement instead of a loop. For example, if L = 8, you could have:
int dot8(unsigned int X, const int Y[])
{
switch (X)
{
case 0: return 0;
case 1: return Y[0];
case 2: return Y[1];
case 3: return Y[0]+Y[1];
// ...
case 255: return Y[0]+Y[1]+Y[2]+Y[3]+Y[4]+Y[5]+Y[6]+Y[7];
}
assert(0 && "X too big");
}
And if L = 32, you can write a dot32() function which calls dot8() four times, inlined if possible. (If your compiler refuses to inline dot8(), you could rewrite dot8() as a macro to force inlining.) Added:
int dot32(unsigned int X, const int Y[])
{
return dot8(X >> 0 & 255, Y + 0) +
dot8(X >> 8 & 255, Y + 8) +
dot8(X >> 16 & 255, Y + 16) +
dot8(X >> 24 & 255, Y + 24);
}
This solution, as mikera points out, may have an instruction cache cost; if so, using a dot4() function might help.
Further update: This can be combined with mikera's solution:
static int dot4(unsigned int X, const int Y[])
{
switch (X)
{
case 0: return 0;
case 1: return Y[0];
case 2: return Y[1];
case 3: return Y[0]+Y[1];
//...
case 15: return Y[0]+Y[1]+Y[2]+Y[3];
}
}
Looking at the resulting assembler code with the -S -O3 options with gcc 4.3.4 on CYGWIN, I'm slightly surprised to see that this is automatically inlined within dot32(), with eight 16-entry jump-tables.
But adding __attribute__((__noinline__)) seems to produce nicer-looking assembler.
Another variation is to use fall-throughs in the switch statement, but gcc adds jmp instructions, and it doesn't look any faster.
Edit--Completely new answer: After thinking about the 100 cycle penalty mentioned by Ants Aasma, and the other answers, the above is likely not optimal. Instead, you could manually unroll the loop as in:
int dot(unsigned int X, const int Y[])
{
return (Y[0] & -!!(X & 1<<0)) +
(Y[1] & -!!(X & 1<<1)) +
(Y[2] & -!!(X & 1<<2)) +
(Y[3] & -!!(X & 1<<3)) +
//...
(Y[31] & -!!(X & 1<<31));
}
This, on my machine, generates 32 x 5 = 160 fast instructions. A smart compiler could conceivably unroll the other suggested answers to give the same result.
But I'm still double-checking.
result = 0;
for(int i = 0; i < L ; i++)
if(X[i]!=0)
result += Y[i];
It's quite likely that the time spent to load X and Y from main memory will dominate. If this is the case for your CPU architecture, the algorithm is faster when loading less. This means that storing X as a bitmask and expanding it into L1 cache will speed up the algorithm as a whole.
Another relevant question is whether your compiler will generate optimal loads for Y. This is higly CPU and compiler dependent. But in general, it helps if the compiler can see precsiely which values are needed when. You could manually unroll the loop. However, if L is a contant, leave it to the compiler:
template<int I> inline void calcZ(int (&X)[L], int(&Y)[L], int &Z) {
Z += X[I] * Y[I]; // Essentially free, as it operates in parallel with loads.
calcZ<I-1>(X,Y,Z);
}
template< > inline void calcZ<0>(int (&X)[L], int(&Y)[L], int &Z) {
Z += X[0] * Y[0];
}
inline int calcZ(int (&X)[L], int(&Y)[L]) {
int Z = 0;
calcZ<L-1>(X,Y,Z);
return Z;
}
(Konrad Rudolph questioned this in a comment, wondering about memory use. That's not the real bottleneck in modern computer architectures, bandwidth between memory and CPU is. This answer is almost irrelevant if Y is somehow already in cache. )
You can store your bit vector as a sequence of ints where each int packs a couple of coefficients as bits. Then, the component-wise multiplication is equivalent to bit-and. With this you simply need to count the number of set bits which could be done like this:
inline int count(uint32_t x) {
// see link
}
int dot(uint32_t a, uint32_t b) {
return count(a & b);
}
For a bit hack to count the set bits see http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
Edit: Sorry I just realized only one of the vectors contains elements of {0,1} and the other one doesn't. This answer only applies to the case where both vectors are limited to coefficients from the set of {0,1}.
Represente X using linked list of the places where x[i] = 1.
To find required sum you need O(N) operations where N is size of your list.
Well you want all bits to get past if its a 1 and none if its a 0. So you want to somehow turn 1 into -1 (ie 0xffffffff) and 0 stays the same. Thats just -X .... so you do ...
Y & (-X)
for each element ... job done?
Edit2: To give a code example you can do something like this and avoid the branch:
int result=0;
for ( int i = 0; i < L; i++ )
{
result+=Y[i] & -(int)((X >> i) & 1);
}
Of course you'd be best off keeping the 1s and 0s in an array of ints and therefore avoiding the shifts.
Edit: Its also worth noting that if the values in Y are 16-bits in size then you can do 2 of these and operations per operation (4 if you have 64-bit registers). It does mean negating the X values 1 by 1 into a larger integer, though.
ie YVals = -4, 3 in 16-bit = 0xFFFC, 0x3 ... put into 1 32-bit and you get 0xFFFC0003. If you have 1, 0 as the X vals then you form a bit mask of 0xFFFF0000 and the 2 together and you've got 2 results in 1 bitwise-and op.
Another edit:
IF you want the code on how to do the 2nd method something like this should work (Though it takes advantage of unspecified behaviour so it may not work on every compiler .. works on every compiler I've come across though).
union int1632
{
int32_t i32;
int16_t i16[2];
};
int result=0;
for ( int i = 0; i < (L & ~0x1); i += 2 )
{
int3264 y3264;
y3264.i16[0] = Y[i + 0];
y3264.i16[1] = Y[i + 1];
int3264 x3264;
x3264.i16[0] = -(int16_t)((X >> (i + 0)) & 1);
x3264.i16[1] = -(int16_t)((X >> (i + 1)) & 1);
int3264 res3264;
res3264.i32 = y3264.i32 & x3264.i32;
result += res3264.i16[0] + res3264.i16[1];
}
if ( i < L )
result+=Y[i] & -(int)((X >> i) & 1);
Hopefully the compiler will optimise out the assigns (Off the top of my head i'm not sure but the idea could be re-worked so that they definitely are) and give you a small speed up in that you now only need to do 1 bitwise-and instead of 2. The speed up would be minor though ...
My team need the "Sobol quasi-random number generator" - a common RNG which is famous for good quality results and speed of operation. I found what looks like a simple C implementation on the web. At home I was able to compile it almost instantaneously using my Linux GCC compiler.
The following day I tried it at work: If I compile in Visual Studio in debug mode it takes about 1 minute. If I were to compile it in release mode it takes about 40 minutes.
Why?
I know that "release" mode triggers some compiler optimization... but how on earth could a file this small take so long to optimize? It's mostly comments and static-data. There's hardly anything worth optimizing.
None of these PCs are particularly slow, and in any case I know that the compile time is consistent across a range of Windows computers. I've also heard that newer versions of Visual Studio have a faster compile time, however for now we are stuck with Visual Studio.Net 2003. Compiling on GCC (the one bundled with Ubuntu 8.04) always takes microseconds.
To be honest, I'm not really sure the codes that good. It's got a nasty smell in it. Namely, this function:
unsigned int i4_xor ( unsigned int i, unsigned int j )
//****************************************************************************80
//
// Purpose:
//
// I4_XOR calculates the exclusive OR of two integers.
//
// Modified:
//
// 16 February 2005
//
// Author:
//
// John Burkardt
//
// Parameters:
//
// Input, unsigned int I, J, two values whose exclusive OR is needed.
//
// Output, unsigned int I4_XOR, the exclusive OR of I and J.
//
{
unsigned int i2;
unsigned int j2;
unsigned int k;
unsigned int l;
k = 0;
l = 1;
while ( i != 0 || j != 0 )
{
i2 = i / 2;
j2 = j / 2;
if (
( ( i == 2 * i2 ) && ( j != 2 * j2 ) ) ||
( ( i != 2 * i2 ) && ( j == 2 * j2 ) ) )
{
k = k + l;
}
i = i2;
j = j2;
l = 2 * l;
}
return k;
}
There's an i8_xor too. And a couple of abs functions.
I think a post to the DailyWTF is in order.
EDIT: For the non-c programmers, here's a quick guide to what the above does:
function xor i:unsigned, j:unsigned
answer = 0
bit_position = 1
while i <> 0 or j <> 0
if least significant bit of i <> least significant bit of j
answer = answer + bit_position
end if
bit_position = bit_position * 2
i = i / 2
j = j / 2
end while
return answer
end function
To determine if the least significant bit is set or cleared, the following is used:
bit set if i <> (i / 2) * 2
bit clear if i == (i / 2) * 2
What makes the code extra WTFy is that C defines an XOR operator, namely '^'. So, instead of:
result = i4_xor (a, b);
you can have:
result = a ^ b; // no function call at all!
The original programmer really should have know about the xor operator. But even if they didn't (and granted, it's another obfuscated C symbol), their implementation of an XOR function is unbelievably poor.
I'm using VC++ 2003 and it compiled instantly in both debug/release modes.
Edit:
Do you have the latest service pack installed on your systems?
I would recommend you download a trial edition of Visual Studio 2008 and try the compile there, just to see if the problem is inherent. Also, if it does happen on a current version, you would be able to report the problem, and Microsoft might fix it.
On the other hand, there is no chance that Microsoft will fix whatever bug is in VS2003.