I'm currently dissecting through the ol' Doom engine source code, and came across an interesting line:
counter = (++counter)&(MAX_VALUE-1);
Looks like a way to increment a counter without going over a certain number, but I have a tricky time doing bitwise operations in my head so I whipped up a quick console project to try this out, and lo and behold, it works beautifully. I deduce the above way is more efficient than an if() statement, especially if the code is executed rapidly, for example, within a loop, where performance within a real-time game is crucial. What I'm trying to figure out is the order of operations the compiler uses to execute this line. If the increment '++' operator is placed after the 'counter', it will always remain zero. It only works if the increment is used as a prefix ("++counter"), but even still, if I write it out with pen and paper, I get the arbitrary result of the bitwise AND operation, not a counter that increments. It seems the bitwise operation is calculated, and THEN the increment is performed, but I'm stumped figuring out why, any ideas?
While the parentheses have higher precedence than operator ++ or bitwise AND (operator &), there are no defined sequence points in your right-hand side. So your code exhibits undefined behavior.
If you remove the operator++ what this is intending to do is
(counter + 1)&(MAX_VALUE-1);
If you consider MAX_VALUE to be 32 then MAX_VALUE-1 in binary is
11111
So if you have a larger value than that and use & any bits left of bit 5 (from the right) will be cleared
0000011111 // assume this is MAX_VALUE - 1
1100110110 // assume this is counter + 1
__________
0000010110
The result would be true if any of the bits less than or equal to MAX_VALUE - 1 were 1.
The formula
counter = (++counter)&(MAX_VALUE-1);
has undefined behaviour, see CoryKramer's answer.
The formula
counter = (counter + 1)&(MAX_VALUE - 1);
works only for MAX_VALUEs that are equal to a power of 2. Only then the value MAX_VALUE - 1 has this form in binary:
000...000111...111
When such a value is used in a AND operation, it "truncates" the higher bits of the other value, and has the effect of wrapping around when the other value reaches MAX_VALUE.
I think the normal modulo operation is just as fast on modern hardware, and does not have the restriction mentionend above:
counter = (counter + 1)%MAX_VALUE;
Related
I've just started the "advanced" stages of the Python 2.7 course on Codeacademy and went to the extra effort of trying to write a function to perform a manual bitwise OR (|) operation.
What I came up with is not the most readable solution (so not so Pythonic)...
def bitwiseOr(bin1, bin2):
"""Perform a bitwise OR swap of two string representations of a binary number"""
# if the second bit is larger (in length), swap them
if len(bin1) < len(bin2):
bin1, bin2 = bin2, bin1
# cast string into list using list comprehension
result = [x for x in bin1]
resultString = ""
# start at the end of the smallest length bit string,
# moving backwards by 1 char, and stop at the 2nd char
for i in range(len(bin2), 2, -1):
if bin2[i-1] == "1":
result[i] = bin2[i-1]
# convert the list back into a string
for item in result:
resultString += item
return resultString
print bin(0b1110 | 0b101)
print bitwiseOr("0b101", "0b1110")
The above two print calls both return the same result (albeit with the first call returning a binary number and the second returning a string representation of a binary number).
Readability aside - I'm interested to see how it's logically done, underneath the hood, by Python internally. Poking around in the CPython repository didn't yield much, despite the fact that I think I found the right file (C is very foreign to me).
What I mean by logically is, there are normally a few ways to solve any given problem, and I solved this one by passing the base 2 representations of each number as strings, creating a list based on the larger length binary, and comparing each character, preferencing the 1.
How does Python do it internally?
The OR operator is like a parallel electrical circuit with two routes, even if one of the routes is broken the current will still flow through. It only stops when both the routes are broken. But you have to be careful with an OR operator in python, although it looks simple the logic has to be decided really carefully or else you might have a very hard time debugging your code.
This question, to the untrained eye (someone without a computer science background), is about trying to understand the C implementation of Python's bitwise OR operation.
However after some more research I've realised the question, to the trained eye, might seem absurd, since it is actually the processor itself which understands this type of operation.
What this means is that the internal logic of the Python bitwise OR operation is actually entirely dependent on the instructions available to the CPU, that is, it is not dependent on a higher level language.
So in summary, the internal logic of the Python bitwise OR operator performs a comparison on the bits (the 1's and 0's) of the machine in question, and how it performs this operation is a direct comparison of said bits which, mind-blowingly, is not dissimilar to how Ancient Egyptians performed multiplication. Woah.
the problem statement is the following:
Xorq has invented an encryption algorithm which uses bitwise XOR operations extensively. This encryption algorithm uses a sequence of non-negative integers x1, x2, … xn as key. To implement this algorithm efficiently, Xorq needs to find maximum value for (a xor xj) for given integers a,p and q such that p<=j<=q. Help Xorq to implement this function.
Input
First line of input contains a single integer T (1<=T<=6). T test cases follow.
First line of each test case contains two integers N and Q separated by a single space (1<= N<=100,000; 1<=Q<= 50,000). Next line contains N integers x1, x2, … xn separated by a single space (0<=xi< 2^15). Each of next Q lines describe a query which consists of three integers ai,pi and qi (0<=ai< 2^15, 1<=pi<=qi<= N).
Output
For each query, print the maximum value for (ai xor xj) such that pi<=j<=qi in a single line.
int xArray[100000];
cin >>t;
for(int j =0;j<t;j++)
{
cin>> n >>q;
//int* xArray = (int*)malloc(n*sizeof(int));
int i,a,pi,qi;
for(i=0;i<n;i++)
{
cin>>xArray[i];
}
for(i=0;i<q;i++)
{
cin>>a>>pi>>qi;
int max =0;
for(int it=pi-1;it<qi;it++)
{
int t = xArray[it] ^ a;
if(t>max)
max =t;
}
cout<<max<<"\n" ;
}
No other assumptions may be made except for those stated in the text of the problem (numbers are not sorted).
The code is functional but not fast enough; is reading from stdin really that slow or is there anything else I'm missing?
XOR flips bits. The max result of XOR is 0b11111111.
To get the best result
if 'a' on ith place has 1 then you have to XOR it with key that has ith bit = 0
if 'a' on ith place has 0 then you have to XOR it with key that has ith bit = 1
saying simply, for bit B you need !B
Another obvious thing is that higher order bits are more important than lower order bits.
That is:
if 'a' on highest place has B and you have found a key with highest bit = !B
then ALL keys that have highest bit = !B are worse that this one
This cuts your amount of numbers by half "in average".
How about building a huge binary tree from all the keys and ordering them in the tree by their bits, from MSB to LSB. Then, cutting the A bit-by-bit from MSB to LSB would tell you which left-right branch to take next to get the best result. Of course, that ignores PI/QI limits, but surely would give you the best result since you always pick the best available bit on i-th level.
Now if you annotate the tree nodes with low/high index ranges of its subelements (performed only done once when building the tree), then later when querying against a case A-PI-QI you could use that to filter-out branches that does not fall in the index range.
The point is that if you order the tree levels like the MSB->LSB bit order, then the decision performed at the "upper nodes" could guarantee you that currently you are in the best possible branch, and it would hold even if all the subbranches were the worst:
Being at level 3, the result of
0b111?????
can be then expanded into
0b11100000
0b11100001
0b11100010
and so on, but even if the ????? are expanded poorly, the overall result is still greater than
0b11011111
which would be the best possible result if you even picked the other branch at level 3rd.
I habe absolutely no idea how long would preparing the tree cost, but querying it for an A-PI-QI that have 32 bits seems to be something like 32 times N-comparisons and jumps, certainly faster than iterating randomly 0-100000 times and xor/maxing. And since you have up to 50000 queries, then building such tree can actually be a good investment, since such tree would be build once per keyset.
Now, the best part is that you actually dont need the whole tree. You may build such from i.e. first two or four or eight bits only, and use the index ranges from the nodes to limit your xor-max loop to a smaller part. At worst, you'd end up with the same range as PiQi. At best, it'd be down to one element.
But, looking at the max N keys, I think the whole tree might actually fit in the memory pool and you may get away without any xor-maxing loop.
I've spent some time google-ing this problem and it seams that you can find it in the context of various programming competitions. While the brute force approach is intuitive it does not really solve the challenge as it is too slow.
There are a few contraints in the problem which you need to speculate in order to write a faster algorithm:
the input consists of max 100k numbers, but there are only 32768 (2^15) possible numbers
for each input array there are Q, max 50k, test cases; each test case consists of 3 values, a,pi,and qi. Since 0<=a<2^15 and there are 50k cases, there is a chance the same value will come up again.
I've found 2 ideas for solving the problem: splitting the input in sqrt(N) intervals and building a segment tree ( a nice explanation for these approaches can be found here )
The biggest problem is the fact that for each test case you can have different values for a, and that would make previous results useless, since you need to compute max(a^x[i]), for a small number of test cases. However when Q is large enough and the value a repeats, using previous results can be possible.
I will come back with the actual results once I finish implementing both methods
I need to find in an image if a pixel has the maximum value compared to the 8 pixels around it.
I am not sure what is the most optimal way, so my idea is to use the if statement like this:
if(pixel > pixel1 && pixel > pixel2 && pixel > pixel3 && ... && pixel> pixel8)
My question is the following: if it found that for instance pixel is not bigger than pixel1, will it still check the rest of the statement or since it's only ANDs, it will already discard the instruction and go further?
And if the answer is the first one, that would make it very computationally heavy to check each pixel all the time, can somebody give me a hint as how to approach more efficiently this simple problem?
This is called Short Circuit Evaluation.
the second argument is only executed or evaluated if the first argument does not suffice to determine the value of the expression
Since the condition is &&, it will NOT check further if it gets a false in any of the conditions.
Similarly if the condition were ||, it would stop checking once it finds a true.
Btw, I am not absolutely certain of the precedence rules, and because of that I would surround each condition in parentheses just to be safe.
if((pixel > pixel1) && (pixel > pixel2) && ...
Edit: Operator precedence rules seem to indicate that the parentheses in this case are unnecessary.
No, it won't check the rest of the statements. C++ "short-circuits" conditional statements, ignoring the second operand to an && if the first is false (and ignoring the second operand to a || if the second is true).
The operators && and || are so-called 'short circuit operators' in C++ (and in most other languages as well). This means that evaluation will stop as soon as the result can be determined. For &&, this means that evaluation of other terms will stop if one term is false, because then the answer is false, independent of the other terms. Conversely, for || this means that evaluation of other terms will stop if one term is true.
See also this link.
Think of it not as a series but a grouping of expressions so && has just a left and right side, and is left-side associative.
If the left hand side evaluates to false it is guaranteed by the standard not to evaluate what is on the right hand side. The right hand side might even contain an access violation (and often does), e.g. checking if a pointer is non-null on the left side, then dereferencing it on the right.
Your operation is O(N) at worst. If you do this once, it is the optimal way, if you are going to do this a lot, you'd be better off finding the max value of your pixels then just checking against that one.
There is "short-circuit" in C++ that means when first condition satisfies if then the second condition will not checked.
For example if pixel > pixel1 results false the following conditions will be ignored.
I refer you to this "Short circuit evaluation"
While short circuit evaluation has been explained in other answers, it's worth pointing out that a comparison of two pixels may not be completely trivial. For example, you may wish to add red, green and blue pixel values after multiplying them by a weighting factor (as the human eye is more sensitive to some colours than others)... in that case, if you don't preserve the overall pixel value inside the object being compared (thereby using more memory both for that value and to somehow track when it's invalidated, + CPU time to check & regenerate it when necessary), then you'll have to perform this redundant calculation during every one of those comparisons. To avoid this, you might - for example - add a "get_brightness()" function that returns a user-defined type that can be compared efficiently with each of the other pixels.
i have this line of code:
base_num = (arr[j]/base)%256;
This line runs in a loop and the operations "/" and "%" take a lot of resources and time to perform. I would like to change this line and apply bit operations in order to maximize the program performance. How can i do that?
Thanks.
If base is the nth power of two, you can replace division by it with a bitshift of n to the right. Then, since taking the mod 256 of an integer is equivalent to taking its last 8 bits, you can AND it with 0xFF. Alternately, you can reverse the operations if you AND it with 256*base and then bitshift n to the right.
base_num = arr[j] >> n;
base_num &= 0xFF;
Of course, any half-decent compiler should be able to do this for you.
Add -O1 or greater to your compiler options and the compiler will do it for you.
In gcc, -O1 turns on -ftree-slsr which is, according to the docs,
Perform straight-line strength reduction on trees. This recognizes related expressions involving multiplications and replaces them by less expensive calculations when possible.
This will replace the modulo, and the base if it is constant. However, if you know that the base will be some non-constant power of two, you can refactor the surrounding code to give you the log2 of that number, and >> by that amount minus one.
You could also just declare base_num as an 8 bit integer:
#include <stdint.h>
uint8_t base_num;
uint16_t crap;
crap = 0xFF00;
base_num = crap;
If your compiler is standards compliment, it will put the value of byte(0xFF00) (0x00) into base_num.
I have yet to meet a compiler that does saturated arithmetic in plain C (neither C++ or C#), but if it does, it will put the value of sat_byte(0xFF00) which being greater than 0xFF, it will put 0xFF into base_num.
Keep in mind your compiler will warn you of a loss of precision in this instance. Your compiler may error out in this case (Visual Studio does with Treat Warnings as Errors On). If that happens, you can just do:
base_num = (uint8_t)crap;
but this seems like what you are trying to avoid.
What you are trying to do it seems is to remove the modulus operator as that requires a division and division is the most costly basic arithmetic operation. I generally would not think of this as a bottleneck in any way as any "intelligent" compiler (even in debug mode) would "optimize" it to:
base_num = crap & 0xFF;
on a supported platform (every mainstream processor I've heard of - x86, AMD64, ARM, MIPS), which should be any. I would be dumbfounded to hear of a processor that has no basic AND and OR arithmetic instructions.
Suppose I have
x &(num-1)
where x is an unsigned long long and num a regular int and & is the bitwise and operator.
I'm getting a significant speed reduction as the value of num increases. Is that normal behavior?
These are the other parts of the code that are affected
int* hash = new int[num]
I don't think that the bitwise operation is slowing down, I think you're using it a lot more times. And probably it isn't even the bitwise operation that's taking too long, but whatever else you're also doing more times.
Use a profiler.
If you're executing the code in a tight loop, it's wholly possibly that you'll see the performance lessen the higher num gets, I'm guessing that your C++ compiler isn't able to find a native instruction to perform the & with an unsigned long long - as you've stated your getting a slowdown for each power of two then I'd expect that the code that results from the & is repeatedly "dividing num" by 2 until it's zero and performing the and bit-by-bit.
Another possibility is that the CPU you're running on is lame and doesn't perform AND in a fixed number of cycles.
Problem solved. It had to do with the CPU cache and locality.