Performance of doing bitwise operations on bitsets - c++

In C++ if I do a logical OR (or AND) on two bitsets, for example:
bitset<1000000> b1, b2;
//some stuff
b1 |= b2;
Does this happen in O(n) or O(1) time? Why?
Also, can this be accomplished using an array of bools in O(1) time?
Thanks.

It has to happen in O(N) time since there is a finite number of bits that can be processed in any given chunk of time by a given processor platform. In other words, the larger the bit-set, the longer the amount of time each operation will take, and the increase will be linear with respect to the number of bits in the bitset.
You also end up with the same problem using the array of bool types. While each individual operation itself will take O(1) time, the total amount of time for N objects will be O(N).

It's impossible to perform a logical operation (e.g. OR or AND) on arbitrary arrays of flags in unit time. True Big-Oh analysis deals with runtime as the size of the data tends to infinity, and a Core i7 is never going to OR together a billion bits in the same time it takes to OR together two bit.

I think it needs to be made clear that Big O is a boundary - an asymptotic boundary (minimum time required cannot be less than the f(x)'s Big O., and in in thinking about it, it states the order of magnitude of the speed of a computation. So if you think about how an array works - if you can say I can do this operation all in one computation or so, or there's a known amount that is very small and much less than N, then it is constant. If you need to iterate in some manner (in this case you will see all the bits need to be checked, and there is no short cut for bitwise OR - therefore N bits need to be computed, and therefore it's O(n). [It's actually tighter boundary than that, but we're dealing with just Big O]. An array itself stores N-bits in it.
In fact, few things are really O(1) (index look ups at a known address using a pointer can be O(1) (if you already know what you are looking up). But, if you have M things that need to be looked up in constant time, then it is O(M) * O(1) = O(M).
This is a function of modern day computer - since most things are processed sequentially. (multi-core helps but doesn't come close to affecting big O notation yet). There is of course, the ability of the computer to process words in parallel, but even that is just a constant subtraction. O(n) / O(64) is still O(n).

Related

Time complexity for finding if number is power of two

What is the time complexity of this code to find if the number is power of 2 or not.
Is it O(1)?
bool isPowerOfTwo(int x) {
// x will check if x == 0 and !(x & (x - 1)) will check if x is a power of 2 or not
return (x && !(x & (x - 1)));
}
LeetCode 231
Informally it's O(1) because the code takes a bounded amount of time to run. It's not constant time as the runtime does depend on the input (for example, if x is 0, the function returns quickly), but it's bounded.
More formally, it's ambiguous because O(n) is defined for functions parameterized by arbitrarily large n, and here int is limited to 2^31 or 2^63. Often in complexity calculations of real programs this can be ignored (because an array of size 2^31 is very large), but here it's easy to think up numbers out of the range that your function accepts.
In practice, complexity theorists commonly generalize your problem in two ways
Either assume int contains Theta(log n) bits, and arithmetic operations work in O(1) time for Theta(log n) bits (that is, the size of memory cells and registers get larger as the input size increases). That's sometimes called the "word model".
Or assume that arithmetic operations are O(1) only for bit operations. That's sometimes called the "bit model".
In the word model, the function is O(1). In the bit model, the function is O(log n).
Note, if you replace int with a big-int type, then your function will certainly be O(log n).
Yes, It is O(1), but Time complexity for bitwiseAnd(10^9,1) bitwiseAnd(10,1) are not same even though they both are O(1). In reality, there are 4 basic operations involved in your equation itself, which we consider as basic and unit operations in terms of the power of computing that it does. But in reality, These basic operations also have a cost of 32 or 64 operations as 32 or 64 bits are used to represent a number in most of the cases. So this O(1) time complexity means that the worst time complexity is of 32 or 64 operations in terms of computing and since 32 and 64 both are very low values and thses operations are performed on machine level so that's why we do not think much about the time these unit steps require to perform their function.
Yes the code is time complexity O(1) because the running time is constant and does not depend on the size of the input.

How to efficiently implement bitwise rotate of an arbitrary sequence?

"The permutation p of n elements defined by an index permutation p(i) = (i + k) mod n is called the k-rotation." -- Stepanov & McJones
std::rotate has become a well known algorithm thanks to Sean Parent, but how to efficiently implement it for an arbitrary sequence of bits?
By efficient, I mean minimizes at least two things, i) the number of writes and ii) the worst-case space complexity.
That is, the input should be similar to std::rotate but bit-wise specific, I guess like this:
A pointer to the memory where the bit sequence starts.
Three bit indices: first, middle and last.
The type of the pointer could be any unsigned integer, and presumably the larger the better. (Boost.Dynamic Bitset calls it the "block".)
It's important to note that the indices may all be offset from the start of a block by different amounts.
According to Stepanov and McJones, rotate on random access data can be implemented in n + gcd(n, k) assignments. The algorithm that reverses each subrange followed by reversing the entire range takes 3n assignments. (However, I agree with the comments below that it is effectively 2n assignments.) Since the bits in an array can be accessed randomly, I assume the same optimal bound applies. Each assignment will usually require two reads because of different subrange block offsets but I'm less concerned about reads than writes.
Does an efficient or optimal implementation of this algorithm already exist out in the open source wild?
If not, how would one do it?
I've looked through Hacker's Delight and Volume 4A of Knuth but can't find an algorithm for it.
Using a vector<uint32_t>, for example, it's easy and reasonably efficient to do the fractional-element part of the rotation in one pass yourself (shift_amount%32), and then call std::rotate to do the rest. The fractional part is easy and only operates on adjacent elements, except at the ends, so you only need to remember one partial element while you're working.
If you want to do the whole thing yourself, then you can do the rotation by reversing the order of the entire vector, and then reversing the order of the front and back sections. The trick to doing this efficiently is that when you reverse the whole vector, you don't actually bit-reverse each element -- you just think of them as being in the opposite order. The reversal of the front and back sections is trickier and requires you to remember 4 partial elements while you work.
In terms of writes to memory or cache, both of the above methods make 2N writes. The optimal rotation you refer to in the question takes N, but if you extend it to work with fractional-word rotations, then each write spans two words and it then takes 2N writes. It provides no advantage and I think it would turn out to be complicated.
That said... I'm sure you could get closer to N writes with a fixed amount of register storage by doing m words at a time, but that's a lot of code for a simple rotation, though, and your time (or at least my time :) would be better spent elsewhere.

What is the complexity of copying a bitset? Is it the same as that for a bit mask?

I intend to know about the differences in performance or a bit mask as compared to bitset. I know copying a bitmask will take O(1) as it is basically represented just as an integer, so is that the same for bitsets as well, where each value is represented by 1 bit, hence making it the same size as a bitmask? Or will copying a bitset take O(N) time.
I'm trying to measure the usefuleness of bitmasking, specifically in the context of competitive programming.
Thanks!
Copying a bitmask isn't constant-time. It's O(n) in the number of bits, just like any other operation that has to touch every element of a structure once.
Generally speaking, a C++ bitset object should behave comparably to a hand-rolled integer bitmask. For instance, operations on a bitset<32> should perform identically to the equivalent bitwise operations on a uint32_t.
When you say that something is O(N), you are talking about its asymptotic complexity. "Asymptotic" is an important word here. It means you are saying that the actual complexity of the thing approaches some linear function of N as N increases without bound.
So, it's important to know what N is. In the case of a bit-mapped set, it's probably the number of unique elements that can be in (or not in) the set. But what is N when you are talking about a data structure that fits in an int? How can N increase without bound in that case?
It doesn't make any sense to talk about the asymptotic complexity of a thing if the thing doesn't scale. An int does not scale. An int is just an int. It doesn't make any sense to say that an operation on an int is O(1) or O(anythingelse) for that matter.

how does IF affect complexity?

Let's say we have an array of 1.000.000 elements and we go through all of them to check something simple, for example if the first character is "A". From my (very little) understanding, the complexity will be O(n) and it will take some X amount of time. If I add another IF (not else if) to check, let's say, if the last character is "G", how will it change complexity? Will it double the complexity and time? Like O(2n) and 2X?
I would like to avoid taking into consideration the number of calculations different commands have to make. For example, I understand that Len() requires more calculations to give us the result than a simple char comparison does, but let's say that the commands used in the IFs will have (almost) the same amount of complexity.
O(2n) = O(n). Generalizing, O(kn) = O(n), with k being a constant. Sure, with two IFs it might take twice the time, but execution time will still be a linear function of input size.
Edit: Here and Here are explanations, with examples, of the big-O notation which is not too mathematic-oriented
Asymptotic complexity (which is what big-O uses) is not dependent on constant factors, more specifically, you can add / remove any constant factor to / from the function and it will remain equivalent (i.e. O(2n) = O(n)).
Assuming an if-statement takes a constant amount of time, it will only add a constant factor to the complexity.
A "constant amount of time" means:
The time taken for that if-statement for a given element is not dependent on how many other elements there are in the array
So basically if it doesn't call a function which looks through the other elements in the array in some way or something similar to this
Any non-function-calling if-statement is probably fine (unless it contains a statement that goes through the array, which some language allows)
Thus 2 (constant-time) if-statements called for each each element will be O(2n), but this is equal to O(n) (well, it might not really be 2n, more on that in the additional note).
See Wikipedia for more details and a more formal definition.
Note: Apart from not being dependent on constant factors, it is also not dependent on asymptotically smaller terms (terms which remain smaller regardless of how big n gets), e.g. O(n) = O(n + sqrt(n)). And big-O is just an upper bound, so saying it is O(n9999) would also be correct (though saying that in a test / exam will probably get you 0 marks).
Additional note: The problem when not ignoring constant factors is - what classifies as a unit of work? There is no standard definition here. One way is to use the operation that takes the longest, but determining this may not always be straight-forward, nor would it always be particularly accurate, nor would you be able to generically compare complexities of different algorithms.
Some key points about time complexity:
Theta notation - Exact bound, hence if a piece of code which we are analyzing contains conditional if/else and either part has some more code which grows based on input size then exact bound can't be obtained since either of branch might be taken and Theta notation is not advisable for such cases. On the other hand, if both of the branches resolve to constant time code, then Theta notation can be applicable in such case.
Big O notation - Upper bound, so if a code has conditionals where either of the conditional branches might grow with input size n, then we assume max or upper bound to calculate the time consumption by the code, hence we use Big O for such conditionals assuming we take the path that has max time consumption. So, the path which has lower time can be assumed as O(1) in amortized analysis(including the fact that we assume this path has no no recursions that may grow with the input size) and calculate time complexity Big O for the lengthiest path.
Big Omega notation - Lower bound, This is the minimum guaranteed time that a piece of code can take irrespective of the input. Useful for cases where the time taken by code doesn't grow based on input size n, but it consumes a significant amount of time k. In these cases, we can use the lower bound analysis.
Note: All of these notations doesn't depend upon the input being best/avg/worst and all of these can be applied to any piece of code.
So as discussed above, Big O doesn't care about the constant factors such as k and only sees how time increases with respect to growth in n, in which case here it is O(kn) = O(n) linear.
PS: This post was about the relation of big O and conditionals evaluation criteria for amortized analysis.
It's related to a question I posted myself today.
In your example it depends on whether you can jump from the first to the last element and if you can't then it also depends on the average length of each entry.
If as you went down through the array you had to read each full entry in order to evaluate your two if statements then your order would be O(1,000,000xN) where N is the average length of each entry. IF N is variable then it will affect the order. An example would be standard multiplication where we perform Log(N) additions of an entry which is Log(N) in lenght and so the order is O(Log^2(N)) or if you prefer O((Log(N))^2).
On the other hand if you can just check the first and last character then N = 2 and is constant so can be ignored.
This is an IMPORTANT point you have to be careful though because how can you decide if your multipler can be ignored. For example say we were doing Log(N) additions of a Log(N/100) number. Now just because Log(N/100) is the smaller term doesn't mean we can ignore it. The multiplying factor cannot be ignored if it is variable.

Cost of using std::map with std::string keys vs int keys?

I know that the individual map queries take a maximum of log(N) time. However I was wondering, I have seen a lot of examples that use strings as map keys. What is the performance cost of associating a std::string as a key to a map instead of an int for example ?
std::map<std::string, aClass*> someMap; vs std::map<int, aClass*> someMap;
Thanks!
Analyzing algorithms for asymptotic performance is working on the operations that must be performed and the cost they add to the equation. For that you need to first know what are the performed operations and then evaluate its costs.
Searching for a key in a balanced binary tree (which maps happen to be) require O( log N ) complex operations. Each of those operations implies comparing the key for a match and following the appropriate pointer (child) if the key did not match. This means that the overall cost is proportional to log N times the cost of those two operations. Following pointers is a constant time operation O(1), and comparing keys depend on the key. For an integer key, comparisons are fast O(1). Comparing two strings is another story, it takes time proportional to the sizes of the strings involved O(L) (where I have used intentionally L as the length of string parameter instead of the more common N.
When you sum all the costs up you get that using integers as keys the total cost is O( log N )*( O(1) + O(1) ) that is equivalent to O( log N ). (O(1) gets hidden in the constant that the O notation silently hides.
If you use strings as keys, the total cost is O( log N )*( O(L) + O(1) ) where the constant time operation gets hidden by the more costly linear operation O(L) and can be converted into O( L * log N ). That is, the cost of locating an element in a map keyed by strings is proportional to the logarithm of the number of elements stored in the map times the average length of the strings used as keys.
Note that the big-O notation is most appropriate to use as an analysis tool to determine how the algorithm will behave when the size of the problem grows, but it hides many facts underneath that are important for raw performance.
As the simplest example, if you change the key from a generic string to an array of 1000 characters you can hide that cost within the constant dropped out of the notation. Comparing arrays of 1000 chars is a constant operation that just happens to take quite a bit of time. With the asymptotic notation that would just be a O( log N ) operation, as with integers.
The same happens with many other hidden costs, as the cost of creation of the elements that is usually considered as a constant time operation, just because it does not depend on the parameters to your problem (the cost of locating the block of memory in each allocation does not depend on your data set, but rather on memory fragmentation that is outside of the scope of the algorithm analysis, the cost of acquiring the lock inside malloc as to guarantee that not two processes try to return the same block of memory depends on the contention of the lock that depends itself number of processors, processes and how much memory requests they perform..., again out of the scope of the algorithm analysis). When reading costs in the big-O notation you must be conscious of what it really means.
In addition to the time complexity from comparing strings already mentioned, a string key will also cause an additional memory allocation each time an item is added to the container. In certain cases, e.g. highly parallel systems, a global allocator mutex can be a source of performance problems.
In general, you should choose the alternative that makes the most sense in your situation, and only optimize based on actual performance testing. It's notoriously hard to judge what will be a bottleneck.
The cost difference will be linked to the difference in cost between comparing two ints versus comparing two strings.
When comparing two strings, you have to dereference a pointer to get to the first chars, and compare them. If they are identical, you have to compare the second chars, and so on. If your strings have a long common prefix, this can slow down the process a bit. It is very unlikely to be as fast as comparing ints, though.
The cost is ofcourse that ints can be compared in real O(1) time whereas strings are compared in O(n) time (n being the maximal shared prefix). Also, the storage of strings consumes more space than that of integers.
Other than these apparent differences, there's no major performance cost.
First of all, I doubt that in a real application, whether you have string keys or int keys makes any noticeable difference. Profiling your application will tell you if it matters.
If it does matter, you could change your key to be something like this (untested):
class Key {
public:
unsigned hash;
std::string s;
int cmp(const Key& other) {
int diff = hash - other.hash;
if (diff == 0)
diff = strcmp(s, other.s);
return diff;
}
Now you're doing an int comparison on the hashes of two strings. If the hashes are different, the strings are certainly different. If the hashes are the same, you still have to compare the strings because of the Pigeonhole Principle.
Simple example with just accessing values in two maps with equal number of keys - one int keys another strings of the same int values takes 8 times longer with strings.