Apart from this code being horrible inefficient, is this way I´m writing recursive function here considered "good style". Like for example what I am doing creating a wrapper then passing it the int mid and a counter int count.
What this code does is getting values from the array then see if that combined with blockIndex is greater than the mid. So, appart from being inefficient would I get a job writing recursive functions like this?
int NumCriticalVotes :: CountCriticalVotesWrapper(Vector<int> & blocks, int blockIndex)
{
int indexValue = blocks.get(blockIndex);
blocks.remove(blockIndex);
int mid = 9;
return CountCriticalVotes(blocks, indexValue, mid, 0);
}
int NumCriticalVotes :: CountCriticalVotes(Vector<int> & blocks, int blockIndex, int mid, int counter)
{
if (blocks.isEmpty())
{
return counter;
}
if (blockIndex + blocks.get(0) >= mid)
{
counter += 1;
}
Vector<int> rest = blocks;
rest.remove(0);
return CountCriticalVotes(rest, blockIndex, mid, counter);
}
This is valid to the extent that it'll work for sufficiently small collections.
It is, however, quite inefficient -- for each recursive call you're creating a copy of the entire uncounted part of the Vector. So, if you count a vector containing, say, 1000 items, you'll first create a Vector of 999 items, then another of 998 items, then another of 997, and so on, all the way down to 0 items.
This would be pretty wasteful by itself, but seems to even get worse. You're then removing an item from your Vector -- but you're removing the first item. Assuming your Vector is something like std::vector, removing the last item takes constant time but removing the first item takes linear time -- i.e., to remove he first item, each item after that is shifted "forward" into the vacated spot.
This means that instead of taking constant space and linear time, your algorithm is quadratic in both space and time. Unless the collection involved is extremely small, it's going to be quite wasteful.
Instead of creating an entire new Vector for each call, I'd just pass around offsets into the existing Vector. This will avoid both copying and removing items, so it's pretty trivial to make it linear in both time and space (which is still well short of optimum, but at least not nearly as bad as quadratic).
To reduce the space used still further, treat the array as two halves. Count each half separately, then add together the results. This will reduce recursion depth to logarithmic instead of linear, which is generally quite substantial (e.g., for a 1000 items, it's a depth of about 10 instead of about a 1000. For a million items, the depth goes up to about 20 instead of a million).
Without know exactly what you are trying to accomplish, this is a very tough question to answer. The way I see recursion, or coding in general, is does it satisfy the following three requirements.
Does it accomplish all desired functionality?
Is it error resilient? Meaning it will not break when passed invalid inputs, or edge cases.
Does it accomplish its goal in sufficient time?
I think you are worried about number 3, and I can say that the time should fit the problem. For example, if you searching through 2 huge lists, O(n^2) is probably not acceptable. However, say you are searching through 2 small sets O(n^2) is probably sufficiently fast.
What I can say is to try timing different implementations of your algorithm on test cases. Just because your solution is recursive doesn't mean that it will always be faster than a "brute force" implementation. (This of course depends specifically on the case).
To answer your question, as far as recursion goes, this sample looks fine. However, will you get a job writing code like this? I don't know how well does this satisfy the other two coding requirements?
Very subjective question. The tail recursion is nice (in my book) but I'd balance that against creating a new vector on every call, which makes a linear algorithm quadratic. Independent of recursion, that's a big no-no, particularly as it is easily avoidable.
A few comments about what the code is intended to accomplish would also be helpful, although I suppose in context that would be less problematic.
The issues with your solution is your passing the count around. Stop pass the count and use the stack to keep track of it. The other issue is I'm not sure what your second condition is suppose to do.
int NumCriticalVotes :: CountCriticalVotesWrapper(Vector<int> & blocks, int blockIndex)
{
int indexValue = blocks.get(blockIndex);
blocks.remove(blockIndex);
int mid = 9;
return CountCriticalVotes(blocks, indexValue, mid);
}
int NumCriticalVotes :: CountCriticalVotes(Vector<int> & blocks, int blockIndex, int mid)
{
if (blocks.isEmpty())
{
return 0;
}
if (/*Not sure what the condition is*/)
{
return 1 + CountCriticalVotes(blocks, blockIndex, mid);
}
return CountCriticalVotes(blocks, blockIndex, mid);
}
In C++, traversing lists of arbitrary length using recursion is never a good practice. It's not just about performance. The standard doesn't mandate tail call optimization, so you have a risk of stack overflow if you can't guarantee that the list has some limited size.
Sure, the recursion depth has to be several hundred thousand with typical stack sizes, but it's hard to know when designing a program what kind of inputs it must be able to handle in the future. The problem might come back to haunt you much later.
Related
In programming, we are using many of the control structure to iterate. So which one is the best way to iterate with with respect to time complexity?
int a[500],n=10;
for(int i=0;i<=n;i++)
{
cin>>a[i]
}
How can I change this iteration achieve less complexity?
Which one is the best way to use for iteration:
for
while
do while
for, while and do while (and also goto) is really the same thing. No matter what loop you create with one of these loops, you can always create an equivalent loop with the same time complexity with all the others. (Almost true. The only exception is that the do-while loop has to be run at least once.)
For example, your loop
for(int i=0;i<=n;i++) {
...
}
corresponds to
int i=0;
while(i<=n) {
...
i++;
}
and
int i=0;
START:
...
i++;
if(i<=n)
goto START;
You could make an equivalent do-while too, but it does not really make sense.
Which one you should choose is more a matter of design than performance. In general:
for - When you know the number of iterations before the loop starts
while - When you don't know
do-while - When you don't know, but at least once
goto - Never (Some exceptions exists)
A benefit with for loops is that you can declare variables that only exists within the loop scope and also can be used for the loop condition.
this will iterate from i=0 to i=10, so 11 iterations in total. The time complexity for any basic loop is O(N).
All the above options mentioned(for-loop, while-loop, do-while-loop) have the same time complexity.
As always, you should use caching techniques for such purposes. Because if you are interested, for, while keywords in fact do the same thing in almost the same instructions (both are expressed in jmp instruction). Again, silver bullet is not existed. By depending of nature of your program the only way to optimize looping is using caching or parallelization if it can fit yoyr goals. Maybe there is constant values which created only once and used multiple times? Then cache result if it is possible.This can reduce time to 'constant'. Or do it in parallel way. But I think it is not proper way, many things compiler will do for you. Better concentrate on your architecture of program
The use of for, while, do-while, for-each, etc could be consider a classic example of syntactic sugar. They're just ways to do the same thing but in certain cases some control structures can be "sweeter" than others. For instance, if you want to keep iterating iff (if and just if) a boolean keeps true (for instance using an Iterator), a while look much better than a for (well that's a subjective comment), but the complexity will be the same.
while (iter.next()) {
// Do something
}
for (;iter.next();) {
// Do something
}
In terms of temporal complexity they're iterating the same amount of elements, in your example N=10 therefore O(N). How can you make it better? It depends, if you have to iterate all over the array, the Big O best case will always be O(N). Now in terms of ~N, that statement is not always true. For instance if you iterate just half of the array having 2 starting points, one at i=0 and the other one at i=n-1, you can achieve a temporal complexity ~N/2
for(int i=0;i<n/2;i++)
{
int x = a[i];
int y = a[n-i-1];
// Do something with those values
}
For big O is the same complexity, given that ~N/2 -> O(N) but if you have a set of 10k records, just read 5k is an achievement! In this last case what I'm trying to say is that if you want to improve your code complexity you need to start checking better data structures and algorithms (this is just a simple silly example, there are beautiful algorithms and data structures for multiple cases). Just remember: for or while are not the big prOblems!
I have the following problem: I have a set of N elements (N being somewhere between several hundred and several thousand elements, let's say between 500 and 3000 elements). Out of these elements, small percentage will have some property "X", but the elements "gain" and "lose" this property in a semi-random fashion; so if I store them all in an array, and assign 1 to elements with property X, and zero otherwise, this array of N elements will have n 1's and the N-n zeros (n being small in the 20-50 range).
The problem is the following: these elements change very frequently in a semi-random way (meaning that any element can flip from 0 to 1 and vice versa, but the process that controls that is somewhat stable, so the total number "n" fluctuates a bit, but is reasonably stable in the 20-50 range); and I frequently need all the "X" elements of the set (in other words, indices of the array where value of the array is 1), to perform some task on them.
One simple and slow way to achieve this is to simply loop through the array and if index k has value 1, perform the task, but this is kinda slow because well over 95% of all the elements have value 1. The solution would be to put all the 1s into a different structure (with n elements) and then loop through that structure, instead of looping through all N elements. The question is what's the best structure to use?
Elements will flip from 0 to 1 and vice versa randomly (from several different threads), so there's no order there of any sort (time when element flipped from 0 to 1 is has nothing to do with time it will flip back), and when I loop through them (from another thread), I do not need to loop in any particular order (in other words, I just need to get them all, but it's nor relevant in which order).
Any suggestions what would be the optimal structure for this? "std::map" comes to mind, but since the keys of std::map are sorted (and I don't need that feature), the questions is if there is anything faster?
EDIT: To clarify, the array example is just one (slow) way to solve the problem. The essence of the problem is that out of one big set "S" with "N" elements, there is a continuously changing subset "s" of "n" elements (with n much smaller then N), and I need to loop though that set "s". Speed is of essence, both for adding/removing elements to "s", and for looping through them. So while suggestions like having 2 arrays and moving elements between them would be fast from iteration perspective, adding and removing elements to an array would be prohibitively slow. It sounds like some hash-based approach like std::set would work reasonably fast on both iteration and addition/removal fronts, the question is is there something better than that? Reading the documentation on "unordered_map" and "unordered_set" doesn't really clarify how much faster addition/removal of elements is relative to std::map and std::set, nor how much slower the iteration through them would be. Another thing to keep in mind is that I don't need a generic solution that works best in all cases, I need one that works best when N is in the 500-3000 range, and n is in the 20-50 range. Finally, the speed is really of essence; there are plenty slow ways of doing it, so I'm looking for the fastest way.
Since order doesn't appear to be important, you can use a single array and keep the elements with property X at the front. You will also need an index or iterator to the point in the array that is the transition from X set to unset.
To set X, increment the index/iterator and swap that element with the one you want to change.
To unset X, do the opposite: decrement the index/iterator and swap that element with the one you want to change.
Naturally with multiple threads you will need some sort of mutex to protect the array and index.
Edit: to keep a half-open range as iterators are normally used, you should reverse the order of the operations above: swap, then increment/decrement. If you keep an index instead of an iterator then the index does double duty as the count of the number of X.
N=3000 isn't really much. If you use a single bit for each of them, you have a structure smaller than 400 bytes. You can use std::bitset for that. If you use an unordered_set or a set however be mindful that you'll spend many more bytes for each of the n elements in your list: if you just allocate a pointer for each element in a 64bit architecture you'll use at least 8*50 = 400 bytes, much more than the bitset
#geza : perhaps I misunderstood what you meant by two arrays; I assume you meant something like have one std::vector (or something similar) in which I store all elements with property X, and another where I store the rest? In reality, I don't care about others, so I really need one array. Adding an element is obviously simple if I can just add it to the end of the array; now, correct me if I'm wrong here, but finding an element in that array is O(n) operation (since the array is unsorted), and then removing it from the array again requires shifting all the elements by one place, so this in average requires n/2 operations. If I use linked list instead of vector, then deleting an element is faster, but finding it still takes O(n). That's what I meant when I said it would be prohibitively slow; if I misunderstood you, please do clarify.
It sounds like std::unordered_set or std::unordered_map would be fastest in adding/deleting elements, since it's O(1) to find an element, but it's unclear to me how fast can one loop through all the keys; the documentation clearly states that iteration through keys of std::unordered_map is slower then iteration through keys of std::map, but it's not quantified in any way just how slow is "slower", and how fast is "faster".
And finally, to repeat one more time, I'm not interested in general solution, I'm interested in one for small "n". So if for example I have two solutions, one that's k_1*log(n), and second that's k_2*n^2, first one might be faster in principle (and for large n), but if k_1 >> k_2 (let's say for example k_1 = 1000 and k_2=2 and n=20), second one can still be faster for relatively small "n" (1000*log(20) is still larger than 2*20^2). So even if addition/deletion in std::unordered_map might be done in constant time O(1), for small "n" it still matters if that constant time is 1 nanosecond or 1 microsecond or 1 millisecond. So I'm really looking for suggestions that work best for small "n", not for in the asymptotic limit of large "n".
An alternative approach (in my opinion worth only if the number of element is increased at least tenfold) might be keeping a double index:
#include<algorithm>
#include<vector>
class didx {
// v == indexes[i] && v > 0 <==> flagged[v-1] == i
std::vector<ptrdiff_t> indexes;
std::vector<ptrdiff_t> flagged;
public:
didx(size_t size) : indexes(size) {}
// loop through flagged items using iterators
auto begin() { return flagged.begin(); }
auto end() { return flagged.end(); }
void flag(ptrdiff_t index) {
if(!isflagged(index)) {
flagged.push_back(index);
indexes[index] = flagged.size();
}
}
void unflag(ptrdiff_t index) {
if(isflagged(index)) {
// swap last item with item to be removed in "flagged", update indexes accordingly
// in "flagged" we swap last element with element at index to be removed
auto idx = indexes[index]-1;
auto last_element = flagged.back();
std::swap(flagged.back(),flagged[idx]);
std::swap(indexes[index],indexes[last_element]);
// remove the element, which is now last in "flagged"
flagged.pop_back();
indexes[index] = 0;
}
}
bool isflagged(ptrdiff_t index) {
return indexes[index] > 0;
}
};
Wikipedia says this about dynamic programming :
In mathematics, computer science, economics, and bioinformatics, dynamic programming is a method for solving a complex problem by breaking it down into a collection of simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems and optimal substructure. When applicable, the method takes far less time than other methods that don't take advantage of the subproblem overlap (like depth-first search).
and also from Introduction to Algorithms (Cormen) , I have learnt that dynamic programming is a method applied to solve repeating computations that have already been computed once. In layman's terms ,
if you're going to compute something again and again , better store it somewhere.
Applying this on Fibonacci I could write its algorithm as follows :
arr[3] = {1,1,1} //first two index for computation , last to keep track
fibbDyn(n){
if(n>=1 || a[2]>n ) return n; // return on base conditions
else {
res = arr[0] + fibbDyn(n-1);
arr[0]=arr[1];
arr[1]=res;
arr[2]+=1; // increment value by 1
return res;
}
}
While I believe this algorithm follows the example of dynamic programming as it reduces the extra computations being done in original recursive fibbonaci version :
fibb(n){
if (n>=1)return n;
else return fibb(n-1) + fibb(n-2);
}
as here due to two separate calls at each recursive step else return fibb(n-1) + fibb(n-2) many computations are repeated.
an iterative solution will probably look like :
int FibonacciIterative(int n)
{
if (n == 0) return 0;
if (n == 1) return 1;
int prevPrev = 0;
int prev = 1;
int result = 0;
for (int i = 2; i <= n; i++)
{
result = prev + prevPrev;
prevPrev = prev;
prev = result;
}
return result;
}
So my question is , will an iterative solution to Fibonacci problem be classified as dynamic programming?
My reasoning for a disagreement is that , an iterative solution dosen't exhibits Overlapping subproblems such as an recursive solution is exhibiting. In an iterative solution , there are no redundant and repetitive computations being made , so it shouldn't be included in dynamic programming.
relevant articles : optimal substructure , overlapping subproblems ,
dynamic programming.
Yes. That's just a special case of Bottom Up dynamic programming. You're allowed to discard table entries that you know you will never use again, in the case of Fibonacci that means you only need to keep 2 entries, and then you can forget it was ever a table and just use two named variables. So, it ends up looking different, almost too simple. But the structure of that algorithm is still DP. The overlapping subproblems that you say aren't there are still there, because you use every result twice (once when it's in prev, again when it's in prevPrev), except in the end. Of course there are no redundant computations made, but then that's the idea of DP - remove redundant computation by reuse.
There is a general "plan of attack" for problems that allow dynamic programming, namely
state the problem recursively
(prove that DP can be applied)
identify an ordering of sub-problems such that they are topologically sorted (so computing a solution relies only on trivial solutions and previously-computed solutions, not on future ones)
fill a table iteratively in that order, if there was a nice order. Maybe keep the Top Down structure if the order is annoying.
In the case of Fibonacci, what happened is that the order is trivial (that's not even particularly uncommon, but it makes it look as if we're "not really doing anything special"), and the dependencies never go back more than 2 places, so the only part of the table that has to be remembered is the previous two cells. So applying all that, you get the well-known iterative algorithm. That doesn't mean it's not DP anymore, it means that the DP was extremely successful.
As for the properties (optimal substructure, overlapping subproblems), they're properties of the problem, they don't go away no matter how you decide to solve it. But you can still see them back in the iterative algorithm, as I pointed out in the first paragraph.
On the wikipedia page of Dynamic Programming read - Dynamic programming in computer programming. This explains about the two approaches, Top Down that falls out as recursive formulation of the problem and Bottom Up in which we iteratively generate solution to bigger problems and current solution are stored in table. In this case, a table is not required and the job can be done using two variables.
Thus, in case of Iterative Approach, only two variables are being used for holding the values of subproblems; ie, prev, (i-1 th) and prevPrev, (i-2 th). Here, prev and prevPrev are used to find the solution of the ith iteration (Bigger problem).
result = prev + prevPrev;
is nothing but representation of ith iteration result, which is equal to prev(i-1) + prevPrev(i-2). Thus, reuse of subproblems is taking place in the iterative approach too.
This is the bottom up approach of dynamic programming and the recursive approach is Top Down approach of dynamic programming.
I needed to sort an array of large sized objects and it got me thinking: could there be a way to minimize the number of swaps?
So I used quicksort (but any other fast sort should work here too) to sort indices to the elements in the array; indices are cheap to swap. Then I used those indices to swap the actual objects into their places. Unfortunately this uses O(n) additional space to store the indices. The code below illustrates the algorithm (which I'm calling IndexSort), and in my tests, appears to be faster than plain quicksort for arrays of large sized objects.
template <class Itr>
void IndexSort(Itr begin, Itr end)
{
const size_t count = end - begin;
// Create indices
vector<size_t> ind(count);
iota(ind.begin(), ind.end(), 0);
// Sort indices
sort(ind.begin(), ind.end(), [&begin] (const size_t i, const size_t j)
{
return begin[i] < begin[j];
});
// Create indices to indices. This provides
// constant time search in the next step.
vector<size_t> ind2(count);
for(size_t i = 0; i < count; ++i)
ind2[ind[i]] = i;
// Swap the objects into their final places
for(size_t i = 0; i < count; ++i)
{
if( ind[i] == i )
continue;
swap(begin[i], begin[ind[i]]);
const size_t j = ind[i];
swap(ind[i], ind[ind2[i]]);
swap(ind2[i], ind2[j]);
}
}
Now I have measured the swaps (of the large sized objects) done by both, quicksort, and IndexSort, and found that quicksort does a far greater number of swaps. So I know why IndexSort could be faster.
But can anyone with a more academic background explain why/how does this algorithm actually work? (it's not intuitive to me, although I somehow came up with it).
Thanks!
Edit: The following code was used to verify the results of IndexSort
// A class whose objects will be large
struct A
{
int id;
char data[1024];
// Use the id to compare less than ordering (for simplicity)
bool operator < (const A &other) const
{
return id < other.id;
}
// Copy assign all data from another object
void operator = (const A &other)
{
memcpy(this, &other, sizeof(A));
}
};
int main()
{
const size_t arrSize = 1000000;
// Create an array of objects to be sorted
vector<A> randArray(arrSize);
for( auto &item: randArray )
item.id = rand();
// arr1 will be sorted using quicksort
vector<A> arr1(arrSize);
copy(randArray.begin(), randArray.end(), arr1.begin());
// arr2 will be sorted using IndexSort
vector<A> arr2(arrSize);
copy(randArray.begin(), randArray.end(), arr2.begin());
{
// Measure time for this
sort(arr1.begin(), arr1.end());
}
{
// Measure time for this
IndexSort(arr2.begin(), arr2.end());
}
// Check if IndexSort yielded the same result as quicksort
if( memcmp(arr1.data(), arr2.data(), sizeof(A) * arr1.size()) != 0 )
cout << "sort failed" << endl;
return 0;
}
Edit: Made the test less pathological; reduced the size of the large object class to just 1024 bytes (plus one int), and increased the number of objects to be sorted to one million. This still results in IndexSort being significantly faster than quicksort.
Edit: This requires more testing for sure. But it makes me think, what if std::sort could, at compile time, check the object size, and (depending on some size threshold) choose either the existing quicksort implemenation or this IndexSort implementation.
Also, IndexSort could be described as an "in-place tag sort" (see samgak's answer and my comments below).
It seems to be a tag sort:
For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk. In that scenario, another algorithm may be preferable even if it requires more total comparisons.
One way to work around this problem, which works well when complex records (such as in a relational database) are being sorted by a relatively small key field, is to create an index into the array and then sort the index, rather than the entire array. (A sorted version of the entire array can then be produced with one pass, reading from the index, but often even that is unnecessary, as having the sorted index is adequate.) Because the index is much smaller than the entire array, it may fit easily in memory where the entire array would not, effectively eliminating the disk-swapping problem. This procedure is sometimes called "tag sort".
As described above, tag sort can be used to sort a large array of data that cannot fit into memory. However even when it can fit in memory, it still requires less memory read-write operations for arrays of large objects, as illustrated by your solution, because entire objects are not being copied each time.
Implementation detail: while your implementation sorts just the indices, and refers back to the original array of objects via the index when doing comparisons, another way of implementing it is to store index/sort key pairs in the sort buffer, using the sort keys for comparisons. This means that you can do the sort without having the entire array of objects in memory at once.
One example of a tag sort is the LINQ to Objects sorting algorithm in .NET:
The sort is somewhat flexible in that it lets you supply a comparison delegate. It does not, however, let you supply a swap delegate. That’s okay in many cases. However, if you’re sorting large structures (value types), or if you want to do an indirect sort (often referred to as a tag sort), a swap delegate is a very useful thing to have. The LINQ to Objects sorting algorithm, for example uses a tag sort internally. You can verify that by examining the source, which is available in the .NET Reference Source. Letting you pass a swap delegate would make the thing much more flexible.
I wouldn't exactly call that an algorithm so much as an indirection.
The reason you're doing fewer swaps of the larger objects is because you have the sorted indices (the final result, implying no redundant intermediary swaps). If you counted the number of index swaps in addition to object swaps, then you'd get more swaps total with your index sorting.
Nevertheless, you're not necessarily bound by algorithmic complexity all the time. Spending the expensive sorting time swapping cheap little indices around saves more time than it costs.
So you have a higher number of total swaps with the index sort, but the bulk of them are cheaper and you're doing far fewer of the expensive swaps of the original object.
The reason it's faster is because your original objects are larger than indices but perhaps inappropriate for a move constructor (not necessarily storing dynamically-allocated data).
At this level, the cost of the swap is going to be bound more by the structure size of the elements you're sorting, and this will be practical efficiency rather than theoretical algorithmic complexity. And if you get into the hardware details, that's going to boil down to things like more fitting in a cache line.
With sorting, the amount of computation done over the same data set is substantial. We're doing at optimal O(NLogN) compares and swaps, often more in practice. So when you use indices, you make both the swapping and comparison potentially cheaper (in your case, just the swapping since you're still using a comparator predicate to compare the original objects).
Put another way, std::sort is O(NLogN). Your index sort is O(N+NLogN). Yet you're making the bigger NLogN work much cheaper using indices and an indirection.
In your updated test case, you're using a very pathological case of enormous objects. So your index sorting is going to pay off big time there. More commonly, you don't have objects of type T where sizeof(T) spans 100 kilobytes. Typically if an object stores data of such size, it's going to store a pointer to it elsewhere and a move constructor to simply shallow copy the pointers (making it about as cheap to swap as an int). So most of the time you won't necessarily get such a big pay off sorting things indirectly this way, but if you do have enormous objects like that, this kind of index or pointer sort will be a great optimization.
Edit: This requires more testing for sure. But it makes me think, what if std::sort could, at compile time, check the object size, and (depending on some size threshold) choose either the existing quicksort implemenation or this IndexSort implementation.
I think that's not a bad idea. At least making it available might be a nice start. Yet I would suggest against the automatic approach. The reason I think it might be better to leave that to the side as a potential optimization the developer can opt into when appropriate is because there are sometimes cases where memory is more valuable than processing. The indices are going to seem trivial if you create like 1 kilobyte objects, but there are a lot of iffy scenarios, borderline cases, where you might be dealing with something more like 32-64 bytes (ex: a list of 32-byte, 4-component double-precision mathematical vectors). In those borderline cases, this index sort method may still be faster, but the extra temporary memory usage of 2 extra indices per element may actually become a factor (and may occasional cause a slowdown at runtime depending on the physical state of the environment). Consider that attempt to specialize cases with vector<bool> -- it often creates more harm than good. At the time it seemed like a great idea to treat vector<bool> as a bitset, now it often gets in the way. So I'd suggest leaving it to the side and letting people opt into it, but having it available might be a welcome addition.
This is an interview question I faced recently.
Given an array of 1 and 0, find a way to partition the bits in place so that 0's are grouped together, and 1's are grouped together. It does not matter whether 1's are ahead of 0's or 0's are ahead of 1's.
An example input is 101010101, and output is either 111110000 or 000011111.
Solve the problem in less than linear time.
Make the problem simpler. The input is an integer array, with each element either 1 or 0. Output is the same integer array with integers partitioned well.
To me, this is an easy question if it can be solved in O(N). My approach is to use two pointers, starting from both ends of the array. Increases and decreases each pointer; if it does not point to the correct integer, swap the two.
int * start = array;
int * end = array + length - 1;
while (start < end) {
// Assume 0 always at the end
if (*end == 0) {
--end;
continue;
}
// Assume 1 always at the beginning
if (*start == 1) {
++start;
continue;
}
swap(*start, *end);
}
However, the interview insists there is a sub-linear solution. This makes me thinking hard but still not get an answer.
Can anyone help on this interview question?
UPDATE: Seeing replies in SO stating that the problem cannot be solved in sub-linear time, I can confirm my original idea that there cannot be a solution of sub-linear.
Is it possible the interviewer plays a trick?
I don't see how there can be a solution faster than linear time.
Imagine a bit array that is all 1's. Any solution will require examining every bit in this array before declaring that it is already partitioned. Examining every bit takes linear time.
It's not possible. Doing it in less than linear time implies that you don't look at every array element (like a binary search). However since there is no way to know what any element of the array is without looking at it, you must look at each array element at least once.
You can use lookup tables to make it faster, but O(n/8) is still O(n), so either the interviewer was wrong or you misunderstood the question.
It is possible faster then in linear time given you have enough memory, it can be done in O(1)
Use the bitmask as index in a vector which maps to the partitioned bitmask.
using your example, at index 341 (101010101) the value 496 (111110000) is stored.
Perhaps the confusion comes from "less than linear time". For example, this solution counts the number of bits, that makes a masks containing that many bits. It only counts bits while there are uncounted on-bits:
// from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
unsigned count_bits(unsigned pX)
{
unsigned result;
for (result = 0; v; ++result)
{
pX &= pX - 1;
}
return result;
}
unsigned n = /* the number */;
// r contains 000...111, with number of 1's equal to number of 1's in v
unsigned r = 1 << count_bits(n);
Even though this minimizes the number of bits to count, it's still linear. So if this is what is meant by "sub-linear", there you go.
But if they really meant sub-linear as in logarithmic or constant, I don't see a way. You could conceivably make a look-up table for every value, but :/
Technically you could send each element of the array to a separate processor and then do it in less than linear time. If you have N processors, you could even do it in O(1) time!
As others said, I don't believe this can be done in less than linear time. For linear time solution, you can STL algorithms instead your own loop like this:
int a1[8] = {1,0,1,0,1,0,1,0};
std::fill(std::remove(a1, a1+8, 0), a1+8, 0);
Well.. It can be be done 'less than linear' time (cheeky method).
if(n % 2)
{
// Arrange all 1's to the right and DON'T check the right-most bit, because it's 1
}else{
// Arrange all 0's to the right and DON'T check the right-most bit, because it's 0.
}
So, technically you 'group' the bits in less than linear time :P
To me, the most likely interpretations are:
The bits are supposed to be in an int instead of an array, in which case you can use something like http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan or an 8-bit (or more) lookup table.
they used "sublinear" to mean "less than n operations" rather than less-than-O(n). But even that seems impossible for the same reasons listed below.
There is another miscommunication in the question
Otherwise the question is wrong, since all elements of the array must be examined to determine the answer, and that is at least 'n' operations.
Listing either 0s or 1s first, and the references to bits rather than bools make me think something like the first option was intended, even though, when dealing with only one word, it doesn't make very much difference. I'm curious to know what the interviewer actually had in mind.
Splitting this work among parallel processors costs N/M ( or O(N) ) only if you assume that parallelism increases more slowly than problem size does. For the last ten years or so, paralellism (via the GPU) has been increasing more rapidly than typical problem sizes, and this trend looks to continue for years to come. For a broad class of problems, it is instructive to assume "infinite parallelism" or more precisely, "parallelism greater than any expected problem size" because the march of progress in GPUs and cloud computing provides such a thing over time.
Assuming infinite parallelism, this problem can be solved in O(logN) time because the addition operator required to add up all the 0 and 1 bits is associative, and so it requires at least logN time steps to complete.