Constant time "is bool array all 0's?" - c++

I have a boolean array:
bool * arr = new bool[n];
I want to figure out in constant time if there are any 1's in this array; can this be done?
I know bitset has a none() method which does the same thing, but this array needs to be dynamically sized, and boost's dynamic_bitset isn't really an option.
Edited for clarity

Short answer: No. You have to examine half the elements on average, and that takes O(n) time.
Long answer: Yes. If you're prepared to add some extra O(1) complexity to your write operations. Just keep track of every 0->1 and 1->0 with an up/down counter.
Note: I'm assuming that in the general case you have bool *arr = new bool[n];. For a constant-sized array, then yes of course the query will be constant time!

Not out of the box, but you can easily maintain a count of 1s:
increment it whenever 0 turns into 1
and decrement it when 1 turns into 0.
Once you have done that, comparing the count to 0 is a constant-time operation.
So, essentially, you trade slightly degraded write performance in exchange to answering "are there any 1s left" faster. Whether this is the right tradeoff, only you can know...

If you're able to control access to the array, you could write a wrapper that maintains a count of the number of 1's in the array as items are updated.

Related

Optimal data structure (in C++) for random access and looping through elements

I have the following problem: I have a set of N elements (N being somewhere between several hundred and several thousand elements, let's say between 500 and 3000 elements). Out of these elements, small percentage will have some property "X", but the elements "gain" and "lose" this property in a semi-random fashion; so if I store them all in an array, and assign 1 to elements with property X, and zero otherwise, this array of N elements will have n 1's and the N-n zeros (n being small in the 20-50 range).
The problem is the following: these elements change very frequently in a semi-random way (meaning that any element can flip from 0 to 1 and vice versa, but the process that controls that is somewhat stable, so the total number "n" fluctuates a bit, but is reasonably stable in the 20-50 range); and I frequently need all the "X" elements of the set (in other words, indices of the array where value of the array is 1), to perform some task on them.
One simple and slow way to achieve this is to simply loop through the array and if index k has value 1, perform the task, but this is kinda slow because well over 95% of all the elements have value 1. The solution would be to put all the 1s into a different structure (with n elements) and then loop through that structure, instead of looping through all N elements. The question is what's the best structure to use?
Elements will flip from 0 to 1 and vice versa randomly (from several different threads), so there's no order there of any sort (time when element flipped from 0 to 1 is has nothing to do with time it will flip back), and when I loop through them (from another thread), I do not need to loop in any particular order (in other words, I just need to get them all, but it's nor relevant in which order).
Any suggestions what would be the optimal structure for this? "std::map" comes to mind, but since the keys of std::map are sorted (and I don't need that feature), the questions is if there is anything faster?
EDIT: To clarify, the array example is just one (slow) way to solve the problem. The essence of the problem is that out of one big set "S" with "N" elements, there is a continuously changing subset "s" of "n" elements (with n much smaller then N), and I need to loop though that set "s". Speed is of essence, both for adding/removing elements to "s", and for looping through them. So while suggestions like having 2 arrays and moving elements between them would be fast from iteration perspective, adding and removing elements to an array would be prohibitively slow. It sounds like some hash-based approach like std::set would work reasonably fast on both iteration and addition/removal fronts, the question is is there something better than that? Reading the documentation on "unordered_map" and "unordered_set" doesn't really clarify how much faster addition/removal of elements is relative to std::map and std::set, nor how much slower the iteration through them would be. Another thing to keep in mind is that I don't need a generic solution that works best in all cases, I need one that works best when N is in the 500-3000 range, and n is in the 20-50 range. Finally, the speed is really of essence; there are plenty slow ways of doing it, so I'm looking for the fastest way.
Since order doesn't appear to be important, you can use a single array and keep the elements with property X at the front. You will also need an index or iterator to the point in the array that is the transition from X set to unset.
To set X, increment the index/iterator and swap that element with the one you want to change.
To unset X, do the opposite: decrement the index/iterator and swap that element with the one you want to change.
Naturally with multiple threads you will need some sort of mutex to protect the array and index.
Edit: to keep a half-open range as iterators are normally used, you should reverse the order of the operations above: swap, then increment/decrement. If you keep an index instead of an iterator then the index does double duty as the count of the number of X.
N=3000 isn't really much. If you use a single bit for each of them, you have a structure smaller than 400 bytes. You can use std::bitset for that. If you use an unordered_set or a set however be mindful that you'll spend many more bytes for each of the n elements in your list: if you just allocate a pointer for each element in a 64bit architecture you'll use at least 8*50 = 400 bytes, much more than the bitset
#geza : perhaps I misunderstood what you meant by two arrays; I assume you meant something like have one std::vector (or something similar) in which I store all elements with property X, and another where I store the rest? In reality, I don't care about others, so I really need one array. Adding an element is obviously simple if I can just add it to the end of the array; now, correct me if I'm wrong here, but finding an element in that array is O(n) operation (since the array is unsorted), and then removing it from the array again requires shifting all the elements by one place, so this in average requires n/2 operations. If I use linked list instead of vector, then deleting an element is faster, but finding it still takes O(n). That's what I meant when I said it would be prohibitively slow; if I misunderstood you, please do clarify.
It sounds like std::unordered_set or std::unordered_map would be fastest in adding/deleting elements, since it's O(1) to find an element, but it's unclear to me how fast can one loop through all the keys; the documentation clearly states that iteration through keys of std::unordered_map is slower then iteration through keys of std::map, but it's not quantified in any way just how slow is "slower", and how fast is "faster".
And finally, to repeat one more time, I'm not interested in general solution, I'm interested in one for small "n". So if for example I have two solutions, one that's k_1*log(n), and second that's k_2*n^2, first one might be faster in principle (and for large n), but if k_1 >> k_2 (let's say for example k_1 = 1000 and k_2=2 and n=20), second one can still be faster for relatively small "n" (1000*log(20) is still larger than 2*20^2). So even if addition/deletion in std::unordered_map might be done in constant time O(1), for small "n" it still matters if that constant time is 1 nanosecond or 1 microsecond or 1 millisecond. So I'm really looking for suggestions that work best for small "n", not for in the asymptotic limit of large "n".
An alternative approach (in my opinion worth only if the number of element is increased at least tenfold) might be keeping a double index:
#include<algorithm>
#include<vector>
class didx {
// v == indexes[i] && v > 0 <==> flagged[v-1] == i
std::vector<ptrdiff_t> indexes;
std::vector<ptrdiff_t> flagged;
public:
didx(size_t size) : indexes(size) {}
// loop through flagged items using iterators
auto begin() { return flagged.begin(); }
auto end() { return flagged.end(); }
void flag(ptrdiff_t index) {
if(!isflagged(index)) {
flagged.push_back(index);
indexes[index] = flagged.size();
}
}
void unflag(ptrdiff_t index) {
if(isflagged(index)) {
// swap last item with item to be removed in "flagged", update indexes accordingly
// in "flagged" we swap last element with element at index to be removed
auto idx = indexes[index]-1;
auto last_element = flagged.back();
std::swap(flagged.back(),flagged[idx]);
std::swap(indexes[index],indexes[last_element]);
// remove the element, which is now last in "flagged"
flagged.pop_back();
indexes[index] = 0;
}
}
bool isflagged(ptrdiff_t index) {
return indexes[index] > 0;
}
};

C++ What is the fastest way to scan for certain elements within a unsigned char array and a unsigned char vector?

I have a small question, what is the FASTEST way to scan for certain elements within a LARGE unsigned char array and a vector that contains only unsigned char elements? Straight answer would be great, but in-depth detailed answer would be awesome. What do I mean by fast? Basically, to search for certain characters within at least a second. I know that wasn't a very educated definition...
Note: The array is not sorted.
Common Declaration:
unsigned char* Array = new unsigned char[ 50000 ];
std::vector< unsigned char > Vec( 50000 );
/*
* Fill Array & Vec with random bytes
*/
Lets say, I want to search for the letter 'a' in Array, I would simply write this loop to search for it:
Note: The searching process will be search for more than one elements. Mainly, 256. Hence, you can exploit that magic number.
For loop method:
unsigned int Count = 0;
for ( unsigned int Index = 0; Index != 50000; ++ Index )
if( Array[ Index ] == 'a' ) Count ++;
std::count method:
unsigned int Count = std::count ( Array, Array + 50000, 'a' );
Are there any faster way to search for certain elements within Array?
Some IDEAS - Please don't give me a thumbs down for this! Its only an idea. I want some opinions.
Sorting
Would the speed be better if we made a copy of Array and sort it? Why make a copy? Well, because we need to keep the original content. The goal is to basically scan and count the occurrence of a character. Remember, speed matter. That means, the copying process must be fast.
Answer: No and its not worth it!
Why? Well, lets read this:
#Kiril Kirov:
Depends. If you plan to search for a single char -
absolutely not. Copying the array is an expensive operation. Sorting it - even more expensive.
Well, if you will have only one array and you plan to search for, let's say, 100 different characters, then this method could give you a better performance. Now, this really depends on your usage. And nobody will be able to give you the absolutely correct answer for this case. You need to run it and profile.
*Scroll down to #Kiril Krov's informative post for more.
Answer:
So far, there isn't a solid or an answer, because there isn't a really "fast" method to achieve this goal, especially when its not SORTED. However, threads could be a possible solution. But, watch out for our CPU! This was based on #Andrea's submitted answer (scroll down a little more for more info) - I hoped I read it right.
As others wrote, the complexity of the best algorithm is O(n), especially since your array is not sorted.
To make the search faster, you could subdivide the array and scan each portion separately in separate threads. This would scale linearly with the number of CPU cores you have available on your machine.
If, for example, you have four cores available, then spawn four threads and let each thread scan one fourth of the array.
Probably this discussion might help: Using threads to reduce array search time
In any case (and this is true for any performance related issues), you should profile your code. Create a test case for the approach you have, measure the time it takes and take this as a baseline. Then, for each modification you do, redo the measurement to check if it really improves the execution time. Also make sure to do each measurement more than once (within the same test case) and calculate the average, to reduce caching and other warming up effects (ideally, execute the code at least once before starting the first measurement).
This is Java related, but gives some good feedback that it does not in all cases make sense to parallelize: A BeginnerĀ“s Guide to Hardcore Concurrency
The best algorithm would be O(n), where n is the number of elements.
As you need to check each element, you must go through the whole array.
The easies way I can think of, is already written in your own answer.
And there's no faster way to do this - the memory is continuous, the array is not sorted, you need to "touch" each element. That's the fastest possible solution.
Regarding your edit: using std::count and "manually" looping through the array will give you the same performance.
Are there any faster way to search for certain elements within Array
Yes, if the array is sorted. Then you can achieve up to O( log(n) ). Then you would need some existing search algorithm, like binary search, for example.
Would the speed be better if we made a copy of Array and sort it
Depends. If you plan to search for a single char - absolutely not. Copying the array is an expensive operation. Sorting it - even more expensive.
Well, if you will have only one array and you plan to search for, let's say, 100 different characters, then this method could give you a better performance. Now, this really depends on your usage. And nobody will be able to give you the absolutely correct answer for this case. You need to run it and profile.
What dou you mean by "fast"?
Fast as in complexity, or as an improvement by a constant? You cannot achieve a better complexity with an unsorted array. However, if you change the array very rarely and seach it very often, you can consider sorting it after each change, or better yet, use a different data structure (like a multimap or a set).
If you intend to have a better constant in your O(n), there are some neat tricks which use/abuse the cache of your CPU. If you search for multiple elements, it's genrally faster to search the first few hundred array elements for each of the characters, then the next few hundred, and so on, rather then scan the whole array for each of your search terms. The improvements are not in the complexity, so the effect will usually not be that great. Unless this search happens at your bottleneck repeated deep inside some other algorithm, I would not recommend it. So unless it's inside a rendering algorithm, or a device driver, or for one specific architecture etc. it is most probably not worth it. However, in the rare cases where it might be appropiate, I've seen speed improvements of 3x - 4x or more by using inline assembly and abusing the CPU chache.
EDIT:
Your comment idicated it might be a good idea to include a short introduction about data structures.
array, vector: fastest accessing, slow searching, slow adding/removing if not appended to the end.
list: slow accessing, slow searching, fastest adding/removing
trees, hash tables, etc. : best searching (some allow O(0) searching!), slow changing (depends on type)
I recommend learning about the different data structures (vector, list, map, multimap, set, multiset, etc.) in C++, so you can use the one which best fits your needs.
About the CPU cache: it seems the choosing of a better fitting data structure and code organization is much more important. However, I include this for the sake of completeness.
If you search the array in shorter chunks rather than the whole array at once, that part of the array is added to the cache of your CPU, and accessing the cache is much faster than accessing RAM. So you can work on that smaller chunk of your data (for example, search for multiple elements), then switch to the next chunk of data, and so on. This means, for example,
search "a" in elements 1..100
search "b" in elements 1..100
search "c" in elements 1..100
search "a" in elements 101..200
search "b" in elements 101..200
search "c" in elements 101..200
...
search "c" in elements 999901 .. 1000000
can be faster than
search "a" in elements 1..1000000
search "b" in elements 1..1000000
search "c" in elements 1..1000000
If the number of searched elements (a, b, c, ..) is sufficiently large. Why? Because in case of a cache size of 100, in the first example, data is read 10000 times from the RAM, in the second example, 30000 times.
However, the efficiency of this (and your choice of the data chunk size) heavily depends on your architecture, and is only recommended if you are really sure that this is your real bottleneck. Usually it's not.
Depending on it is one time scan or many times.
Sorting will help a lot on the scan speed, you can always narrow down your scan by bisearch. And the complexity could be O(log(n)).
Or if you can beginning from inserting and build the array which will be scan, you can use red-black tree which is slow to insert, but always sorted.
Last but not least, for your very question in which you are scanning "unsigned char array", in which the number of element is limited. You can do one time scan, but it need more memory: use the value of each element inside your unsigned char array as the index of another array which used for storing the scan result.
If you want the position of every element, the other array could be: int scanresult[256][n], where n is the biggest number for the number of certain char.
If you only need count how many 'a' in the array, the other array could be: int scanresult[256], take this as an example, The complexity is O(n), but only need to run once:
unsigned char* Array = new unsigned char[ 50000 ];
/* Fill Array */
int scanresult[256];
for ( int i=0;i<256;++i) { scanresult[i]=0; }
for ( unsigned int Index = 0; Index != 50000; ++ Index )
scanresult[Array[Index]]++;
For a single character search, std::count is probably as fast
as you're going to get. And for small sets of data (and 50000)
is small, you're not likely to notice the time anyway. Of
course, for a single character, almost any reasonable algorithm
will take less time than it takes to read the data.
(std::count on 50000 elements in a vector or a C style array
will be close to instantaneous on a modern machine. Orders of
magnitude uner your "at least a second", at any rate.)
If you want to go faster, the solution is to not create the
array to begin with, but to do the processing on the fly, while
you're reading the data (or to get the array immediately, via
mmap). And if you need the data for more than one
character... just build up a character frequency table as you
read the data. And find the fastest way of reading the data
(almost certainly mmap under Linux, at least according to some
measures I made recently). After that, just index into this
table when you want the count. Reading the data will be O(n)
(and there's no way around that), but after that, getting the
count is O(1), with a very, very small contant factor as well
(under a nanosecond on a lot of machines).
Don't forget, unsigned char > 0 && unsigned char <= 256...
#define MAX 50000
unsigned char* Array = new unsigned char[ MAX ];
unsigned int Logs[ 256 ];
// Fill Array
::memset( &Logs, 0, sizeof( Logs ) * 256 );
for( unsigned int Index = 0; Index != MAX; ++ Index )
Logs[ Array[ Index ] ] ++;
delete [] Logs;

Set all values of a row and/or column in c++ to 1 or 0

I have a problem which requires resetting all values in a column to 0 or 1. The code which i am using is normal naive approach to set values by iterating each time. Is there any faster implementation.
//Size of board n*n
i=0;
cin>>x>>y;x--;
if(query=="SetRow")
{
while(i!=N){ board[i][x]=y;i++;}
}
else
{
while(i!=N){ board[i][x]=y;i++;}
}
y can be 0 or 1
Well, other then iterating the columns there are few optimizations you might want to make:
Iterating columns is less efficient then iterating rows (about *4 slower) due to cache performance. In columns iteration, you have a cache miss for each element - while in rows iteration you have cache miss for 1 out of 4 elements (usually, it depends on architecture and size of data, but usually a cache line fits 4 integers).
Thus - if you iterate columns more often then rows- redesign it, in order to iterate rows more often. This thread discusses a similar issue.
Also, after you do it - you can use memset() which I believe is better optimized for this task.
(Note: Compilers might do that for you automatically in some cases)
You can use lazy initialization, there is actually O(1) algorithm to initialize an array with constant value, it is described with more details here: initalize an array in constant time. This comes at the cost of ~triple the amount of space, and more expansive seek later on.
The idea behind it (2) is to maintain additional stack (logically, implemented as array+ pointer to top) and array, the additional array will indicate when it was first initialized (a number from 0 to n) and the stack will indicate which elements were already modified.
When you access array[i], if stack[additionalArray[i]] == i && additionalArray[i] < top the value of the array is array[i]. Otherwise - it is the "initialized" value.
When doing array[i] = x, if it was not initialized yet (as seen before), you should set additionalArray[i] = stack[top] and increase top.
This results in O(1) initialization, but as said it requires additional memory and each access is more expansive.
The same principles described by the article regarding initializing an array in O(1) can also be applied here.
The problem is taken from running codechef long contest.... hail cheaters .. close this thread

Inserting and removing elements from an array while maintaining the array to be sorted

I'm wondering whether somebody can help me with this problem. I'm using C/C++ to program and I need to do the following:
I am given a sorted array P (biggest first) containing floats. It usually has a very big size.. sometimes holding correlation values from 10 megapixel images. I need to iterate through the array until it is empty. Within the loop there is additional processing taking place.
The gist of the problem is that at the start of the loop, I need to remove the elements with the maximum value from the array, check certain conditions and if they hold, then I need to reinsert the elements into the array but after decreasing their value. However, I want the array to be efficiently sorted after the reinsertion.
Can somebody point me towards a way of doing this? I have tried the naive approach of re-sorting everytime I insert, but that seems really wasteful.
Change the data structure. Repeatedly accessing the largest element, and then quickly inserting new values, in such a way that you can still efficiently repeatedly access the largest element, is a job for a heap, which may be fairly easily created from your array in C++.
BTW, please don't talk about "C/C++". There is no such language. You're instead making vague implications about the style in which you're writing things, most of which will strike experienced programmers as bad.
I would look into the http://www.cplusplus.com/reference/stl/priority_queue/, as it is designed to do just this.
You could use a binary search to determine where to insert the changed value after you removed it from the array. Note that inserting or removing at the front or somewhere in the middle is not very efficient either, as it requires moving all items with a higher index up or down, respectively.
ISTM that you should rather put your changed items into a new array and sort that once, after you finished iterating over the original array. If memory is a problem, and you really have to do things in place, change the values in place and only sort once.
I can't think of a better way to do this. Keeping the array sorted all the time seems rather inefficient.
Since the array is already sorted, you can use a binary search to find the location to insert the updated value. C++ provides std::lower_bound or std::upper_bound for this purpose, C provides bsearch. Just shift all the existing values up by one location in the array and store the new value at the newly cleared spot.
Here's some pseudocode that may work decently if you aren't decreasing the removed values by much:
For example, say you're processing the element with the maximum value in the array, and say the array is sorted in descending order (largest first).
Remove array[0].
Let newVal = array[0] - adjustment, where adjustment is the amount you're decreasing the value by.
Now loop through, adjusting only the values you need to:
Pseudocode:
i = 0
while (newVal < array[i]) {
array[i] = array[i+1];
i++;
}
array[i] = newVal;
swap(array[i], array[i+1]);
Again, if you're not decreasing the removed values by a large amount (relative to the values in the array), this could work fairly efficiently.
Of course, the generally better alternative is to use a more appropriate data structure, such as a heap.
Maybe using another temporary array could help.
This way you can first sort the "changed" elements alone.
And after that just do a regular merge O(n) for the two sub-arrays to the temp array, and copy everything back to the original array.

How to partition bits in a bit array with less than linear time

This is an interview question I faced recently.
Given an array of 1 and 0, find a way to partition the bits in place so that 0's are grouped together, and 1's are grouped together. It does not matter whether 1's are ahead of 0's or 0's are ahead of 1's.
An example input is 101010101, and output is either 111110000 or 000011111.
Solve the problem in less than linear time.
Make the problem simpler. The input is an integer array, with each element either 1 or 0. Output is the same integer array with integers partitioned well.
To me, this is an easy question if it can be solved in O(N). My approach is to use two pointers, starting from both ends of the array. Increases and decreases each pointer; if it does not point to the correct integer, swap the two.
int * start = array;
int * end = array + length - 1;
while (start &lt end) {
// Assume 0 always at the end
if (*end == 0) {
--end;
continue;
}
// Assume 1 always at the beginning
if (*start == 1) {
++start;
continue;
}
swap(*start, *end);
}
However, the interview insists there is a sub-linear solution. This makes me thinking hard but still not get an answer.
Can anyone help on this interview question?
UPDATE: Seeing replies in SO stating that the problem cannot be solved in sub-linear time, I can confirm my original idea that there cannot be a solution of sub-linear.
Is it possible the interviewer plays a trick?
I don't see how there can be a solution faster than linear time.
Imagine a bit array that is all 1's. Any solution will require examining every bit in this array before declaring that it is already partitioned. Examining every bit takes linear time.
It's not possible. Doing it in less than linear time implies that you don't look at every array element (like a binary search). However since there is no way to know what any element of the array is without looking at it, you must look at each array element at least once.
You can use lookup tables to make it faster, but O(n/8) is still O(n), so either the interviewer was wrong or you misunderstood the question.
It is possible faster then in linear time given you have enough memory, it can be done in O(1)
Use the bitmask as index in a vector which maps to the partitioned bitmask.
using your example, at index 341 (101010101) the value 496 (111110000) is stored.
Perhaps the confusion comes from "less than linear time". For example, this solution counts the number of bits, that makes a masks containing that many bits. It only counts bits while there are uncounted on-bits:
// from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
unsigned count_bits(unsigned pX)
{
unsigned result;
for (result = 0; v; ++result)
{
pX &= pX - 1;
}
return result;
}
unsigned n = /* the number */;
// r contains 000...111, with number of 1's equal to number of 1's in v
unsigned r = 1 << count_bits(n);
Even though this minimizes the number of bits to count, it's still linear. So if this is what is meant by "sub-linear", there you go.
But if they really meant sub-linear as in logarithmic or constant, I don't see a way. You could conceivably make a look-up table for every value, but :/
Technically you could send each element of the array to a separate processor and then do it in less than linear time. If you have N processors, you could even do it in O(1) time!
As others said, I don't believe this can be done in less than linear time. For linear time solution, you can STL algorithms instead your own loop like this:
int a1[8] = {1,0,1,0,1,0,1,0};
std::fill(std::remove(a1, a1+8, 0), a1+8, 0);
Well.. It can be be done 'less than linear' time (cheeky method).
if(n % 2)
{
// Arrange all 1's to the right and DON'T check the right-most bit, because it's 1
}else{
// Arrange all 0's to the right and DON'T check the right-most bit, because it's 0.
}
So, technically you 'group' the bits in less than linear time :P
To me, the most likely interpretations are:
The bits are supposed to be in an int instead of an array, in which case you can use something like http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan or an 8-bit (or more) lookup table.
they used "sublinear" to mean "less than n operations" rather than less-than-O(n). But even that seems impossible for the same reasons listed below.
There is another miscommunication in the question
Otherwise the question is wrong, since all elements of the array must be examined to determine the answer, and that is at least 'n' operations.
Listing either 0s or 1s first, and the references to bits rather than bools make me think something like the first option was intended, even though, when dealing with only one word, it doesn't make very much difference. I'm curious to know what the interviewer actually had in mind.
Splitting this work among parallel processors costs N/M ( or O(N) ) only if you assume that parallelism increases more slowly than problem size does. For the last ten years or so, paralellism (via the GPU) has been increasing more rapidly than typical problem sizes, and this trend looks to continue for years to come. For a broad class of problems, it is instructive to assume "infinite parallelism" or more precisely, "parallelism greater than any expected problem size" because the march of progress in GPUs and cloud computing provides such a thing over time.
Assuming infinite parallelism, this problem can be solved in O(logN) time because the addition operator required to add up all the 0 and 1 bits is associative, and so it requires at least logN time steps to complete.