Fast way to pick randomly from a set, with each entry picked only once? - c++

I'm working on a program to solve the n queens problem (the problem of putting n chess queens on an n x n chessboard such that none of them is able to capture any other using the standard chess queen's moves). I am using a heuristic algorithm, and it starts by placing one queen in each row and picking a column randomly out of the columns that are not already occupied. I feel that this step is an opportunity for optimization. Here is the code (in C++):
vector<int> colsleft;
//fills the vector sequentially with integer values
for (int c=0; c < size; c++)
colsleft.push_back(c);
for (int i=0; i < size; i++)
{
vector<int>::iterator randplace = colsleft.begin() + rand()%colsleft.size();
/* chboard is an integer array, with each entry representing a row
and holding the column position of the queen in that row */
chboard[i] = *randplace;
colsleft.erase(randplace);
}
If it is not clear from the code: I start by building a vector containing an integer for each column. Then, for each row, I pick a random entry in the vector, assign its value to that row's entry in chboard[]. I then remove that entry from the vector so it is not available for any other queens.
I'm curious about methods that could use arrays and pointers instead of a vector. Or <list>s? Is there a better way of filling the vector sequentially, other than the for loop? I would love to hear some suggestions!

The following should fulfill your needs:
#include <algorithm>
...
int randplace[size];
for (int i = 0; i < size; i ++)
randplace[i] = i;
random_shuffle(randplace, randplace + size);
You can do the same stuff with vectors, too, if you wish.
Source: http://gethelp.devx.com/techtips/cpp_pro/10min/10min1299.asp

Couple of random answers to some of your questions :):
As far as I know, there's no way to fill an array with consecutive values without iterating over it first. HOWEVER, if you really just need consecutive values, you do not need to fill the array - just use the cell indices as the values: a[0] is 0 and a[100] is 100 - when you get a random number, treat the number as the value.
You can implement the same with a list<> and remove cells you already hit, or...
For better performance, rather than removing cells, why not put an "already used" value in them (like -1) and check for that. Say you get a random number like 73, and a[73] contains -1, you just get a new random number.
Finally, describing item 3 reminded me of a re-hashing function. Perhaps you can implement your algorithm as a hash-table?

Your colsleft.erase(randplace); line is really inefficient, because erasing an element in the middle of the vector requires shifting all the ones after it. A more efficient approach that will satisfy your needs in this case is to simply swap the element with the one at index (size - i - 1) (the element whose index will be outside the range in the next iteration, so we "bring" that element into the middle, and swap the used one out).
And then we don't even need to bother deleting that element -- the end of the array will accumulate the "chosen" elements. And now we've basically implemented an in-place Knuth shuffle.

Related

C++: Remove repeated numbers in a matrix

I want to remove numbers from a matrix that represents coordinates with the format 'x y z'. One example:
1.211 1.647 1.041
2.144 2.684 1.548
1.657 2.245 1.021
1.657 0.984 2.347
2.154 0.347 2.472
1.211 1.647 1.041
In this example the coordinates 1 and 6 are the same (x, y and z are the same) and I want to remove them but I do not want to remove cases with only one value equal as coordinates 3 and 4 for x-coordinate).
These values are in a text file and I want to print the coordinates without duplication in another file or even in the same one.
A very simple solution would be to treat each line as a string and use a set of strings. As you traverse the file line-wise, you check if the current line exists in the set and if not, you insert and print it.
Complexity: O(nlogn), extra memory needed: almost the same as your input file in the worst case
With the same complexity and exactly the worst case memory consumption as the previous solution, you can load the file in memory, sort it line-wise, and then easily skip duplicates while printing. The same can be done inside the file if you are allowed to re-order it, and this way you need very little extra memory, but be much slower.
If memory and storage is an issue (I'm assuming since you can't duplicate the file), you can use the simple method of comparing the current line with all previous lines before printing, with O(n^2) complexity but no extra memory. This however is a rather bad solution, since you have to read multiple times from the file, which can be really slow compared to the main memory.
How to do this if you want to preserve the order.
Read the coordinates into an array of structures like this
struct Coord
{
double x,y,z;
int pos;
bool deleted;
};
pos is the line number, deleted is set to false.
Sort the structs by whatever axis tends to show the greatest variation.
Run through the array comparing the value of the axis you were using in the sort from the previous item to the value in the current item. If the difference is less than a certain preset delta (.i.e. if you care about three digits after the decimal point you would look for a difference of 0.000999999 or so) you compare the remaining values and set deleted for any line where x,y,z are close enough.
for(int i=1;i<count;i++)
{
if(fabs(arr[i].x-arr[i-1].x)<0.001)
if(fabs(arr[i].y-arr[i-1].y)<0.001)
if(fabs(arr[i].z-arr[i-1].z)<0.001)
arr[i].deleted=true;
}
sort the array again, this time ascending by pos to restore the order.
Go through the array and output all items where deleted is false.
In c++, you can use the the power of STL to solve this problem. Use the map and store the three coordinates x, y and z as a key in the map. The mapped value to the key will store the count of that key.
Key_type = pair<pair<float,float>,float>
mapped_type = int
Create a map m with the above given key_type and mapped_type and insert all the rows into the map updating the count for each row. Let's assume n is the total number of rows.
for(i = 0; i < n; i++) {
m[make_pair(make_pair(x,y),z)]++;
}
Each insertion takes O(logn) and you have to insert n times. So,overall time complexity will be O(nlogn). Now, loop over all the rows of the matrix again and if the mapped_value of that row is 1, then it is unique.

How do I print out vectors in different order every time

I'm trying to make two vectors. Where vector1 (total1) is containing some strings and vector2(total2) is containing some random unique numbers(that are between 0 and total1.size() - 1)
I want to make a program that print out total1s strings, but in different order every turn. I don't want to use iterators or something because I want to improve my problem solving capacity.
Here is the specific function that crash the program.
for (unsigned i = 0; i < total1.size();)
{
v1 = rand() % total1.size();
for (unsigned s = 0; s < total1.size(); ++s)
{
if (v1 == total2[s])
;
else
{
total2.push_back(v1);
++i;
}
}
}
I'm very grateful for any help that I can get!
Can I suggest you change of algorithm?. Because, even if your current one is correctly implemented ("s", in your code, must go from 0 to total2.size not total1.size and if element is found, break and generate a new random), it has the following drawback: assume vectors of 1.000.000 elements and you are trying the last random number. You have one probability in 1.000.000 of find a random number not previously used. That is a very small amount.Last but one number has a probability of 2 in 1.000.000 also small. In conclusion, your program will loop and expend lots of CPU resources.
Your best alternative is follow #NathanOliver suggestion and look for function std::shuffle. The manual page shows the implementation algorithm, that is what you are looking for.
Another simple algorithm, with some pros and cons, is:
init total2 with sequence 0,1,2,...,n where n is the size total1 - 1
choice two random numbers, i1 and i2, in range [0,n-1].
Swap elements i1 and i2 in total2.
repeat from (2) a fixed number of times "R".
This method allows to known a priori the necessary steps and to control the level of "randomness" of the final vector (bigger R is more random). However, it is far to be good in its randomness quality.
Another method, better in the probabilistic distribution:
fill a list L with number 0,1,2,...size total1-1.
choice a random number i between 0 and the size of list L - 1 .
Store in total2 the i-th element in list L.
Remove this element from L.
repeat from (2) until L is empty.
If you just want to shuffle vector<string> total1, you can do this without using helping vector<int> total2. Here is an implementation based on Fisher–Yates shuffle.
for(int i=n-1; i>=1; i--) {
int j=rand()%(i+1);
swap(total1[j], total1[i]); // your prof might not allow use of swap:)
}
If you must use vector<int> total2 then shuffle it using above algorithm. Next you can use it to create a new vector<string> result from total1 where result[i]=total1[total2[i]].

C++ - read 1000 floats and insert them into a vector of size 10 by keeping the lowest 10 numbers only

So I am pretty new to c++ and I am not sure if there is a data structure already created to facilitate what I am trying to do (so I do not reinvent the wheel):
What I am trying to do
I am reading a file where I need to parse the file, do some calculations on every floating value on every row of the file, and return the top 10 results from the file in ascending order.
What am I trying to optimize
I am dealing with a 1k file and a 1.9 million row file so for each row, I will get a result that is of size 72 so in 1k row, I will need to allocate a vector of 72000 elements and for the 1.9 million rows ... well you get the idea.
What I have so far
I am currently working with a vector for the results which then I sort and resize it to 10.
const unsigned int vector_space = circularVector.size()*72;
//vector for the results
std::vector<ResultType> results;
results.reserve(vector_space);
but this is extremely inefficient.
*What I want to accomplish *
I want to only keep a vector of size 10, and whenever I perform a calculation, I will simply insert the value into the vector and remove the largest floating point that was in the vector, thus maintaining the top 10 results in ascending order.
Is there a structure already in c++ that will have such behavior?
Thanks!
EDIT: Changed to use the 10 lowest elements rather than the highest elements as the question now makes clear which is required
You can use a std::vector of 10 elements as a max heap, in which the elements are partially sorted such that the first element always contains the maximum value. Note that the following is all untested, but hopefully it should get you started.
// Create an empty vector to hold the highest values
std::vector<ResultType> results;
// Iterate over the first 10 entries in the file and put the results in the vector
for (... ; i < 10; i++) {
// Calculate the value of this row
ResultType r = ....
// Add it to the vector
results.push_back(r);
}
// Now that the vector is "full", turn it into a heap
std::make_heap(results.begin(), results.end());
// Iterate over all the remaining rows, adding values which are lower than the
// current maximum
for (i = 10; .....) {
// Calculate the value for this row
ResultType r = ....
// Compare it to the max element in the heap
if (r < results.front()) {
// Add the new element to the vector
results.push_back(r);
// Move the existing minimum to the back and "re-heapify" the rest
std::pop_heap(results.begin(), results.end());
// Remove the last element from the vector
results.pop_back();
}
}
// Finally, sort the results to put them all in order
// (using sort_heap just because we can)
std::sort_heap(results.begin(), results.end());
Yes. What you want is a priority queue or heap, defined so as to remove the lowest value. You just need to do such a remove if the size after the insertion is greater than 10. You should be able to do this with STL classes.
Just use std::set to do that, since in std::set all values are sorted from min to max.
void insert_value(std::set<ResultType>& myset, const ResultType& value){
myset.insert(value);
int limit = 10;
if(myset.size() > limit){
myset.erase(myset.begin());
}
}
I think MaxHeap will work for this problem.
1- Create a max heap of size 10.
2- Fill the heap with 10 elements for the first time.
3- For 11th element check it with the largest element i.e root/element at 0th index.
4- If 11th element is smaller; replace the root node with 11th element and heapify again.
Repeat the same steps until the whole file is parsed.

Randomly select index from a STL vector from truth value

I have a vector that looks like:
vector<int> A = {0, 1, 1, 0, 0, 1, 0, 1};
I'd like to select a random index from the non-zero values of A. Using this example A, I want to randomly select an element from the array {1,2,5,7}.
Currently I do this by creating another array
vector<int> b;
for(int i=0;i<A.size();i++)
if(A[i])
b.push_back(i);
Once b is created, I find the index by using this answer:
get random element from container
Is there a more STL-like (or C++11) way of doing this, perhaps one that does not create an intermediate array? In this example A is small, but in my production code this selection process is in an inner-loop and A is non-static and thousands of elements long.
A great way to do this is Reservoir Sampling.
In short, you walk your array until you find the first non-zero value, and record that index as the first possible answer you might return.
Then, you continue to walk the array. Every time you find a non-zero value, you randomly might change which new index is your possible answer, with decreasing probability.
This algorithm also works great if you need M random index values from your array.
What's great about this, is that you walk each element only one time, and you don't need a separate memory structure to record the non-zero elements. It's O(N) in speed, and O(M) in memory, in your case it's O(1) in memory, since you only want 1 random value.
On the flip side, random number generators are traditionally quite slow. So, you might want to performance test this against any other ideas people come up with here, to see if the trade-off of speed-vs-memory is worth it for you.
With a single pass through the array, you can determine how many false (or true) values there are. If you are doing this kind of thing often, you can even write a class to keep track of this for you.
Regardless, you can then pick a random number i between 0 and num_false (or num_true). Then with another pass through the array, you can return the ith false (or true) index.
We can loop through each non-zero value and assign it a random number. The index with the largest random number is the one we select.
int value = 0;
int index = 0;
while(int i = 0; i < A.size(); i++) {
if(!A[i]) continue;
auto j = rand();
if(j > value) {
index = i;
value = j;
}
}
vector<int> A = {0,1,1,0,0,1,0,1};
random_shuffle(A.begin(),A.end());
auto it = find_if(A.begin(),A.end(),[](const int elem){return elem;});

Sorting an integer array of 100 elements having only 3 elements in it

Suppose I have an array of 100 numbers. The only distinct values in the array are 1, 2 and 3. The values are randomly ordered throughout the array. For instance, the array might be populated as:
int values[100];
for (int i = 0; i < 100; i++)
values[i] = 1 + rand() % 3;
How can I efficiently sort an array like this?
The fastest solution is not to "sort" at all:
Run through the array and count the number of occurrences of 1,2 and 3. These counts should hopefully fit in registers...
Fill the array with the right number of 1s, 2s and 3s, overwriting whatever is there already.
At the end you will have a fully sorted array.
In general, this can be a useful O(n) sorting algorithm when you have a very small range of possible values compared to the size of the array.
Dutch National flag algorithm is the commonly cited algorithm for this and is actually the partition step in one of the variants of quicksort (1 corresponds to less than, 2 to equal to and 3 to greater than). In that variant, you don't need to sort the middle portion.