C++: Remove repeated numbers in a matrix - c++

I want to remove numbers from a matrix that represents coordinates with the format 'x y z'. One example:
1.211 1.647 1.041
2.144 2.684 1.548
1.657 2.245 1.021
1.657 0.984 2.347
2.154 0.347 2.472
1.211 1.647 1.041
In this example the coordinates 1 and 6 are the same (x, y and z are the same) and I want to remove them but I do not want to remove cases with only one value equal as coordinates 3 and 4 for x-coordinate).
These values are in a text file and I want to print the coordinates without duplication in another file or even in the same one.

A very simple solution would be to treat each line as a string and use a set of strings. As you traverse the file line-wise, you check if the current line exists in the set and if not, you insert and print it.
Complexity: O(nlogn), extra memory needed: almost the same as your input file in the worst case
With the same complexity and exactly the worst case memory consumption as the previous solution, you can load the file in memory, sort it line-wise, and then easily skip duplicates while printing. The same can be done inside the file if you are allowed to re-order it, and this way you need very little extra memory, but be much slower.
If memory and storage is an issue (I'm assuming since you can't duplicate the file), you can use the simple method of comparing the current line with all previous lines before printing, with O(n^2) complexity but no extra memory. This however is a rather bad solution, since you have to read multiple times from the file, which can be really slow compared to the main memory.

How to do this if you want to preserve the order.
Read the coordinates into an array of structures like this
struct Coord
{
double x,y,z;
int pos;
bool deleted;
};
pos is the line number, deleted is set to false.
Sort the structs by whatever axis tends to show the greatest variation.
Run through the array comparing the value of the axis you were using in the sort from the previous item to the value in the current item. If the difference is less than a certain preset delta (.i.e. if you care about three digits after the decimal point you would look for a difference of 0.000999999 or so) you compare the remaining values and set deleted for any line where x,y,z are close enough.
for(int i=1;i<count;i++)
{
if(fabs(arr[i].x-arr[i-1].x)<0.001)
if(fabs(arr[i].y-arr[i-1].y)<0.001)
if(fabs(arr[i].z-arr[i-1].z)<0.001)
arr[i].deleted=true;
}
sort the array again, this time ascending by pos to restore the order.
Go through the array and output all items where deleted is false.

In c++, you can use the the power of STL to solve this problem. Use the map and store the three coordinates x, y and z as a key in the map. The mapped value to the key will store the count of that key.
Key_type = pair<pair<float,float>,float>
mapped_type = int
Create a map m with the above given key_type and mapped_type and insert all the rows into the map updating the count for each row. Let's assume n is the total number of rows.
for(i = 0; i < n; i++) {
m[make_pair(make_pair(x,y),z)]++;
}
Each insertion takes O(logn) and you have to insert n times. So,overall time complexity will be O(nlogn). Now, loop over all the rows of the matrix again and if the mapped_value of that row is 1, then it is unique.

Related

Find minimum value at each index after queries which tell you minimum value over a range

Assume that initially in array a each element has infinity as value.
Now M queries are input of the type l r x.
Here l to r is range where value need to be updated if a[i]>x where l<=i<=r and l<=r<=n.
After M queries you need to output the minimum value at each index.
One way to this is to use Brute Force
memset(a,inf,sizeof(a));
while(j<m)
{
scanf("%d %d %d",&l,&r,&c);
for(i=l-1;i<r;i++)
{
if(a[i]>c)
a[i]=c;
}
j++;
}
for(i=0;i<n;i++)
printf("%d",a[i]);
Now this takes O(mn) time where n=size of each query which can be n in worst case.
What are more efficient ways to solve this in lesser time complexity?
There is an approach that has a different asymptotic complexity. It involves keeping a sorted list of begin and end of queries. In order to avoid actual sorting, I'm using a sparse array the size of a.
Now, the general idea is that you store the queries and while iterating you keep a heap containing the queries is who's range you are:
# size of array (n)
count = ...
# for each array element you have a list of ranges that
# start or end at this array element
list<queries> l[count]
list<queries> r[count]
heap<queries> h
for i in range(count):
if l[i]:
h.push(l[i])
if h is empty:
output(inf)
else:
output(h.lowest().value)
if r[i]:
h.pop(r[i])
The actual performance of this (and other algorithms) greatly depends on the size of the array and density of the queries, none of which is covered in the asymptotic complexity of this algorithm though. Finding an optimal algorithm can't be done while ignoring the actual input data. It could also be worthwhile to change algorithms depending on the data.
Note: my answer assumes that the problem is online, so you must execute updates and queries as they arrive. An advantage of this is that my solution is more robust, allowing you to add more types of updates and queries in the same complexity. The disadvantage is that it might not be the absolute best choice for your problem if you're dealing with an offline problem.
You can use a segment tree. Have each node in the segment tree store the minimum value set for its associated interval (initially infinity, something very large) and use a lazy update and query scheme.
Update(left, right, c)
Update(node, left, right, c):
if node.interval does not intersect [left, right]:
return
if node.interval included in [left, right]:
node.minimum = min(c, node.minimum)
return
Update(node.left, left, right, c)
Update(node.right, left, right, c)
Query(index)
Query(node, minimum = infinity, index):
if node.interval == [index, index]:
return minimum
if index included in node.left.interval:
return Query(node.left, min(minimum, node.minimum), index)
return Query(node.right, min(minimum, node.minimum), index)
Total complexity: O(log n) for each update and query operation. You need to call Query for every element in the end.

C++ - read 1000 floats and insert them into a vector of size 10 by keeping the lowest 10 numbers only

So I am pretty new to c++ and I am not sure if there is a data structure already created to facilitate what I am trying to do (so I do not reinvent the wheel):
What I am trying to do
I am reading a file where I need to parse the file, do some calculations on every floating value on every row of the file, and return the top 10 results from the file in ascending order.
What am I trying to optimize
I am dealing with a 1k file and a 1.9 million row file so for each row, I will get a result that is of size 72 so in 1k row, I will need to allocate a vector of 72000 elements and for the 1.9 million rows ... well you get the idea.
What I have so far
I am currently working with a vector for the results which then I sort and resize it to 10.
const unsigned int vector_space = circularVector.size()*72;
//vector for the results
std::vector<ResultType> results;
results.reserve(vector_space);
but this is extremely inefficient.
*What I want to accomplish *
I want to only keep a vector of size 10, and whenever I perform a calculation, I will simply insert the value into the vector and remove the largest floating point that was in the vector, thus maintaining the top 10 results in ascending order.
Is there a structure already in c++ that will have such behavior?
Thanks!
EDIT: Changed to use the 10 lowest elements rather than the highest elements as the question now makes clear which is required
You can use a std::vector of 10 elements as a max heap, in which the elements are partially sorted such that the first element always contains the maximum value. Note that the following is all untested, but hopefully it should get you started.
// Create an empty vector to hold the highest values
std::vector<ResultType> results;
// Iterate over the first 10 entries in the file and put the results in the vector
for (... ; i < 10; i++) {
// Calculate the value of this row
ResultType r = ....
// Add it to the vector
results.push_back(r);
}
// Now that the vector is "full", turn it into a heap
std::make_heap(results.begin(), results.end());
// Iterate over all the remaining rows, adding values which are lower than the
// current maximum
for (i = 10; .....) {
// Calculate the value for this row
ResultType r = ....
// Compare it to the max element in the heap
if (r < results.front()) {
// Add the new element to the vector
results.push_back(r);
// Move the existing minimum to the back and "re-heapify" the rest
std::pop_heap(results.begin(), results.end());
// Remove the last element from the vector
results.pop_back();
}
}
// Finally, sort the results to put them all in order
// (using sort_heap just because we can)
std::sort_heap(results.begin(), results.end());
Yes. What you want is a priority queue or heap, defined so as to remove the lowest value. You just need to do such a remove if the size after the insertion is greater than 10. You should be able to do this with STL classes.
Just use std::set to do that, since in std::set all values are sorted from min to max.
void insert_value(std::set<ResultType>& myset, const ResultType& value){
myset.insert(value);
int limit = 10;
if(myset.size() > limit){
myset.erase(myset.begin());
}
}
I think MaxHeap will work for this problem.
1- Create a max heap of size 10.
2- Fill the heap with 10 elements for the first time.
3- For 11th element check it with the largest element i.e root/element at 0th index.
4- If 11th element is smaller; replace the root node with 11th element and heapify again.
Repeat the same steps until the whole file is parsed.

Find dominant mode of an unsorted array

Note, this is a homework assignment.
I need to find the mode of an array (positive values) and secondarily return that value if the mode is greater that sizeof(array)/2,the dominant value. Some arrays will have neither.
That is simple enough, but there is a constraint that the array must NOT be sorted prior to the determination, additionally, the complexity must be on the order of O(nlogn).
Using this second constraint, and the master theorem we can determine that the time complexity 'T(n) = A*T(n/B) + n^D' where A=B and log_B(A)=D for O(nlogn) to be true. Thus, A=B=D=2. This is also convenient since the dominant value must be dominant in the 1st, 2nd, or both halves of an array.
Using 'T(n) = A*T(n/B) + n^D' we know that the search function will call itself twice at each level (A), divide the problem set by 2 at each level (B). I'm stuck figuring out how to make my algorithm take into account the n^2 at each level.
To make some code of this:
int search(a,b) {
search(a, a+(b-a)/2);
search(a+(b-a)/2+1, b);
}
The "glue" I'm missing here is how to combine these divided functions and I think that will implement the n^2 complexity. There is some trick here where the dominant must be the dominant in the 1st or 2nd half or both, not quite sure how that helps me right now with the complexity constraint.
I've written down some examples of small arrays and I've drawn out ways it would divide. I can't seem to go in the correct direction of finding one, single method that will always return the dominant value.
At level 0, the function needs to call itself to search the first half and second half of the array. That needs to recurse, and call itself. Then at each level, it needs to perform n^2 operations. So in an array [2,0,2,0,2] it would split that into a search on [2,0] and a search on [2,0,2] AND perform 25 operations. A search on [2,0] would call a search on [2] and a search on [0] AND perform 4 operations. I'm assuming these would need to be a search of the array space itself. I was planning to use C++ and use something from STL to iterate and count the values. I could create a large array and just update counts by their index.
if some number occurs more than half, it can be done by O(n) time complexity and O(1) space complexity as follow:
int num = a[0], occ = 1;
for (int i=1; i<n; i++) {
if (a[i] == num) occ++;
else {
occ--;
if (occ < 0) {
num = a[i];
occ = 1;
}
}
}
since u r not sure whether such number occurs, all u need to do is to apply the above algorithm to get a number first, then iterate the whole array 2nd time to get the occurance of the number and check whether it is greater than half.
If you want to find just the dominant mode of an array, and do it recursively, here's the pseudo-code:
def DominantMode(array):
# if there is only one element, that's the dominant mode
if len(array) == 1: return array[0]
# otherwise, find the dominant mode of the left and right halves
left = DominantMode(array[0:len(array)/2])
right = DominantMode(array[len(array)/2:len(array)])
# if both sides have the same dominant mode, the whole array has that mode
if left == right: return left
# otherwise, we have to scan the whole array to determine which one wins
leftCount = sum(element == left for element in array)
rightCount = sum(element == right for element in array)
if leftCount > len(array) / 2: return left
if rightCount > len(array) / 2: return right
# if neither wins, just return None
return None
The above algorithm is O(nlogn) time but only O(logn) space.
If you want to find the mode of an array (not just the dominant mode), first compute the histogram. You can do this in O(n) time (visiting each element of the array exactly once) by storing the historgram in a hash table that maps the element value to its frequency.
Once the histogram has been computed, you can iterate over it (visiting each element at most once) to find the highest frequency. Once you find a frequency larger than half the size of the array, you can return immediately and ignore the rest of the histogram. Since the size of the histogram can be no larger than the size of the original array, this step is also O(n) time (and O(n) space).
Since both steps are O(n) time, the resulting algorithmic complexity is O(n) time.

Find pair of elements in integer array such that abs(v[i]-v[j]) is minimized

Lets say we have int array with 5 elements: 1, 2, 3, 4, 5
What I need to do is to find minimum abs value of array's elements' subtraction:
We need to check like that
1-2 2-3 3-4 4-5
1-3 2-4 3-5
1-4 2-5
1-5
And find minimum abs value of these subtractions. We can find it with 2 fors. The question is, is there any algorithm for finding value with one and only for?
sort the list and subtract nearest two elements
The provably best performing solution is assymptotically linear O(n) up until constant factors.
This means that the time taken is proportional to the number of the elements in the array (which of course is the best we can do as we at least have to read every element of the array, which already takes O(n) time).
Here is one such O(n) solution (which also uses O(1) space if the list can be modified in-place):
int mindiff(const vector<int>& v)
{
IntRadixSort(v.begin(), v.end());
int best = MAX_INT;
for (int i = 0; i < v.size()-1; i++)
{
int diff = abs(v[i]-v[i+1]);
if (diff < best)
best = diff;
}
return best;
}
IntRadixSort is a linear time fixed-width integer sorting algorithm defined here:
http://en.wikipedia.org/wiki/Radix_sort
The concept is that you leverage the fixed-bitwidth nature of ints by paritioning them in a series of fixed passes on the bit positions. ie partition them on the hi bit (32nd), then on the next highest (31st), then on the next (30th), and so on - which only takes linear time.
The problem is equivalent to sorting. Any sorting algorithm could be used, and at the end, return the difference between the nearest elements. A final pass over the data could be used to find that difference, or it could be maintained during the sort. Before the data is sorted the min difference between adjacent elements will be an upper bound.
So to do it without two loops, use a sorting algorithm that does not have two loops. In a way it feels like semantics, but recursive sorting algorithms will do it with only one loop. If this issue is the n(n+1)/2 subtractions required by the simple two loop case, you can use an O(n log n) algorithm.
No, unless you know the list is sorted, you need two
Its simple Iterate in a for loop
keep 2 variable "minpos and maxpos " and " minneg" and "maxneg"
check for the sign of the value you encounter and store maximum positive in maxpos
and minimum +ve number in "minpos" do the same by checking in if case for number
less than zero. Now take the difference of maxpos-minpos in one variable and
maxneg and minneg in one variable and print the larger of the two . You will get
desired.
I believe you definitely know how to find max and min in one for loop
correction :- The above one is to find max difference in case of minimum you need to
take max and second max instead of max and min :)
This might be help you:
end=4;
subtractmin;
m=0;
for(i=1;i<end;i++){
if(abs(a[m]-a[i+m])<subtractmin)
subtractmin=abs(a[m]-a[i+m];}
if(m<4){
m=m+1
end=end-1;
i=m+2;
}}

Fast way to pick randomly from a set, with each entry picked only once?

I'm working on a program to solve the n queens problem (the problem of putting n chess queens on an n x n chessboard such that none of them is able to capture any other using the standard chess queen's moves). I am using a heuristic algorithm, and it starts by placing one queen in each row and picking a column randomly out of the columns that are not already occupied. I feel that this step is an opportunity for optimization. Here is the code (in C++):
vector<int> colsleft;
//fills the vector sequentially with integer values
for (int c=0; c < size; c++)
colsleft.push_back(c);
for (int i=0; i < size; i++)
{
vector<int>::iterator randplace = colsleft.begin() + rand()%colsleft.size();
/* chboard is an integer array, with each entry representing a row
and holding the column position of the queen in that row */
chboard[i] = *randplace;
colsleft.erase(randplace);
}
If it is not clear from the code: I start by building a vector containing an integer for each column. Then, for each row, I pick a random entry in the vector, assign its value to that row's entry in chboard[]. I then remove that entry from the vector so it is not available for any other queens.
I'm curious about methods that could use arrays and pointers instead of a vector. Or <list>s? Is there a better way of filling the vector sequentially, other than the for loop? I would love to hear some suggestions!
The following should fulfill your needs:
#include <algorithm>
...
int randplace[size];
for (int i = 0; i < size; i ++)
randplace[i] = i;
random_shuffle(randplace, randplace + size);
You can do the same stuff with vectors, too, if you wish.
Source: http://gethelp.devx.com/techtips/cpp_pro/10min/10min1299.asp
Couple of random answers to some of your questions :):
As far as I know, there's no way to fill an array with consecutive values without iterating over it first. HOWEVER, if you really just need consecutive values, you do not need to fill the array - just use the cell indices as the values: a[0] is 0 and a[100] is 100 - when you get a random number, treat the number as the value.
You can implement the same with a list<> and remove cells you already hit, or...
For better performance, rather than removing cells, why not put an "already used" value in them (like -1) and check for that. Say you get a random number like 73, and a[73] contains -1, you just get a new random number.
Finally, describing item 3 reminded me of a re-hashing function. Perhaps you can implement your algorithm as a hash-table?
Your colsleft.erase(randplace); line is really inefficient, because erasing an element in the middle of the vector requires shifting all the ones after it. A more efficient approach that will satisfy your needs in this case is to simply swap the element with the one at index (size - i - 1) (the element whose index will be outside the range in the next iteration, so we "bring" that element into the middle, and swap the used one out).
And then we don't even need to bother deleting that element -- the end of the array will accumulate the "chosen" elements. And now we've basically implemented an in-place Knuth shuffle.