Compare each element of a 2D array with the rest C++ - c++

I'm new to C++, and what I have to do is to write a method that checks if a 2D array contains any duplicate items.
So for example, I have a matrix[3][4], I've been able to compare the first element [0][0] with the rest, till the last one [2][3].
The problem is that I don't know how to proceed, what I should do is that the method then compares the element [0][1] with the rest (except with the previous one and itself of course) and etc..

First of, the fact that it's a 2D array is irrelevant in this context; you want to find duplicates across the entire array, so you'd be better of with 1-dimensional indexing anyway. Coincidentally, that's a suggested way of handlings two-dimensional arrays in C++.
So assuming you can put the indices in some order, the general idea is to check every element with all subsequent elements. This will give you O(n2). A pseudocode is, unsurprisingly, two loops, and is a common pattern used e.g. for collision detection:
for (iterA = 0; iterA < num - 1; iterA++) {
for (iterB = iterA + 1; iterB < num; iterB++) { // note how iterB starts one further
if (*iterA == *iterB)
return true; // found a duplicate
}
}
In case of a 2D array, the *iterA dereference can be replaced with a function that breaks up the composite 1-dimensional index into two components, e.g. with i % width, i / width. This is, again, a very common pattern.
That being said, why bother? Make an std::set, start putting elements one-by-one and call find before every insert. If find returns something, break.

Related

How to avoid out of range exception when erasing vector in a loop?

My apologies for the lengthy explanation.
I am working on a C++ application that loads two files into two 2D string vectors, rearranges those vectors, builds another 2D string vector, and outputs it all in a report. The first element of the two vectors is a code that identifies the owner of the item and the item in the vector. I pass the owner's identification to the program on start and loop through the two vectors in a nested while loop to find those that have matching first elements. When I do, I build a third vector with components of the first two, and I then need to capture any that don't match.
I was using the syntax "vector.erase(vector.begin() + i)" to remove elements from the two original arrays when they matched. When the loop completed, I had my new third vector, and I was left with two vectors that only had elements, which didn't match and that is what I needed. This was working fine as I tried the various owners in the files (the program accepts one owner at a time). Then I tried one that generated an out of range error.
I could not figure out how to do the erase inside of the loop without throwing the error (it didn't seem that swap and pop or erase-remove were feasible solutions). I solved my problem for the program with two extra nested while loops after building my third vector in this one.
I'd like to know how to make the erase method work here (as it seems a simpler solution) or at least how to check for my out of range error (and avoid it). There were a lot of "rows" for this particular owner; so debugging was tedious. Before giving up and going on to the nested while solution, I determined that the second erase was throwing the error. How can I make this work, or are my nested whiles after the fact, the best I can do? Here is the code:
i = 0;
while (i < AIvector.size())
{
CHECK:
j = 0;
while (j < TRvector.size())
{
if (AIvector[i][0] == TRvector[j][0])
{
linevector.clear();
// Add the necessary data from both vectors to Combo_outputvector
for (x = 0; x < AIvector[i].size(); x++)
{
linevector.push_back(AIvector[i][x]); // add AI info
}
for (x = 3; x < TRvector[j].size(); x++) // Don't need the the first three elements; so start with x=3.
{
linevector.push_back(TRvector[j][x]); // add TR info
}
Combo_outputvector.push_back(linevector); // build the combo vector
// then erase these two current rows/elements from their respective vectors, this revises the AI and TR vectors
AIvector.erase(AIvector.begin() + i);
TRvector.erase(TRvector.begin() + j);
goto CHECK; // jump from here because the erase will have changed the two increments
}
j++;
}
i++;
}
As already discussed, your goto jumps to the wrong position. Simply moving it out of the first while loop should solve your problems. But can we do better?
Erasing from a vector can be done cleanly with std::remove and std::erase for cheap-to-move objects, which vector and string both are. After some thought, however, I believe this isn't the best solution for you because you need a function that does more than just check if a certain row exists in both containers and that is not easily expressed with the erase-remove idiom.
Retaining the current structure, then, we can use iterators for the loop condition. We have a lot to gain from this, because std::vector::erase returns an iterator to the next valid element after the erased one. Not to mention that it takes an iterator anyway. Conditionally erasing elements in a vector becomes as simple as
auto it = vec.begin()
while (it != vec.end()) {
if (...)
it = vec.erase(it);
else
++it;
}
Because we assign erase's return value to it we don't have to worry about iterator invalidation. If we erase the last element, it returns vec.end() so that doesn't need special handling.
Your second loop can be removed altogether. The C++ standard defines functions for searching inside STL containers. std::find_if searches for a value in a container that satisfies a condition and returns an iterator to it, or end() if it doesn't exist. You haven't declared your types anywhere so I'm just going to assume the rows are std::vector<std::string>>.
using row_t = std::vector<std::string>;
auto AI_it = AIVector.begin();
while (AI_it != AIVector.end()) {
// Find a row in TRVector with the same first element as *AI_it
auto TR_it = std::find_if (TRVector.begin(), TRVector.end(), [&AI_it](const row_t& row) {
return row[0] == (*AI_it)[0];
});
// If a matching row was found
if (TR_it != TRVector.end()) {
// Copy the line from AIVector
auto linevector = *AI_it;
// Do NOT do this if you don't guarantee size > 3
assert(TR_it->size() >= 3);
std::copy(TR_it->begin() + 3, TR_it->end(),
std::back_inserter(linevector));
Combo_outputvector.emplace_back(std::move(linevector));
AI_it = AIVector.erase(AI_it);
TRVector.erase(TR_it);
}
else
++AI_it;
}
As you can see, switching to iterators completely sidesteps your initial problem of figuring out how not to access invalid indices. If you don't understand the syntax of the arguments for find_if search for the term lambda. It is beyond the scope if this answer to explain what they are.
A few notable changes:
linevector is now encapsulated properly. There is no reason for it to be declared outside this scope and reused.
linevector simply copies the desired row from AIVector rather than push_back every element in it, as long as Combo_outputvector (and therefore linevector) contains the same type than AIVector and TRVector.
std::copy is used instead of a for loop. Apart from being slightly shorter, it is also more generic, meaning you could change your container type to anything that supports random access iterators and inserting at the back, and the copy would still work.
linevector is moved into Combo_outputvector. This can be a huge performance optimization if your vectors are large!
It is possible that you used an non-encapsulated linevector because you wanted to keep a copy of the last inserted row outside of the loop. That would prohibit moving it, however. For this reason it is faster and more descriptive to do it as I showed above and then simply do the following after the loop.
auto linevector = Combo_outputvector.back();

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.
Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.
You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.
You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64
The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

Replacing For loop with memcopy, memmove, or std:copy?

I've got shift function where i an continuously sending it new data points and it will shift my points by an offset of 1. This is to achieve a "graphical shifting" where the points represent points on a graph.
The shifting function is the following:
void Chart_Buffer::ShiftData()
{
for(int index = 0; index < (_channel_Samples - 1); ++index)
{
_sample_Points[index].y = _sample_Points[index + 1].y;
}
return;
}
The problem with this is that it is running through a huge array of up to 800 data points and it does this every time for every new data point added, so i wanted to see if i can optimize this process by shifting all values out by an offset of 1 without running through a for loop. I looked at implementations of memcopy, memmove, and std::copy, but i cant figure out how to use them for my purpose.
Basically, if i have elements 0-799 in the array, i want to shift elements 1-799 by 1 so that i have 0-798 and then just add the new element to the array.
Edit: _sample_Points is type tagPOINT with the following structure:
typedef struct tagPOINT
{
LONG x;
LONG y;
} POINT, *PPOINT, NEAR *NPPOINT, FAR *LPPOINT;
It's hard to give a firm answer to this without knowing what you are doing with _sample_Points. But I believe that I can firmly say that copying every element in the array down one is an expensive approach.
In the best case: You just need to access the front of the array and add to the back of the array. If that's the case you're describing a queue.
To add a new element to the back of a queue use: push
To inspect the front element use: front
To "copy everything down one" (just delete the front element) use: pop.
Otherwise you'd be in the case where: You need random access to the array. If that's the case you can still get potentially better performance from a deqeu.
To add a new element to the back of a deque use: push_back
To inspect the front element use: front
To "copy everything down one" (just delete the front element) use: pop_front
So if you use a queue for your _sample_Points Chart_Buffer::ShiftData could be replaced by _sample_Points.pop().
If you use a deque for your _sample_Points Chart_Buffer::ShiftData could be replaced by _sample_Points.pop_front().
It looks like that you are looking for a std::deque. It is a double ended queue, which means you can pop an element from the back and push on the front.
If what you are looking for is to keep the elements of your array in a certain order, this will help you do just that.
Now if you also want to have them contiguously on memory, then you could do it like this:
memmove(array+1, array, sizeof(element)*(array_size-1));
array[0] = new_element;
You cannot do this without less operations than you are already doing, whether you spell all of them or you call an algorithm. The problem is that the operation is not what you described initially, it is not shifting the data, but shifting part of the data (only the y coordinate) but leaving the other half as it is.
If you don't want to spell out the operation, you can play with the transform algorithm in a way similar to the answer by id256, but I am not sure whether that is an improvement really, the loop in the question is easier and cleaner than the transform...
If it is an acceptable amount of refactoring of your code, you could also let go of tagPOINT and instead of having one _sample_Points, have two arrays, one for the x and one for the y. Then you can memmove() the array of ys. Like:
LONG _sample_Points_x[DIMENSION];
LONG _sample_Points_y[DIMENSION];
void Chart_Buffer::ShiftData() {
memmove(_sample_Points_y, _sample_Points_y + 1, (DIMENSION-1) * sizeof _sample_Points_y[0]);
}

Adding object to vector with push_back working fine, but adding objects with accessor syntax [ ] , not working

I've implemented a merge function for vectors, which basically combines to sorted vectors in a one sorted vector. (yes, it is for a merge sort algorithm). I was trying to make my code faster and avoid overheads, so I decided not to use the push_back method on the vector, but try to use the array syntax instead which has lesser over head. However, something is going terribly wrong, and the output is messed up when i do this. Here's the code:
while(size1<left.size() && size2 < right.size()) //left and right are the input vectors
{
//it1 and it2 are iterators on the two sorted input vectors
if(*it1 <= *it2)
{
final.push_back(*it1); //final is the final vector to output
//final[count] = *it1; // this does not work for some reason
it1++;
size1++;
//cout<<"count ="<<count<<" size1 ="<<size1<<endl;
}
else
{
final.push_back(*it2);
//final[count] = left[size2];
it2++;
size2++;
}
count++;
//cout<<"count ="<<count<<" size1 ="<<size1<<"size2 = "<<size2<<endl;
}
It seems to me that the two methods should be functionally equivalent.
PS I have already reserved space for the final vector so that shouldnt be a problem.
You can't add new objects to vector using operator[]. .reserve() doesn't add them neither. You have to either use .resize() or .push_back().
Also, you are not avoiding overheads at all; call cost of operator[] isn't really much better that push_back() one, so until you profile your code thorougly, just use push_back. You can still use reserve to make sure unneccessary allocations won't be made.
In most of the cases, "optimizations" like this don't really help. If you want to make your code faster, profile it first and look for the hot paths.
There is a huge difference between
vector[i] = item;
and
vector.push_back(item);
Differences:
The first one modifies the element at index i and i must be valid index. That is,
0 <= i < vector.size() must be true
If i is an invalid index, the first one invokes undefined behavior, which means ANYTHING can happen. You could, however, use at() which throws exception if i is invalid:
vector.at(i) = item; //throws exception if i is invalid
The second one adds an element to the vector at the end, which means the size of the vector increases by one.
Since, sematically both of them do different thing, choose the one which you need.

Implementing own quicksort on dynamic array

I have to implement my own sort on a dynamic string array, e.g. of such array is:
string * sortArray;
I then read in the size of the array from a text file and make the array as long as needed and fill it. So, I have...
sortArray = new string[_numberOfNames];
for(int i = 0; i < _numberOfNames; ++i){
sin >> _data[i];
}
Now I need to create my own sorting method and I thought I'd go with quicksort. My problem is, I'm not sure how to go about it.
When I choose a pivot, how can I then go about setting up two more dynamic string arrays to put the lower values and highers values in to, then recurse on? There is no way of knowing before hand how big each array needs to be before I start putting values into them.
I thought I could do something like define the size of each array as being the same as the array being sorted, and then some how remove any unwanted empty spaces from the end, but I'm not sure this is possible?
Any help would be much appreciated.
P.S. I know about the std::sort, I already have this in the program, I'm just trying to implement a sort myself.
Two options as from the comments above:
1.) Use std::vector. There you can have variable size arrays.
2.) Use an "in place" version of quicksort that does the sorting in your original array. See http://en.wikipedia.org/wiki/Quicksort#In-place_version
Lets say you have array size N
and you pivot value is x
what you should do is like that, have two pointers one to the beginning(0) and one to the end (N-1). they should both move to the middle. when ever the beginning pointer value is greater than x and the end pointer value is lower than x switch their values. after you finished and placed x in his new location (where the two pointers met) continue recursionally for the part left to x and right to x.