Related
I have a complex problem and have been trying to identify what needs to be a very, very efficient algorithm. I'm hoping i can get some ideas from you helpful folks. Here is the situation.
I have a vector of vectors. These nested vectors are of various length, all storing integers in a random order, such as (pseudocode):
vector_list = {
{ 1, 4, 2, 3 },
{ 5, 9, 2, 1, 3, 3 },
{ 2, 4, 2 },
...,
100 more,
{ 8, 2, 2, 4 }
}
and so on, up to over 100 different vectors at a time inside vector_list. Note that the same integer can appear in each vector more than once. I need to remove from this vector_list any vectors that are duplicates of another vector. A vector is a duplicate of another vector if:
It has the same integers as the other vector (regardless of order). So if we have
vec1 = { 1, 2, 3 }
vec2 = { 2, 3, 1 }
These are duplicates and I need to remove one of them, it doesnt matter which one.
A vector contains all of the other integers of the other vector. So if we have
vec1 = { 3, 2, 2 }
vec2 = { 4, 2, 3, 2, 5 }
Vec2 has all of the ints of vec1 and is bigger, so i need to delete vec1 in favor of vec2
The problem is as I mentioned the list of vectors can be very big, over 100, and the algorithm may need to run as many as 1000 times on a button click, with a different group of 100+ vectors over 1000 times. Hence the need for efficiency. I have considered the following:
Sorting the vectors may make life easier, but as I said, this has to be efficient, and i'd rather not sort if i didnt have to.
It's more complicated by the fact that the vectors aren't in any order with respect to their size. For example, if the vectors in the list were ordered by size:
vector_list = {
{ },
{ },
{ },
{ },
{ },
...
{ },
{ }
}
It might make life easier, but that seems like it would take a lot of effort and I'm not sure about the gain.
The best effort I've had so far to try and solve this problem is:
// list of vectors, just 4 for illustration, but in reality more like 100, with lengths from 5 to 15 integers long
std::vector<std::vector<int>> vector_list;
vector_list.push_back({9});
vector_list.push_back({3, 4, 2, 8, 1});
vector_list.push_back({4, 2});
vector_list.push_back({1, 3, 2, 4});
std::vector<int>::iterator it;
int i;
int j;
int k;
// to test if a smaller vector is a duplicate of a larger vector, i copy the smaller vector, then
// loop through ints in the larger vector, seeing if i can find them in the copy of the smaller. if i can,
// i remove the item from the smaller copy, and if the size of the smaller copy reaches 0, then the smaller vector
// was a duplicate of the larger vector and can be removed.
std::vector<int> copy;
// flag for breaking a for loop below
bool erased_i;
// loop through vector list
for ( i = 0; i < vector_list.size(); i++ )
{
// loop again, so we can compare every vector to every other vector
for ( j = 0; j < vector_list.size(); j++ )
{
// don't want to compare a vector to itself
if ( i != j )
{
// if the vector in i loop is at least as big as the vector in j loop
if ( vector_list[i].size() >= vector_list[j].size() )
{
// copy the smaller j vector
copy = vector_list[j];
// loop through each item in the larger i vector
for ( k = 0; k < vector_list[i].size(); k++ ) {
// if the item in the larger i vector is in the smaller vector,
// remove it from the smaller vector
it = std::find(copy.begin(), copy.end(), vector_list[i][k]);
if (it != copy.end())
{
// erase
copy.erase(it);
// if the smaller vector has reached size 0, then it must have been a smaller duplicate that
// we can delete
if ( copy.size() == 0 ) {
vector_list.erase(vector_list.begin() + j);
j--;
}
}
}
}
else
{
// otherwise vector j must be bigger than vector i, so we do the same thing
// in reverse, trying to erase vector i
copy = vector_list[i];
erased_i = false;
for ( k = 0; k < vector_list[j].size(); k++ ) {
it = std::find(copy.begin(), copy.end(), vector_list[j][k]);
if (it != copy.end()) {
copy.erase(it);
if ( copy.size() == 0 ) {
vector_list.erase(vector_list.begin() + i);
// put an extra flag so we break out of the j loop as well as the k loop
erased_i = true;
break;
}
}
}
if ( erased_i ) {
// break the j loop because we have to start over with whatever
// vector is now in position i
break;
}
}
}
}
}
std::cout << "ENDING VECTORS\n";
// TERMINAL OUTPUT:
vector_list[0]
[9]
vector_list[1]
[3, 4, 2, 8, 1]
So this function gives me the right results, as these are the 2 unique vectors. It also gives me the correct results if i push the initial 4 vectors in reverse order, so the smallest one comes last for example. But it feels so inefficient comparing every vector to every other vector. Plus i have to create these "copies" and try to reduce them to 0 .size() with every comparison I make. very inefficient.
Anyways, any ideas on how I could make this speedier would be much appreciated. Maybe some kind of organization by vector length, I dunno.... It seems wasteful to compare them all to each other.
Thanks!
Loop through the vectors and for each vector, map the count of unique values occurring in it. unordered_map<int, int> would suffice for this, let's call it M.
Also maintain a set<unordered_map<int, int>>, say S, ordered by the size of unordered_map<int, int> in decreasing order.
Now we will have to compare contents of M with the contents of unordered_maps in S. Let's call M', the current unordered_map in S being compared with M. M will be a subset of M' only when the count of all the elements in M is less than or equal to the count of their respective elements in M'. If that's the case then it's a duplicate and we'll not insert. For any other case, we'll insert. Also notice that if the size of M is greater than the size of M', M can't be a subset of M'. That means we can insert M in S. This can be used as a pre-condition to speed things up. Maintain the indices of vectors which weren't inserted in S, these are the duplicates and have to be deleted from vector_list in the end.
Time Complexity: O(N*M) + O(N^2*D) + O(N*log(N)) = O(N^2*D) where N is the number of vectors in vector_list, M is the average size of the vectors in vector_list and D is the average size of unordered_map's in S. This is for the worst case when there aren't any duplicates. For average case, when there are duplicates, the second complexity will come down.
Edit: The above procedure will create a problem. To fix that, we'll need to make unordered_maps of all vectors, store them in a vector V, and sort that vector in decreasing order of the size of unordered_map. Then, we'll start from the biggest in this vector and apply the above procedure on it. This is necessary because, a subset, say M1 of a set M2, can be inserted into S before M2 if the respective vector of M1 comes before the respective vector of M2 in vector_list. So now we don't really need S, we can compare them within V itself. Complexity won't change.
Edit 2: The same problem will occur again if sizes of two unordered_maps are the same in V when sorting V. To fix that, we'll need to keep the contents of unordered_maps in some order too. So just replace unordered_map with map and in the comparator function, if the size of two maps is the same, compare element by element and whenever the keys are not the same for the very first time or are same but the M[key] is not the same, put the bigger element before the other in V.
Edit 3: New Time Complexity: O(N*M*log(D)) + O(N*D*log(N)) + O(N^2*D*log(D)) = O(N^2*D*log(D)). Also you might want to pair the maps with the index of the respective vectors in vector_list so as to know which vector you must delete from vector_list when you find a duplicate in V.
IMPORTANT: In sorted V, we must start checking from the end just to be safe (in case we choose to delete a duplicate from vector_list as well as V whenever we encounter it). So for the last map in V compare it with the rest of the maps before it to check if it is a duplicate.
Example:
vector_list = {
{1, 2, 3},
{2, 3, 1},
{3, 2, 2},
{4, 2, 3, 2, 5},
{1, 2, 3, 4, 6, 2},
{2, 3, 4, 5, 6},
{1, 5}
}
Creating maps of respective vectors:
V = {
{1->1, 2->1, 3->1},
{1->1, 2->1, 3->1},
{2->2, 3->1},
{2->2, 3->1, 4->1, 5->1},
{1->1, 2->2, 3->1, 4->1, 6->1},
{2->1, 3->1, 4->1, 5->1, 6->1},
{1->1, 5->1}
}
After sorting:
V = {
{1->1, 2->2, 3->1, 4->1, 6->1},
{2->1, 3->1, 4->1, 5->1, 6->1},
{2->2, 3->1, 4->1, 5->1},
{1->1, 2->1, 3->1},
{1->1, 2->1, 3->1},
{1->1, 5->1},
{2->2, 3->1}
}
After deleting duplicates:
V = {
{1->1, 2->2, 3->1, 4->1, 6->1},
{2->1, 3->1, 4->1, 5->1, 6->1},
{2->2, 3->1, 4->1, 5->1},
{1->1, 5->1}
}
Edit 4: I tried coding it up. Running it a 1000 times on a list of 100 vectors, the size of each vector being in range [1-250], the range of the elements of vector being [0-50] and assuming the input is available for all the 1000 times, it takes around 2 minutes on my machine. It goes without saying that there is room for improvement in my code (and my machine).
My approach is to copy the vectors that pass the test to an empty vector.
May be inefficient.
May have bugs.
HTH :)
C++ Fiddle
#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
int main(int, char **) {
using namespace std;
using vector_of_integers = vector<int>;
using vector_of_vectors = vector<vector_of_integers>;
vector_of_vectors in = {
{ 1, 4, 2, 3 }, // unique
{ 5, 9, 2, 1, 3, 3 }, // unique
{ 3, 2, 1 }, // exists
{ 2, 4, 2 }, // exists
{ 8, 2, 2, 4 }, // unique
{ 1, 1, 1 }, // exists
{ 1, 2, 2 }, // exists
{ 5, 8, 2 }, // unique
};
vector_of_vectors out;
// doesnt_contain_vector returns true when there is no entry in out that is superset of any of the passed vectors
auto doesnt_contain_vector = [&out](const vector_of_integers &in_vector) {
// is_subset returns true a vector contains all the integers of the passed vector
auto is_subset = [&in_vector](const vector_of_integers &out_vector) {
// contained returns true when the vector contains the passed integer
auto contained = [&out_vector](int i) {
return find(out_vector.cbegin(), out_vector.cend(), i) != out_vector.cend();
};
return all_of(in_vector.cbegin(), in_vector.cend(), contained);
};
return find_if(out.cbegin(), out.cend(), is_subset) == out.cend();
};
copy_if(in.cbegin(), in.cend(), back_insert_iterator<vector_of_vectors>(out), doesnt_contain_vector);
// show results
for (auto &vi: out) {
copy(vi.cbegin(), vi.cend(), std::ostream_iterator<int>(std::cout, ", "));
cout << "\n";
}
}
You could try something like this. I use std::sort and std::includes. Perhaps this is not the most effective solution.
// sort all nested vectors
std::for_each(vlist.begin(), vlist.end(), [](auto& v)
{
std::sort(v.begin(), v.end());
});
// sort vector of vectors by length of items
std::sort(vlist.begin(), vlist.end(), [](const vector<int>& a, const vector<int>& b)
{
return a.size() < b.size();
});
// exclude all duplicates
auto i = std::begin(vlist);
while (i != std::end(vlist)) {
if (any_of(i+1, std::end(vlist), [&](const vector<int>& a){
return std::includes(std::begin(a), std::end(a), std::begin(*i), std::end(*i));
}))
i = vlist.erase(i);
else
++i;
}
I have a 2D and 3D vector
using namespace std;
vector< vector<int> > vec_2d;
vector<vector<vector<int>>> vec_3d
I know how to iterate 2D vector row-wise using two iterators. The first the iterator of the "rows" and the second the iterators of the "columns" in that "row". Now, I need to iterate over 2D vector such that the first iterator becomes the iterator of the "columns" and the second the iterator of the rows in that "column" i.e. column-wise.
Using iterators, this will be very difficult. I'd say you would probably need to implement your own iterator classes inheriting from std::iterator<random_access_iterator_tag, Type>.
If you don't actually need to use iterators and really have a good reason for wanting to traverse vectors of vectors in such an odd way (and are aware of how this will slow down memory access by preventing caching) then it could easily be done using indexes.
Here's an example using indexes which handles the tricky case where the inner vectors are not all of the same length.
using namespace std;
int main()
{
vector< vector<int> > vec_2d = { {1, 2, 3}, {4, 5, 6, 7}, {8, 9, 10} };
bool is_col_out_of_bounds = false;
for (size_t col=0; ! is_col_out_of_bounds; col++)
{
is_col_out_of_bounds = true;
for (size_t row=0; row<vec_2d.size(); row++)
{
if (col < vec_2d[row].size())
{
is_col_out_of_bounds = false;
cout << vec_2d[row][col] << endl;
}
}
}
return 0;
}
Output:
1
4
8
2
5
9
3
6
10
7
If you want to guarantee that all rows are of the same length, then vector<array<T, N>> may be a better choice.
A simple answer: Who said your layout should be interpreted row-wise? A std::vector<std::vector<Foo>> has no knowledge about rows and columns, so let the outer-most vector represent columns instead of rows.
This is a pain when printing to the terminal, which is likely to do it row-wise, but if column layout is preferred internally, do it that way.
A question that might appear trivial, but I am wondering if there's a way of obtaining the count of integers made unique after I transform an array containing repeated integers into an unordered_set. To be clear, I start with some array, turn into an unordered set, and suddenly, the unordered_set only contains unique integers, and I am simply after the repeat number of the integers in the unordered_set.
Is this possible at all? (something like unordered_set.count(index) ?)
A question that might appear trivial, but I am wondering if there's a way of obtaining the count of integers made unique after I transform an array containing repeated integers into an unordered_set.
If the container is contiguous, like an array, then I believe you can use ptrdiff_t to count them after doing some iterator math. I'm not sure about non-contiguous containers, though.
Since you start with an array:
Call unique on the array
unique returns iter.end()
Calculate ptrdiff_t count using iter.begin() and iter.end()
Remember that the calculation in step 3 needs to be adjusted for the sizeof and element.
But to paraphrase Beta, some containers lend themselves to this, and other do not. If you have an unordered set (or a map or a tree), then the information will not be readily available.
According to your answer to the user2357112's question I will write a solution.
So, let's assume that instead of unordered_set we will use a vector and our vector has values like this:
{1, 1, 1, 3, 4, 1, 1, 4, 4, 5, 5};
So, we want to get numbers (in different vector I think) of how many times particular value appears in the vector, right? And in this specific case result would be: 1 appears 5 times, 3 appears one time, 4 appears 3 times and 5 appears 2 times.
To get this done, one possible solution can be like this:
Get unique entries from source vector and store them in different vector, so this vector will contain: 1, 3, 4, 5
Iterate through whole unique vector and count these elements from source vector.
Print result
The code from point 1, can be like this:
template <typename Type>
vector<Type> unique_entries (vector<Type> vec) {
for (auto iter = vec.begin (); iter != vec.end (); ++iter) {
auto f = find_if (iter+1, vec.end (), [&] (const Type& val) {
return *iter == val;
});
if (f != vec.end ()) {
vec.erase (remove (iter+1, vec.end (), *iter), vec.end ());
}
}
return vec;
}
The code from point 2, can be like this:
template <typename Type>
struct Properties {
Type key;
long int count;
};
template <typename Type>
vector<Properties<Type>> get_properties (const vector<Type>& vec) {
vector<Properties<Type>> ret {};
auto unique_vec = unique_entries (vec);
for (const auto& uv : unique_vec) {
auto c = count (vec.begin (), vec.end (), uv); // (X)
ret.push_back ({uv, c});
}
return ret;
}
Of course we do not need Properties class to store key and count value, you can return just a vector of int (with count of elements), but as I said, it is one of the possible solutions. So, by using unique_entries we get a vector with unique entries ( :) ), then we can iterate through the whole vector vec (get_properties, using std::count marked as (X)), and push_back Properties object to the vector ret.
The code from point 3, can be like this:
template <typename Type>
void show (const vector<Properties<Type>>& vec) {
for (const auto& v : vec) {
cout << v.key << " " << v.count << endl;
}
}
// usage below
vector<int> vec {1, 1, 1, 3, 4, 1, 1, 4, 4, 5, 5};
auto properties = get_properties (vec);
show (properties);
And the result looks like this:
1 5
3 1
4 3
5 2
What is worth to note, this example has been written using templates to provide flexibility of choosing type of elements in the vector. If you want to store values of long, long long, short, etc, instead of int type, all you have to do is to change definition of source vector, for example:
vector<unsigned long long> vec2 {1, 3, 2, 3, 4, 4, 4, 4, 3, 3, 2, 3, 1, 7, 2, 2, 2, 1, 6, 5};
show (get_properties (vec2));
will produce:
1 3
3 5
2 5
4 4
7 1
6 1
5 1
which is desired result.
One more note, you can do this with vector of string as well.
vector<string> vec_str {"Thomas", "Rick", "Martin", "Martin", "Carol", "Thomas", "Martin", "Josh", "Jacob", "Jacob", "Rick"};
show (get_properties (vec_str));
And result is:
Thomas 2
Rick 2
Martin 3
Carol 1
Josh 1
Jacob 2
I assume you're trying to get a list of unique values AND the number of their occurences. If that's the case, then std::map provides the cleanest and simplest solution:
//Always prefer std::vector (or at least std::array) over raw arrays if you can
std::vector<int> myInts {2,2,7,8,3,7,2,3,46,7,2,1};
std::map<int, unsigned> uniqueValues;
//Get unique values and their count
for (int val : myInts)
++uniqueValues[val];
//Output:
for (const auto & val : uniqueValues)
std::cout << val.first << " occurs " << val.second << " times." << std::endl;
In this case it doesn't have to be std::unordered_set.
how do you create a new row of values in an array from user input or cin?
say theres a row of values already in the array and you need to add a second row of values
but not added to the first row, and how would you put the braces and the comma in, does the user put it in or is there something that will automatically put the bracers and comma in
int test [] = { 1, 21, 771, 410, 120711 },
{ 1, 2, 3, 4, 5 };
Without very bad and dirty tricks this is not possible. Better use list or vector (which is the nearest to an array). The other possibility is to use pointers and to extend it create a temporary memory, copy the old data and then add the new.
There is no way to change the size of array while still preserving its contents. The only way to change the size of an array at all is to use the new operator to allocate dynamic memory to a pointer, but this will destroy any data the array previously held. If you want to have a re-sizable array, you should probably use std::vector.
If you're keen on using c++11 you can keep your initialiser lists with std::vector like so:
#include <vector>
int main()
{
// initialise
std::vector<std::vector<int>> test = { { 1, 21, 771, 410, 120711 },
{ 1, 2, 3, 4, 5 } };
// add new data from user
test.push_back({9, 8, 7, 6, 5, 4, 3, 2, 1});
}
You're asking for a two-dimensional array. This is declared like this:
int test[][5] = {
{1, 21, 771, 410, 120711},
{1, 2, 3, 4, 5 },
// Add more if you want.
};
The first array is accessed through test[0], the second through test[1], etc. The first element of the first array is test[0][0], the second test[0][1] and so forth.
Note that this is an array with a static size. You can't change it at runtime. If you know in advance how many rows you need, you just declare it as:
int test[NUMBER OF ROWS][NUMBER OF COLUMNS];
and then fill it with values later. But you cannot change the size. If you want a fully dynamic array, then you should use a vector of vectors:
std::vector< std::vector<int> > test;
You then add rows with:
test.push_back(std::vector<int>());
and add elements to each row with:
// Adds a number to the first row.
test[0].push_back(some_int);
Access happens the same way as with the static array (test[0], test[0][0], etc.)
I have a huge amount of data set. I wish to use array to store these data. In more deatil,
In my array, i want to use 3 columns such as Number number_of_points point_numbers. For this i can create a array like mypointstore[][] (for example mypointstore[20][3]). But my problem is that i want to store point numbers in column 3 like, 20, 1, 32, 9, 200, 12 and etc.(mypointstore[0][0]= 1, mypointstore[0][1]= 6 and mypointstore[0][2]={ 20, 1, 32, 9, 200, 12 }). I don’t know that is it posible to use array for this structure? If so, please help me to solve this problem.
I tried to use map like map<int,int,vector<int>> mypointstore; but i don’t know how to insert data into this map;
My some codes are here
map<int,int, vector<int>> mypointstore;
size=20;
For (int i=0; i<= size;i++){
Int det=0;
For (int j=0; j<= points.size();j++){//points is a one of array with my points
If (points.at(j)>Z1 && points.at(j) <=Z2){
//Here i want to store i , det and poiznts.at(j) like i in 1st colum, det in 2nd and
//pointnumber in 3rd colum) in each step of the loop it take a point
//number //which satisfied the if condition so it should be include into my
// vector of map
det++;
}
}
// at here i want to change total det value into 2nd element of my map so it like (0)(6)( 20, 1, 32, 9, 200, 12)
}
similar procedure for the next step so finaly it should be
(0)(6)( 20, 1, 32, 9, 200, 12)
(1)(10)( 20, 1, 32, 9, 200, 12, 233, 80, 12, 90)
(2)(3)( 3, 15, 32)
It sounds to me like you probably want a vector of structs, something like:
struct point_data {
int number;
std::vector<int> point_numbers;
};
std::vector<point_data> points;
I've only put in two "columns", because (at least as I understand it) your number_of_points is probably point_numbers.size().
If you're going to use the number to find the rest of the data, then your idea to use a map would make sense:
std::map<int, std:vector<int> > points;
You could use a multimap<int, int> instead of map<int, vector<int> > but I usually find the latter more understandable.