vector iterating over itself - c++

In my project I have a vector wit some relational data (a struct that holds two similar objects which represent a relationship between them) and I need to check for relationships combinations between all data in the vector.
What I am doing is iterating over the vector and inside the first for loop I am iterating again to look for relationships between data.
This is a simplified model of what I am doing
for(a=0; a<vec.size(); a++)
{
for(b=0; b<vec.size(); b++)
{
if(vec[a].something==vec[b].something) {...}
}
}
My collection has 2800 elements which means that I will be iterating 2800*2800 times...
What kind of data structure is more suitable for this kind of operation?
Would using for_each be any faster then traversing the vector like this?
Thanks in advance!
vec has two structs which are made up of two integers and nothing is ordered.

no, for_each still does the same thing.
Using a hash map could make your problem better. Start with an empty hash and iterate through the list. For each element, see if it's in the hash. If it's not, add it. If it is, then you have a duplicate and you run your code.
In C++, you can use std::map. In C, there is no built in map datastructure, so you'd have to make your own.
The high-level pseudo code would look something like this
foreach (element in array)
if map.has_key(element)
do_stuff(element)
else
map.add_key(element)

The easiest way to improve the efficiency of this operation would be to sort the vector and then look for duplicates. If sorting the vector isn't an option, you could create another vector of pointers to the elements of this vector and sort that. Both of those will take you from an N**2 complexity to an N*log(N) complexity (assuming, of course, that you use an N*log(N) sort). This does mean using more space, but often using a bit of space for significant time improvements is very reasonable.

assuming your vector contains a "relation" structure like:
class Entity;
struct Relation {
Entity* something;
Entity* relative;
};
and you have a vector of "relations":
std::vector<Relation> ties;
So if I understood it correctly, you want to segment ties and have a list of Relations for each Entity. This may be represented by a map:
std::map<Entity*,std::vector<Relation*>> entityTiesIndex;
Then you could just scan once through all ties and collect the relations for each entity:
for (int i=0; i < ties.size(); ++i ) {
Relation* relation = &ties[i];
entityTiesIndex[relation->something].push_back(relation);
}
Mind here the usual disclaimer about references to container elements, as these may change when container is altered.
Hope this makes sense.

Related

C++ unordered_map or unordered_set : What to use if I wish to keep an "isVisited" data structure

I want to keep a data structure for storing all the elements that I have seen till now. Considering that keeping an array for this is out of question as elements can be of the order of 10^9, what data structure should I use for achieving this : unordered_map or unordered_set in C++ ?
Maximum elements that will be visited in worst case : 10^5
-10^9 <= element <= 10^9
As #MikeCAT said in the comments, a map would only make sense if you wanted to store additional information about the element or the visitation. But if you wanted only to store the truth value of whether the element has been visited or not, the map would look something like this:
// if your elements were strings
std::unordered_map<std::string, bool> isVisited;
and then this would just be a waste of space. Storing the truth value is redundant, if the mere presence of the string within the map already indicates that it has been visited. Let's see a comparison:
std::unordered_map<std::string, bool> isVisitedMap;
std::unordered_set<std::string> isVisitedSet;
// Visit some places
isVisitedMap["madrid"] = true;
isVisitedMap["london"] = true;
isVisitedSet.insert("madrid");
isVisitedSet.insert("london");
// Maybe the information expires so you want to remove them
isVisitedMap["london"] = false;
isVisitedSet.erase("london");
Now the elements stored in each structure will be:
For the map:
{{"london", false}, {"madrid", true}} <--- 4 elements
{"madrid"} <--- 1 element. Much better
In a project in which I had a binary tree converted to a binary DAG for optimization purposes (GRAPHGEN) I passed the exploration function a map from node pointers to bool:
std::map<BinaryDrag<conact>::node*, bool> &visited_fl
The map kept track of the pointers in order not to go through the same nodes again when doing multiple passes.
You could use a std::unordered_map<Value, bool>.
I want to keep a data structure for storing all the elements that I have seen till now.
A way to re-phrase that is to say "I want a data structure to store the set of all elements that I've seen till now". The clue is in the name. Without more information, std::unordered_set seems like a reasonable choice to represent a set.
That said, in practice it depends on details like what you're planning to do with this set. Array can be a good choice as well (yes, even for billions of elements), other set implementations may be better and maps can be useful in some use cases.

C++ efficient way to store and update sorted items

I have a operation that continuously generates random solutions (std::vector<float>). I evaluate the solutions against a mathematical function to see their usefulness (float). I would like to store the top 10 solutions all the time. What would be the most efficient way to do this in C++?
I need to store both the solutions(std::vector) and their usefulness (float). I am performing several hundred thousands of evaluations and hence I am in need of an efficient solution.
Edit:
I am aware of sorting methods. I am looking for methods other than sorting and storing the values. Looking for better data structures if any.
You evaluate the float score() function for current std::vector<T> solution, store them in a std::pair<vector<T>, float>.
You use a std::priority_queue< pair<vector<T>, float> > to store the 10 best solutions based on their score, and the score itself. std::priority_queue is a heap, so it allows you to extract its max value according to a compare function that you can set up to compare score_a < score_b.
Store the first 10 pairs, then for each new one compare it with the top of the heap, if score(new) > score(10th) then insert(new) into the priority_queue p, and p.pop_back() to get rid of the old 10th element.
You keep doing this inside a loop until you run out of vector<T> solutions.
Have a vector of pair, where pair has 1 element as solution and other element as usefulness. Then write custom comparator to compare elements in the vector.
Add element at last, then sort this vector and remove last element.
As #user4581301 mentioned in comments, for 10 elements, you dont need to sort. Just traverse vector everytime, or you can also perform ordered insert in vector.
Here are some links to help you:
https://www.geeksforgeeks.org/sorting-vector-of-pairs-in-c-set-1-sort-by-first-and-second/
Comparator for vector<pair<int,int>>

How to make a fast search for an object with a particular value in a vector of structs or classes? c++

If I have thousands of struct or class objects in a vector, how to find those that are needed, in a fast way?
For example:
Making a game, and I need fastest way of collision detection. Each tile is a struct, there are many tiles in the vector map, with a values: x and y.
So basically I do:
For(i=0;i<end of vector list;i++)
{
//searching if x= 100 and y =200
}
So maybe there is a different way , like smart pointers or something to search for particular objects faster?
You should sort your vector and then use the standard library algorithms like binary_search, lower_bound, or upper_bound.
The above will give you a better compliexity than o(n) given by walk through of entire vector or by using standard library algorithm find.
i think you have to go more in depth that the simple research of a value inside a group of struct, even more if you are planning on searching among a elevated number.
How are the struct generated, how are they collected and how you keep track of them, there is a common key that you can you can use to order while you create them?
You should focus on sorting them while you add it to the whole structure, that way you avoid massive computation burst every time you have to perform a search. Choose a good algorithm (example AVL sorting), that way you can have a O(log(n))) adding/delete/searching.
A vector is just an unordered collection of objects. There is not really anyway to do what you are asking unless you start sorting your vector in specific ways (e.g. if it is sorted you can jump to the middle of the vector and potentially split your search time in half)
You may be better off picking a different data structure (either instead of the vector or in combination with it)
For example:
for_each(v.begin(),v.end(), [](int e)
{
if (e%2==1)//vector elements that are not divided by 2 without remainder
cout<<e<<endl;
});

How to associate to a number another number without using array

Let's say we have read these values:
3
1241
124515
5322353
341
43262267234
1241
1241
3213131
And I have an array like this (with the elements above):
a[0]=1241
a[1]=124515
a[2]=43262267234
a[3]=3
...
The thing is that the elements' order in the array is not constant (I have to change it somewhere else in my program).
How can I know on which position does one element appear in the read document.
Note that I can not do:
vector <int> a[1000000000000];
a[number].push_back(all_positions);
Because a will be too large (there's a memory restriction). (let's say I have only 3000 elements, but they're values are from 0 to 2^32)
So, in the example above, I would want to know all the positions 1241 is appearing on without iterating again through all the read elements.
In other words, how can I associate to the number "1241" the positions "1,6,7" so I can simply access them in O(1) (where 1 actually is the number of positions the element appears)
If there's no O(1) I want to know what's the optimal one ...
I don't know if I've made myself clear. If not, just say it and I'll update my question :)
You need to use some sort of dynamic array, like a vector (std::vector) or other similar containers (std::list, maybe, it depends on your needs).
Such data structures are safer and easier to use than C-style array, since they take care of memory management.
If you also need to look for an element in O(1) you should consider using some structures that will associate both an index to an item and an item to an index. I don't think STL provides any, but boost should have something like that.
If O(log n) is a cost you can afford, also consider std::map
You can use what is commonly refered to as a multimap. That is, it stores Key and multiple values. This is an O(log) look up time.
If you're working with Visual Studios they provide their own hash_multimap, else may I suggest using Boost::unordered_map with a list as your value?
You don't need a sparse array of 1000000000000 elements; use an std::map to map positions to values.
If you want bi-directional lookup (that is, you sometimes want "what are the indexes for this value?" and sometimes "what value is at this index?") then you can use a boost::bimap.
Things get further complicated as you have values appearing more than once. You can sacrifice the bi-directional lookup and use a std::multimap.
You could use a map for that. Like:
std::map<int, std::vector<int>> MyMap;
So everytime you encounter a value while reading the file, you append it's position to the map. Say X is the value you read and Y is the position then you just do
MyMap[X].push_back( Y );
Instead of you array use
std::map<int, vector<int> > a;
You need an associative collection but you might want to associated with multiple values.
You can use std::multimap< int, int >
or
you can use std::map< int, std::set< int > >
I have found in practice the latter is easier for removing items if you just need to remove one element. It is unique on key-value combinations but not on key or value alone.
If you need higher performance then you may wish to use a hash_map instead of map. For the inner collection though you will not get much performance in using a hash, as you will have very few duplicates and it is better to std::set.
There are many implementations of hash_map, and it is in the new standard. If you don't have the new standard, go for boost.
It seems you need a std::map<int,int>. You can store the mapping such as 1241->0 124515->1 etc. Then perform a look up on this map to get the array index.
Besides the std::map solution offered by others here (O(log n)), there's the approach of a hash map (implemented as boost::unordered_map or std::unordered_map in C++0x, supported by modern compilers).
It would give you O(1) lookup on average, which often is faster than a tree-based std::map. Try for yourself.
You can use a std::multimap to store both a key (e.g. 1241) and multiple values (e.g. 1, 6 and 7).
An insert has logarithmic complexity, but you can speed it up if you give the insert method a hint where it can insert the item.
For O(1) lookup you could hash the number to find its entry (key) in a hash map (boost::unordered_map, dictionary, stdex::hash_map etc)
The value could be a vector of indices where the number occurs or a 3000 bit array (375 bytes) where the bit number for each respective index where the number (key) occurs is set.
boost::unordered_map<unsigned long, std::vector<unsigned long>> myMap;
for(unsigned long i = 0; i < sizeof(a)/sizeof(*a); ++i)
{
myMap[a[i]].push_back(i);
}
Instead of storing an array of integer, you could store an array of structure containing the integer value and all its positions in an array or vector.

Is this usage of unordered map efficient/right way?

I want to learn about mapping functions in c/c++ in general so this is a basic program on unordered mapping. I use unordered mapping because my input data are not sorted and I read that unordered_map is very efficient. Here I've an array with which I'm creating the hash table and use the lookup function to find if the elements in another array are in the hash table or not. I've several questions regarding this implementation:
#include <stdio.h>
#include <unordered_map>
using namespace std;
typedef std::unordered_map<int,int> Mymap;
int main()
{
int x,z,l=0;
int samplearray[5] = {0,6,4,3,8};
int testarray[10] = {6,3,8,67,78,54,64,74,22,77};
Mymap c1;
for ( x=0;x< sizeof(samplearray)/sizeof(int);x++)
c1.insert(Mymap::value_type(samplearray[x], x));
for ( z=0;z< sizeof(testarray)/sizeof(int);z++)
if((c1.find(testarray[z]) != c1.end()) == true)
l++;
printf("The number of elements equal are : %d\n",l);
printf("the size of samplearray and testarray are : %d\t%d\n",sizeof(samplearray)/sizeof(int),sizeof(testarray)/sizeof(int));
}
First of all, is this a right way to
implement it? I'm getting the
answers right but seems that I use
too much of for loop.
This seems fairly okay with very small data but if I'm dealing with files of size > 500MB then this seems that, if I create a hash table for a 500MB file then the size of the hash table itself will be twice as much which is 1000MB. Is this always the case?
What is the difference between std::unordered map and boost::unordered map?
Finally, a small request. I'm new to C/C++ so if you are giving suggestions like using some other typedef/libraries, I'd highly appreciate if you could use a small example or implement it on my code. Thanks
You're starting off on the wrong foot. A map (ordered or otherwise) is intended to store a key along with some associated data. In your case, you're only storing a number (twice, as both the key and the data). For this situation, you want a set (again, ordered or otherwise) instead of a map.
I'd also avoid at least the first for loop, and use std::copy instead:
// There are better ways to do this, but it'll work for now:
#define end(array) ((array) + (sizeof(array)/sizeof(array[0]))
std::copy(samplearray,
end(samplearray),
std::inserter(Myset));
If you only need to count how many items are common between the two sets, your for loop is fairly reasonable. If you need/want to actually know what items are common between them, you might want to consider using std::set_intersection:
std::set<int> myset, test_set, common;
std::copy(samplearray, end(samplearray), std::inserter(myset));
std::copy(testarray, end(testarray), std::inserter(test_set));
std::set_intersection(myset.begin(), myset.end(),
test_set.begin(), test_set.end(),
std::inserter(common));
// show the common elements (including a count):
std::cout <<common.size() << " common elements:\t";
std::copy(common.begin(), common.end(),
std::ostream_iterator<int>(std::cout, "\t");
Note that you don't need to have an actual set to use set_intersection -- all you need is a sorted collection of items, so if you preferred to you could just sort your two arrays, then use set_intersection on them directly. Likewise, the result could go in some other collection (e.g., a vector) if you prefer.
As mentioned by Jerry, you could use a for loop for the search if you only need to know the number of matches. If that is the case, I would recommend using an unordered_set since you don't need the elements to be sorted.