I have a variable looks like this
map< string, vector<double> > a_data;
long story short, a_data can be filled only by node 0. Hence, broadcasting it using MPI_Bcast() is necessary.
As we know that we can only use primitive data type. So, how should I do to broadcast STL datatype like map using MPI_Bcast()??
One approach that you can do is to:
first broadcast the number of keys to every process; So that every process knows the number of keys that will have to compute;
broadcast an array that has coded the size of each of those keys;
broadcast another array that has coded the size of each array of values;
create a loop to iterate over the keys;
broadcast first the key string (as an array of chars);
broadcast next the values as an array of doubles.
So in pseudo-code would look like:
// number_of_keys <- get number of keys from a_data;
// MPI_Bcast() number_of_keys;
// int key_sizes[number_of_keys];
// int value_sizes[number_of_keys];
//
// if(node == 0){ // the root process
// for every key in a_data do
// key_sizes[i] = the size of the key;
// value_sizes[i] = size of the vector of values associated to key
// }
//
// MPI_Bcast() the array key_sizes
// MPI_Bcast() the array value_sizes
//
// for(int i = 0; i < number_of_keys; i++){
// key <- get key in position 0 from a_data
// values <- get the values associated with the key
//
// MPI_Bcast() the key and use the size stored on key_sizes[i]
// MPI_Bcast() the values and use the size stored on value_sizes[i]
//
// // Non root processes
// if(node != 0){
// add key to the a_data of the process
// add the values to the corresponded key
// }
// }
You just need to adapt the code to C++ (which I am not an expert) so you might have to adapt a bit, but the big picture is there. After having the approach working you can optimized further by reducing the number of broadcast needed. That can be done by packing more information per broadcast. For instance, you can broadcast first the number of items, the sizes of the keys and values, and finally the keys and the values together. For the latter you would need to create your custom MPI Datatype similar to the example showcased here.
Related
I'm having trouble implementing a c++ function and would require your help. Let me try to explain the goal of my function. I have two vectors vector keys and vector > values as private variables of my class.First I am trying to fill the vector keys with values of keys. Next I am trying to fill the vector values with a list of values for a particular key. In the index that will contain the key in my keys vector the same index will carry a set of int values in a list associated with the keys in my values vector.
For Example
insert(2,4)
insert(2,6)
Now the first index of my keys vector will contain 2 and the first index of my values vector will have a list that contains 4 and 6. Here is what I have come up with by I keep getting segmentation fault. Can you help me achieve this, especially how to store the values. Here is my code for the insert function.
Class A1
{
public:
{
for (unsigned int i=0;i<keys.size();i++)
{
if (keys.at(i)==key)
{
return;
}
}
keys.push_back(key);
/* for (unsigned int i=0;i<keys.size();i++)
{
if(keys[i]==key)
{
values[i].push_back(value);
}
}*/
}
private:
vector<int>keys;
vector<list<int>> values;
}
The code that I have commented out using /* */ is the part that is giving me the error.
I have a program where I use records of the form:
// declaring a struct for each record
struct record
{
int number; // number of record
vector<int> content; // content of record
};
Within main I then declare each record:
record batch_1; // stores integers from 1 - 64
record batch_2; // stores integers from 65 - 128
Where each batch stores 64 integers from a list of numbers (in this instance from a list of 128 total numbers). I would like to make this program open ended, such that the program is capable of handling any list size (with the constraint of it being a multiple of 64). Therefore, if the list size was 256 I would need four records (batch_1 - batch_4). I am not sure how I can create N-many records, but I am looking for something like this (which is clearly not the solution):
//creating the batch records
for (int i = 1; i <= (list_size / 64); i++)
{
record batch_[i]; // each batch stores 64 integers
}
How can this be done, and will the scope of something declared within the for loop extend beyond the loop itself? I imagine an array would satisfy the scope requirement, but I am not sure how to implement it.
Like many suggested in the comments why not use a resizable vector provided by the C++ Standard Library: std::vector?
So, instead of having this:
record batch_1; // stores integers from 1 - 64
record batch_2; // stores integers from 65 - 128
.
.
record batch_n // Stores integers x - y
Replace with:
std::vector<record> batches;
//And to create the the batch records
for (int i = 1; i <= (list_size / 64); i++) {
record r;
r.number = i;
r.content = ....;
batches.push_back(r);
// You could also declare a constructor for your record struct to facilitate instantiating it.
}
Why don't you try this
// code
vector<record> v(list_size / 64);
// additinal code goes here
Now,you can access your data as follow
(v[i].content).at(j);
For those not familiar with Disjoint-set data structure.
https://en.wikipedia.org/wiki/Disjoint-set_data_structure
I'm trying to find the no. of groups of friends from the given sets of friends and their relationships. Of course, there is no doubt that this could easily be implemented using BFS/DFS. But I choose to use disjoint set, I also tend to find the friend group the person belongs, etc, and disjoint-set certainly sounds to be appropriate for that case.
I have implemented the Disjoint set data structure, Now I need to find the number of disjoint sets it contains(which will give me the No. of groups).
Now, I'm stuck at implementing on how to find the No. of disjoint-sets efficiently, as the number of friends can be as large as 1 00 00 0.
Options that I think should work.
Attach the new set at the back of the original, and destroy the old set.
Change their parents of each element at every union.
But since the number of friends are huge, I'm not sure if that's the correct approach, Perhaps if there is any other efficient way or should I go ahead and implement any of the above.
Here is my code for additional details.(I'have not implemented the counting disjoint-set here)
//disjoint set concept
//https://www.topcoder.com/community/data-science/data-science-tutorials/disjoint-set-data-structures/
// initially all the vertices are takes as single set and they are their own representative.
// next we see, compare two vertices, if they have same parent(representative of the set), we leave it.
// if they don't we merge them it one set.
// finally we get different disjoint sets.
#includes ...
using namespace std;
#define edge pair<int, int>
const int max 1000000;
vector<pair<int, edge > > graph, mst;
int N, M;
int parent[max];
int findset(int x, int* parent){
//find the set representative.
if(x != parent[x]){
parent[x] = findset(parent[x], parent);
}
return parent[x];
}
void disjoints(){
for(int i=0; i<M; i++){
int pu = findset(graph[i].second.first, parent);
int pv = findset(graph[i].second.second, parent);
if(pu != pv){ //if not in the same set.
mst.push_back(graph[i]);
total += graph[i].first;
parent[pu] = parent[pv]; // create the link between these two sets
}
}
}
void noOfDisjoints(){
//returns the No. of disjoint set.
}
void reset(){
for(int i=0; i<N; i++){
parent[i] = i;
}
}
int main() {
cin>>N>>M; // No. of friends and M edges
int u,v,w; // u= source, v= destination, w= weight(of no use here).
reset();
for(int i =0; i<M ;i++){
cin>>u>>v>>w;
graph.push_back(pair<int, edge>(w,edge(u,v)));
}
disjoints();
print();
return 0;
}
Each union operaiton on two items a,b in Disjoint Set Data Structure has two possible scenarios:
You tried to unite items from the same set. In this case, nothing is done, and number of disjoint sets remain the same.
You united items from two different sets, so you basically converged two sets into one - effectively decreasing the number of disjoint sets by exactly one.
From this, we can conclude that it is easy to find the number of disjoint sets at every moment by tracking the number of unions of type (2) from the above.
If we denote this number by succ_unions, then the total number of sets at each point is number_of_initial_sets - succ_unions.
If all you need to know is the number of disjoint sets and not what they are, one option would be to add in a counter variable to your data structure counting how many disjoint sets there are. Initially, there are n of them, one per individual element. Every time you perform a union operation, if the two elements don't have the same representative, then you know you're merging two disjoint sets into one, so you can decrement the counter. That would look something like this:
if (pu != pv){ //if not in the same set.
numDisjointSets--; // <--- Add this line
mst.push_back(graph[i]);
total += graph[i].first;
parent[pu] = parent[pv]; // create the link between these two sets
}
Hope this helps!
Currently working on a graph representation using a vector of vectors. I am attempting to insert a vector of edges at a specific location within adjacencies. adjacencies is defined as adjacencies = new std::vector< std::vector<Edge*>* >;
I am running into an issue with the vector not inserting at the specific .stateId location. It is quite possible the logic isn't what I intend it to be. Do i need to be resizing the vector? From documentation, I would assume the vector will resize automatically when inserting at a location not currently in the vector. I appreciate the clarification.
Here is my method:
/*
* Connecting Edge vertF -----> vertT via weigh
* adjacencies[v][e]
*/
void GraphTable::InsertEdgeByWeight(Vertex* vertF,Vertex* vertT, char weigh){
Edge* tempEdge = new Edge(vertT,weigh);
/*
* Need to figure out how to properly allocate the space in adjacencies.size()
* Test 4 works with initial ID 0 but not test 5 with ID 4
*/
std::vector<Edge*>* temp_vec = new vector<Edge*>;
temp_vec->push_back(tempEdge);
/*if vector at location doesnt exist, we will push a new vector of edges otherwise we
* will need to push the edge into the current vector
*/
if(adjacencies->size()<vertF->thisState.stateId){
adjacencies->resize(vertF->thisState.stateId);
adjacencies[vertF->thisState.stateId].push_back(temp_vec);
}else{
adjacencies[vertF->thisState.stateId].push_back(temp_vec);
}
cout<< adjacencies->capacity() << endl;
//cout<< adjacencies->max_size() << endl;
}
You are resizing adjacencies to a value of vertF->thisState.stateId and then calling adjacencies[vertF->thisState.stateId].
If the size of a vector/array is "x", then the highest index is "x-1".
So you should write this instead -:
adjacencies[vertF->thisState.stateId-1].push_back(temp_vec);
Edit : As Ankit Garg pointed out in the comments, you should probably push tempEdge directly to adjacencies instead of creating a temporary vector.
Expanding from my comment, I think you must do something like this:
if(adjacencies->size() < vertF->thisState.stateId)
{
// the vector does not exist, hence push_back the new vector
.....
}
else
{
// vector already exists, so just push_back the edge
adjacencies[vertF->thisState.stateId].push_back(temp_edge);
}
We have a application that run on a low power processor that needs to have fast response to incoming data. The data comes in with a associated key. Each key ranges from 0 - 0xFE (max 0xFF). The data itself ranges from 1kB to 4kB in size. The system processes data like:
data in
key lookup -> insert key & data if not found
buffer data in existing key
After some event, a key and associated data is destroy'ed.
We are trialing a couple of solutions:
Pre-allocated std::vector<std::pair<int,unsigned char *>>, that looks up a key value based on the index position.
std::map<int, unsigned char *>
Red-Black tree
std::vector<...> with a binary sort and binary search of the key's
Are there any other algorithms that could be fast at insert-search-delete?
Thanks.
std::map uses a balanced tree (like red-black tree) itself, so there is no point in re-implementing it.
A sorted std::vector with binary search has the same performance of a balanced binary tree. The difference is that placing a key in the middle of the vector is costly.
Since your keys have a very limited range, your best choice is similar to your first suggestion:
std::vector<unsigned char *> data(0xFF); // no need to have a pair
This way, a simple check of data[key] == NULL shows you whether data for this key exists or not. If it was me, I would even make it simpler:
unsigned char *data[0xFF];
If the key is in the range [0, 0xFF), then you could use this:
std::vector<std::string> lut(0xFF); //lookup table
//insert
lut[key] = data; //insert data at position 'key'
//delete
lut[key].clear(); //clear - it means data empty
//search
if(lut[key].empty() ) //empty string means no key, no data!
{
//key not found
}
else
{
std::string & data = lut[key]; //found
}
Note that I used empty string to indicate that data doesn't exist.