Hazelcast: make sure related objects are physically stored on same member - mapreduce

I have 3 distributed maps which objects have one shared property - identifier. This identifier is used as a key for one map while 2 other maps are using cluster wide global ids as a key. There's also a Map-Reduce job that is combining related by this identifier object and is storing the result into another map. The idea is to minimize inter-cluster network traffic so job is communicating only with one member where it is being executed.
The question is: do I need to do any extra action to make sure partitions of different distributed maps are physically stored on one member?

PartitionAware will do this for you.
If you want to guarantee three objects reside in the same partition, their key classes should implement PartitionAware and return the same result from the getPartitionKey() method.
For example, to keep all members of the same family together:
public class Person implements PartitionAware, Serializable {
private String firstName;
private String lastName;
public Object getPartitionKey() {
return this.lastName;
}
You can verify the partition with hazelcastInstance.getPartitionService().getPartition(key).getPartitionId()
Partition 0 contains the first part of each of map X, map Y, map Z. Partition 1 contains the next part, etc.

Related

Multiple keys to same underlying object in C++

While building virtual network functions (VNF) I came across a situation where I need to store a class or struct object that can be accessed with anyone out of a set of valid keys tied to that specific object.
Formally here are the requirements :
We need to be able to access a single-data-blob using multiple keys. A key can be "empty" or with a valid "name/value".
Consider all keys that will be tied to a data, that will identify data. If all are empty then insert data once getting any <key1, data> as input.
If another key2 comes up for the same data, bind that key2 with existing <key1, data>.
For example, if person_data is an object. Then a unique phone number or a unique house number can be tied as keys. These keys may or may not be member variables of person-data class/struct. If they are not member variables then as a programmer we might know the relevant keys already. We should get the person_data object with key phone number Or house number,i.e. we don't need all keys at the same time to retrieve the object.
Is there any existing C++ utility? It seems a very valid use case in general or at least in NFVs (Network function virtualization).
Thanks!
It seems that you have your definition of "key" and "data" reversed.
Your example shows that you group two "keys" under a single "data" value. In Computer Science, the value by which you group is called a key. A key can certainly be associated with two data values.
For instance, for a person_data object, the fields house_number and phone_number would be data members. They certainly would not be keys; house numbers are not unique and not everybody has a phone number.
Hence, you can use just a simple std::vector<person_data>. If you do have a proper key for persons (a value that uniquely identifies each and every person), you might sort by that and have a std::set<person_data>.

c++ and oracle DB and linked lists, which data structures to use?

I am writing a program that receives data over time and I am looking for different patterns in the data.
I have to save data for different processes that I create in the program for future calculations,
I want to save the data to an Oracle DB (that has some support for storing objects).
the information that I want to relate to a new process has the following structure:
list of logic expressions:
stage 1 ->(a*b*c)+(d*e)+..(can have more conditions)
stage 2 ->(f*a*c)+(a*b)+..
stage 3 ->(g*h*i)+(j*k)+..
each letter: a,b, c,d etc represent a logic function that has different parameters related to it, I need to save these parameters for future usage of each logic function.
the * represents logical AND
the + represents logical OR
The question is how to implement it?
I can create an object for each letter, e.g. for "a" (which can be a function or a condition that needs to be check etc) and save the data of this object to the oracle DB.
A numerator can be given to each process to identify it, however I am not sure how to identify each one of the logic functions (e.g. "a") because I need later to assemble the data from the database back to the original process that I am handling (example stage 1).
Regarding linked lists, not sure if to use them in my program to represent the structure of each logic in each stage e.g. a->b->c->(new OR expression)->d->e. or maybe there is a better solution? I can also save this information as a string and try to do parsing later
e.g. string command="stage 1 ->(a*b*c)+(d*e)"
in case that I will be using linked list, I am not sure how to save the structure of the lists to the database.
for the external structure, stage1,stage2, stage3.. etc not sure also if to use linked lists and how to save them to a database.
I would appreciate some advice on how to build it.
Thanks!
Let's build this from the bottom up. You want to refrain from writing your own linked list structures, if possible.
A stage consists of one or more products that are summed.
The products are pointers to functions or function objects, let's call them functors.
Each group of functions objects could be a std::vector<function_object>. These group would have the results multiplied together; which can be handled in a loop.
A stage is one or more of the above vectors. This can be expressed as:
std::vector< std::vector<function_object> >
You can add another dimension for the stages:
std::vector< std::vector< std::vector<function_object> > >
If you prefer to use linked list, replace std::vector with std::list.
Edit 1: Function IDs not objects
Most databases have a difficult time storing code for a function. So, you'll have to use function identifiers instead.
A function identifier is a number associated with a function. This association will be in your code and not in the data. The easiest implementation is to use an array of function objects or pointers. Use the function identifier as an index into the array, the retrieve the functor.
A more robust method is to use a table of <function_id, functor>. This structure allows for the records to be in any order, and the records can be deleted without damaging the code. With the vector, slots must never be removed.
struct Table_Entry
{
unsigned int function_id;
Function_Pointer p_function;
const char * function_name;
};
Table Entry function_associations[] =
{
{5, logic_function_1, "Logic Function 1"},
//...
};

STL map like data structure to allow searching based on two keys

I want to make a map like structure to allow searching by two keys both will be strings, here's an example:
Myclass s;
Person p = s.find("David"); // searching by name
// OR
p = s.find("XXXXX"); // searching by ID
i don't want a code solution, i just want some help to get started like the structures i can use to achieve what i want, help is appreciated guys, it's finals week.
Put your records into a vector (or list). Add a pointer to the record objects to two maps, one with one key and one with the other.
There are many different ways how this could be achieved. The question is: what are the complexities of insert, delete and lookup operations that you aim for?
std::map is implemented as red-black tree that provides increadibly quick self-balancing (rotations) and all of mentioned operations (lookup/find, insert, delete) with complexity of O(log(n)). Note that this suits the idea of single key.
With 2 keys you can not keep elements sorted because the order based on one key will be most likely different than order based on the other one. The most straightforward and natural approach would be storing records in one container and holding the keys used by this container in 2 different structures, one optimized for retrieving this key given id and the other one for retrieving it given name.
If there is a constraint of storing everything at one place while you'd like to optimize find operation that will support two different keys, then you could create a wrapper of std::map<std::string, Person> where each element would be contained twice (each time under a different key), i.e. something like:
std::map<std::string, Person> myContainer;
...
Person p;
std::string id = "1E57A";
std::string name = "David";
myContainer[id] = p;
myContainer[name] = p;
I can think of 2 advantages of doing this:
quite satisfying performance:
lookup with complexity O(log(2*n))
insertion & deletion with complexity O(2*log(2*n))
extremely simple implementation (using existing container)
you just need to remember than the "expected" size of the container is half of its actual size
both of the keys: id and name should be attributes of Person so that when you find a concrete element given one of these keys, you immediately have the other one too
Disadvantage is that it will consume 2x so much memory and there might even be a constraint that:
none of the names should be an id of some other person at the same time and vice versa (no id should be a name of some other person)

Why is a set used instead of a map? C++

Sets are used to get information of an object by providing all the information, usually used to check if the data exists. A map is used to get the information of an object by using a key (single data). Correct me if I am wrong. Now the question is why would we need a set in the first place, can't we a map to see if the data exist? why would we need to provide all the information just to see if it exist?
There are many operations where you just need a set. Using a map would be just extra space.
Set operations (Union, Intersection etc.).
Keeping unique elements from a collection of numbers, objects etc.
A set serves to group items of the same type that are different among themselves (i.e., they are not equal). For example, the numbers 1 and 2 are both of int type, but 1!=2.
set containers are useful when you want to keep track of collections of homogeneous things as a group, and perform mathematical operations on such groups (like intersection, union, difference, etc). For example, imagine a set of search results containing all the documents mentioning the words cat and dog. And then another set containing all the documents mentioning the words pet. The union of those two sets would give you the group of documents containing the words cat, dog, and pet. Notice that such group will have no repetitions (i.e., if a document was in the both sets initially, it will be only once in the second set).
maps are most certainly not a set, but they can be seen as an arrangement which allows you to associate a value with every element of a set. They are used to represent relationships. For example, the set of people working for a company have an associated employee_number; in this case a map would be a useful structure to represent such relationship.
Going back to the previous example, if you wanted to know how many times has each page been accessed, you could probably create a map along the lines of std::map<Page, int>, that is, a relationship between the pages, and the number of times each has been visited.
Notice that the keys of a map form a set (probably this is what confuses many people), and an implication of this property is that you can only have a given key once (there are some esoteric containers where a key can be mapped to different values though).
So, if you need to interact with groups and collections as a whole, and with the members of the group itself, probably you want a set. If you need to associate certain things with members of a group or a collection, probably you want a map. If the association spans more than one dimension, probably you want a multi_map.
Important notice that in C++ std::set and std::map are ordered. C++11 offers alternative unordered containers called std::unordered_set and std::unordered_map.
A Set contains a unique list of ordered values, but a Map can contain a non unique set of unordered values accessed using a key.
Either could be used to determine if an object exists, it depends on your use case and how you need to be able to access that object - can you test to see if the Set contains an object that you have a reference to, or do you need to look it up by one or more keys to be able to compare it?

QMap::insertMulti or QMultiMap?

What should i use between QMap::insertMulti and QMultiMap to handle :
2 -> abc
2 -> def
3 -> ghi
3 -> jkl
What's the difference enter the 2 solutions ?
Reading Container Classes:
QMap<Key, T>
This provides a dictionary (associative array) that maps keys of type Key to values of type T. Normally each key is associated with a single value. QMap stores its data in Key order; if order doesn't matter QHash is a faster alternative.
QMultiMap<Key, T>
This is a convenience subclass of QMap that provides a nice interface for multi-valued maps, i.e. maps where one key can be associated with multiple values.
it looks like both can do the job. In this document there is also Algorithmic Complexity section where you can see that both classes have the same complexity.
I would choose QMultiMap just to better document the fact I'm going to hold multiple values with the same key.
Both can serve this purpose. QMultiMap is actually a subclass of QMap.
If you are willing to have multiple values for single key, you can use:
QMap : for inserting use insertMulti
QMultiMap : for inserting use insert
If you are willing to have single value for single key, you can use:
QMap : for inserting use insert
QMultiMap : for inserting use replace
You can see that both can server both purpose. But, each have unique default behavior which matches its name. Also, each have some methods or operators which is convenient for single/multi.
It is better to choose type depending on your need. It is a good practice. For example, if you use QMap for storing single key multiple values, some other person who is going through your class members might get the impression that you are willing to save single key value pairs (from the data type)
Similarly, if you use QMultiMap, anyone reading the definition can get the idea that the data will have multiple value for same key.