Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am writing a simulation in c++ with a lot of little entities indexed by an integer, each having relationships of various types with one another. I have a basic structure storing the data about the relationship, and the relationships are unidirectional (A can be friends with B but B doesn't necessarily have any relationship with A).
So I have a lot of data that has the form (integer index, integer index, data...)
I will very often need to start with an entity (index) and find all of the relationships it has with others (so all triples with the first entry equal to some index).
I also will sometimes need to remove entities from the simulation and destroy all relationships that reference them (remove all triples with either the first or second entry equal to a given integer).
On one extreme, I can store everything in arbitrary order and search through it every time to construct any of the lists I need to get out of it (the lists I refer to in #1 and #2 above). This requires the least amount of data, but will also be very slow. Another extreme is where I keep track of multiple indexing structures that allow me to do the two operations I described above more quickly but that take up some memory. It's hard to compactly describe what I mean, but you could imagine a list of lists that allow you to quickly answer the question "what are all the relationships that have 47 as their first entry in the triple."
I don't know anything about data structures, but I imagine this must be a problem people have encountered and thought about before. Are there any C++ libraries with data structures that would automatically keep track of this type of indexing information or that are relevant to what I'm describing? Thanks!
I'd do something like this:
class Entity;
class Relationship // usually called Edge
{
Entity *from;
Entity *to;
// other data
};
class Entity // usually called Node
{
list<Relationship*> incoming;
list<Relationship*> outgoing;
};
vector<Entity*> roster;
You might want to wrap roster in some sort of EntityHandler class, to manage all of those pointers.
For #1, look up an Entity in the roster by its number (e.g. roster[5]), and that Entity's outgoing is what you want -- you can do a shallow copy (of the pointers), or a deep copy (of the Relationships).
For #2, look up the Entity and iterate over both of its Relationship lists; for each Relationship, remove the corresponding pointer from the list in the Entity at the other end, then delete the Relationship. Then delete the Entity. And don't forget to set the pointer in the roster to NULL.
Wikipedia has a good section on how to implement graphs, even has time and storage complexity.
There three main types are Incidence matrix adjacency matrix's and adjacency lists. Genrally speaking:
Incidence matrix's use a node*edge sized matrix and are best for hyper-graphs, graphs with edges/connections of to more than one node and multi-graphs, graphs witch allow more than one connection between two nodes. However the more edges, the more space.
adjacency lists are most efficient if they is lots of resizing, as data is non continuous, and sparse graphs, as they only allocate space for links that are there, but use pointers which take up more space. Adjacency lists are probably the easiest to implement for directed graphs)
lastly is the adjacency matrix, which stores a node*node matrix to check if two nodes are connected, for example if [1][3] is true the 1st node is connected to the third.
https://en.wikipedia.org/wiki/Graph_(data_structure) has good descriptions on their implementations, just watch out though, they say they is an 'incidence list', but it is just a link to an adjacency list implementation
Related
I am looking for data structure in c++ and I need an advice.
I have nodes, every node has unique_id and group_id:
1 1.1.1.1
2 1.1.1.2
3 1.1.1.3
4 1.1.2.1
5 1.1.2.2
6 1.1.2.3
7 2.1.1.1
8 2.1.1.2
I need a data structure to answer those questions:
what is the group_id of node 4
give me list (probably vector) of unique_id's that belong to group 1.1.1
give me list (probably vector) of unique_id's that belong to group 1.1
give me list (probably vector) of unique_id's that belong to group 1
Is there a data structure that can answer those questions (what is the complexity time of inserting and answering)? or should I implement it?
I would appreciate an example.
EDIT:
at the beginning, I need to build this data structure. most of the action is reading by group id. insertion will happen but less then reading.
the time complexity is more important than memory space
To me, hierarchical data like the group ID calls for a tree structure. (I assume that for 500 elements this is not really necessary, but it seems natural and scales well.)
Each element in the first two levels of the tree would just hold vectors (if they come ordered) or maps (if they come un-ordered) of sub-IDs.
The third level in the tree hierarchy would hold pointers to leaves, again in a vector or map, which contain the fourth group ID part and the unique ID.
Questions 2-4 are easily and quickly answered by navigating the tree.
For question 1 one needs an additional map from unique IDs to leaves in the tree; each element inserted into the tree also has a pointer to it inserted into the map.
First of all, if you are going to have only a small number of nodes then it would probably make sense not to mess with advanced data structuring. Simple linear search could be sufficient.
Next, it looks like a good job for SQL. So may be it's a good idea to incorporate into your app SQLite library. But even if you really want to do it without SQL it's still a good hint: what you need are two index trees to support quick searching through your array. The complexity (if using balanced trees) will be logarithmic for all operations.
Depends...
How often do you insert? Or do you mostly read?
How often do you access by Id or GroupId?
With a max of 500 nodes I would put them in a simple Vector where the Id is the offset into the array (if the Ids are indeed as shown). The group-search can than be implemented by iterating over the array and comparing the partial gtroup-ids.
If this is too expensive and you really access the strcuture a lot and need very high performance, or you do a lot of inserts I would implement a tree with a HashMap for the Id's.
If the data is stored in a database you may use a SELECT/ CONNECT BY if your systems supports that and query the information directly from the DB.
Sorry for not providing a clear answer, but the solution depends on too many factors ;-)
Sounds like you need a container with two separate indexes on unique_id and group_id. Question 1 will be handled by the first index, Questions 2-4 will be handled by the second.
Maybe take a look at Boost Multi-index Containers Library
I am not sure of the perfect DS for this. But I would like to make use of a map.
It will give you O(1) efficiency for question 1 and for insertion O(logn) and deletion. The issue comes for question 2,3,4 where your efficiency will be O(n) where n is the number of nodes.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
It is not clear from what I have read about ADTS and data structures here and else where what the difference between them is. Suppose I have a class that has private data members (an array), and I have functions in it that restrict pushing and popping elements from the data member to occur only from the top. This would be a stack, a specific data structure. But it also would be an ADT:
"An ADT is a collection of data and a set of operations on that data." ("Data Abstraction and Problem Solving with C++", Carrano, pg 17).
But since Wikipedia calls ADTs "purely theoretical entities", is the above class the ADT while the implementation, the object, the data structure?
No. 'Instantiated abstract' is essentially a contradiction in terms.
A data structure is any kind of organization of data; ADTs are a particular kind (and use) of data structures.
Well, kinda sorta. ADTs are a description consisting of a description of the state space, and the operations that are defined against that state space. So, an integer is a set of states corresponding to the range of numbers you're representing, along with the operations add, subtract and multiply. (We'll ignore divide since we'd have to define exactly which integer divide we mean.)
An integer in a computer program is indeed an instance in the sense that it's a concrete chunk of memory and appropriate code to implement the ADT.
Fred Brooks makes a nice distinction, where Int as an abstraction is a specification, the Int ADT is an implementation because it fully describes the exact type, and the code (microcode, instructions) that actually can do adding and so on is the realization. Unfortunately, that distinction isn't really commonly used except among Fred's former students.
An abstract data type is defined as a set of behaviors and properties, common to all implementations of that abstract data type. A .NET List data type has length, has data and operations on the data such as get front, get back, the sequence of data is ordered, ...
A std::list<> is another implementation of the abstract data type list.
Were you finding a list type in another programming language, you would be able to know what it does from your knowledge of the abstract data type list and how they work in other languages.
And if that list in that language would start painting pictures on your screen, you would rightfully delete that language from your system ;)
You can consider the semantics of many abstract data types fixed. Such as lists, sets, stacks, queues. In contrast, before there is a collective acceptance for an abstract data type, the semantics might vary.
As an example, were you polling opinions on definitions of the abstract data type "fuzzy set", you would most likely not get a single answer.
On a side note, if you look for some more theoretical approach to your question, you might want to search for material on "kinds". There, for example, a generic List, applied to a specific data type 'a to be contained in that list would be called a type constructor, as it constructs the concrete type List<'a> when applying List<> to the type 'a, with List<> being a higher kinded type. (I sure hope I got that one right...). Yet, it is beyond the scope of this answer to elaborate more on that matter.
How would one look for entities with specific components in an entity component system?
In my current implementation I'm storing components in a
std::unordered_map< entity_id, std::unordered_map<type_index, Component *> >.
So if a system needs access to entities with specific components, what is the most efficient way to access them.
I currently have 2 ideas:
Iterate through the map and skip the entities that don't have those components.
Create "mappers" or "views" that hold a pointer to the entity and update them every time a component is assigned to or removed from an entity.
I saw some approaches with bitmasks and such, but that doesn't seem scalable.
Your situation calls for std::unordered_multimap.
"find" method would return an iterator for the first element, which matches the key in multimap. "equal_range" method would return you a pair, containing the iterators for the first and last object, matching your key.
Actually what unordered_multimap allows you to create is an in-memory key-value database that stores a bunch of objects for the same key.
If your "queries" would get more complicated than "give me all objects with component T" and turn into something like "give me all components that have component T and B at the same time", you would be better suited to create a class that has unordered_multimap as a member and has a bunch of utility methods for querying the stuff.
More on the subject:
http://www.cplusplus.com/reference/unordered_map/unordered_multimap/equal_range/
unordered_multimap - iterating the result of find() yields elements with different value (somewhat related question - the accepted answer could be helpful)
The way I do it involves storing a back index to the entity from the component (32-bits). It adds a bit of memory overhead but the total memory overhead of a component in mine is around 8 bytes which is usually not too bad for my use cases, and around 4 bytes per entity.
Now when you have a back index to an entity, what you can do when satisfying a query for all entities that have 2 or more component types is to use parallel bit sets.
For example, if you are looking for entities with two component types, Motion, and Sprite, then we start out by iterating through the motion components and set the associated bits for the entities that own them.
Next we iterate through the sprite components and look for the entity bits already set by the pass through motion components. If the entity index appears in both the motion components and the sprite components, then we add the entity to the list of entities that contain both. A diagram of the idea as well as how to multithread it and pool the entity-parallel bit arrays:
That gives you a set intersection in linear time and with a very small m (very, very cheap work per iteration as we're just marking and inspecting a bit -- much, much, much cheaper than a hash table, e.g.). I can actually perform a set intersection between two sets with 100 million elements each in under a second using this technique. As a bonus, with some minor effort, you can make it give you the entities back in sorted order for cache-friendly access patterns if you use the bitset to grab the indices of the entities that belong in 2 or more components.
There are ways to do this in better than linear time (Log(N)/Log(64)) though it gets considerably more involved where you can actually perform a set intersection between two sets containing a hundred million elements each in under a millisecond. Here's a hint:
I'm currently coding a physical simulation on a lattice, I'm interested in describing loops in this lattice, they are closed curved composed by the edges of the lattice cells. I'm storing the information on this lattice cells (by information I mean a Boolean variable saying if the edge is valuable or no for composing a loop) in a 3 dimensional Boolean array.
I'm now thinking about a good structure to handle this loops. they are basically a list of edges, so I would need something like an array of 3d integer vectors, each edge being defined by 3 coordinates in my current parameterization. I'm already thinking about building a class around this "list" object as I'll need methods computing the loop diameter and probably more in the future.
But, I'm definitely not so aware of the choice of structure I have to do that, my physics background hasn't taught me enough in C++. And for so, I'd like to hear your suggestion for shaping this piece of code. I would really enjoy discovering some new ways of coding this kid of things.
You want two separate things. One is keeping track of all edges and allowing fast lookup of edge objects by an (int,int,int) index (you probably don't want int there but something like size_t or so). This is entirely independent from your second goal crating ordered subsets of these.
General Collection (1)
Since your edge database is going to be sparse (i.e. only a few of the possible indices will actually identify as a particular edge), my prior suggestion of using a 3d matrix is unsuitable. Instead, you probably want to lookup edges with a hash map.
How easy this is, depends on the expected size of the individual integers. That is, can you manage to have no more than 21 bit per integer (for instance if your integers are short int values, which have only 16 bit), then you can concatenate them to one 64 bit value, which already has an std::hash implementation. Otherwise, you will have to implement your own hash specialisation for, e.g., std::hash<std::array<uint32_t,3>> (which is also quite easy, and highly stackable).
Once you can hash your key, you can throw it into an std::unordered_map and be done with it. That thing is fast.
Loop detection (2)
Then you want to have short-lived data structures for identifying loops, so you want a data structure that extends on one end but never on the other. That means you're probably fine with an std::vector or possibly with an std::deque if you have very large instances (but try the vector first!).
I'd suggest simply keeping the index to an edge in the local vector. You can always lookup the edge object in your unordered_map. Then the question is how to represent the index. If Int represents your integer type (e.g. int, size_t, short, ...) it's probably the most consistent to use an std::array<Int,3> --- if the types of the integers differ, you'll want an std::tuple<...>.
Greetings code-gurus!
I am writing an algorithm that connects, for instance node_A of Region_A with node_D of Region_D. (node_A and node_D are just integers). There could be 100k+ such nodes.
Assume that the line segment between A and D passes through a number of other regions, B, C, Z . There will be a maximum of 20 regions in between these two nodes.
Each region has its own properties that may vary according to the connection A-D. I want to access these at a later point of time.
I am looking for a good data structure (perhaps an STL container) that can hold this information for a particular connection.
For example, for connection A - D I want to store :
node_A,
node_D,
crosssectional area (computed elsewhere) ,
regionB,
regionB_thickness,
regionB other properties,
regionC, ....
The data can be double , int , string and could also be an array /vector etc.
First I considered creating structs or classes for regionB, regionC etc .
But, for each connection A-D, certain properties like thickness of the region through which this connection passes are different.
There will only be 3 or 4 different things I need to store pertaining to a region.
Which data structure should I consider here (any STL container like vector?) Could you please suggest one? (would appreciate a code snippet)
To access a connection between nodes A-D, I want to make use of int node_A (an index).
This probably means I need to use a hashmap or similar data structure.
Can anyone please suggest a good data structure in C++ that can efficiently
hold this sort of data for connection A -D described above? (would appreciate a code snippet)
thank you!
UPDATE
for some reasons, I can not make use of pkgs like boost. So want to know if I can use any libraries from STL
You should try to group stuff together when you can. You can group the information on each region together with something like the following:
class Region_Info {
Region *ptr;
int thickness;
// Put other properties here.
};
Then, you can more easily create a data structure for your line segment, maybe something like the following:
class Line_Segment {
int node_A;
int node_D;
int crosssectional_area;
std::list<Region_Info>;
};
If you are limited to only 20 regions, then a list should work fine. A vector is also fine if you would prefer.
Have you considered a adjacency array for each node, which stores the nodes it is connected to, along with other data?
First, define a node
class Node
{
int id_;
std::vector<AdjacencyInfo> adjacency_;
}
Where the class AdjacencyInfo can store the myriad data which you need. You can change the Vector to a hashmap with the node id as the key if lookup speed is an issue. For fancy access you may want to overload the [] operator if it is an essential requirement.
So as an example
class Graph
{
std::map<int, Node> graph_;
}
boost has a graph library: boost.graph. Check it out if it is useful in your case.
Well, as everyone else has noticed, that's a graph. The question is, is it a sparse graph, or a dense one? There are generally two ways of representing graphs (more, but you'll probably only need to consider these two) :
adjacency matrix
adjacency list
An adjacency matrix is basically a NxN matrix which stores all the nodes in the first row and column, and connection data (edges) as cells, so you can index edges by vertices. Sorry if my English sucks, not my native language. Anyway, you should only consider adjacency matrix if you have a dense graph, and need to find node->edge->node connections really fast. However, iterating through neighbours or adding/removing vertices in an adjacency matrix is slow, the first requiring N iterations, and the second resizing the array/vector you use to store the matrix.
Your other option is to use an adjacency list. Basically, you have a class that represents a node, and one that represents an edge, that stores all the data for that edge, and two pointers that point to the nodes it's connected to. The node class has a collection of some sort (a list will do), and keeps track of all the edges it's connected to. Then you'll need a manager class, or simply a bunch of functions that operate on your nodes. Adding/connecting nodes is trivial in this case as is listing neighbours or connected edges. However, it's harder to iterate over all the edges. This structure is more flexible than the adjacency matrix and it's better for sparse graphs.
I'm not sure that I understood your question completely, but if I did, I think you'd be better off with an adjacency matrix, seems like you have a dense graph with lots of interconnected nodes and only need connection info.
Wikipedia has a good article on graphs as a data structure, as well as good references and links, and finding examples shouldn't be hard. Hope this helps :
Link