Issue while designing B+Tree template class in C++

Issue while designing B+Tree template class in C++ - c++

I'm trying to write a generic C++ implementation of B+Tree. My problem comes from the fact that there are two kinds of nodes in a B+Tree; the internal nodes, which contain keys and pointers to child nodes, and leaf nodes, which contain keys and values, and pointers in an internal node can either point to other internal nodes, or leaf nodes.
I can't figure how to model such a relationship with templates (I don't want to use casts or virtual classes).
Hope there is a solution to my problem or a better way to implement B+Tree in C++.

The simplest way:
bool mIsInternalPointer;
union {
InternalNode<T>* mInternalNode;
LeafNode<T>* mLeafNode;
};
This can be somewhat simplified by using boost::variant :)

Related

Working on Parser with Binary and unary node?

I now this is bit weird title but I hope you will understand what I am asking about. Few months back I worked on Interpreter program in python and that was kind of great but now I want to implement same in C++ but doing so is offering great problems to me as C++ is type strict.
Lets start of from what I did in my python program. First I created a Lexer that would separate everything into tokens (key-value pair) and I wrote a Parser which will convert a arithmetic grammar into Operation Nodes as BinaryOpNode, UnaryOpNode, and NumberNode. ex- (-2+7)^3 will be converted into AST as a Binary Node having left node as another Binary Node, operator as POW(power) and right node as Number node of 3. Left Node of this node is Binary Node whose Left Node is Unary Node (MINUS and a Number Node 2), opeartor as PLUS and Right Node as Number Node 7.
I did this by identifying expression, term and factor. I have wrote a Lexer in C++ but having problem in Parser. Please help me to do same in C++.
What I have done so Far??
I tried something weird but kind of working. I created a class BinaryOpNode with two void* members for Right and left Node, A Enum member for operation between Rt and Lt node. Now two another boolean members for both nodes which would help to now what type of void* Lt and Rt are? Are they UnaryOpNode or BinaryOpNode(default). This will help me to typecast the Node into respective types.
However I am not satisfied with my results as they look like less optimized and also I can't keep track of NumberNode this way.
Please Help me. THANKS IN ADVANCE

What you are looking for is polymorphism. That is, code that a programmer writes, and does different things depending on the types of the things it operates on.
C++ supports a bewildering array of ways to do polymorphism.
The most supported kind is inheritance based virtual polymorphism. In this, you create a base class:
struct INode {
virtual ~INode() {}
};
and add in common operations to it, making those common operations pure-virtual:
struct INode {
virtual ~INode() {}
virtual std::vector<INode*> GetChildren() const = 0;
};
This requires that you work with pointers instead of object instances.
In this system, if you know the type of an object, you can use dynamic_cast<RealType*>(iNodePointer) to get a pointer to the object as an instance of that type. It returns nullptr if the types don't match. This lets you access the methods you have in the descended type that aren't in the base interface.
A second kind of polymorphism is std::variant based. This is a closed set of types, which parsers often have.
using AnyNode = std::variant<Node::BinaryOp, Node::UnaryOp, Node::Number>;
here you use std::visit to operate on the concrete type instead of dynamic_cast, and your parse tree is value-based instead of pointer-based.
There is some pain when you want a node to have inside itself a vector of AnyNode.
A third way is std::function type-erasure style. Here you write your own polymorphic system that takes objects of arbitrary type and wraps their operations up in a value-semantics wrapper.
A forth option is CRTP static polymorphism. This isn't suitable to build a dynamic parse tree, but it can be used to help implement some of the above.
A fifth option is aspect oriented std::function operation bundles.
A sixth option is manual function table tweaking, basically reimplementing the C++ vtable solution manually as if you are in C, but in C++. This can permit you to have features similar to other OO-languages.
A seventh option is to write up a signals-slots system and send messages to your objects.
There are almost certainly more.
The easiest solution is probably to first learn about inheritance and virtual functions in C++ (the first option above). I personally would probably write a parse tree using std::variant at this point, but if you probably don't know enough C++ at this point to practically do that.

Recursive data-structures without the use of pointers

During my bachelor degree in CS I've come across the use of recursive data-structures a lot of times. In C++ I always ended up using pointers to make my data structures recursive, just like what I would do in C.
A simplified example could be the following:
struct Tree{
int data;
struct Tree *left, *right;
};
However, using pointers tends to be a risky job and involves a lot hours debugging and testing the code. For these resouns I would like to know if there is any other efficient way of defining recursive data-structures in C++.
In other programming languages, like Rust, I've seen things like that:
struct Node {
children: Vec<Node>,
node_type: NodeType,
}
Is there a safer and confortable way of defining such recursive structures in C++. One possibility would be to use std::Vector, but I am not aware of the performance of the method.

The reason pointers are used rather than values is because you would never be able to define your struct as its size would be infinitely recursive.
struct Tree{
int data;
struct Tree left, right;
};
Neglecting padding etc, you could approximate the size of Tree as
sizeof(Tree) == sizeof(int) + sizeof(Tree) + sizeof(Tree)
// ^data ^left ^right
but you can see that since Tree has two members of Tree, and those members themselves have two Tree members, and those have two Tree members.... you can see where this is going.

The Rust example uses a vector of children - this can be empty as well.
In C++, the member variable can be an object, a pointer or a reference (omitted for simplicity).
Since a node object cannot be used directly (this would loop infinitely) and you do not wish to use a pointer, your options are:
use a vector as well (though for a binary tree this is not the most convenient type - you could however limit it in code to always two elements),
use a map (key could be an enum CHILD_LEFT, CHILD_RIGHT),
reconsider using pointers, or better yet: smart pointers (this looks like a good use case for regular unique_ptrs).

List design (Object oriented) suggestion needed

I'm trying to implement a generic class for lists for an embedded device using C++. Such a class will provide methods to update the list, sort the list, filter the list based on some user specified criteria, group the list based on some user specified criteria etc. But there are quite a few varieties of lists I want this generic class to support and each of these varieties can have different display aspects. Example: One variety of list can have strings and floating point numbers in each of its elements. Other variety could have a bitmap, string and special character in each of it's elements. etc.
I wrote down a class with the methods of interest (sort, group, etc). This class has an object of another class (say DisplayAspect) as its member. But the number of member variables and the type of each member variable of class DisplayAspect is unknown. What would be a better way to implement this?

Why not use the std::list, C++ provides that and it provides all the functionality you mentioned(It is templated class, So it supports all data types you can think of).
Also, there is no point reinventing the wheel as the code you write will almost will never be as efficient as std::list.
In case you still want to reinvent this wheel, You should write a template list class.

First, you should probably use std::list as your list, as others have stated. It seems to me that you are having problems more with what to put in the list, however, so I'm focusing on that part of the question.
Since you want to also store multiple bits of information in each element of the list, you will need to create multiple classes, one to store each combination. You don't describe why you are storing mutiple bits of information, but you'd want to use a logical name for each class. So if, for example, you were storing a name and a price (string and a double), you could give the class some name like Product.
You mention creating a class called DisplayAspect.
If this is because you want to have one piece of code print all of these lists, then you should use inheritance and polymorphism to accomplish this goal. One way to accomplish that is to make your DisplayAspect class an abstract class with the needed functions (printItem() for example) pure virtual and have each of the classes you created for the combinations of data be subclasses of this DisplayAspect class.
If, on the other hand, you created the DisplayAspect class so that you could reuse your list code, you should look into template classes. std::list is an example of a template class and it will hold any type you'd like to put into it and in that case, you could drop your DisplayAspect class.

Others (e.g., #Als) have already given the obvious, direct, answer to the question you asked. If you really want a linked list, they're undoubtedly correct: std::list is the obvious first choice.
I, however, am going to suggest that you probably don't want a linked list at all. A linked list is only rarely a useful data structure. Given what you've said you want (sorting, grouping), and especially your target (embedded system, so you probably don't have a lot of memory to waste) a linked list probably isn't a very good choice for what you're trying to do. At least right off, it sounds like something closer to an array probably makes a lot more sense.
If you end up (mistakenly) deciding that a linked list really is the right choice, there's a fair chance you only need a singly linked list though. For that, you might want to look at Boost Slist. While it's a little extra work to use (it's intrusive), this will generally have lower overhead, so it's at least not quite a poor of a choice as many generic linked lists.

Is "node" an ADT? If so, what is its interface?

Nodes are useful for implementing ADTs, but is "node" itself an ADT? How does one implement "node"? Wikipedia uses a plain old struct with no methods in its (brief) article on nodes. I googled node to try and find an exhaustive article on them, but mostly I found articles discussing more complex data types implemented with nodes.
Just what is a node? Should a node have methods for linking to other nodes, or should that be left to whatever owns the nodes? Should a node even be its own standalone class? Or is it enough to include it as an inner struct or inner class? Are they too general to even have this discussion?

A node is an incredibly generic term. Essentially, a node is a vertex in a graph - or a point in a network.
In relation to data structures, a node usually means a single basic unit of data which is (usually) connected to other units, forming a larger data structure. A simple data structure which demonstrates this is a linked list. A linked list is merely a chain of nodes, where each node is linked (via a pointer) to the following node. The end node has a null pointer.
Nodes can form more complex structures, such as a graph, where any single node may be connected to any number of other nodes, or a tree where each node has two or more child nodes. Note that any data structure consisting of one or more connected nodes is a graph. (A linked list and a tree are both also graphs.)
In terms of mapping the concept of a "node" to Object Oriented concepts like classes, in C++ it is usually customary to have a Data Structure class (sometimes known as a Container), which will internally do all the work on individual nodes. For example, you might have a class called LinkedList. The LinkedList class then would have an internally defined (nested) class representing an individual Node, such as LinkedList::Node.
In some more cruder implementations you may also see a Node itself as the only way to access the data structure. You then have a set of functions which operate on nodes. However, this is more commonly seen in C programs. For example, you might have a struct LinkedListNode, which is then passed to functions like void LinkedListInsert(struct LinkedListNode* n, Object somethingToInsert);
In my opinion, the Object Oriented approach is superior, because it better hides details of implementation from the user.

Generally you want to leave node operations to whatever ADT owns them. For example a list should have the ability to traverse its own nodes. It doesn't need to the node to have that ability.
Think of the node as a simple bit of data that the ADT holds.

In the strictest terms, any assemblage of one or more primitive types into some kind of bundle, usually with member functions to operate on the data, is an Abstract Data Type.
The grey area largely comes from which language you operate under. For example, in Python, some coders consider the list to be a primitive type, and thus not an ADT. But in C++, the STL List is definitely an ADT. Many would consider the STL string to be an ADT, but in C# it's definitely a primitive.
To answer your question more directly: Any time you are defining a data structure, be it struct or class, with or without methods, it is necessarily an ADT because you are abstracting primitive data types into some kind of construct for which you have another purpose.

An ADT isn't a real type. That's why it's called an ADT. Is 'node' an ADT? Not really, IMO. It can be a part of one, such as a linked list ADT. Is 'this node I just created to contain thingys' an ADT? Absolutely not! It's, at best, an example of an implementation of an ADT.
There's really only one case in which ADT's can be shown expressed as code, and that's as templated classes. For example, std::list from the C++ STL is an actual ADT and not just an example of an instance of one. On the other hand, std::list<thingy> is an example of an instance of an ADT.
Some might say that a list that can contain anything that obeys some interface is also an ADT. I would mildly disagree with them. It's an example of an implementation of an ADT which can contain a wide variety of objects that all have to obey a specific interface.
A similar argument could be made about the requirements of the std::list's "Concepts". For instance that type T must be copyable. I would counter that by saying that these are simply requirements of the ADT itself while the previous version actually requires a specific identity. Concepts are higher level than interfaces.
Really, an ADT is quite similar to a "pattern" except that with ADT's we're talking about algorithms, big O, etc... With patterns we're talking about abstraction, reuse, etc... In other words, patterns are a way to build something that's implementations solve a particular type of problem and can be extended/reused. An ADT is a way to build an object that can be manipulated through algorithms but isn't exactly extensible.

Nodes are a detail of implementing the higher class. Nodes don't exist or operate on their own- they only exist because of the need for separate lifetimes and memory management than the initial, say, linked list, class. As such, they don't really define themselves as their own type, but happily exist with no encapsulation from the owning class, if their existence is effectively encapsulated from the user. Nodes typically also don't display polymorphism or other OO behaviours.
Generally speaking, if the node doesn't feature in the public or protected interface of the class, then don't bother, just make them structs.

In the context of ADT a node is the data you wish to store in the data structure, plus some plumbing metadata necessary for the data structure to maintain its integrity. No, a node is not an ADT. A good design of an ADT library will avoid inheritance here because there is really no need for it.
I suggest you read the code of std::map in your compiler's standard C++ library to see how its done properly. Granted, you will probably not see an ADT tree but a Red-Black tree, but the node struct should be the same. In particular, you will likely see a lightweight struct that remains private to the data structure and consisting of little other than data.

You're mixing in three mostly orthogonal concepts in your question: C++, nodes, ADTs.
I don't think it's useful to try to sort out what can be said in general about the intersection of those concepts.
However, things can be said about e.g. singly linked list nodes in C++.
#include <iostream>
template< class Payload >
struct Node
{
Node* next;
Payload value;
Node(): next( 0 ) {}
Node( Payload const& v ): next( 0 ), value( v ) {}
void linkInFrom( Node*& aNextPointer )
{
next = aNextPointer;
aNextPointer = this;
}
static Node* unlinked( Node*& aNextPointer)
{
Node* const result = aNextPointer;
aNextPointer = result->next;
return result;
}
};
int main()
{
using namespace std;
typedef Node<int> IntNode;
IntNode* pFirstNode = 0;
(new IntNode( 1 ))->linkInFrom( pFirstNode );
(new IntNode( 2 ))->linkInFrom( pFirstNode );
(new IntNode( 3 ))->linkInFrom( pFirstNode );
for( IntNode const* p = pFirstNode; p != 0; p = p->next )
{
cout << p->value << endl;
}
while( pFirstNode != 0 )
{
delete IntNode::unlinked( pFirstNode );
}
}
I first wrote these operations in Pascal, very early eighties.
It continually surprises me how little known they are. :-)
Cheers & hth.,

Structure for hierarchal Component storage

I've been batting this problem around in my head for a few days now and haven't come to any satisfactory conclusions so I figured I would ask the SO crew for their opinion. For a game that I'm working on I'm using a Component Object Model as described here and here. It's actually going fairly well but my current storage solution is turning out to be limiting (I can only request components by their class name or an arbitrary "family" name). What I would like is the ability to request a given type and iterate through all components of that type or any type derived from it.
In considering this I've first implemented a simple RTTI scheme that stores the base class type through the derived type in that order. This means that the RTTI for, say, a sprite would be: component::renderable::sprite. This allows me to compare types easily to see if type A is derived from type B simply by comparing the all elements of B: i.e. component::renderable::sprite is derived from component::renderable but not component::timer. Simple, effective, and already implemented.
What I want now is a way to store the components in a way that represents that hierarchy. The first thing that comes to mind is a tree using the types as nodes, like so:
component
/ \
timer renderable
/ / \
shotTimer sprite particle
At each node I would store a list of all components of that type. That way requesting the "component::renderable" node will give me access to all renderable components regardless of derived type. The rub is that I want to be able to access those components with an iterator, so that I could do something like this:
for_each(renderable.begin(), renderable.end(), renderFunc);
and have that iterate over the entire tree from renderable down. I have this pretty much working using a really ugly map/vector/tree node structure and an custom forward iterator that tracks a node stack of where I've been. All the while implementing, though, I felt that there must be a better, clearer way... I just can't think of one :(
So the question is: Am I over-complicating this needlessly? Is there some obvious simplification I'm missing, or pre-existing structure I should be using? Or is this just inheritly a complex problem and I'm probably doing just fine already?
Thanks for any input you have!

You should think about how often you need to do the following:
traverse the tree
add/remove elements from the tree
how many objects do you need to keep track of
Which is more frequent will help determine the optimum solution
Perhaps instead of make a complex tree, just have a list of all types and add a pointer to the object for each type it is derived from. Something like this:
map<string,set<componenet *>> myTypeList
Then for an object that is of type component::renderable::sprite
myTypeList["component"].insert(&object);
myTypeList["renderable"].insert(&object);
myTypeList["sprite"].insert(&object);
By registering each obejct in multiple lists, it then becomes easy to do something to all object of a given type and subtypes
for_each(myTypeList["renderable"].begin(),myTypeList["renderable"].end(),renderFunc);
Note that std::set and my std::map construct may not be the optimum choice, depending on how you will use it.
Or perhaps a hybrid approach storing only the class heirarchy in the tree
map<string, set<string> > myTypeList;
map<string, set<component *> myObjectList;
myTypeList["component"].insert("component");
myTypeList["component"].insert("renderable");
myTypeList["component"].insert("sprite");
myTypeList["renderable"].insert("renderable");
myTypeList["renderable"].insert("sprite");
myTypeList["sprite"].insert("sprite");
// this isn't quite right, but you get the idea
struct doForList {
UnaryFunction f;
doForList(UnaryFunction f): func(f) {};
operator ()(string typename) {
for_each(myTypeList[typename].begin();myTypeList[typename].end(), func);
}
}
for_each(myTypeList["renderable"].begin(),myTypeList["renderable"].end(), doForList(myFunc))

The answer depends on the order you need them in. You pretty much have a choice of preorder, postorder, and inorder. Thus have obvious analogues in breadth first and depth first search, and in general you'll have trouble beating them.
Now, if you constraint the problem a litle, there are a number of old fashioned algorithms for storing trees of arbitrary data as arrays. We used them a lot in the FORTRAN days. One of them had the key trick being to store the children of A, say A2 and A3, at index(A)*2,index(A)*2+1. The problem is that if your tree is sparse you waste space, and the size of your tree is limited by the array size. But, if I remember this right, you get the elements in breadth-first order by simple DO loop.
Have a look at Knuth Volume 3, there is a TON of that stuff in there.

If you want to see code for an existing implementation, the Game Programming Gems 5 article referenced in the Cowboy Programming page comes with a somewhat stripped down version of the code we used for our component system (I did a fair chunk of the design and implementation of the system described in that article).
I'd need to go back and recheck the code, which I can't do right now, we didn't represent things in a hierarchy in the way you show. Although components lived in a class hierarchy in code, the runtime representation was a flat list. Components just declared a list of interfaces that they implemented. The user could query for interfaces or concrete types.
So, in your example, Sprite and Particle would declare that they implemented the RENDERABLE interface, and if we wanted to do something to all renderables, we'd just loop through the list of active components and check each one. Not terribly efficient on the face of it, but it was fine in practice. The main reason it wasn't an issue was that it actually turns out to not be a very common operation. Things like renderables, for example, added themselves to the render scene at creation, so the global scene manager maintained its own list of renderable objects and never needed to query the component system for them. Similarly with phyics and collision components and that sort of thing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js