I'm trying to implement a data structure that allows me to look up a number in a database as quickly as possible. Let's say I have a database that has 5450 different numbers. My primary concern is speed not memory efficiency. I found this article online about Multi Way Tree: http://c0.typesafety.net/courses/15122-f10/lectures/18-tries.pdf. So I decided to implement a 10-way tree where each node is an array size 10 but I'm having a bit of difficulty how to create classes for the structure. Here is a rough outline that I came up with:
class MSTNode{
bool isDigit; //is it one of the digit in the number
int arrayNode[];
MST(bool isWord): isWord(isWord){
arrayNode[] = [0,1,2,3,4,5,6,7,8,9];
}
}
class MST{
MSTNode * root;
//What functions should be included in this class?
//Insert Function?
//Search Function?
}
I just need a little help to get the ball rolling. I would appreciate very much if somebody can point out the potential problem with my design above. What should be included? what should not? Basically, I need help to come up with the design of the data structure. In no way, I'm looking to get free code from you. I just need help in the beginning with the design, I can implement the rest.
You may have something like:
class MSTNode{
public:
void Insert(unsigned int n) {
// GetOrCreate MSTNode in the first digit of n
// and recursively call insert with n without this digit
// once no more digit, set the isFinal flag.
}
bool Search(unsigned int n) const {
// Get MSTNode of the first digit of n
// if nullptr, return false.
// and recursively call Search with n without this digit
// once no more digit, return the isFinal flag.
}
private:
std::unique_ptr<MSTNode> arrayNode[10];
bool isFinal = false; //is it one of the digit in the number
};
With the first MSTNode the root.
Related
I am making an inter-city route planning program where the graph that is formed has string-type nodes (e.g. LHR, ISB, DXB). It's undirected but weighted, and is initialized as:
map<pair<string, string>, int> city;
and then I can add edges by for example:
Graph g;
g.addEdge("DXB", "LHR", 305);
g.addEdge("HTR", "LHR", 267);
g.addEdge("HTR", "ISB", 543);
and the resultant output will be:
ISB LHR
DXB 0 305
HTR 543 267
Now, the question... I'm trying to implement Dijkstra's algorithm in this graph but so far have been unable to correctly run it on string-type nodes and opposed to learning and doing it on int-type nodes. Can someone guide me through the correct steps of implementing it in the most efficient way possible?
The data structure used by a graph application has a big impact on the efficiency and ease of coding.
Many designs start off with the nodes. I guess the nodes, in the problems that are being modelled, often have a physical reality while the links can be abstract relationships. So it is more natural to start writing a node class, and add on the links later.
However, when coding algorithms that solve problems in graph theory, it becomes clear that the links are the real focus. So, lets start with a link class.
class cLink
{
public:
cLink(int c = 1)
: myCost(c)
{
}
int myCost; // a constraint e.g. distance of a road, max xapacity of a pipe
int myValue; // a calculated value, e.g. actual flow through a pipe
};
If we store the out edges of node in a map keyed by the destination node index, then the map will be an attribute of the source node.
class cNode
{
public:
cNode(const std::string &name = "???")
: myName(name)
{
}
std::string myName;
std::map<int, cLink> myLink; // out edges of node, keyed by destination
};
We have links and nodes, so we are ready to construct a class to store the graph
class cGraph {
public:
std::map<int, cNode> myG; // the graph, keyed by internal node index
};
Where did the node index come from? Humans are terrible at counting, so better the computer generates the index when the node is added.
cGraph::createNode( const std::string& name )
{
int n = myG.size();
myG.insert(std::make_pair(n, cNode(name)));
}
Don't implement this! It has a snag - it can create two nodes with the same name. We need to be able to check if node with a specified name exists.
int cGraph::find(const std::string &name)
{
for (auto n : myG)
{
if (n.second.myName == name)
{
return n.first;
}
}
return -1;
}
This is inefficient. However, it only needs to be done once when the node is added. Then the algorithms that search through the graph can use fast lookup of nodes by index number.
Now we can prevent two nodes being created with the same name
int cGraph::findoradd(const std::string &name)
{
// search among the existing nodes
int n = find(name);
if (n < 0)
{
// node does not exist, create a new one
// with a new index and add it to the graph
n = myG.size();
myG.insert(std::make_pair(n, cNode(name)));
}
return n;
}
Humans, in addition to being terrible counters, are also over confident in their counting prowess. When they specify a graph like this
1 -> 2
1 -> 3
Let’s not be taken in. Let’s regard these numbers as names and continue to use our own node index system.
/** Add costed link between two nodes
*
* If the nodes do not exist, they will be added.
*
*/
void addLink(
const std::string & srcname,
const std::string & dstname,
double cost = 1)
{
int u = findoradd(srcname);
int v = findoradd(dstname);
myG.find(u)->second.myLink.insert(
std::make_pair(v, cLink(cost)));
if (!myfDirected)
myG.find(v)->second.myLink.insert(
std::make_pair(u, cLink(cost)));
}
With the addition of a few getters and setters, we are ready to start implementing graph algorithms!
To see an complete implementation, including Dijsktra, using these ideas, check out PathFinder.
The core problem is that when we work on graphs with integer vertices, the index of the adjacency list represents the node (since the indexes are also numbers). Now instead of using adjacency list like vector<pair<int, int> > adj[N]we can use map<string,vector<string, int> > adj. Now adj["DXB"] will contain a vector of pairs of the form <string, int> which is the <name, weight> for the cities connected to "DXB".
If this approach seems very complex, then you use some extra memory to map a city to a number, and then you can code everything considering that the graph has integer vertices.
I was given a homework that asked me to iterate through a linked list with a given class header which I should not change:
template<typename ItemType>
class LinkedList{
public:
...
LinkedList();
LinkedList(const LinkedList<ItemType>&);
int getCurrentSize340RecursiveNoHelper() const;
private:
Node<ItemType>* headPtr{ nullptr }; // Pointer to first node
}
The node class header is:
template<typename ItemType>
class Node {
public:
...
Node();
Node(const ItemType&);
Node(const ItemType&, Node<ItemType>*);
Node<ItemType>* getNext() const;
private:
ItemType item; // A data item
Node<ItemType>* next{ nullptr }; // Pointer to next node
}
In the function getCurrentSize340RecursiveNoHelper(), we are supposed to iterate the linked list to get the size.
I know that I could iterate the linked list with the help of static or global, but my professor says that we should avoid using them. Is there any possible way to do that?
You can recurse through member variables rather than parameters.
int LinkedList::getCurrentSize340RecursiveNoHelper() const {
if (this->headPtr == nullptr){
return 0;
}
Node<ItemType> nextNode = this->headPtr->getNext();
if (nextNode != nullptr){
LinkedList<ItemType> restOfList;
restOfList.add(nextNode); // a bit weird but this is the only way you can set the headPtr of a linked list.
return 1 + restOfList.getCurrentSize340RecursiveNoHelper();
}
return 1;
}
In the function getCurrentSize340RecursiveNoHelper(), we are supposed
to iterate the linked list to get the size.
I know that I could iterate the linked list with the help of static or
global, but my professor says that we should avoid using them.
Is there any possible way to do that?
Last question first: YES, this is possible, even easy. Software is remarkably flexible.
Recursion without function parameters (nor static vars, nor global vars).
Iteration and recursion without parameters are both easy to do, once you've seen how.
And with practice, you can easily avoid both static and global vars.
See also: https://stackoverflow.com/a/45550373/2785528, which presents a simple mechanism to "pass user input through a system without using global vars". We might summarize it into 2 steps, a) early in main instantiate a custom class instance to contain (for 'transport') any user inputs / values that you would otherwise place into global vars for the various users to fetch, and then b) pass these transport objects by reference to the ctors of objects that need them (in similar fashion to a travel case). Easy.
Kudo's to the professor to insist on user-defined-types (UDT). This is the stuff you need to practice on.
However, for this post, it will be simpler to study the how-to (of iteration and recursion without parameters, etc.) by simply ignoring the UDT with which you stated no problems. In this way, we can concentrate on the function forms.
To simplify even more, my examples will be Functors ... a short and simple form of classes. I recommend Functor classes as both simple and efficient.
And one more simplification, I will use std::list. There are plenty of examples of how to make, fill, and use this container.
One key idea (to support no-parameter-functions) is to 'bundle' your data and methods into the class. Encapsulation is your friend.
Idea 1 -- Minimize main(). Keep it short. Get out and direct to business.
int main(int, char**)
{
int retVal = 0;
retVal += F820_Iterative_t()();
retVal += F820_Recursive_t()();
return retVal;
}
Here, two functors are invoked. I separated the iterative from recursive examples.
Note that these functors are invoked early in main. There are many things that get initialized before main (but see the well known initialization fiasco). This simplifies and controls when these init's happen.
The first functor will instantiate, process, then destruct, completing its full lifetime before the next functor gets started, i.e. they are serialized.
data types:
// examples use std::list
// user typedefs ---------------vvvvvvvvvvv
typedef list<string> StrList_t;
typedef list<string>::iterator StrListIt_t;
data attributes:
// Functor
class F820_Iterative_t
{
// NOTE: class instance data attributes are not global vars
StrList_t m_strList; // std::list<std::string>
StrListIt_t m_it; // std::list<std::string>::iterator
...
// Functor
class F820_Recursive_t
{
StrList_t m_strList; // std::list<std::string>
StrListIt_t m_it; // std::list<std::string>::iterator
...
Example 1 -- iterative element count
// iterate with no parameters, static vars, or global vars
uint F820_Iterative_t::iterateCount( )
{ // --^-- no params
uint lcount = 0;
do // -------------iterative loop
{
if (m_it == m_strList.end()) // detect end of list
break; // kick out
lcount += 1;
m_it++; // step to next element
} while(true);
return(lcount);
}
Example 2 -- recursive element count
// recurse with no parameters, static vars, or global vars
uint F820_Recursion_t::recurseCount( )
{ // --^-- no params
if (m_it == m_strList.end()) // RTC (recursion termination clause)
return 0;
m_it++; // step to next element
return (1 + recurseCount()); // tail recursion
} // --^^^^^^^^^^^^-- recursive invocation
In both of the above element count functions, the Functor data attributes are initialized before F820_xxx::xxxCount() is invoked.
This code uses no function parameters, and no global vars. The vars used in these funcs (m_it and m_strList) are data attributes of the class instance. The function access them directly through the implied 'this' pointer.
Life interrupts.
Both xxxxCount() are above for comparison. Ask for more if you want, this code compiles and runs. I plan to find time to insert the rest.
When you're dealing with recursive problems, a trick is to just assume that the recursive function works if and only if you give it a simpler problem to work on than the one you're dealing with. You're given a pointer to the head of the linked list. After you've dealt with the base case, you now know that the list has at least one element. A simpler problem you can ask your recusive function to do is to calculate the length of the rest of the list and then just add 1 to the result.
Premise: suppose I have a rectangular subset of 2D space and a collection of points, all with different x-values, in this subset. In the interest of the optimization of an algorithm as yet unwritten, I want to split my box into cells according to the following process: I halve my rectangle into 2 equal parts along the x-axis. Then I repeatedly halve each sub-rectangle until every cell contains either 1 point or none at all.
In this illustration the vertical lines represent the “halving process” and the lines are ordered by darkness (darker is newer).
First I’ll define two basic classes:
class Point{
private:
double x;
double y;
public:
// [...]
// the relevant constructor and getter
// overloaded operators +, -, * for vector calculations
};
class Box{
private:
Point bottom_left_point;
double width;
double height;
public:
Box(Point my_point, double my_x, double my_y) : // constructor
bottom_left_point(my_point), width(my_x), height(my_y){}
bool contains(const Point& p); // returns true iff the box contains p in the geometric sense
Box halve(bool b) const; // takes a boolean as input and returns the left half-rectangle for false, and the right half-rectangle for true
};
Now to implement the “halving algorithm” I’ll need a binary tree-like structure. Each node will represent a sub-cell of the rectangle (with the root node representing the total rectangle). A node may have two children, in which case the children represent its left and right halves. A node may also have a pointer to a particle which exists in the cell. The ultimate idea will be to start with an empty tree and insert the points in, one by one using a method insert(Point* to_be_inserted).
So I’ll define the following recursive class, whose private attributes are rather self-explanatory:
class Node;
class Node{
private:
enum node_type{ INT, EXT, EMPTY };
node_type type;
// type == INT means that it is an interior node, i.e. has children
// type == EXT means that it is an exterior node, i.e. has no children but contains a point
// type == EMPTY means that it has no children and no point
std::array<Node*,2> children;
Box domain; // the geometric region which is being represented
Point* tenant; // address of the particle that exists in this cell (if one such exists)
public:
Node(Box my_domain) :
type(EMPTY), children({nullptr}), domain(my_domain){}
//
// to be continued...
The first order of business is to define a subdivide() method which endows my node with two children:
void Node::subdivide(void){
type = INT;
children[0] = new Node(domain.halve(false));
children[1] = new Node(domain.halve(true));
}
Now everything is in place to write the crux of this whole affair, the insert method. Since it will be written recursively, the easiest solution is to have a boolean return type which tells us if the insertion was a success or failure. With this in mind, here’s the code:
bool Node::insert(Point* to_be_inserted){
if(not domain.contains(*to_be_inserted)) return false;
switch(type){
case INT:{
for(Node* child : children) if(child->insert(to_be_inserted)) return true;
return false;
}
case EXT:{
subdivide();
for(Node* child : children) if(child->insert(to_be_inserted)) break;
tenant = nullptr;
for(Node* child : children) if(child->insert(to_be_inserted)) return true;
break;
}
case EMPTY:{
type = EXT;
tenant = to_be_inserted;
return true;
}
}
throw 1; // this line should not, in, theory ever be reached
}
(Note that, for the sake of abstraction and generality, I have used for loops on the array children when I could have simply written out the two cases.)
Explanation:
First we check if to_be_inserted is in the geometric region represented by this. If not, return false.
If this is an internal node, we pass the point on to the each child until it is successfully inserted.
If this is an external node, that means that we have to split the node in two in order to be able to properly isolate to_be_inserted from the point that currently lives in the node.
First we call multiply().
Then we attempt to insert the current tenant into one of the children (please excuse how obscene this sounds, I assure you that it’s unintentional).
Once that is done, we do the same with to_be_inserted and return the result. (Note that a priori the insertion would be a success at this point because of the preliminary call to box::contains.
Finally, if this is an EMPTY node, we simply have to assign tenant to *to_be_inserted and change type to EXT and we’re done.
Ok, so let's try it out with a simple main:
int main(void){
Box my_box(ORIGIN, 1.0, 1.0); // rectangle of vertices (0,0),(0,1),(1,0),(1,1)
Node tree(box); // initializes an empty tree representing the region of my_box
Point p(0.1, 0.1);
Point q(0.6, 0.7);
tree.insert(&p);
tree.insert(&q);
return 0;
}
This compiles, but upon running the exception at the bottom of insert is thrown after a few calls. How is this possible, given that at no point a Node is constructed without a type value?
Edit: I have noticed, as well as this one, several possible errors which may also occur with small changes in the code:
An inexplicable call to nullptr->insert(something)
A call to insert by the address 0x0000000000000018 which doesn't point to an initialized Node.
The entirety of the code, including a makefile with the relevant debugging flags, can be found at https://github.com/raphael-vock/phantom-call.
I'm an absolute beginner in OOP (and C++). Trying to teach myself using resources my university offers for students of higher years, and a bunch of internet stuff I can find to clear things up.
I know basic things about OOP - I get the whole point of abstracting stuff into classes and using them to create objects, I know how inheritance works (at least, probably the basics), I know how to create operator functions (although as far as I can see that only helps in code readability in a sense that it becomes more standard, more language like), templates, and stuff like that.
So I've tried my first "project": to code Minesweeper (in command line, I never created a GUI before). Took me a few hours to create the program, and it works as desired, but I feel like I'm missing a huge point of OOP in there.
I've got a class "Field" with two attributes, a Boolean mine and a character forShow. I've defined the default constructor for it to initialize an instance as an empty field (mine is false), and forShowis . (indicating a not yet opened filed). I've got some simple inline functions such as isMine, addMine, removeMine, setForShow, getForShow, etc.
Then I've got the class Minesweeper. Its attributes are numberOfColumns, ~ofRows, numberOfMines, a pointer ptrGrid of type Mine*, and numberOfOpenedFields. I've got some obvious methods such as generateGrid, printGrid, printMines (for testing purposes).
The main thingy about it is a function openFiled which writes the number of mines surrounding the opened field, and another function clickField which recursively calls itself for surrounding fields if the field which is currently being opened has 0 neighbor mines. However, those two functions take an argument -- the index of the field in question. That kinda misses the point of OOP, if I understand it correctly.
For example, to call the function for the field right to the current one, I have to call it with argument i+1. The moment I noticed this, I wanted to make a function in my Field class which would return a pointer to the number right to it... but for the class Field itself, there is no matrix, so I can't do it!
Is it even possible to do it, is it too hard for my current knowledge? Or is there another more OOP-ish way to implement it?
TLDR version:
It's a noob's implemetation of Minesweeper game using C++. I got a class Minesweeper and Field. Minesweeper has a pointer to matrix of Fields, but the navigation through fields (going one up, down, wherever) doesn't seem OOP-ishly.
I want to do something like the following:
game->(ptrMatrix + i)->field.down().open(); // this
game->(ptrMatrix + i + game.numberOfColumns).open(); // instead of this
game->(ptrMatrix + i)->field.up().right().open(); // this
game->(ptrMatrix + i + 1 - game.numberOfColumns).open(); // instead of this
There are a couple of ways that you could do this in an OOP-ish manner. #Peter Schneider has provided one such way: have each cell know about its neighbours.
The real root of the problem is that you're using a dictionary (mapping exact coordinates to objects), when you want both dictionary-style lookups as well as neighbouring lookups. I personally wouldn't use "plain" OOP in this situation, I'd use templates.
/* Wrapper class. Instead of passing around (x,y) pairs everywhere as two
separate arguments, make this into a single index. */
class Position {
private:
int m_x, m_y;
public:
Position(int x, int y) : m_x(x), m_y(y) {}
// Getters and setters -- what could possibly be more OOPy?
int x() const { return m_x; }
int y() const { return m_y; }
};
// Stubbed, but these are the objects that we're querying for.
class Field {
public:
// don't have to use an operator here, in fact you probably shouldn't . . .
// ... I just did it because I felt like it. No justification here, move along.
operator Position() const {
// ... however you want to get the position
// Probably want the Fields to "know" their own location.
return Position(-1,-1);
}
};
// This is another kind of query. For obvious reasons, we want to be able to query for
// fields by Position (the user clicked on some grid), but we also would like to look
// things up by relative position (is the cell to the lower left revealed/a mine?)
// This represents a Position with respect to a new origin (a Field).
class RelativePosition {
private:
Field *m_to;
int m_xd, m_yd;
public:
RelativePosition(Field *to, int xd, int yd) : m_to(to), m_xd(xd),
m_yd(yd) {}
Field *to() const { return m_to; }
int xd() const { return m_xd; }
int yd() const { return m_yd; }
};
// The ultimate storage/owner of all Fields, that will be manipulated externally by
// querying its contents.
class Minefield {
private:
Field **m_field;
public:
Minefield(int w, int h) {
m_field = new Field*[w];
for(int x = 0; x < w; x ++) {
m_field[w] = new Field[h];
}
}
~Minefield() {
// cleanup
}
Field *get(int x, int y) const {
// TODO: check bounds etc.
// NOTE: equivalent to &m_field[x][y], but cleaner IMO.
return m_field[x] + y;
}
};
// The Query class! This is where the interesting stuff happens.
class Query {
public:
// Generic function that will be instantiated in a bit.
template<typename Param>
static Field *lookup(const Minefield &field, const Param ¶m);
};
// This one's straightforwards . . .
template<>
Field *Query::lookup<Position>(const Minefield &field, const Position &pos) {
return field.get(pos.x(), pos.y());
}
// This one, on the other hand, needs some precomputation.
template<>
Field *Query::lookup<RelativePosition>(const Minefield &field,
const RelativePosition &pos) {
Position base = *pos.to();
return field.get(
base.x() + pos.xd(),
base.y() + pos.yd());
}
int main() {
Minefield field(5,5);
Field *f1 = Query::lookup(field, Position(1,1));
Field *f0 = Query::lookup(field, RelativePosition(f1, -1, -1));
return 0;
}
There are a couple of reasons why you might want to do it this way, even if it is complicated.
Decoupling the whole "get by position" idea from the "get neighbour" idea. As mentioned, these are fundamentally different, so expose a different interface.
Doing it in this manner gives you the opportunity to expand later with more Query types in a straightforwards fashion.
You get the advantage of being able to "store" a Query for later use. Perhaps to be executed in a different thread if it's a really expensive query, or in an event loop to be processed after other events, or . . . lots of reasons why you might want to do this.
You end up with something like this: (C++11 ahead, be warned!)
std::function<Field *()> f = std::bind(Query::lookup<RelativePosition>,
field, RelativePosition(f1, -1, -1));
. . . wait, what?
Well, what we essentially want to do here is "delay" an execution of Query::lookup(field, RelativePosition(f1, -1, -1)) for later. Or, rather, we want to "set up" such a call, but not actually execute it.
Let's start with f. What is f? Well, by staring at the type signature, it appears to be a function of some sort, with signature Field *(). How can a variable be a function? Well, it's actually more like a function pointer. (There are good reasons why not to call it a function pointer, but that's getting ahead of ourselves here.)
In fact, f can be assigned to anything that, when called, produces a Field * -- not just a function. If you overload the operator () on a class, that's a perfectly valid thing for it to accept as well.
Why do we want to produce a Field * with no arguments? Well, that's an execution of the query, isn't it? But the function Query::lookup<RelativePosition> takes two arguments, right?
That's where std::bind comes in. std::bind essentially takes an n-argument function and turns it into an m-argument function, with m <= n. So the std::bind call takes in a two-place function (in this case), and then fixes its first two arguments, leaving us with . . .
. . . a zero-argument function, that returns a Field *.
And so we can pass around this "function pointer" to a different thread to be executed there, store it for later use, or even just repeatedly call it for kicks, and if the Position of Fields was to magically change for some reason (not applicable in this situation), the result of calling f() will dynamically update.
So now that I've turned a 2D array lookup into a mess of templates . . . we have to ask a question: is it worth it? I know this is a learning exercise and all, but my response: sometimes, an array is really just an array.
You can link the four neighbours to the cell via pointers or references. That would likely happen after the playing field has been created. Whether that's good or bad design I'm not sure (I see the same charme though that you see). For large fields it would increase the memory footprint substantially, because a cell probably doesn't hold that much data besides these pointers:
class Cell
{
// "real" data
Cell *left, *right, *upper, *lower;
// and diagonals? Perhaps name them N, NE, E, SE, S...
};
void init()
{
// allocate etc...
// pseudo code
foreach r: row
{
foreach c: column
{
// bounds check ok
cells[r][c].upper = &cells[r-1][c];
cells[r][c].left = &cells[r][c-1];
// etc.
}
}
// other stuff
}
I currently have have a HashSet of NElement objects. Each NElement object has a unique Element field, and an integer n.
Here are 2 operations I need to do with the data:
Iterate over all the values in collection.
With Element e, search the collection for an instance of NElement that has e and process it.
Here's an example of #2:
public void Add(NElement ne) {
foreach(NElement ne2 in elements) { //elements is the HashSet
if(ne2.element == ne.element) {
ne2.Number += ne.Number; //Number is the integer
return;
}
}
elements.Add(ne);
}
I think there is a better way to accomplish this using a collection other than a List or Set. Any suggestions?
A possible solution would be a bit of a different design. A molecular formula consists of a bunch of elements along with how many of those elements there are. So a possible solution is to have a MolecularFormula class that wraps this information, which is based in a
Map<Element, int>.
A possible example:
public class MolecularFormula
{
private Map<Element, int> elements = new HashMap<Element, int>();
//... Constructors etc
//A list to iterate through all values
public List<NElement> getElements()
{
List<NElement> retList = new ArrayList<NElement>();
foreach(Element e : elements)
{
retList.put(new NElement(e, elements.get(e));
}
return retList;
}
//To add something
public void add(Element e, int num)
{
if(elements.containsKey(e))
{
int newNum = elements.get(e) + num;
elements.remove(e);
elements.put(e, newNum);
}
else
{
elements.put(e, num);
}
}
}
This is hastily thrown together and not very efficient at all, but it should give you an idea of a possible option.
Try using SMARTS, SMILES, InChi or ASL. The first two are open source, I believe. InChi is maintained by the IUPAC, and is nicely hashable for database use. ASL is proprietary to Schrödinger, Inc, though if you are already using Schrödinger software, I'd recommend using their Python API directly.
Using any of these tools, you could find functional groups (or atoms) described by a specific SMARTS/SMILES/ASL string within a molecule described by SMARTS/SMILES/ASL.