Developing dynamic branching factor trees in c++ - c++

struct avail
{
int value;
avail **child;
};
avail *n = new avail;
n->child = new avail*[25];
for (int i = 0; i < 25; i++)
n->child[i] = new avail;
This is my solution to generating dynamictrees.But I need to specify the no at the start(25). But for further code I want this to be done dynamically something along the lines of
push(avail(n->child[newindex]))
Or
n->child[29]=new avail;
I want to add nodes on a need basis and create proper hierarchy.I would have used stacks for this but I want parent child relation between the nodes. I want to avoid using vectors to complicate the code.

Related

B-Tree Node Splitting Techniques

I've stumbled upon a problem whilst doing my DSA (Data Structures and Algorithms) homework. I'm said to implement a B-Tree with Insertion and Search algorithms. As far as it goes, the search is working correctly, but I'm having trouble implementing the insertion function. Specifically the logic behind the B-Tree node-splitting algorithm. A pseudocode/C-style I could come up with is the following:
#define D 2
#define DD 2*D
typedef btreenode* btree;
typedef struct node
{
int keys[DD]; //D == 2 and DD == 2*D;
btree pointers[DD+1];
int index; //used to iterate throught the "keys" array
}btreenode;
void splitNode(btree* parent, btree* child1, btree* child2)
{
//Copies the content from the splitted node to the children
(*child1)->key[0] = (*parent)->key[0];
(*child1)->key[1] = (*parent)->key[1];
(*child2)->key[0] = (*parent)->key[2];
(*child2)->key[1] = (*parent)->key[3];
(*child1)->index = 1;
(*child2)->index = 1;
//"Clears" the parent node from any data
for(int i = 0; i<DD; i++) (*parent)->key[i] = -1;
for(int i = 0; i<DD+1; i++) (*parent)->pointers[i] = NULL
//Fixed the pointers to the children
(*parent)->index = 0;
//the line bellow was taken out for creating a new node that didn't have to be there.
//(*parent)->key[(*parent)->index] = newNode(); // The newNode() function allocs and inserts a the new key that I need to insert.
(*parent)->pointers[index] = (*child1);
(*parent)->pointers[index+1] = (*child2);
}
I'm almost sure that I'm messing up something with the pointers, but I'm not sure what. Any help is appreciated. Maybe I need a little bit more study on the B-Tree subject? I must add that while I can use basic input/output from C++, I need to use C-style structs.
You don't need to create a new node here. You've apparently already created the two new child nodes. All you have to do here after populating the children is make the parent now point to the two children, via a copy of the first key in each of them, and adjust its key count to two. You don't need to set the parent keys to -1 either.

How to generate a hashmap for huge chunk of data?

I want to make a map such that a set of pointers point to arrays of dynamic size.
I did use hashing with chaining. But since data I am using it for is huge, the program give std::bad_alloc after few iterations. The reason of which may be new used to generate the linked list.
Someone please suggest which data structure shall I use?
Or anything else that can improve memory usage with my hash table?
Program is in C++.
This is what my code looks like:
Initialization of hashtable:
class Link
{
public:
double iData;
Link* pNext;
Link(double it) : iData(it)
{ }
void displayLink()
{ cout << iData << " "; }
};
class List
{
private:
Link* pFirst;
public:
List()
{ pFirst = NULL; }
void insert(double key)
{
if(pFirst==NULL)
pFirst = new Link(key);
else
{
Link* pLink = new Link(key);
pLink->pNext = pFirst;
pFirst = pLink;
}
}
};
class HashTable
{
public:
int arraySize;
vector<List*> hashArray;
HashTable(int size)
{
hashArray.resize(size);
for(int j=0; j<size; j++)
hashArray[j] = new List;
}
};
main snippet:
int t_sample = 1000;
for(int i=0; i < k; i++) // initialize random position
{
x[i] = (cal_rand() * dom_sizex); //dom_sizex = 20e-10 cal_rand() generates rand no between 0 and 1
y[i] = (cal_rand() * dom_sizey); //dom_sizey = 10e-10
}
for(int t=0; t < t_sample; t++)
{
int size;
size = cell_nox * cell_noy; //size of hash table cell_nox = 212, cell_noy = 424
HashTable theHashTable(size); //make table
int hashValue = 0;
for(int n=0; n<k; n++) // k = 10*212*424
{
int m = x[n] /cell_width; //cell_width = 4.7e-8
int l = y[n] / cell_width;
hashValue = (kx*l)+m;
theHashTable.hashArray[hashValue]->insert(n);
}
-------
-------
}
First things first, use a Standard Container. In your specific case, you might want:
either std::unordered_multimap<int, double>
or std::unordered_map<int, std::vector<double>>
(Note: if you do not have C++11, those are available in Boost)
Your main loop becomes (using the second option):
typedef std::unordered_map<int, std::vector<double>> HashTable;
for(int t = 0; t < t_sample; ++t)
{
size_t const size = cell_nox * cell_noy;
// size of hash table cell_nox = 212, cell_noy = 424
HashTable theHashTable;
theHashTable.reserve(size);
for (int n = 0; n < k; ++n) // k = 10*212*424
{
int m = x[n] / cell_width; //cell_width = 4.7e-8
int l = y[n] / cell_width;
int const cellId = (kx*l)+m;
theHashTable[cellId].push_back(n);
}
}
This will not leak memory (reliably), although of course you might have other leaks, and thus will give you a reliable baseline. It is also probably faster than your approach, with a more convenient interface, etc...
In general you should not re-invent the wheel, unless you have a specific need that is not addressed by the available wheels or you are actually trying to learn how to create a wheel or to create a better wheel.
The OS has to solve the same issues with the memory pages, maybe it's worth looking at how that is done? First of all, let's assume all pages are on the disk. A page is a fixed size memory chunk. For your use case, let's say it's an array of your records. Because RAM is limited, the OS maintains a mapping between the page number and it's location in RAM.
So, let's say your pages have 1000 records, and you want to access record 2024, you would ask the OS for page 2, and read record 24 from that page. That way, your map is only 1/1000 in size.
Now, if your page has no mapping to a memory location, then it is either on disk or has never been accessed before (is empty). Then you need to swap out another page, and load that page from disk (and update the location mapping).
This is a very simplified description of what happens and i wouldn't be surprised if someone jumps me in the neck for describing it like this.
The point is:
What does this mean for you?
First of all, your data exceeds your RAM - you won't get around writing to disk, if you don't want to try compression first.
Second, your chains can work as pages if you want, but i wonder whether just paging your hashcode would work better. What i mean is, use the upper bits as page number, and the lower bits as offset in the page. Avoiding collisions is still key, as you want to load the least pages possible. You can still chain your pages, and end up with a much smaller map.
Second - a crucial part is deciding which pages to swap out to make room for the new pages. LRU should do ok. If you can better predict which pages you will (not) need, so much better for you.
Third - you need placeholders for your pages to tell you whether they are in-memory or on disk.
Hope this helps.

Randomly shuffling a linked list

I'm currently working on a project and the last piece of functionality I have to write is to shuffle a linked list using the rand function.
I'm very confused on how it works.
Could someone clarify on how exactly I could implement this?
I've looked at my past code examples and what I did to shuffle an array but the arrays and linked lists are pretty different.
Edit:
For further clarifications my Professor is making us shuffle using a linked list because he is 'awesome' like that.
You can always add another level of indirection... ;)
(see Fundamental theorem of software engineering in Wikipedia)
Just create an array of pointers, sized to the list's length, unlink items from the list and put their pointers to the array, then shuffle the array and re-construct the list.
EDIT
If you must use lists you might use an approach similar to merge-sort:
split the list into halves,
shuffle both sublists recursively,
merge them, picking randomly next item from one or the other sublist.
I don't know if it gives a reasonable random distribution :D
bool randcomp(int, int)
{
return (rand()%2) != 0;
}
mylist.sort(randcomp);
You can try iterate over list several times and swap adjacent nodes with certain probablity. Something like this:
const float swapchance = 0.25;
const int itercount = 100;
struct node
{
int val;
node *next;
};
node *fisrt;
{ // Creating example list
node *ptr = 0;
for (int i = 0; i < 20; i++)
{
node *tmp = new node;
tmp.val = i;
tmp.next = ptr;
ptr = tmp;
}
}
// Shuffling
for (int i = 0; i < itercount; i++)
{
node *ptr = first;
node *prev = 0;
while (ptr && ptr->next)
{
if (std::rand() % 1000 / 1000.0 < swapchance)
{
prev->next = ptr->next;
node *t = ptr->next->next;
ptr->next->next = ptr;
ptr->next = t;
}
prev = ptr;
ptr = ptr->next;
}
}
The big difference between an array and a linked list is that when you use an array you can directly access a given element using pointer arithmetic which is how the operator[] works.
That however does not preclude you writing your own operator[] or similar where you walk the list and count out the nth element of the list. Once you got this far, removing the element and placing it into a new list is quite simple.
The big difference is where the complexity is O(n) for an array it becomes O(n^2) for a linked list.

Execution time of creating a graph adt in C++

Generally, is creating an undirected graph adt supposed to take a long time?
If I have a graph of 40 nodes, and each node is connected to 20% of the other nodes, my program will stall when it tries to link the nodes together.
The max I can really get up to is 20% density of 20 nodes. My code to link vertexes together looks like this:
while(CalculateDensity()){
LinkRandom();
numLinks++;
}
void LinkRandom(){
int index = rand()%edgeList.size();
int index2 = rand()%edgeList.size();
edgeList.at(index).links.push_back(edgeList.at(index2));
edgeList.at(index2).links.push_back(edgeList.at(index));
}
Is there any way to do this faster?
EDIT: Here is where the data structure declaration:
for(int i=0; i<TOTAL_NODES; i++){
Node *ptr = new Node();
edgeList.push_back(*ptr); //populate edgelist with nodes
}
cout<<"edgelist populated"<<endl;
cout<<"linking nodes..."<<endl;
while(CalculateDensity()){
LinkRandom();
numLinks++;
}
Seems to me that you're copying a growing structure with each push_back.
That could be the cause of slowness.
If you could show the data structure declaration I could try to be more specific.
edit I still miss the Node declaration, nevertheless I would try to change the edgeList to a list of pointers to Node. Then
// hypothetic declaration
class Node {
list<Node*> edgeList;
}
//populate edgelist with nodes
for(int i=0; i<TOTAL_NODES; i++)
edgeList.push_back(new Node());
....
void LinkRandom(){
int index = rand()%edgeList.size();
int index2 = rand()%edgeList.size();
edgeList.at(index)->links.push_back(edgeList.at(index2));
edgeList.at(index2)->links.push_back(edgeList.at(index));
}

C++ inserting (and shifting) data into an array

I am trying to insert data into a leaf node (an array) of a B-Tree. Here is the code I have so far:
void LeafNode::insertCorrectPosLeaf(int num)
{
for (int pos=count; pos>=0; pos--) // goes through values in leaf node
{
if (num < values[pos-1]) // if inserting num < previous value in leaf node
{continue;} // conitnue searching for correct place
else // if inserting num >= previous value in leaf node
{
values[pos] = num; // inserts in position
break;
}
}
count++;
} // insertCorrectPos()
Before the line values[pos] = num, I think need to write some code that shifts the existing data instead of overwriting it. I am trying to use memmove but have a question about it. Its third parameter is the number of bytes to copy. If I am moving a single int on a 64 bit machine, does this mean I would put a "4" here? If I am going about this completely wrong any any help would be greatly appreciated. Thanks
The easiest way (and probably the most efficient) would be to use one of the standard libraries predefined structures to implement "values". I suggest either list or vector. This is because both list and vector has an insert function that does it for you. I suggest the vector class specifically is because it has the same kind of interface that an array has. However, if you want to optimize for speed of this action specifically, then I would suggest the list class because of the way it is implemented.
If you would rather to it the hard way, then here goes...
First, you need to make sure that you have the space to work in. You can either allocate dynamically:
int *values = new int[size];
or statically
int values[MAX_SIZE];
If you allocate statically, then you need to make sure that MAX_SIZE is some gigantic value that you will never ever exceed. Furthermore, you need to check the actual size of the array against the amount of allocated space every time you add an element.
if (size < MAX_SIZE-1)
{
// add an element
size++;
}
If you allocate dynamically, then you need to reallocate the whole array every time you add an element.
int *temp = new int[size+1];
for (int i = 0; i < size; i++)
temp[i] = values[i];
delete [] values;
values = temp;
temp = NULL;
// add the element
size++;
When you insert a new value, you need to shift every value over.
int temp = 0;
for (i = 0; i < size+1; i++)
{
if (values[i] > num || i == size)
{
temp = values[i];
values[i] = num;
num = temp;
}
}
Keep in mind that this is not at all optimized. A truly magical implementation would combine the two allocation strategies by dynamically allocating more space than you need, then growing the array by blocks when you run out of space. This is exactly what the vector implementation does.
The list implementation uses a linked list which has O(1) time for inserting a value because of it's structure. However, it is much less space inefficient and has O(n) time for accessing an element at location n.
Also, this code was written on the fly... be careful when using it. There might be a weird edge case that I am missing in the last code segment.
Cheers!
Ned