How to implement a compact linked list with array? - c++

Here is the question of exercise CLRS 10.3-4 I am trying to solve
It is often desirable to keep all elements of a doubly linked list compact in storage,
using, for example, the first m index locations in the multiple-array representation.
(This is the case in a paged, virtual-memory computing environment.) Explain how to implement the procedures ALLOCATE OBJECT and FREE OBJECT so that the representation is compact. Assume that there are no pointers to elements of the linked list outside the list itself. (Hint: Use the array implementation of a stack.)
Here is my soln so far
int free;
int allocate()
{
if(free == n+1)
return 0;
int tmp = free;
free = next[free];
return tmp;
}
int deallocate(int pos)
{
for(;pos[next]!=free;pos[next])
{
next[pos] = next[next[pos]];
prev[pos] = prev[next[pos]];
key[pos] = key[next[pos]];
}
int tmp = free;
free = pos;
next[free] = tmp;
}
Now , The problem is , If this is the case , We don't need linked list. If deletion is O(n) we can implement it using normal array. Secondly I have not used the array implementation of stack too . So where is the catch? How should I start?

You don't have to shrink the list right away. Simply leave a hole and link that hole to your free list. Once you've allocated the memory, it's yours. So let's say your page size is 1K. Your initial allocated list size would then be 1K, even if the list is empty. Now you can add and remove items very effectively.
Then introduce another method to pack your list, i.e. remove all holes. Keep in mind that after calling the pack-method, all 'references' become invalid.

Related

How to manage an array of pointers to a struct

I have the following struct:
struct Item
{
Item* nextPtr;
int intKey;
int intValueLength;
};
Based of such a struct I need to maintain several linked lists, which means I need to keep track of one head pointer for each one. I have thought about using an array (HEADS) which will contain a head pointer for each list. The number of lists is variable and will be calculated at run time so I am defining the array dynamically as follows:
int t = 10;
Item* HEADS = new Item[t];
Firstly, I need to initialize each head pointer to NULL because the linked lists are empty when the program runs. How do I do this initialization?
for (int i = 0; i <= t - 1; i++)
// Initialize each element of HEADS to NULL.
And, of course, I will also need to update each element of HEADS with the proper pointer to a linked list (when inserting and deleting items) and also to get the value of each head pointer to display the elements of each list.
I have seen other posts similar to this one in the forum but I am still confused, that is why I am asking my specific situation.
Is this a good approach?
I will very much appreciate your advice.
Respectfully,
Jorge Maldonado
In C++ the common way to write the initialization for loop would be
for (int i = 0; i < t ; i++)
HEADS[i] = NULL;
Or you could write
for (int i = 0 ; i < t ; HEADS[i++] = NULL);
which is slightly more compact.
As to the question of whether an array of pointers is a good idea or not - if you're going to have a variable number of lists, perhaps you should use a linked list of pointers to other linked lists.
I do wonder about your data structure, though. In it you have a pointer to the next element in the list, a key value, and a the length of the value, but you don't appear to have a reference to a value - unless the "key" is really the value, in which case you have mixed terminology - that is, you refer to something in one place as a "key" and in another as a "value. Perhaps you need a pointer to a "value"? But I don't know what you're trying to do here so I just thought I'd note that issue.
Best of luck.
Good approach? That's a very, very dependent on things. Good for a student starting to learn C, maybe. Good for a real C++ programmer? Absolutely not. If you really want to create a linked-list, you should make a class that encompasses each element of these, and dynamically add elements. This is how std::list, for example, works. (std::list is doubly-linked list, and way more complicated).
Here's a sample class of how this should look like (off the top of my head; haven't compiled it, but should work):
struct LinkedList
{
Item* list;
int size = 0;
LinkedList() //constructor of this class/struct, it's the function that will be called once you create an object of LinkedList
{
list = nullptr; //don't use NULL, it's deprecated (this is for C++11, or change it back to NULL if you insist on using C++98)
}
addItem(const int& key)
{
Item item; //construct a new item
item.intKey = key; //fill the value in the item
Item* newList = new Item[size+1]; //create the new list with one more element
for(int i = 0; i < size; i++) //copy the old list to the new list
{
newList[i] = list[i]; //copy element by element
}
list[size] = item; //fill in the new item
if(size > 0)
{
list[size - 1].nextPtr = &list[size]; //assign "next pointer" for previous element
}
size = size+1; //increase the size of the list
}
~linkedList()
{
if(list != nullptr)
{
delete[] list;
}
}
}
Now this is better, but it's still far from optimal. However, this is how C++ should be used. You create objects and deal with them. What you did above is more like C, not C++.
To my code, you have to call:
LinkedList myList;
myList.addItem(55);
There are many things to do here to make this optimal. I'll mention a few:
In my code, every time you add an item, a new array is allocated. This is bad! std::vector solves this problem by allocating a bigger size than needed (for example, you add 1 item, it reserves 10, but uses only 1, and doesn't tell you that). Once you need more than 10, say 11, it reserves 20, maybe. This optimizes performance.
Try to read my code and understand it. You'll learn so much. Ask questions; I'll try to answer. And my recommendation is: get a C++ book, and start reading.

Memory fragmentation using std list?

I'm using list of lists to store points data in my appliation.
Here some examples test I made:
//using list of lists
list<list<Point>> ls;
for(int i=0;i<10000;++i)
{
list<Point> lp;
lp.resize(4);
lp.pushback(Point(1,2));
ls.push_back(lp);
}
I asume that memory used will be
10k elements * 5 Points * Point size = 10000*5*2*4=400.000 bytes + some overhead of list container, but memory used by programm rises dramatically.
Is it due to overhead of list container or maybe because of memory fragmentation?
EDIT:
add some info and another example
Point is mfc CPoint class or you can define your own point class with int x,y , I'm using VS2008 in debug mode, Win XP, and Window Task Manager to view memory of application
I can't use vector instead of outer list because I don't know total size N of it beforehand, so I must push_back every new entry.
here is modified example
int N=10000;
list<vector<CPoint>> ls;
for(int i=0;i<N;++i)
{
vector<CPoint> vp;
vp.resize(5);
vp.reserve(5);
ls.push_back(vp);
}
and I compare it to
CPoint* p= new CPoint[N*5];
It's not "+ some overhead of list container". List overhead is linear with the number of objects, not constant. There's 50,000 Points, but with each Point you also have two pointers (std::list is doubly-linked), and also with each element in ls, you have two pointers. Plus, each list is going to have a head and tail pointer.
So that's 140,002 (I think) extra pointers that your math doesn't account for. Note that this dwarfs the size of the Point objects themselves, since they're so small. You sure that list is the right container for you? vector has constant overhead - basically three pointer per container, which would be just 30,003 additional pointers on top of just the Point objects. That's a large memory savings - if that is something that matters.
[Update based on Bill Lynch's comment] vector could allocate more space than 5 for your points. Worst-case, it will allocate twice as much space as you need. But since sizeof(Point) == sizeof(Point*) for you, that's still strictly better than list since list will always use three times as much space.

Sorting an array of valid and invalid numbers in c++ for an embedded system

I am writing a program in C++ that will be used with Windows Embedded Compact 7. I have heard that it is best not to dynamically allocate arrays when writing embedded code. I will be keeping track of between 0 and 50 objects, so I am initially allocating 50 objects.
Object objectList[50];
int activeObjectIndex[50];
static const int INVALID_INDEX = -1;
int activeObjectCount=0;
activeObjectCount tells me how many objects I am actually using, and activeObjectIndex tells me which objects I am using. If the 0th, 7th, and 10th objects were being used I would want activeObjectIndex = [0,7,10,-1,-1,-1,...,-1]; and activeObjectCount=3;
As different objects become active or inactive I would like activeObjectIndex list to remain ordered.
Currently I am just sorting the activeObjectIndex at the end of each loop that the values might change in.
First, is there a better way to keep track of objects (that may or may not be active) in an embedded system than what I am doing? If not, is there an algorithm I can use to keep the objects sorted each time I add or remove and active object? Or should I just periodically do a bubble sort or something to keep them in order?
You have a hard question, where the answer requires quite a bit of knowledge about your system. Without that knowledge, no answer I can give would be complete. However, 15 years of embedded design has taught me the following:
You are correct, you generally don't want to allocate objects during runtime. Preallocate all the objects, and move them to active/inactive queues.
Keeping things sorted is generally hard. Perhaps you don't need to. You don't mention it, but I'll bet you really just need to keep your Objects in "used" and "free" pools, and you're using the index to quickly find/delete Objects.
I propose the following solution. Change your object to the following:
class Object {
Object *mNext, *mPrevious;
public:
Object() : mNext(this), mPrevious(this) { /* etc. */ }
void insertAfterInList(Object *p2) {
mNext->mPrev = p2;
p2->mNext = mNext;
mNext = p2;
p2->mPrev = this;
}
void removeFromList() {
mPrev->mNext = mNext;
mNext->mPrev = mPrev;
mNext = mPrev = this;
}
Object* getNext() {
return mNext;
}
bool hasObjects() {
return mNext != this;
}
};
And use your Objects:
#define NUM_OBJECTS (50)
Object gObjects[NUM_OBJECTS], gFree, gUsed;
void InitObjects() {
for(int i = 0; i < NUM_OBJECTS; ++i) {
gFree.insertAfter(&mObjects[i]);
}
}
Object* GetNewObject() {
assert(mFree.hasObjects());
Object obj = mFree->getNext();
obj->removeFromList();
gUsed.insertAfter(obj);
return obj;
}
void ReleaseObject(Object *obj) {
obj->removeFromList();
mFree.insertAfter(obj);
}
Edited to fix a small glitch. Should work now, although not tested. :)
The overhead of a std::vector is very small. The problem you can have is that dynamic resizing will allocate more memory than needed. However, as you have 50 elements, this shouldn't be a problem at all. Give it a try, and change it only if you see a strong impact.
If you cannot/do not want to remove unused objects from a std::vector, you can maybe add a boolean to your Object that indicates if it is active? This won't require more memory than using activeObjectIndex (maybe even less depending on alignment issues).
To sort the data with a boolean (not active at the end), write a function :
bool compare(const Object & a, const Object & b) {
if(a.active && !b.active) return true;
else return false;
}
std::sort(objectList,objectList + 50, &compare); // if you use an array
std::sort(objectList.begin(),objectList.end(), &compare); // if you use std::vector
If you want to sort using activeObjectIndex it will be more complicated.
If you want to use a structure that is always ordered, use std::set. However it will require more memory (but for 50 elements, it won't be an issue).
Ideally, implement the following function :
bool operator<(const Object & a, const Object & b) {
if(a.active && !b.active) return true;
else return false;
}
This will allow to use directly std::sort(objectList.begin(), objectList.end()) or declare an std::set that will stay sorted.
One way to keep track of active / inactive is to have the active Objects be on a doubly linked list. When an object goes from inactive to active then add to the list, and active to inactive remove from the list. You can add these to Object
Object * next, * prev;
so this does not require memory allocation.
If no dynamic memory allocation is allowed, I would use simple c-array or std::array and an index, which points into last+1 object. Objects are always kept in sorted order.
Addition is done by inserting new object into correct position of sorted list. To find insert position lower_bound or find_if can be used. For 50 element, second probably will be faster. Removal is similar.
You should not worry about having the list sorted, as writing a method to search in a list of indices what are the ones active would be O(N), and, in your particular case, amortized to O(1), as your array seems to be small enough for this little extra verification.
You could maintain the index of the last element checked, until it reaches the limit:
unsigned int next(const unsigned int& last) {
for (unsigned int i = last + 1; i < MAX_ARRAY_SIZE; i++) {
if (activeObjectIndex[i] != -1) {
return i;
}
}
return -1;
}
However, if you really want to have a side index, you can simply double the size of the array, creating a double linked list to the elements:
activeObjectIndex[MAX_ARRAY_SIZE * 3] = {-1};
activeObjectIndex[i] = "element id";
activeObjectIndex[i + 1] = "position of the previous element";
activeObjectIndex[i + 2] = "position of the next element";

Difference between linked lists and array of structs?

what's the difference between those pieces of code?
1)
struct MyStruct
{
int num;
} ms[2];
ms[0].num = 5;
ms[1].num = 15;
2)
struct MyStruct
{
int num;
MyStruct *next;
};
MyStruct *ms = new MyStruct;
ms->num = 5;
ms->next = new MyStruct;
ms->next->num = 15;
I'm probably a little confused about linked-lists and lists in general, are they useful to something in particular? Please explain me more.
Your first definition...
struct MyStruct
{
int num;
} ms[1];
...creates a statically allocated array with a single element. You cannot change the size of the array while your program is running; this array will never hold more than one element. You can access items in the array by direct indexing; e.g., ms[5] would get you the sixth element in the array (remember, C and C++ arrays are 0-indexed, so the first element is ms[0]), assuming that you had defined an array of the appropriate size.
Your second definition...
struct MyStruct
{
int num;
MyStruct *next;
};
...creates a dynamically allocated linked list. Memory for this list is allocated dynamically during runtime, and the linked list can grow (or shrink) during the lifetime of the program. Unlike arrays, you cannot directly access any element in the list; to get to the sixth element you have to start at the first element and then iterate 5 times.
Regarding errors you have in your code, the first one constructs a static number of MyStruct elements and store them in ms array, so ms is an array of MyStruct structures, of course in this you meant it to be of 2 elements only, later on you can't add any other element to ms array and though you have limited the number of MyStruct elements, while in the second case when you have a linked list you can chain as many MyStruct elements as you want and this will lead to dynamic number of MyStruct elements, the second case let you add as many MyStruct as you want during the run time, the second case should look like this conceptually in memory:
[ MyStruct#1 ] ----> [ MyStruct#2 ] ----> [ NULL ]
NULL though could be a MyStruct#3 for example, while the first one:
[ MyStruct#1 ] ----> [ MyStruct#2 ]
and that's it, no MyStruct#3 can be added.
Now let's go through the code you wrote:
struct MyStruct
{
int num;
} ms[1];
ms[1] really means create me an ms array of one MyStruct element.
The code next assume you created two:
ms[0].num = 5;
ms[1].num = 15
Hence it should have been:
struct MyStruct
{
int num;
} ms[2];
And it will work fine! and keep in mind the simple illustration I made for it:
[ MyStruct#1 ] ----> [ MyStruct#2 ]
Second Case:
struct MyStruct
{
int num;
MyStruct *next;
};
MyStruct *ms = new MyStruct;
ms->num = 5;
ms->next = new MyStruct;
ms->next->num = 15;
This code uses the C++ operator new if you save your source code as .cpp you'll be able to compile as C++ application with no errors, while for C, the syntax should change like so:
struct MyStruct
{
int num;
MyStruct *next;
};
MyStruct *ms = (MyStruct *) malloc(sizeof MyStruct);
ms->num = 5;
ms->next = (MyStruct *) malloc(sizeof MyStruct);
ms->next->num = 15;
and don't forget to include #include <stdlib.h> for the malloc() function, you can read more about this function here.
And as the first case recall my illustration for the linked-list:
[ MyStruct#1 ] ----> [ MyStruct#2 ] ----> [ NULL ]
Where NULL is actually the next element of the ms->next MyStruct structure, to explain it more recall that ms->next is a pointer of MyStruct and we have allocated it a space in the heap so now it's pointing to a block of memory of the same size of MyStruct structure.
Finally here is a Stackoverflow question about when to use a linked-list and when to use an array so you can get exactly why people all around the world prefer linked-list sometimes and array other times.
Oh, my friend, there are dozens of different kinds of data structures that pretty much just hold a bunch of num values or whatever. The reason programmers don't just use arrays for everything is the differences in the amount of memory required, and the ease of doing whichever operations are most important for your particular needs.
Linked lists happen to be very quick at adding or removing individual items. The trade off is that finding an item in the middle of the list is relatively slow, and the extra memory required by the next pointers. A properly-sized array is very compact in memory, and you can access an item in the middle very quickly, but to add a new item at the end you either have to know the maximum number of elements beforehand, which is often impossible or wastes memory, or reallocate a larger array and copy everything over, which is slow.
Therefore, someone who doesn't know how big their list needs to be, and who mostly only needs to deal with items at the beginning or end of the list or always loops over the entire list, and cares more about execution speed than saving a few bytes of memory, is very likely to choose a linked list over an array.
The main differences between lists and arrays in general:
Ordering in lists is explicit; each element stores the location of the preceding/succeeding element. Ordering in arrays is implicit; each element is assumed to have a preceding/succeeding element. Note that a single list may contain multiple orderings. For example, you could have something likestruct dualList {
T data1;
K data2;
struct dualList *nextT;
struct dualList *nextK;
};
that allows you to order the same list two different ways, one by data1 and the other by data2.
Adjacent array elements are in adjacent memory locations; adjacent list elements don't have to be in adjacent locations.
Arrays offer random access to their elements; lists only offer sequential access (i.e., you have to walk down the list to find an element).
Arrays are (usually) fixed in length1 - adding elements to or removing elements from the array doesn't change the array's size. Lists can grow or shrink as needed.
Lists are great for maintaining a dynamically changing sequence of values, especially if the values need to remain ordered. They're not so hot for storing relatively static data that needs to be retrieved quickly and frequently, since you can't access elements randomly.
You can get around this by declaring memory dynamically, and then use realloc to resize that memory block as needed, but it needs to be done carefully and can be a bit of a PITA.
Linked lists are useful when element ordering is important, and the number of elements is not known in advance. Besides, accessing an element in linked list takes O(n) time. When you look for an element in a list, in the worst case, you'll have to look at every element of a list.
For array, the number must be known in advance. When you define an array in C, you have to pass it its size. On the other hand, accessing an array element takes O(1) time, since an element can be addressed by index. With linked list, that is not possible.
However, that is not C++ related question, since the concept of linked list and array is not tied to C++.
An array is a contiguous pre-allocated block of memory whereas a linked list is a collection of runtime allocated ( malloc ) pieces of memory ( not necessarily contiguous ) linked to each other via pointers ( *next ). You would generally use an array of structs if you know at compile time the maximum number of elements you need to store. A linked list of structs however is useful if you don't know the maximum number of elements that will need to be stored. Also with a linked list the number of elements may change, add and remove elements.

Delete on a very deep tree

I am building a suffix trie (unfortunately, no time to properly implement a suffix tree) for a 10 character set. The strings I wish to parse are going to be rather long (up to 1M characters). The tree is constructed without any problems, however, I run into some when I try to free the memory after being done with it.
In particularly, if I set up my constructor and destructor to be as such (where CNode.child is a pointer to an array of 10 pointers to other CNodes, and count is a simple unsigned int):
CNode::CNode(){
count = 0;
child = new CNode* [10];
memset(child, 0, sizeof(CNode*) * 10);
}
CNode::~CNode(){
for (int i=0; i<10; i++)
delete child[i];
}
I get a stack overflow when trying to delete the root node. I might be wrong, but I am fairly certain that this is due to too many destructor calls (each destructor calls up to 10 other destructors). I know this is suboptimal both space, and time-wise, however, this is supposed to be a quick-and-dirty solution to a the repeated substring problem.
tl;dr: how would one go about freeing the memory occupied by a very deep tree?
Thank you for your time.
One option is to allocate from a large buffer then deallocate that buffer all at once.
For example (untested):
class CNodeBuffer {
private:
std::vector<CNode *> nodes;
public:
~CNodeBuffer() {
empty();
}
CNode *get(...) {
CNode *node = new CNode(...);
nodes.push_back(node);
return node;
}
void empty() {
for(std::vector<CNode *>::iterator *i = nodes.begin(); i != nodes.end(); ++i) {
delete *i;
}
nodes = std::vector<CNode *>();
}
};
If pointers to a std::vector's elements are stable, you can make things a bit simplier and just use a std::vector<CNode>. This requires testing.
Do you initialize the memory for the nodes themselves? From what I can see, your code only allocates memory for the pointers, not the actual nodes.
As far as your question goes, try to iterate over the tree in an iterative manner, not recursively. Recursion is bad, it's nice only when it's on the paper, not in the code, unfortunately.
Have you considered just increasing your stack size?
In visual studio you do it with /FNUMBER where NUMBER is stack size in bytes. You might also need to specify /STACK:reserve[,commit].
You're going to do quite a few deletes. That will take a lot of time, because you will access memory in a very haphazard way. However, at that point you don't need the tree structure anymore. Hence, I would make two passes. In the first pass, create a std::vector<CNode*>, and reserve() enough space for all nodes in your tree. Now recurse over the tree and copy all CNode*'s to your vector. In the second step, sort them (!). Then, in the third step, delete all of them. The second step is technically optional but likely makes the third step a lot faster. If not, try sorting in reverse order.
I think in this case a breadth-first cleanup might help, by putting all the back-tracking information into a deque rather than on the OS provided stack. It still won't pleasantly solve the problem of making it happen in the destructor though.
Pseudocode:
void CNode::cleanup()
{
std::deque<CNode*> nodes;
nodes.push_back(this);
while(!nodes.empty())
{
// Get and remove front node from deque.
// From that node, put all non-null children at end of deque.
// Delete front node.
}
}