Popping/deleting nodes from a Huffman Tree Minheap - c++

I'm having trouble popping correctly from a Huffman Tree. Right now I am creating a Huffman Tree based off a minheap and I want to do the following:
If we assume A and B to be two different subtrees, I would say that A would be popped off first if A's frequency is less than B's frequency. If they have the same frequency, then I would find the smallest character in ASCII value in any of A's leaf nodes. Then I would see if that smallest character leaf node in A is smaller than that in any of B's leaf nodes. If so I would pop off A before B. If not I would pop off B. <- this is what I'm having trouble with.
For example:
Let's assume I input:
eeffgghh\n (every letter except for \n's frequency which is 1 is 2)
into my Huffman Tree. Then my tree would look like this:
9
/ \
5 4
/ \ / \
3 h g f
/\
e \n
Below is my attempt for popping out of my Huffman minheap. I am having trouble with the part of comparing if the frequencies of two letters are the same. If anyone could help, that would be great. Thanks!
void minHeap::heapDown(int index)
{
HuffmanNode *t = new HuffmanNode();
if(arr[index]->getFreq() == arr[left]->getFreq() || arr[index]->getFreq() == arr[right]->getFreq()) //arr is an array of HeapNodes
{
if(arr[left]->getLetter() < arr[right]->getLetter())
{
t = arr[index]; //equals operator is overloaded for swapping
arr[index] = arr[left];
arr[left] = t;
heapDown(left);
}
else
{
t = arr[index];
arr[index] = arr[right];
arr[right] = t;
heapDown(right);
}
}
if(arr[index]->getFreq() > arr[left]->getFreq() || arr[index]->getFreq() > arr[right]->getFreq())
{
if(arr[left]->getFreq() < arr[right]->getFreq())
{
t = arr[index];
arr[index] = arr[left];
arr[left] = t;
heapDown(left);
}
else
{
t = arr[index];
arr[index] = arr[right];
arr[right] = t;
heapDown(right);
}//else
}//if
}

The standard C++ library contains heap algorithms. Unless you're not allowed to use it, you might well find it easier.
The standard C++ library also contains swap(a, b), which would be a lot more readable than the swap you're doing. However, swapping in heapDown is inefficient: what you should do is hold onto the element to be placed in a temporary, then sift children down until you find a place to put the element, and then put it there.
Your code would also be a lot more readable if you implemented operator< for HuffmanNode. In any event, you're doing one more comparison than is necessary; what you really want to do is (leaving out lots of details):
heapDown(int index, Node* value) {
int left = 2 * min - 1; // (where do you do this in your code???)
// It's not this simple because you have to check if left and right both exist
min = *array[left] < *array[left + 1] ? left : left + 1; // 1 comparison
if (array[min] < to_place) {
array[index] = array[min];
heapDown(min, value);
} else {
array[index] = value;
}
Your first comparison (third line) is completely wrong. a == b || a == c does not imply that b==c, or indeed give you any information about which of b and c is less. Doing only the second comparison on b and c will usually give you the wrong answer.
Finally, you're doing a new unnecessarily on every invocation, but never doing a delete. So you are slowly but inexorably leaking memory.

Related

clarification about working of two pointer approach

I have some doubts about using two pointer approach.
Case 1: - Suppose we have an array A, that is sorted and a target value B. We want to find out if there exist two elements whose difference is equal to B or not.
int helper(vector<int> &A, int B)
{
int left = 0, n = A.size();
int right = left + 1;
while (right < n)
{
int currDiff = A[right] - A[left];
if (currDiff < B)
right++;
else if (currDiff > B)
{
left++;
if (left == right)
right++;
}
else
return 1;
}
return 0;
}
Case 2: - Suppose we have an array A, that is sorted and a target value B. We want to find out if there exist two elements whose sum is equal to B or not.
int helper(vector<int> &A, int B)
{
int left = 0, n = A.size();
int right = n - 1;
while (left < right)
{
int currSum = A[right] + A[left];
if (currSum < B)
left++;
else if (currSum > B)
{
right--;
}
else
return 1;
}
return 0;
}
The doubt is that in case 1 we set both pointers on the left side(left = 0, right = left + 1) and start scanning while in case 2 we set one pointer on the left side and the other one on the right side(left = 0, right = A.size() - 1).
I am a bit confused about how this is working.
There's no rule that you must have to set the two pointers in different way. It's all about the algorithm you're following. It may be good, it may be bad. Let's say, for difference, we set the left=0 and right=A.size()-1. As the given array A is sorted, the first difference between A[right] and A[left] will be maximum.
int currDiff = A[right] - A[left]; //max possible value for currDiff for A
So, now if currDiff is greater than the given number, what will you do? increase the left or decrease the right? Let say you do the later one, I mean decrease the right, and the corresponding condition satisfies again, do the same, decrease the right. Now, let say now you got the currDiff is smaller than the given number, what will you do? increase the left? probably. But in the next iteration, if you get the same condition satisfied, that is, currDiff is still smaller than the given number, what will you do now? Again increase the left? What if increasing the right in this particular position would give you the result?
So, you see, there arises a lot of cases needed to be handled if you started the finding diff of pair having left and right in the opposite ends.
Finally, what I want to say is, it's all about the algorithm you are following, nothing else.

Convert linked list into binary search tree, do stuff and return tree as list

I have the following problem:
I have a line with numbers that I have to read. The first number from the line is the amount of operations I will have to perform on the rest of the sequence.
There are two types of operations I will have to do:
Remove- we remove the number after the current one, then we move forward X steps in the sequence, where X=value of removed element)
Insert- we insert a new number after the current one with a value of (current element's value-1), then we move forward by X steps in the sequence where X = value of the current element (i.e not the new one)
We do "Remove" if the current number's value is even, and "Insert" if the value is odd.
After the amount of operations we have to print the whole sequence, starting from the number we ended the operations.
Properly working example:
Input: 3 1 2 3
Output:0 0 3 1
3 is the first number and it becomes the OperCount value.
First operation:
Sequence: 1 2 3, first element: 1
1 is odd, so we insert 0 (currNum's value-1)
We move forward by 1(currNum's value)
Output sequence: 1 0 2 3, current position: 0
Second operation:
0 is even so we remove the next value (2)
Move forward by the removed element's value(2):
From 0 to 3
From 3 to 1
Output sequence: 1 0 3, current position: 1
Third operation:
1 is even, so once again we insert new element with value of 0
Move by current element's value(1), onto the created 0.
Output sequence: 1 0 0 3, current position: first 0
Now here is the deal, we have reached the final condition and now we have to print whole sequence, but starting from the current position.
Final Output:
0 0 3 1
I have the working version, but its using the linked list, and because of that, it doesn't pass all the tests. Linked list traversal is too long, thats why I need to use the binary tree, but I kinda don't know how to start with it. I would appreciate any help.
First redefine the operations to put most (but not all) the work into a container object: We want 4 operations supported by the container object:
1) Construct from a [first,limit) pair of input random access iterators
2) insert(K) finds the value X at position K, inserts a X-1 after it and returns X
3) remove(K) finds the value X at position K, deletes it and returns X
4) size() reports the size of the contents
The work outside the container would just keep track of incremental changes to K:
K += insert(K); K %= size();
or
K += remove(K); K %= size();
Notice the importance of a sequence point before reading size()
The container data is just a root pointing to a node.
struct node {
unsigned weight;
unsigned value;
node* child[2];
unsigned cweight(unsigned s)
{ return child[s] ? child[s]->weight : 0; }
};
The container member functions insert and remove would be wrappers around recursive static insert and remove functions that each take a node*& in addition to K.
The first thing each of either recursive insert or remove must do is:
if (K<cweight(0)) recurse passing (child[0], K);
else if ((K-=cweight(0))>0) recurse passing (child[1], K-1);
else do the basic operation (read the result, create or destroy a node)
After doing that, you fix the weight at each level up the recursive call stack (starting where you did the work for insert or the level above that for remove).
After incrementing or decrementing the weight at the current level, you may need to re-balance, remembering which side you recursively changed. Insert is simpler: If child[s]->weight*4 >= This->weight*3 you need to re-balance. The re-balance is one of the two basic tree rotations and you select which one based on whether child[s]->cweight(s)<child[s]->cweight(1-s). rebalance for remove is the same idea but different details.
This system does a lot more worst case re-balancing than a red-black or AVL tree. But still is entirely logN. Maybe there is a better algorithm for a weight-semi-balanced tree. But I couldn't find that with a few google searches, nor even the real name of nor other details about what I just arbitrarily called a "weight-semi-balanced tree".
Getting the nearly 2X speed up of strangely mixing the read operation into the insert and remove operations, means you will need yet another recursive version of insert that doesn't mix in the read, and is used for the portion of the path below the point you read from (so it does the same recursive weight changes and re-balancing but with different input and output).
Given random access input iterators, the construction is a more trivial recursive function. Grab the middle item from the range of iterators and make a node of it with the total weight of the whole range, then recursively pass the sub ranges before and after the middle one to the same recursive function to create child subtree.
I haven't tested any of this, but I think the following is all the code you need for remove as well as the rebalance needed for both insert and remove. Functions taking node*& are static member function of tree and those not taking node*& are non static.
unsigned tree::remove(unsigned K)
{
node* removed = remove(root, K);
unsigned result = removed->value;
delete removed;
return result;
}
// static
node* tree::remove( node*& There, unsigned K) // Find, unlink and return the K'th node
{
node* result;
node* This = There;
unsigned s=0; // Guess at child NOT removed from
This->weight -= 1;
if ( K < This->cweight(0) )
{
s = 1;
result = remove( This->child[0], K );
}
else
{
K -= This->cweight(0);
if ( K > 0 )
{
result = remove( This->child[1], K-1 );
}
else if ( ! This->child[1] )
{
// remove This replacing it with child[0]
There = This->child[0];
return This; // Nothing here/below needs a re-balance check
}
else
{
// remove This replacing it with the leftmost descendent of child[1]
result = This;
There = This = remove( This->child[1], 0 );
This->child[0] = Result->child[0];
This->child[1] = Result->child[1];
This->weight = Result->weight;
}
}
rebalance( There, s );
return result;
}
// static
void tree::rebalance( node*& There, unsigned s)
{
node* This = There;
node* c = This->child[s];
if ( c && c->weight*4 >= This->weight*3 )
{
node* b = c->child[s];
node* d = c->child[1-s];
unsigned bweight = b ? b->weight : 0;
if ( d && bweight < d->weight )
{
// inner rotate: d becomes top of subtree
This->child[s] = d->child[1-s];
c->child[1-s] = d->child[s];
There = d;
d->child[s] = c;
d->child[1-s] = This;
d->weight = This->weight;
c->weight = bweight + c->cweight(1-s) + 1;
This->weight -= c->weight + 1;
}
else
{
// outer rotate: c becomes top of subtree
There = c;
c->child[1-s] = This;
c->weight = This->weight;
This->child[s] = d;
This->weight -= bweight+1;
}
}
}
You can use std::set which is implemented as binary tree. It's constructor allows construction from the iterator, thus you shouldn't have problem transforming list to the set.

Understanding the Baeza-Yates Régnier algorithm (multiple string matching, extended from Boyer-Moore)

First of all, excuse me if I write a lot, I tried to summarize my research so that everyone can understand.
R. Baeza-Yates and M. Regnier published in 1990 a new algorithm for searching a two dimensional mm pattern in a two dimensional nn text. The publication is very well written and quite understandable for a novice like me, the algorithm is described in pseudocode and I was able to implements it successfully.
One part of the BYR algorithm requires the Aho-Corasick algorithm. This allows to search occurences of multiple keywords in a string text. However, they also say that this part of their algorithm can be greatly improved by using Aho-Corasick not, but Commentz-Walter algorithm (based on Boyer-Moore rather than Knuth-Morris-Pratt algorithm). They evoke an alternative to the Commentz-Walter algorithm, alternative that they themselves developed. This is described and explained in their previous publication (see 4th chapter).
This is where my problem lies. As I said, the algorithm goes through the text and check if it contains a word from the set of keywords. The words are arranged upside down and placed in a tree. To be efficient, it will sometimes be necessary to skip a number of letters, when he knows that there is no match found.
To determine the number of characters that can be skipped, two tables d and dd have to be computed. Then, the algorithm is very simple:
The algorithm works as follows:
We align the root of the trie with position m in the text, and we start matching the text from right to left following the corresponding
path in the trie.
If a match is found (final node), we output the index of the corresponding string.
After a match or mismatch, we move the trie further in the text using the maximum of the shift associated to the current node (means dd), and the
value of d[x], where x is the character in the text corresponding to
the root of the trie.
Start matching the trie again from right to left in the new position.
My problem is that I do not know how to compute the dd function. In their publication, R. Baeza-Yates and M. Regnier propose a formal definition of it:
pi is a word among the set of keyword, j is the index of a letter in this word, so pi[j] is like a node in the previous trie I showed. Number in the node represented dd(node). L is the number of words, and mi is the number of letters in the word pi.
They give no indication concerning the construction of this function. They only recommend to watch the work of W. Rytter. This document builds a function similar to that expected, the difference being that in this case, there is only one keyword and not a set.
The definiton of dd (called D here), is as follow:
It may be noted similarities with the previous definition, but I do not understand everything.
The pseudocode for the construction of this function is given in the paper, I have implemented it, here in C++:
int pattern[] = { 1, 2, 3, 1 }; /* I use int instead of char, simpler */
const int n = sizeof(pattern) / 4;
int D[n];
int f[n];
int j = n;
int t = n + 1;
for (int k = 1; k <= n; k++){
D[k-1] = 2 * n - k;
}
while (j > 0) {
f[j-1] = t;
while (t <= n) {
if (pattern[j-1] != pattern[t-1]) {
D[t-1] = min(D[t-1], n - j);
t = f[t-1];
}
else {
break;
}
}
t = t - 1;
j = j - 1;
}
int f1[n];
int q = t;
t = n + 1 - q;
int q1 = 1;
int j1 = 1;
int t1 = 0;
while (j1 <= t) {
f1[j1 - 1] = t1;
while (t1 >= 1) {
if (pattern[j1 - 1] != pattern[t1 - 1]) {
t1 = f1[t1 - 1];
}
else {
break;
}
}
t1 = t1 + 1;
j1 = j1 + 1;
}
while (q < n) {
for (int k = q1; k <= q; k++) {
D[k - 1] = min(D[k - 1], n + q - k);
}
q1 = q + 1;
q = q + t - f1[t - 1];
t = f1[t - 1];
}
for (int i = 0; i < n; i++)
{
cout << D[i] << " ";
}
It works, but I do not know how to expand it for several words, I do not know how to coincide with the formal definition of dd given by Baeza-Yates and Régnier. I said that the two definitions was similar, but I do not know to what extent.
I did not find any other information about their algorithm, it is impossible for me to know how to implement the construction of dd, but I am looking for someone who could perhaps understand and show me how to get there, explaining me the link between the definitions of D and dd.
I think d[x] corresponds to the bad character rule in http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm and D corresponds to the Good Suffix rule in the same article. This would mean that x in d[x] is not the character in the root of the tree, but the value of the first character in the text being searched that fails to match a child of the current node.
I think the idea is the same as Boyer-Moore. You move along the tree as long as you have a match, and when you have a mismatch you know two things: the character causing the mismatch, and the substring you have matched so far. Taking each of these things independently, you may be able to work out that if you shifted along the text being searched 1,2,..k positions you still wouldn't have a match, because at these offsets the character that caused a mismatch would still cause a mismatch, or the portion of the text that previously matched would not match at this shifted offset. So you can skip on to the first offset not ruled out by either value.
Actually, this suggests a variant scheme, in which d and DD provide not numbers but bit-masks, and you and together the two bitmaps and shift according to the position of the first bit that is still set. Presumably this doesn't save you enough to be worth the extra set-up time.

Modify lazy propogation in segment tree

I recently read about lazy propogation in segment tree and coded it too.But i got stuck when suppose instead of adding value(=val) i need to divide by value.How to do it ?
Please help
My update function is as follow :
void update_tree(int node, int a, int b, int i, int j, int value) {
if(lazy[node] != 0) { // This node needs to be updated
tree[node] += lazy[node]; // Update it
if(a != b) {
lazy[node*2] += lazy[node]; // Mark child as lazy
lazy[node*2+1] += lazy[node]; // Mark child as lazy
}
lazy[node] = 0; // Reset it
}
if(a > b || a > j || b < i) // Current segment is not within range [i, j]
return;
if(a >= i && b <= j) { // Segment is fully within range
tree[node] += value;
if(a != b) { // Not leaf node
lazy[node*2] += value;
lazy[node*2+1] += value;
}
return;
}
update_tree(node*2, a, (a+b)/2, i, j, value); // Updating left child
update_tree(1+node*2, 1+(a+b)/2, b, i, j, value); // Updating right child
tree[node] = max(tree[node*2], tree[node*2+1]); // Updating root with max value
}
HINTS
Suppose you need to divide by a fixed value of K.
One possibility would be to convert your numbers to base K and in each node maintain an array of numbers A[], where A[i] is the total in all lower nodes of all digits in position i (when thought of as a base K number).
So, for example, if K was 10, then A[0] would store the total of all the units, while A[1] would store the total of all the tens.
The reason to do this is that it then becomes easy to divide lazily by K, all you need to do is set A[i]=A[i+1] and you can use the same lazy update trick as in your code.
EXAMPLE
Suppose we had an array 5,11,20,100 and K was 10
We would construct a node for element 5,11 containing the value:
Total = A[1]*10+A[0]*1 with A[1]=1 and A[0]=5+1 (the sum of the unit values)
we would also have a node for 20,100 containing the value:
Total = A[2]*100+A[1]*10+A[0]*1 with A[2]=1,A[1]=2,A[0]=0
and a node for the entire 5,11,20,100 array with:
Total = A[2]*100+A[1]*10+A[0]*1 with A[2]=1,A[1]=2+1,A[0]=5+1
If we then wanted to divide the whole array by 10, we would simply change the array elements for the top node:
A=[1,3,6] changes to [0,1,3]
and then we could query the sum of all the node by computing:
Total = A[2]*100+A[1]*10+A[0]*1 = 0*100+1*10+3*1=13
which is the same as
(5/10=0)+(11/10=1)+(20/10=2)+(100/10=10)

Randomly permute N first elements of a singly linked list

I have to permute N first elements of a singly linked list of length n, randomly. Each element is defined as:
typedef struct E_s
{
struct E_s *next;
}E_t;
I have a root element and I can traverse the whole linked list of size n. What is the most efficient technique to permute only N first elements (starting from root) randomly?
So, given a->b->c->d->e->f->...x->y->z I need to make smth. like f->a->e->c->b->...x->y->z
My specific case:
n-N is about 20% relative to n
I have limited RAM resources, the best algorithm should make it in place
I have to do it in a loop, in many iterations, so the speed does matter
The ideal randomness (uniform distribution) is not required, it's Ok if it's "almost" random
Before making permutations, I traverse the N elements already (for other needs), so maybe I could use this for permutations as well
UPDATE: I found this paper. It states it presents an algorithm of O(log n) stack space and expected O(n log n) time.
I've not tried it, but you could use a "randomized merge-sort".
To be more precise, you randomize the merge-routine. You do not merge the two sub-lists systematically, but you do it based on a coin toss (i.e. with probability 0.5 you select the first element of the first sublist, with probability 0.5 you select the first element of the right sublist).
This should run in O(n log n) and use O(1) space (if properly implemented).
Below you find a sample implementation in C you might adapt to your needs. Note that this implementation uses randomisation at two places: In splitList and in merge. However, you might choose just one of these two places. I'm not sure if the distribution is random (I'm almost sure it is not), but some test cases yielded decent results.
#include <stdio.h>
#include <stdlib.h>
#define N 40
typedef struct _node{
int value;
struct _node *next;
} node;
void splitList(node *x, node **leftList, node **rightList){
int lr=0; // left-right-list-indicator
*leftList = 0;
*rightList = 0;
while (x){
node *xx = x->next;
lr=rand()%2;
if (lr==0){
x->next = *leftList;
*leftList = x;
}
else {
x->next = *rightList;
*rightList = x;
}
x=xx;
lr=(lr+1)%2;
}
}
void merge(node *left, node *right, node **result){
*result = 0;
while (left || right){
if (!left){
node *xx = right;
while (right->next){
right = right->next;
}
right->next = *result;
*result = xx;
return;
}
if (!right){
node *xx = left;
while (left->next){
left = left->next;
}
left->next = *result;
*result = xx;
return;
}
if (rand()%2==0){
node *xx = right->next;
right->next = *result;
*result = right;
right = xx;
}
else {
node *xx = left->next;
left->next = *result;
*result = left;
left = xx;
}
}
}
void mergeRandomize(node **x){
if ((!*x) || !(*x)->next){
return;
}
node *left;
node *right;
splitList(*x, &left, &right);
mergeRandomize(&left);
mergeRandomize(&right);
merge(left, right, &*x);
}
int main(int argc, char *argv[]) {
srand(time(NULL));
printf("Original Linked List\n");
int i;
node *x = (node*)malloc(sizeof(node));;
node *root=x;
x->value=0;
for(i=1; i<N; ++i){
node *xx;
xx = (node*)malloc(sizeof(node));
xx->value=i;
xx->next=0;
x->next = xx;
x = xx;
}
x=root;
do {
printf ("%d, ", x->value);
x=x->next;
} while (x);
x = root;
node *left, *right;
mergeRandomize(&x);
if (!x){
printf ("Error.\n");
return -1;
}
printf ("\nNow randomized:\n");
do {
printf ("%d, ", x->value);
x=x->next;
} while (x);
printf ("\n");
return 0;
}
Convert to an array, use a Fisher-Yates shuffle, and convert back to a list.
I don't believe there's any efficient way to randomly shuffle singly-linked lists without an intermediate data structure. I'd just read the first N elements into an array, perform a Fisher-Yates shuffle, then reconstruct those first N elements into the singly-linked list.
First, get the length of the list and the last element. You say you already do a traversal before randomization, that would be a good time.
Then, turn it into a circular list by linking the first element to the last element. Get four pointers into the list by dividing the size by four and iterating through it for a second pass. (These pointers could also be obtained from the previous pass by incrementing once, twice, and three times per four iterations in the previous traversal.)
For the randomization pass, traverse again and swap pointers 0 and 2 and pointers 1 and 3 with 50% probability. (Do either both swap operations or neither; just one swap will split the list in two.)
Here is some example code. It looks like it could be a little more random, but I suppose a few more passes could do the trick. Anyway, analyzing the algorithm is more difficult than writing it :vP . Apologies for the lack of indentation; I just punched it into ideone in the browser.
http://ideone.com/9I7mx
#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
struct list_node {
int v;
list_node *n;
list_node( int inv, list_node *inn )
: v( inv ), n( inn) {}
};
int main() {
srand( time(0) );
// initialize the list and 4 pointers at even intervals
list_node *n_first = new list_node( 0, 0 ), *n = n_first;
list_node *p[4];
p[0] = n_first;
for ( int i = 1; i < 20; ++ i ) {
n = new list_node( i, n );
if ( i % (20/4) == 0 ) p[ i / (20/4) ] = n;
}
// intervals must be coprime to list length!
p[2] = p[2]->n;
p[3] = p[3]->n;
// turn it into a circular list
n_first->n = n;
// swap the pointers around to reshape the circular list
// one swap cuts a circular list in two, or joins two circular lists
// so perform one cut and one join, effectively reordering elements.
for ( int i = 0; i < 20; ++ i ) {
list_node *p_old[4];
copy( p, p + 4, p_old );
p[0] = p[0]->n;
p[1] = p[1]->n;
p[2] = p[2]->n;
p[3] = p[3]->n;
if ( rand() % 2 ) {
swap( p_old[0]->n, p_old[2]->n );
swap( p_old[1]->n, p_old[3]->n );
}
}
// you might want to turn it back into a NULL-terminated list
// print results
for ( int i = 0; i < 20; ++ i ) {
cout << n->v << ", ";
n = n->n;
}
cout << '\n';
}
For the case when N is really big (so it doesn't fit your memory), you can do the following (a sort of Knuth's 3.4.2P):
j = N
k = random between 1 and j
traverse the input list, find k-th item and output it; remove the said item from the sequence (or mark it somehow so that you won't consider it at the next traversal)
decrease j and return to 2 unless j==0
output the rest of the list
Beware that this is O(N^2), unless you can ensure random access in the step 3.
In case the N is relatively small, so that N items fit into the memory, just load them into array and shuffle, like #Mitch proposes.
If you know both N and n, I think you can do it simply. It's fully random, too. You only iterate through the whole list once, and through the randomized part each time you add a node. I think that's O(n+NlogN) or O(n+N^2). I'm not sure. It's based upon updating the conditional probability that a node is selected for the random portion given what happened to previous nodes.
Determine the probability that a certain node will be selected for the random portion given what happened to previous nodes (p=(N-size)/(n-position) where size is number of nodes previously chosen and position is number of nodes previously considered)
If node is not selected for random part, move to step 4. If node is selected for the random part, randomly choose place in random part based upon the size so far (place=(random between 0 and 1) * size, size is again number of previous nodes).
Place the node where it needs to go, update the pointers. Increment size. Change to looking at the node that previously pointed at what you were just looking at and moved.
Increment position, look at the next node.
I don't know C, but I can give you the pseudocode. In this, I refer to the permutation as the first elements that are randomized.
integer size=0; //size of permutation
integer position=0 //number of nodes you've traversed so far
Node head=head of linked list //this holds the node at the head of your linked list.
Node current_node=head //Starting at head, you'll move this down the list to check each node, whether you put it in the list.
Node previous=head //stores the previous node for changing pointers. starts at head to avoid asking for the next field on a null node
While ((size not equal to N) or (current_node is not null)){ //iterating through the list until the permutation is full. We should never pass the end of list, but just in case, I include that condition)
pperm=(N-size)/(n-position) //probability that a selected node will be in the permutation.
if ([generate a random decimal between 0 and 1] < pperm) //this decides whether or not the current node will go in the permutation
if (j is not equal to 0){ //in case we are at start of list, there's no need to change the list
pfirst=1/(size+1) //probability that, if you select a node to be in the permutation, that it will be first. Since the permutation has
//zero elements at start, adding an element will make it the initial node of a permutation and percent chance=1.
integer place_in_permutation = round down([generate a random decimal between 0 and 1]/pfirst) //place in the permutation. note that the head =0.
previous.next=current_node.next
if(place_in_permutation==0){ //if placing current node first, must change the head
current_node.next=head //set the current Node to point to the previous head
head=current_node //set the variable head to point to the current node
}
else{
Node temp=head
for (counter starts at zero. counter is less than place_in_permutation-1. Each iteration, increment counter){
counter=counter.next
} //at this time, temp should point to the node right before the insertion spot
current_node.next=temp.next
temp.next=current_node
}
current_node=previous
}
size++ //since we add one to the permutation, increase the size of the permutation
}
j++;
previous=current_node
current_node=current_node.next
}
You could probably increase the efficiency if you held on to the most recently added node in case you had to add one to the right of it.
Similar to Vlad's answer, here is a slight improvement (statistically):
Indices in algorithm are 1 based.
Initialize lastR = -1
If N <= 1 go to step 6.
Randomize number r between 1 and N.
if r != N
4.1 Traverse the list to item r and its predecessor.
If lastR != -1
If r == lastR, your pointer for the of the r'th item predecessor is still there.
If r < lastR, traverse to it from the beginning of the list.
If r > lastR, traverse to it from the predecessor of the lastR'th item.
4.2 remove the r'th item from the list into a result list as the tail.
4.3 lastR = r
Decrease N by one and go to step 2.
link the tail of the result list to the head of the remaining input list. You now have the original list with the first N items permutated.
Since you do not have random access, this will reduce the traversing time you will need within the list (I assume that by half, so asymptotically, you won't gain anything).
O(NlogN) easy to implement solution that does not require extra storage:
Say you want to randomize L:
is L has 1 or 0 elements you are done
create two empty lists L1 and L2
loop over L destructively moving its elements to L1 or L2 choosing between the two at random.
repeat the process for L1 and L2 (recurse!)
join L1 and L2 into L3
return L3
Update
At step 3, L should be divided into equal sized (+-1) lists L1 and L2 in order to guaranty best case complexity (N*log N). That can be done adjusting the probability of one element going into L1 or L2 dynamically:
p(insert element into L1) = (1/2 * len0(L) - len(L1)) / len(L)
where
len(M) is the current number of elements in list M
len0(L) is the number of elements there was in L at the beginning of step 3
There is an algorithm takes O(sqrt(N)) space and O(N) time, for a singly linked list.
It does not generate a uniform distribution over all permutation sequence, but it can gives good permutation that is not easily distinguishable. The basic idea is similar to permute a matrix by rows and columns as described below.
Algorithm
Let the size of the elements to be N, and m = floor(sqrt(N)). Assuming a "square matrix" N = m*m will make this method much clear.
In the first pass, you should store the pointers of elements that is separated by every m elements as p_0, p_1, p_2, ..., p_m. That is, p_0->next->...->next(m times) == p_1 should be true.
Permute each row
For i = 0 to m do:
Index all elements between p_i->next to p_(i+1)->next in the link list by an array of size O(m)
Shuffle this array using standard method
Relink the elements using this shuffled array
Permute each column.
Initialize an array A to store pointers p_0, ..., p_m. It is used to traverse the columns
For i = 0 to m do
Index all elements pointed A[0], A[1], ..., A[m-1] in the link list by an array of size m
Shuffle this array
Relink the elements using this shuffled array
Advance the pointer to next column A[i] := A[i]->next
Note that p_0 is an element point to the first element and the p_m point to the last element. Also, if N != m*m, you may use m+1 separation for some p_i instead. Now you get a "matrix" such that the p_i point to the start of each row.
Analysis and randomness
Space complexity: This algorithm need O(m) space to store the start of row. O(m) space to store the array and O(m) space to store the extra pointer during column permutation. Hence, time complexity is ~ O(3*sqrt(N)). For N = 1000000, it is around 3000 entries and 12 kB memory.
Time complexity: It is obviously O(N). It either walk through the "matrix" row by row or column by column
Randomness: The first thing to note is that each element can go to anywhere in the matrix by row and column permutation. It is very important that elements can go to anywhere in the linked list. Second, though it does not generate all permutation sequence, it does generate part of them. To find the number of permutation, we assume N=m*m, each row permutation has m! and there is m row, so we have (m!)^m. If column permutation is also include, it is exactly equal to (m!)^(2*m), so it is almost impossible to get the same sequence.
It is highly recommended to repeat the second and third step by at least one more time to get an more random sequence. Because it can suppress almost all the row and column correlation to its original location. It is also important when your list is not "square". Depends on your need, you may want to use even more repetition. The more repetition you use, the more permutation it can be and the more random it is. I remember that it is possible to generate uniform distribution for N=9 and I guess that it is possible to prove that as repetition tends to infinity, it is the same as the true uniform distribution.
Edit: The time and space complexity is tight bound and is almost the same in any situation. I think this space consumption can satisfy your need. If you have any doubt, you may try it in a small list and I think you will find it useful.
The list randomizer below has complexity O(N*log N) and O(1) memory usage.
It is based on the recursive algorithm described on my other post modified to be iterative instead of recursive in order to eliminate the O(logN) memory usage.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef struct node {
struct node *next;
char *str;
} node;
unsigned int
next_power_of_two(unsigned int v) {
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
return v + 1;
}
void
dump_list(node *l) {
printf("list:");
for (; l; l = l->next) printf(" %s", l->str);
printf("\n");
}
node *
array_to_list(unsigned int len, char *str[]) {
unsigned int i;
node *list;
node **last = &list;
for (i = 0; i < len; i++) {
node *n = malloc(sizeof(node));
n->str = str[i];
*last = n;
last = &n->next;
}
*last = NULL;
return list;
}
node **
reorder_list(node **last, unsigned int po2, unsigned int len) {
node *l = *last;
node **last_a = last;
node *b = NULL;
node **last_b = &b;
unsigned int len_a = 0;
unsigned int i;
for (i = len; i; i--) {
double pa = (1.0 + RAND_MAX) * (po2 - len_a) / i;
unsigned int r = rand();
if (r < pa) {
*last_a = l;
last_a = &l->next;
len_a++;
}
else {
*last_b = l;
last_b = &l->next;
}
l = l->next;
}
*last_b = l;
*last_a = b;
return last_b;
}
unsigned int
min(unsigned int a, unsigned int b) {
return (a > b ? b : a);
}
randomize_list(node **l, unsigned int len) {
unsigned int po2 = next_power_of_two(len);
for (; po2 > 1; po2 >>= 1) {
unsigned int j;
node **last = l;
for (j = 0; j < len; j += po2)
last = reorder_list(last, po2 >> 1, min(po2, len - j));
}
}
int
main(int len, char *str[]) {
if (len > 1) {
node *l;
len--; str++; /* skip program name */
l = array_to_list(len, str);
randomize_list(&l, len);
dump_list(l);
}
return 0;
}
/* try as: a.out list of words foo bar doz li 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
*/
Note that this version of the algorithm is completely cache unfriendly, the recursive version would probably perform much better!
If both the following conditions are true:
you have plenty of program memory (many embedded hardwares execute directly from flash);
your solution does not suffer that your "randomness" repeats often,
Then you can choose a sufficiently large set of specific permutations, defined at programming time, write a code to write the code that implements each, and then iterate over them at runtime.