Time complexity for this snippet - c++

I am solving the following question from LeetCode.com:
Given a binary tree, return the zigzag level order traversal of its nodes' values. (ie, from left to right, then right to left for the next level and alternate between).
For the tree:
3
/ \
9 20
/ \
15 7
The answer is:
[
[3],
[20,9],
[15,7]
]
I came up with the following code:
/**
* Definition for a binary tree node.
* struct TreeNode {
* int val;
* TreeNode *left;
* TreeNode *right;
* TreeNode(int x) : val(x), left(NULL), right(NULL) {}
* };
*/
class Solution {
public:
vector<vector<int>> result;
void zigzagLevelOrderUtil(TreeNode* root, int level) {
if(root==NULL) return;
if(result.size()==level)
result.push_back(vector<int>());
result[level].push_back(root->val);
zigzagLevelOrderUtil(root->left, level+1);
zigzagLevelOrderUtil(root->right, level+1);
}
vector<vector<int>> zigzagLevelOrder(TreeNode* root) {
result.clear();
zigzagLevelOrderUtil(root, 0);
for(int i=0; i<result.size(); i++)
if(i%2!=0)
reverse(result[i].begin(), result[i].end()); //I think the complexity here is O(hn)
return result;
}
};
However, I am not sure about the complexity - is it O(hn) (where h is the height and n is the number of nodes); or is it just O(h). I am confused because technically, I am working (reversing) only on half of h; and I think it can have at max n/2 nodes. [Please see the comment in the code].
Could someone please confirm? Thanks!

Let's see.
Since there are h elements in result vector one can say the complexity is O(h.n) but in reality it is O(n).
Why is that? Because 'std::reverse' function has O(n) complexity. It seems like O(h.n) because you have h amount of vectors in result vector. But remember in total all that vectors have n elements which are the nodes of the tree.
Let's say we have a full binary tree with 3 levels including the root level. The result vector is something like [[x], [y, z], [t, w, m, n]]. 1 operation for result[0] and 1 for inner vector 0, 1 operation for result[1] and 2 for inner vector 1, 1 operation for result[2] and 4 for inner vector 2. So we have 10 operations which are equal to h+n operations. (For making our job easy, not only even indexes are processed but all of them are.)
As the inner vectors grow bigger the operation in which we go to the next index of outer vector is becoming insignificant. The time complexity is O(h+n) as seen in the example but h < n and so we can say it is O(n) complexity.

Related

Appropriate data structure for add and find queries

I have two types of queries.
1 X Y
Add element X ,Y times in the collection.
2 N
Number of queries < 5 * 10^5
X < 10^9
Y < 10^9
Find Nth element in the sorted collection.
I tried STL set but it did not work.
I think we need balanced tree with each node containing two data values.
First value will be element X. And another will be prefix sum of all the Ys of elements smaller than or equal to value.
When we are adding element X find preprocessor of that first value.Add second value associated with preprocessor to Y.
When finding Nth element. Search in tree(second value) for value immediately lower than N.
How to efficiently implement this data structure ?
This can easily be done using segment tree data structure with complexity of O(Q*log(10^9))
We should use so called "sparse" segment tree so that we only create nodes when needed, instead of creating all nodes.
In every node we will save count of elements in range [L, R]
Now additions of some element y times can easily be done by traversing segment tree from root to leaf and updating the values (also creating nodes that do not exist yet).
Since the height of segment tree is logarithmic this takes log N time where N is our initial interval length (10^9)
Finding k-th element can easily be done using binary search on segment tree, since on every node we know the count of elements in some range, we can use this information to traverse left or right to the element which contains the k-th
Sample code (C++):
#include <bits/stdc++.h>
using namespace std;
#define ll long long
const int sz = 31*4*5*100000;
ll seg[sz];
int L[sz],R[sz];
int nxt = 2;
void IncNode(int c, int l, int r, int idx, int val)
{
if(l==r)
{
seg[c]+=val;
return;
}
int m = (l+r)/2;
if(idx <= m)
{
if(!L[c])L[c]=nxt++;
IncNode(L[c],l,m,idx,val);
}
else
{
if(!R[c])R[c]=nxt++;
IncNode(R[c],m+1,r,idx,val);
}
seg[c] = seg[L[c]] + seg[R[c]];
}
int FindKth(int c, int l, int r, ll k)
{
if(l==r)return r;
int m = (l+r)/2;
if(seg[L[c]] >= k)return FindKth(L[c],l,m,k);
return FindKth(R[c],m+1,r,k-seg[L[c]]);
}
int main()
{
ios::sync_with_stdio(0);cin.tie(0);cout.tie(0);
int Q;
cin>>Q;
int L = 0, R = 1e9;
while(Q--)
{
int type;
cin>>type;
if(type==1)
{
int x,y;
cin>>x>>y;
IncNode(1,L,R,x,y);
}
else
{
int k;
cin>>k;
cout<<FindKth(1,L,R,k)<<"\n";
}
}
}
Maintaining a prefix sum in each node is not practical. It would mean that every time you add a new node, you have to update the prefix sum in every node succeeding it in the tree. Instead, you need to maintain subtree sums: each node should contain the sum of Y-values for its own key and the keys of all descendants. Maintaining subtree sums when the tree is updated should be straightforward.
When you answer a query of type 2, at each node, you would descend into the left subtree if N is less than or equal to the subtree sum value S of the left child (I'm assuming N is 1-indexed). Otherwise, subtract S + 1 from N and descend into the right subtree.
By the way, if the entire set of X values is known in advance, then instead of a balanced BST, you could use a range tree or a binary indexed tree.

Finding kthSmallestElement in the BST

I am trying to solve the following question from LeetCode:
https://leetcode.com/problems/kth-smallest-element-in-a-bst/description/
The aim is, given a BST, we have to find out the Kth-smallest element in it and return its value.
I could come up with a O(n) time and space solution myself. But another solution which I wrote with online help is far better:
/**
* Definition for a binary tree node.
* struct TreeNode {
* int val;
* TreeNode *left;
* TreeNode *right;
* TreeNode(int x) : val(x), left(NULL), right(NULL) {}
* };
*/
class Solution {
public:
int kthSmallestUtil(TreeNode* root, int& k) {
if(!root) return -1;
int value=kthSmallestUtil(root->left, k);
if(!k) return value;
k--;
if(k==0) return root->val;
return kthSmallestUtil(root->right, k);
}
int kthSmallest(TreeNode* root, int k) {
return kthSmallestUtil(root, k);
}
};
I have understood the above solution. I also debugged it (https://onlinegdb.com/BJnoIkrLM) by inserting break points at 29, 30, 33 and 37. However, I still feel a bit uneasy because of the following reason:
In case of the call kthSmallestUtil(root->left, k);, we pass the original value of k; we then (understandably) decrement the value of k for the current root (since we are doing in order traversal). But, when we again recurse for kthSmallestUtil(root->right, k);, why don't we pass the original value of k? Why does the right child get a 'preferential' treatment - a decremented value of k?
I know because of debugging how the values of k change and we get the final answer.. But I am seeking some intuition behind using the original value of k for the left child and the decremented value of k for the right child.
This solutions seems to assume an ordered binary search tree.
That means the left branch of the tree contains only smaller values than the current nodes val. Thus it first recurses into the left branch, decrementing k along the way, then if k is not 0 k is decremented for the current element. If k is still not 0 then the values in the right branch, all greater than the current nodes value, are considered.
What you need to understand is that the k being decremented in the k--; line is not the original value of k but the value of k after the traversal of the entire left branch.
The recursive calls all modify the same k because k is passed by reference and not by value
The code works more less this way - go as deep as you can in the left branch of the BST. When you reach the leftmost leaf - the smallest value - decrement k value and start seraching in the ramaining part of the BST. Because we already visited smallest value in the whole tree and we are searching for kth smallest value, we must search for k-1th smallest value in the rest of the tree (as we no longer take into account this leftmost leaf). And so, if k is equal to zero it means current node has the kth smallest value. Otherwise it is necessary to also search the right subtrees.

What's time complexity of this algorithm for finding all Path Sum?

Path Sum Given a binary tree and a sum, find all root-to-leaf paths where each path's sum equals the given sum.
For example: sum = 11.
5
/ \
4 8
/ / \
2 -2 1
The answer is :
[
[5, 4, 2],
[5, 8, -2]
]
Personally I think, the time complexity = O(2^n), n is the number of
nodes of the given binary tree.
Thank you Vikram Bhat and David Grayson, the tight time
complexity = O(nlogn), n is the number of nodes in the given binary
tree.
Algorithm checks each node once, which causes O(n)
"vector one_result(subList);" will copy entire path from subList to one_result, each time, which causes O(logn), because the
height is O(logn).
So finally, the time complexity = O(n * logn) =O(nlogn).
The idea of this solution is DFS[C++].
/**
* Definition for binary tree
* struct TreeNode {
* int val;
* TreeNode *left;
* TreeNode *right;
* TreeNode(int x) : val(x), left(NULL), right(NULL) {}
* };
*/
#include <vector>
using namespace std;
class Solution {
public:
vector<vector<int> > pathSum(TreeNode *root, int sum) {
vector<vector<int>> list;
// Input validation.
if (root == NULL) return list;
vector<int> subList;
int tmp_sum = 0;
helper(root, sum, tmp_sum, list, subList);
return list;
}
void helper(TreeNode *root, int sum, int tmp_sum,
vector<vector<int>> &list, vector<int> &subList) {
// Base case.
if (root == NULL) return;
if (root->left == NULL && root->right == NULL) {
// Have a try.
tmp_sum += root->val;
subList.push_back(root->val);
if (tmp_sum == sum) {
vector<int> one_result(subList);
list.push_back(one_result);
}
// Roll back.
tmp_sum -= root->val;
subList.pop_back();
return;
}
// Have a try.
tmp_sum += root->val;
subList.push_back(root->val);
// Do recursion.
helper(root->left, sum, tmp_sum, list, subList);
helper(root->right, sum, tmp_sum, list, subList);
// Roll back.
tmp_sum -= root->val;
subList.pop_back();
}
};
Though it seems that time complexity is O(N) but if you need to print all paths then it is O(N*logN). Suppose that u have a complete binary tree then the total paths will be N/2 and each path will have logN nodes so total of O(N*logN) in worst case.
Your algorithm looks correct, and the complexity should be O(n) because your helper function will run once for each node, and n is the number of nodes.
Update: Actually, it would be O(N*log(N)) because each time the helper function runs, it might print a path to the console consisting of O(log(N)) nodes, and it will run O(N) times.
TIME COMPLEXITY
The time complexity of the algorithm is O(N^2), where ‘N’ is the total number of nodes in the tree. This is due to the fact that we traverse each node once (which will take O(N)), and for every leaf node we might have to store its path which will take O(N).
We can calculate a tighter time complexity of O(NlogN) from the space complexity discussion below.
SPACE COMPLEXITY
If we ignore the space required for all paths list, the space complexity of the above algorithm will be O(N) in the worst case. This space will be used to store the recursion stack. The worst-case will happen when the given tree is a linked list (i.e., every node has only one child).
How can we estimate the space used for the all paths list? Take the example of the following balanced tree:
1
/ \
2 3
/ \ / \
4 5 6 7
Here we have seven nodes (i.e., N = 7). Since, for binary trees, there exists only one path to reach any leaf node, we can easily say that total root-to-leaf paths in a binary tree can’t be more than the number of leaves. As we know that there can’t be more than N/2 leaves in a binary tree, therefore the maximum number of elements in all paths list will be O(N/2) = O(N). Now, each of these paths can have many nodes in them. For a balanced binary tree (like above), each leaf node will be at maximum depth. As we know that the depth (or height) of a balanced binary tree is O(logN) we can say that, at the most, each path can have logN nodes in it. This means that the total size of the all paths list will be O(N*logN). If the tree is not balanced, we will still have the same worst-case space complexity.
From the above discussion, we can conclude that the overall space complexity of our algorithm is O(N*logN).
Also from the above discussion, since for each leaf node, in the worst case, we have to copy log(N) nodes to store its path, therefore the time complexity of our algorithm will also be O(N*logN).
The worst case time complexity is not O(nlogn), but O(n^2).
to visit every node, we need O(n) time
to generate all paths, we have to add the nodes to the path for every valid path.
So the time taken is sum of len(path). To estimate an upper bound of the sum: the number of paths is bounded by n, the length of path is also bounded by n, so O(n^2) is an upper bound. Both worst case can be reached at the same time if the top half of the tree is a linear tree, and the bottom half is a complete binary tree, like this:
1
1
1
1
1
1 1
1 1 1 1
number of paths is n/4, and length of each path is n/2 + log(n/2) ~ n/2

printing all binary trees from inorder traversal

Came across this question in an interview.
Given inorder traversal of a binary tree. Print all the possible binary trees from it.
Initial thought:
If say we have only 2 elements in the array. Say 2,1.
Then two possible trees are
2
\
1
1
/
2
If 3 elements Say, 2,1,4. Then we have 5 possible trees.
2 1 4 2 4
\ / \ / \ /
1 2 4 1 4 2
\ / / \
4 2 1 1
So, basically if we have n elements, then we have n-1 branches (childs, / or ).
We can arrange these n-1 branches in any order.
For n=3, n-1 = 2. So, we have 2 branches.
We can arrange the 2 branches in these ways:
/ \ \ / /\
/ \ / \
Initial attempt:
struct node *findTree(int *A,int l,int h)
{
node *root = NULL;
if(h < l)
return NULL;
for(int i=l;i<h;i++)
{
root = newNode(A[i]);
root->left = findTree(A,l,i-1);
root->right = findTree(A,i+1,h);
printTree(root);
cout<<endl;
}
}
This problem breaks down quite nicely into subproblems. Given an inorder traversal, after choosing a root we know that everything before that is the left subtree and everthing after is the right subtree (either is possibly empty).
So to enumerate all possible trees, we just try all possible values for the root and recursively solve for the left & right subtrees (the number of such trees grows quite quickly though!)
antonakos provided code that shows how to do this, though that solution may use more memory than desirable. That could be addressed by adding more state to the recursion so it doesn't have to save lists of the answers for the left & right and combine them at the end; instead nesting these processes, and printing each tree as it is found.
I'd write one function for constructing the trees and another for printing them.
The construction of the trees goes like this:
#include <vector>
#include <iostream>
#include <boost/foreach.hpp>
struct Tree {
int value;
Tree* left;
Tree* right;
Tree(int value, Tree* left, Tree* right) :
value(value), left(left), right(right) {}
};
typedef std::vector<Tree*> Seq;
Seq all_trees(const std::vector<int>& xs, int from, int to)
{
Seq result;
if (from >= to) result.push_back(0);
else {
for (int i = from; i < to; i++) {
const Seq left = all_trees(xs, from, i);
const Seq right = all_trees(xs, i + 1, to);
BOOST_FOREACH(Tree* tl, left) {
BOOST_FOREACH(Tree* tr, right) {
result.push_back(new Tree(xs[i], tl, tr));
}
}
}
}
return result;
}
Seq all_trees(const std::vector<int>& xs)
{
return all_trees(xs, 0, (int)xs.size());
}
Observe that for root value there are multiple trees that be constructed from the values to the left and the right of the root value. All combinations of these left and right trees are included.
Writing the pretty-printer is left as an exercise (a boring one), but we can test that the function indeed constructs the expected number of trees:
int main()
{
const std::vector<int> xs(3, 0); // 3 values gives 5 trees.
const Seq result = all_trees(xs);
std::cout << "Number of trees: " << result.size() << "\n";
}

Randomly permute N first elements of a singly linked list

I have to permute N first elements of a singly linked list of length n, randomly. Each element is defined as:
typedef struct E_s
{
struct E_s *next;
}E_t;
I have a root element and I can traverse the whole linked list of size n. What is the most efficient technique to permute only N first elements (starting from root) randomly?
So, given a->b->c->d->e->f->...x->y->z I need to make smth. like f->a->e->c->b->...x->y->z
My specific case:
n-N is about 20% relative to n
I have limited RAM resources, the best algorithm should make it in place
I have to do it in a loop, in many iterations, so the speed does matter
The ideal randomness (uniform distribution) is not required, it's Ok if it's "almost" random
Before making permutations, I traverse the N elements already (for other needs), so maybe I could use this for permutations as well
UPDATE: I found this paper. It states it presents an algorithm of O(log n) stack space and expected O(n log n) time.
I've not tried it, but you could use a "randomized merge-sort".
To be more precise, you randomize the merge-routine. You do not merge the two sub-lists systematically, but you do it based on a coin toss (i.e. with probability 0.5 you select the first element of the first sublist, with probability 0.5 you select the first element of the right sublist).
This should run in O(n log n) and use O(1) space (if properly implemented).
Below you find a sample implementation in C you might adapt to your needs. Note that this implementation uses randomisation at two places: In splitList and in merge. However, you might choose just one of these two places. I'm not sure if the distribution is random (I'm almost sure it is not), but some test cases yielded decent results.
#include <stdio.h>
#include <stdlib.h>
#define N 40
typedef struct _node{
int value;
struct _node *next;
} node;
void splitList(node *x, node **leftList, node **rightList){
int lr=0; // left-right-list-indicator
*leftList = 0;
*rightList = 0;
while (x){
node *xx = x->next;
lr=rand()%2;
if (lr==0){
x->next = *leftList;
*leftList = x;
}
else {
x->next = *rightList;
*rightList = x;
}
x=xx;
lr=(lr+1)%2;
}
}
void merge(node *left, node *right, node **result){
*result = 0;
while (left || right){
if (!left){
node *xx = right;
while (right->next){
right = right->next;
}
right->next = *result;
*result = xx;
return;
}
if (!right){
node *xx = left;
while (left->next){
left = left->next;
}
left->next = *result;
*result = xx;
return;
}
if (rand()%2==0){
node *xx = right->next;
right->next = *result;
*result = right;
right = xx;
}
else {
node *xx = left->next;
left->next = *result;
*result = left;
left = xx;
}
}
}
void mergeRandomize(node **x){
if ((!*x) || !(*x)->next){
return;
}
node *left;
node *right;
splitList(*x, &left, &right);
mergeRandomize(&left);
mergeRandomize(&right);
merge(left, right, &*x);
}
int main(int argc, char *argv[]) {
srand(time(NULL));
printf("Original Linked List\n");
int i;
node *x = (node*)malloc(sizeof(node));;
node *root=x;
x->value=0;
for(i=1; i<N; ++i){
node *xx;
xx = (node*)malloc(sizeof(node));
xx->value=i;
xx->next=0;
x->next = xx;
x = xx;
}
x=root;
do {
printf ("%d, ", x->value);
x=x->next;
} while (x);
x = root;
node *left, *right;
mergeRandomize(&x);
if (!x){
printf ("Error.\n");
return -1;
}
printf ("\nNow randomized:\n");
do {
printf ("%d, ", x->value);
x=x->next;
} while (x);
printf ("\n");
return 0;
}
Convert to an array, use a Fisher-Yates shuffle, and convert back to a list.
I don't believe there's any efficient way to randomly shuffle singly-linked lists without an intermediate data structure. I'd just read the first N elements into an array, perform a Fisher-Yates shuffle, then reconstruct those first N elements into the singly-linked list.
First, get the length of the list and the last element. You say you already do a traversal before randomization, that would be a good time.
Then, turn it into a circular list by linking the first element to the last element. Get four pointers into the list by dividing the size by four and iterating through it for a second pass. (These pointers could also be obtained from the previous pass by incrementing once, twice, and three times per four iterations in the previous traversal.)
For the randomization pass, traverse again and swap pointers 0 and 2 and pointers 1 and 3 with 50% probability. (Do either both swap operations or neither; just one swap will split the list in two.)
Here is some example code. It looks like it could be a little more random, but I suppose a few more passes could do the trick. Anyway, analyzing the algorithm is more difficult than writing it :vP . Apologies for the lack of indentation; I just punched it into ideone in the browser.
http://ideone.com/9I7mx
#include <iostream>
#include <cstdlib>
#include <ctime>
using namespace std;
struct list_node {
int v;
list_node *n;
list_node( int inv, list_node *inn )
: v( inv ), n( inn) {}
};
int main() {
srand( time(0) );
// initialize the list and 4 pointers at even intervals
list_node *n_first = new list_node( 0, 0 ), *n = n_first;
list_node *p[4];
p[0] = n_first;
for ( int i = 1; i < 20; ++ i ) {
n = new list_node( i, n );
if ( i % (20/4) == 0 ) p[ i / (20/4) ] = n;
}
// intervals must be coprime to list length!
p[2] = p[2]->n;
p[3] = p[3]->n;
// turn it into a circular list
n_first->n = n;
// swap the pointers around to reshape the circular list
// one swap cuts a circular list in two, or joins two circular lists
// so perform one cut and one join, effectively reordering elements.
for ( int i = 0; i < 20; ++ i ) {
list_node *p_old[4];
copy( p, p + 4, p_old );
p[0] = p[0]->n;
p[1] = p[1]->n;
p[2] = p[2]->n;
p[3] = p[3]->n;
if ( rand() % 2 ) {
swap( p_old[0]->n, p_old[2]->n );
swap( p_old[1]->n, p_old[3]->n );
}
}
// you might want to turn it back into a NULL-terminated list
// print results
for ( int i = 0; i < 20; ++ i ) {
cout << n->v << ", ";
n = n->n;
}
cout << '\n';
}
For the case when N is really big (so it doesn't fit your memory), you can do the following (a sort of Knuth's 3.4.2P):
j = N
k = random between 1 and j
traverse the input list, find k-th item and output it; remove the said item from the sequence (or mark it somehow so that you won't consider it at the next traversal)
decrease j and return to 2 unless j==0
output the rest of the list
Beware that this is O(N^2), unless you can ensure random access in the step 3.
In case the N is relatively small, so that N items fit into the memory, just load them into array and shuffle, like #Mitch proposes.
If you know both N and n, I think you can do it simply. It's fully random, too. You only iterate through the whole list once, and through the randomized part each time you add a node. I think that's O(n+NlogN) or O(n+N^2). I'm not sure. It's based upon updating the conditional probability that a node is selected for the random portion given what happened to previous nodes.
Determine the probability that a certain node will be selected for the random portion given what happened to previous nodes (p=(N-size)/(n-position) where size is number of nodes previously chosen and position is number of nodes previously considered)
If node is not selected for random part, move to step 4. If node is selected for the random part, randomly choose place in random part based upon the size so far (place=(random between 0 and 1) * size, size is again number of previous nodes).
Place the node where it needs to go, update the pointers. Increment size. Change to looking at the node that previously pointed at what you were just looking at and moved.
Increment position, look at the next node.
I don't know C, but I can give you the pseudocode. In this, I refer to the permutation as the first elements that are randomized.
integer size=0; //size of permutation
integer position=0 //number of nodes you've traversed so far
Node head=head of linked list //this holds the node at the head of your linked list.
Node current_node=head //Starting at head, you'll move this down the list to check each node, whether you put it in the list.
Node previous=head //stores the previous node for changing pointers. starts at head to avoid asking for the next field on a null node
While ((size not equal to N) or (current_node is not null)){ //iterating through the list until the permutation is full. We should never pass the end of list, but just in case, I include that condition)
pperm=(N-size)/(n-position) //probability that a selected node will be in the permutation.
if ([generate a random decimal between 0 and 1] < pperm) //this decides whether or not the current node will go in the permutation
if (j is not equal to 0){ //in case we are at start of list, there's no need to change the list
pfirst=1/(size+1) //probability that, if you select a node to be in the permutation, that it will be first. Since the permutation has
//zero elements at start, adding an element will make it the initial node of a permutation and percent chance=1.
integer place_in_permutation = round down([generate a random decimal between 0 and 1]/pfirst) //place in the permutation. note that the head =0.
previous.next=current_node.next
if(place_in_permutation==0){ //if placing current node first, must change the head
current_node.next=head //set the current Node to point to the previous head
head=current_node //set the variable head to point to the current node
}
else{
Node temp=head
for (counter starts at zero. counter is less than place_in_permutation-1. Each iteration, increment counter){
counter=counter.next
} //at this time, temp should point to the node right before the insertion spot
current_node.next=temp.next
temp.next=current_node
}
current_node=previous
}
size++ //since we add one to the permutation, increase the size of the permutation
}
j++;
previous=current_node
current_node=current_node.next
}
You could probably increase the efficiency if you held on to the most recently added node in case you had to add one to the right of it.
Similar to Vlad's answer, here is a slight improvement (statistically):
Indices in algorithm are 1 based.
Initialize lastR = -1
If N <= 1 go to step 6.
Randomize number r between 1 and N.
if r != N
4.1 Traverse the list to item r and its predecessor.
If lastR != -1
If r == lastR, your pointer for the of the r'th item predecessor is still there.
If r < lastR, traverse to it from the beginning of the list.
If r > lastR, traverse to it from the predecessor of the lastR'th item.
4.2 remove the r'th item from the list into a result list as the tail.
4.3 lastR = r
Decrease N by one and go to step 2.
link the tail of the result list to the head of the remaining input list. You now have the original list with the first N items permutated.
Since you do not have random access, this will reduce the traversing time you will need within the list (I assume that by half, so asymptotically, you won't gain anything).
O(NlogN) easy to implement solution that does not require extra storage:
Say you want to randomize L:
is L has 1 or 0 elements you are done
create two empty lists L1 and L2
loop over L destructively moving its elements to L1 or L2 choosing between the two at random.
repeat the process for L1 and L2 (recurse!)
join L1 and L2 into L3
return L3
Update
At step 3, L should be divided into equal sized (+-1) lists L1 and L2 in order to guaranty best case complexity (N*log N). That can be done adjusting the probability of one element going into L1 or L2 dynamically:
p(insert element into L1) = (1/2 * len0(L) - len(L1)) / len(L)
where
len(M) is the current number of elements in list M
len0(L) is the number of elements there was in L at the beginning of step 3
There is an algorithm takes O(sqrt(N)) space and O(N) time, for a singly linked list.
It does not generate a uniform distribution over all permutation sequence, but it can gives good permutation that is not easily distinguishable. The basic idea is similar to permute a matrix by rows and columns as described below.
Algorithm
Let the size of the elements to be N, and m = floor(sqrt(N)). Assuming a "square matrix" N = m*m will make this method much clear.
In the first pass, you should store the pointers of elements that is separated by every m elements as p_0, p_1, p_2, ..., p_m. That is, p_0->next->...->next(m times) == p_1 should be true.
Permute each row
For i = 0 to m do:
Index all elements between p_i->next to p_(i+1)->next in the link list by an array of size O(m)
Shuffle this array using standard method
Relink the elements using this shuffled array
Permute each column.
Initialize an array A to store pointers p_0, ..., p_m. It is used to traverse the columns
For i = 0 to m do
Index all elements pointed A[0], A[1], ..., A[m-1] in the link list by an array of size m
Shuffle this array
Relink the elements using this shuffled array
Advance the pointer to next column A[i] := A[i]->next
Note that p_0 is an element point to the first element and the p_m point to the last element. Also, if N != m*m, you may use m+1 separation for some p_i instead. Now you get a "matrix" such that the p_i point to the start of each row.
Analysis and randomness
Space complexity: This algorithm need O(m) space to store the start of row. O(m) space to store the array and O(m) space to store the extra pointer during column permutation. Hence, time complexity is ~ O(3*sqrt(N)). For N = 1000000, it is around 3000 entries and 12 kB memory.
Time complexity: It is obviously O(N). It either walk through the "matrix" row by row or column by column
Randomness: The first thing to note is that each element can go to anywhere in the matrix by row and column permutation. It is very important that elements can go to anywhere in the linked list. Second, though it does not generate all permutation sequence, it does generate part of them. To find the number of permutation, we assume N=m*m, each row permutation has m! and there is m row, so we have (m!)^m. If column permutation is also include, it is exactly equal to (m!)^(2*m), so it is almost impossible to get the same sequence.
It is highly recommended to repeat the second and third step by at least one more time to get an more random sequence. Because it can suppress almost all the row and column correlation to its original location. It is also important when your list is not "square". Depends on your need, you may want to use even more repetition. The more repetition you use, the more permutation it can be and the more random it is. I remember that it is possible to generate uniform distribution for N=9 and I guess that it is possible to prove that as repetition tends to infinity, it is the same as the true uniform distribution.
Edit: The time and space complexity is tight bound and is almost the same in any situation. I think this space consumption can satisfy your need. If you have any doubt, you may try it in a small list and I think you will find it useful.
The list randomizer below has complexity O(N*log N) and O(1) memory usage.
It is based on the recursive algorithm described on my other post modified to be iterative instead of recursive in order to eliminate the O(logN) memory usage.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef struct node {
struct node *next;
char *str;
} node;
unsigned int
next_power_of_two(unsigned int v) {
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
return v + 1;
}
void
dump_list(node *l) {
printf("list:");
for (; l; l = l->next) printf(" %s", l->str);
printf("\n");
}
node *
array_to_list(unsigned int len, char *str[]) {
unsigned int i;
node *list;
node **last = &list;
for (i = 0; i < len; i++) {
node *n = malloc(sizeof(node));
n->str = str[i];
*last = n;
last = &n->next;
}
*last = NULL;
return list;
}
node **
reorder_list(node **last, unsigned int po2, unsigned int len) {
node *l = *last;
node **last_a = last;
node *b = NULL;
node **last_b = &b;
unsigned int len_a = 0;
unsigned int i;
for (i = len; i; i--) {
double pa = (1.0 + RAND_MAX) * (po2 - len_a) / i;
unsigned int r = rand();
if (r < pa) {
*last_a = l;
last_a = &l->next;
len_a++;
}
else {
*last_b = l;
last_b = &l->next;
}
l = l->next;
}
*last_b = l;
*last_a = b;
return last_b;
}
unsigned int
min(unsigned int a, unsigned int b) {
return (a > b ? b : a);
}
randomize_list(node **l, unsigned int len) {
unsigned int po2 = next_power_of_two(len);
for (; po2 > 1; po2 >>= 1) {
unsigned int j;
node **last = l;
for (j = 0; j < len; j += po2)
last = reorder_list(last, po2 >> 1, min(po2, len - j));
}
}
int
main(int len, char *str[]) {
if (len > 1) {
node *l;
len--; str++; /* skip program name */
l = array_to_list(len, str);
randomize_list(&l, len);
dump_list(l);
}
return 0;
}
/* try as: a.out list of words foo bar doz li 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
*/
Note that this version of the algorithm is completely cache unfriendly, the recursive version would probably perform much better!
If both the following conditions are true:
you have plenty of program memory (many embedded hardwares execute directly from flash);
your solution does not suffer that your "randomness" repeats often,
Then you can choose a sufficiently large set of specific permutations, defined at programming time, write a code to write the code that implements each, and then iterate over them at runtime.