I had implemented a BST for a multiset using the C++ code below, whereas each node contains the number of occurrence num of each distinct number data, and I try to find the number of elements less than certain value x, using the order function below.
It works, however, inefficient in terms of execution time.
Is there any method with better time complexity?
struct Node {
int data;
int height;
int num;
Node *left;
Node *right;
};
int order(Node *n, int x) {
int sum = 0;
if (n != NULL) {
if (n->data < x) {
sum += n->num;
sum += order(n->right, x);
}
sum += order(n->left, x);
}
return sum;
}
You can bring the algorithm down to O(logN) time by storing in each node the number of elements in the subtree of which it is the root. Then you'd only have to recurse on one of the two children of each node (go left if x < node->data, right if x > node->data), which if the tree is balanced only takes logarithmic time.
struct Node {
int data;
int height;
int num;
int size; // numer of elements in the subtree starting at this node
Node *left;
Node *right;
};
int order(Node *n, int x) {
if(n == NULL) return 0;
// elements less than n->data make up the whole left subtree
if (x == n->data) {
return n->left ? n->left->size : 0;
}
// even smaller? just recurse left
else if (x < n->data) {
return order(n->left, x);
}
// bigger? take the whole left subtree and part of the right one
else {
return (n->left ? n->left->size : 0) + order(n->right, x);
}
}
Of course, now you have to keep track of the size, but this can be done very efficiently when updating the tree: simply recalculate the size (n->left->size + n->right->size + 1) of each modified node in an insertion or deletion.
If you can add the size to your structure, I highly recommend using Dario Petrillo’s answer.
If you have to stick to your structure, you can reduce the number of instructions and recursions.
int count_all(Node* n) {
int acc = n->num;
if (n->left != NULL) acc += count_all(n->left);
if (n->right != NULL) acc += count_all(n->right);
return acc;
}
int order(Node *n, int x) {
if (n == NULL) return 0;
// Find the first left node which is < x
while (n->data >= x) {
n = n->left;
if (n == NULL) return 0;
}
assert(n != NULL && n->data < x);
int sum = n->num;
// Grab everything left because all of them are < x
if (n->left != NULL) sum += count_all(n->left);
// Some of the right nodes may be < x, some may not
// Repeat the algorithm to find out
if (n->right != NULL) sum += order(n->right, x);
return sum;
}
This reduces the number of recursions when the root is bigger than x and you want to quickly find the next left node that satisfies n->data < x. It also removes a ton of unnecessary comparisons to x for the left side of a tree where you already know that everything is < x.
I need an algorithm which can find the median of a singly linked list in linear time complexity O(n) and constant space complexity O(1).
EDIT: The singly linked list is a C-style singly linked list. No stl allowed (no container, no functions, everything stl is forbidden, e.g no std::forward_list). Not allowed to move the numbers in any other container (like an array).
It's acceptable to have a space complexity of O(logn) as this will be actually even under 100 for my lists. Also I am not allowed to use the STL functions like the nth_element
Basically I have linked list with like 3 * 10^6 elements and I need to get the median in 3 seconds, so I can't use a sorting algoritm to sort the list (that will be O(nlogn) and will take something like 10-14 seconds maybe).
I've done some search online and I've found that it's posibile to find the median of an std::vector in O(n) and O(1) space compleity with quickselect (the worst case is in O(n^2), but it is rare), example: https://www.geeksforgeeks.org/quickselect-a-simple-iterative-implementation/
But I can't find any algoritm that does this for a linked list. The issue is that I can use the array index to randomly acces the vectorIf I want to modify that algoritm the complexity will be much bigger, because. For example when I change the pivotindex to the left I actually need to traverse the list to get that new element and go further (this will get me at least O(kn) with a big k for my list, even aproching O(n^2)...).
EDIT 2:
I know I have too many variables but I've been testing different stuff and I am still working on my code...
My current code:
#include <bits/stdc++.h>
using namespace std;
template <class T> class Node {
public:
T data;
Node<T> *next;
};
template <class T> class List {
public:
Node<T> *first;
};
template <class T> T getMedianValue(List<T> & l) {
Node<T> *crt,*pivot,*incpivot;
int left, right, lung, idx, lungrel,lungrel2, left2, right2, aux, offset;
pivot = l.first;
crt = pivot->next;
lung = 1;
//lung is the lenght of the linked list (yeah it's lenght in romanian...)
//lungrel and lungrel2 are the relative lenghts of the part of
//the list I am processing, e.g: 2 3 4 in a list with 1 2 3 4 5
right = left = 0;
while (crt != NULL) {
if(crt->data < pivot->data){
aux = pivot->data;
pivot->data = crt->data;
crt->data = pivot->next->data;
pivot->next->data = aux;
pivot = pivot->next;
left++;
}
else right++;
// cout<<crt->data<<endl;
crt = crt->next;
lung++;
}
if(right > left) offset = left;
// cout<<endl;
// cout<<pivot->data<<" "<<left<<" "<<right<<endl;
// printList(l);
// cout<<endl;
lungrel = lung;
incpivot = l.first;
// offset = 0;
while(left != right){
//cout<<"parcurgere"<<endl;
if(left > right){
//cout<<endl;
//printList(l);
//cout<<endl;
//cout<<"testleft "<<incpivot->data<<" "<<left<<" "<<right<<endl;
crt = incpivot->next;
pivot = incpivot;
idx = offset;left2 = right2 = lungrel = 0;
//cout<<idx<<endl;
while(idx < left && crt!=NULL){
if(pivot->data > crt->data){
// cout<<"1crt "<<crt->data<<endl;
aux = pivot->data;
pivot->data = crt->data;
crt->data = pivot->next->data;
pivot->next->data = aux;
pivot = pivot->next;
left2++;lungrel++;
}
else {
right2++;lungrel++;
//cout<<crt->data<<" "<<right2<<endl;
}
//cout<<crt->data<<endl;
crt = crt->next;
idx++;
}
left = left2 + offset;
right = lung - left - 1;
if(right > left) offset = left;
//if(pivot->data == 18) return 18;
//cout<<endl;
//cout<<"l "<<pivot->data<<" "<<left<<" "<<right<<" "<<right2<<endl;
// printList(l);
}
else if(left < right && pivot->next!=NULL){
idx = left;left2 = right2 = 0;
incpivot = pivot->next;offset++;left++;
//cout<<endl;
//printList(l);
//cout<<endl;
//cout<<"testright "<<incpivot->data<<" "<<left<<" "<<right<<endl;
pivot = pivot->next;
crt = pivot->next;
lungrel2 = lungrel;
lungrel = 0;
// cout<<"p right"<<pivot->data<<" "<<left<<" "<<right<<endl;
while((idx < lungrel2 + offset - 1) && crt!=NULL){
if(crt->data < pivot->data){
// cout<<"crt "<<crt->data<<endl;
aux = pivot->data;
pivot->data = crt->data;
crt->data = (pivot->next)->data;
(pivot->next)->data = aux;
pivot = pivot->next;
// cout<<"crt2 "<<crt->data<<endl;
left2++;lungrel++;
}
else right2++;lungrel++;
//cout<<crt->data<<endl;
crt = crt->next;
idx++;
}
left = left2 + left;
right = lung - left - 1;
if(right > left) offset = left;
// cout<<"r "<<pivot->data<<" "<<left<<" "<<right<<endl;
// printList(l);
}
else{
//cout<<cmx<<endl;
return pivot->data;
}
}
//cout<<cmx<<endl;
return pivot->data;
}
template <class T> void printList(List<T> const & l) {
Node<T> *tmp;
if(l.first != NULL){
tmp = l.first;
while(tmp != NULL){
cout<<tmp->data<<" ";
tmp = tmp->next;
}
}
}
template <class T> void push_front(List<T> & l, int x)
{
Node<T>* tmp = new Node<T>;
tmp->data = x;
tmp->next = l.first;
l.first = tmp;
}
int main(){
List<int> l;
int n = 0;
push_front(l, 19);
push_front(l, 12);
push_front(l, 11);
push_front(l, 101);
push_front(l, 91);
push_front(l, 21);
push_front(l, 9);
push_front(l, 6);
push_front(l, 25);
push_front(l, 4);
push_front(l, 18);
push_front(l, 2);
push_front(l, 8);
push_front(l, 10);
push_front(l, 200);
push_front(l, 225);
push_front(l, 170);
printList(l);
n=getMedianValue(l);
cout<<endl;
cout<<n;
return 0;
}
Do you have any sugestion on how to adapt quickselect to a singly listed link or other algoritm that would work for my problem ?
In your question, you mentioned that you are having trouble selecting a pivot that is not at the start of the list, because this would require traversing the list. If you do it correctly, you only have to traverse the entire list twice:
once for finding the middle and the end of the list in order to select a good pivot (e.g. using the "median-of-three" rule)
once for the actual sorting
The first step is not necessary if you don't care much about selecting a good pivot and you are happy with simply selecting the first element of the list as the pivot (which causes worst case O(n^2) time complexity if the data is already sorted).
If you remember the end of the list the first time you traverse it by maintaining a pointer to the end, then you should never have to traverse it again to find the end. Also, if you are using the standard Lomuto partition scheme (which I am not using for the reasons stated below), then you must also maintain two pointers into the list which represent the i and j index of the standard Lomuto partition scheme. By using these pointers, should never have to traverse the list for accessing a single element.
Also, if you maintain a pointer to the middle and the end of every partition, then, when you later must sort one of these partitions, you will not have to traverse that partition again to find the middle and end.
I have now created my own implementation of the QuickSelect algorithm for linked lists, which I have posted below.
Since you stated that the linked list is singly-linked and cannot be upgraded to a doubly-linked list, I can't use the Hoare partition scheme, as iterating a singly-linked list backwards is very expensive. Therefore, I am using the generally less efficient Lomuto partition scheme instead.
When using the Lomuto partition scheme, either the first element or the last element is typically selected as a pivot. However, selecting either of those has the disadvantage that sorted data will cause the algorithm to have the worst-case time complexity of O(n^2). This can be prevented by selecting a pivot according to the "median-of-three" rule, which is to select a pivot from the median value of the first element, middle element and last element. Therefore, in my implementation, I am using this "median-of-three" rule.
Also, the Lomuto partition scheme typically creates two partitions, one for values smaller than the pivot and one for values larger than or equal to the pivot. However, this will cause the worst-case time complexity of O(n^2) if all values are identical. Therefore, in my implementation, I am creating three partitions, one for values smaller than the pivot, one for values larger than the pivot, and one for values equal to the pivot.
Although these measures don't completely eliminate the possibility of worst-case time complexity of O(n^2), they at least make it highly unlikely (unless the input is provided by a malicious attacker). In order to guarantee a time complexity of O(n), a more complex pivot selection algorithm would have to be used, such as median of medians.
One significant problem I encountered is that for an even number of elements, the median is defined as the arithmetic mean of the two "middle" or "median" elements. For this reason, I can't simply write a function similar to std::nth_element, because if, for example, the total number of elements is 14, then I will be looking for the 7th and 8th largest element. This means I would have to call such a function twice, which would be inefficient. Therefore, I have instead written a function which can search for the two "median" elements at once. Although this makes the code more complex, the performance penalty due to the additional code complexity should be minimal compared to the advantage of not having to call the same function twice.
Please note that although my implementation compiles perfectly on a C++ compiler, I wouldn't call it textbook C++ code, because the question states that I am not allowed to use anything from the C++ standard template library. Therefore, my code is rather a hybrid of C code and C++ code.
In the following code, I only use the standard template library (in particular the function std::nth_element) for testing my algorithm and for verifying the results. I do not use any of these functions in my actual algorithm.
#include <iostream>
#include <iomanip>
#include <cassert>
// The following two headers are only required for testing the algorithm and verifying
// the correctness of its results. They are not used in the algorithm itself.
#include <random>
#include <algorithm>
// The following setting can be changed to print extra debugging information
// possible settings:
// 0: no extra debugging information
// 1: print the state and length of all partitions in every loop iteraton
// 2: additionally print the contents of all partitions (if they are not too big)
#define PRINT_DEBUG_LEVEL 0
template <typename T>
struct Node
{
T data;
Node<T> *next;
};
// NOTE:
// The return type is not necessarily the same as the data type. The reason for this is
// that, for example, the data type "int" requires a "double" as a return type, so that
// the arithmetic mean of "3" and "6" returns "4.5".
// This function may require template specializations to handle overflow or wrapping.
template<typename T, typename U>
U arithmetic_mean( const T &first, const T &second )
{
return ( static_cast<U>(first) + static_cast<U>(second) ) / 2;
}
//the main loop of the function find_median can be in one of the following three states
enum LoopState
{
//we are looking for one median value
LOOPSTATE_LOOKINGFORONE,
//we are looking for two median values, and the returned median
//will be the arithmetic mean of the two
LOOPSTATE_LOOKINGFORTWO,
//one of the median values has been found, but we are still searching for
//the second one
LOOPSTATE_FOUNDONE
};
template <
typename T, //type of the data
typename U //type of the return value
>
U find_median( Node<T> *list )
{
//This variable points to the pointer to the first element of the current partition.
//During the partition phase, the linked list will be broken and reassembled afterwards, so
//the pointer this pointer points to will be nullptr until it is reassembled.
Node<T> **pp_start = &list;
//This pointer represents nothing more than the cached value of *pp_start and it is
//not always valid
Node<T> *p_start = *pp_start;
//These pointers are maintained for accessing the middle of the list for selecting a pivot
//using the "median-of-three" rule.
Node<T> *p_middle;
Node<T> *p_end;
//result is not defined if list is empty
assert( p_start != nullptr );
//in the main loop, this variable always holds the number of elements in the current partition
int num_total = 1;
// First, we must traverse the entire linked list in order to determine the number of elements,
// in order to calculate k1 and k2. If it is odd, then the median is defined as the k'th smallest
// element where k = n / 2. If the number of elements is even, then the median is defined as the
// arithmetic mean of the k'th element and the (k+1)'th element.
// We also set a pointer to the nodes in the middle and at the end, which will be required later
// for selecting a pivot according to the "median-of-three" rule.
p_middle = p_start;
for ( p_end = p_start; p_end->next != nullptr; p_end = p_end->next )
{
num_total++;
if ( num_total % 2 == 0 ) p_middle = p_middle->next;
}
// find out whether we are looking for only one or two median values
enum LoopState loop_state = num_total % 2 == 0 ? LOOPSTATE_LOOKINGFORTWO : LOOPSTATE_LOOKINGFORONE;
//set k to the index of the middle element, or if there are two middle elements, to the left one
int k = ( num_total - 1 ) / 2;
// If we are looking for two median values, but we have only found one, then this variable will
// hold the value of the one we found. Whether we have found one can be determined by the state of
// the variable loop_state.
T val_found;
for (;;)
{
//make p_start cache the value of *pp_start again, because a previous iteration of the loop
//may have changed the value of pp_start
p_start = *pp_start;
assert( p_start != nullptr );
assert( p_middle != nullptr );
assert( p_end != nullptr );
assert( num_total != 0 );
if ( num_total == 1 )
{
switch ( loop_state )
{
case LOOPSTATE_LOOKINGFORONE:
return p_start->data;
case LOOPSTATE_FOUNDONE:
return arithmetic_mean<T,U>( val_found, p_start->data );
default:
assert( false ); //this should be unreachable
}
}
//select the pivot according to the "median-of-three" rule
T pivot;
if ( p_start->data < p_middle->data )
{
if ( p_middle->data < p_end->data )
pivot = p_middle->data;
else if ( p_start->data < p_end->data )
pivot = p_end->data;
else
pivot = p_start->data;
}
else
{
if ( p_start->data < p_end->data )
pivot = p_start->data;
else if ( p_middle->data < p_end->data )
pivot = p_end->data;
else
pivot = p_middle->data;
}
#if PRINT_DEBUG_LEVEL >= 1
//this line is conditionally compiled for extra debugging information
std::cout << "\nmedian of three: " << (*pp_start)->data << " " << p_middle->data << " " << p_end->data << " ->" << pivot << std::endl;
#endif
// We will be dividing the current partition into 3 new partitions (less-than,
// equal-to and greater-than) each represented as a linked list. Each list
// requires a pointer to the start of the list and a pointer to the pointer at
// the end of the list to write the address of new elements to. Also, when
// traversing the lists, we need to keep a pointer to the middle of the list,
// as this information will be required for selecting a new pivot in the next
// iteration of the loop. The latter is not required for the equal-to partition,
// as it would never be used.
Node<T> *p_less = nullptr, **pp_less_end = &p_less, **pp_less_middle = &p_less;
Node<T> *p_equal = nullptr, **pp_equal_end = &p_equal;
Node<T> *p_greater = nullptr, **pp_greater_end = &p_greater, **pp_greater_middle = &p_greater;
// These pointers are only used as a cache to the location of the end node.
// Despite their similar name, their function is quite different to pp_less_end
// and pp_greater_end.
Node<T> *p_less_end = nullptr;
Node<T> *p_greater_end = nullptr;
// counter for the number of elements in each partition
int num_less = 0;
int num_equal = 0;
int num_greater = 0;
// NOTE:
// The following loop will temporarily split the linked list. It will be merged later.
Node<T> *p_next_node = p_start;
//the following line isn't necessary; it is only used to clarify that the pointers no
//longer point to anything meaningful
*pp_start = p_start = nullptr;
for ( int i = 0; i < num_total; i++ )
{
assert( p_next_node != nullptr );
Node<T> *p_current_node = p_next_node;
p_next_node = p_next_node->next;
if ( p_current_node->data < pivot )
{
//link node to pp_less
assert( *pp_less_end == nullptr );
*pp_less_end = p_less_end = p_current_node;
pp_less_end = &p_current_node->next;
p_current_node->next = nullptr;
num_less++;
if ( num_less % 2 == 0 )
{
pp_less_middle = &(*pp_less_middle)->next;
}
}
else if ( p_current_node->data == pivot )
{
//link node to pp_equal
assert( *pp_equal_end == nullptr );
*pp_equal_end = p_current_node;
pp_equal_end = &p_current_node->next;
p_current_node->next = nullptr;
num_equal++;
}
else
{
//link node to pp_greater
assert( *pp_greater_end == nullptr );
*pp_greater_end = p_greater_end = p_current_node;
pp_greater_end = &p_current_node->next;
p_current_node->next = nullptr;
num_greater++;
if ( num_greater % 2 == 0 )
{
pp_greater_middle = &(*pp_greater_middle)->next;
}
}
}
assert( num_total == num_less + num_equal + num_greater );
assert( num_equal >= 1 );
#if PRINT_DEBUG_LEVEL >= 1
//this section is conditionally compiled for extra debugging information
{
std::cout << std::setfill( '0' );
switch ( loop_state )
{
case LOOPSTATE_LOOKINGFORONE:
std::cout << "LOOPSTATE_LOOKINGFORONE k = " << k << "\n";
break;
case LOOPSTATE_LOOKINGFORTWO:
std::cout << "LOOPSTATE_LOOKINGFORTWO k = " << k << "\n";
break;
case LOOPSTATE_FOUNDONE:
std::cout << "LOOPSTATE_FOUNDONE k = " << k << " val_found = " << val_found << "\n";
}
std::cout << "partition lengths: ";
std::cout <<
std::setw( 2 ) << num_less << " " <<
std::setw( 2 ) << num_equal << " " <<
std::setw( 2 ) << num_greater << " " <<
std::setw( 2 ) << num_total << "\n";
#if PRINT_DEBUG_LEVEL >= 2
Node<T> *p;
std::cout << "less: ";
if ( num_less > 10 )
std::cout << "too many to print";
else
for ( p = p_less; p != nullptr; p = p->next ) std::cout << p->data << " ";
std::cout << "\nequal: ";
if ( num_equal > 10 )
std::cout << "too many to print";
else
for ( p = p_equal; p != nullptr; p = p->next ) std::cout << p->data << " ";
std::cout << "\ngreater: ";
if ( num_greater > 10 )
std::cout << "too many to print";
else
for ( p = p_greater; p != nullptr; p = p->next ) std::cout << p->data << " ";
std::cout << "\n\n" << std::flush;
#endif
std::cout << std::flush;
}
#endif
//insert less-than partition into list
assert( *pp_start == nullptr );
*pp_start = p_less;
//insert equal-to partition into list
assert( *pp_less_end == nullptr );
*pp_less_end = p_equal;
//insert greater-than partition into list
assert( *pp_equal_end == nullptr );
*pp_equal_end = p_greater;
//link list to previously cut off part
assert( *pp_greater_end == nullptr );
*pp_greater_end = p_next_node;
//if less-than partition is large enough to hold both possible median values
if ( k + 2 <= num_less )
{
//set the next iteration of the loop to process the less-than partition
//pp_start is already set to the desired value
p_middle = *pp_less_middle;
p_end = p_less_end;
num_total = num_less;
}
//else if less-than partition holds one of both possible median values
else if ( k + 1 == num_less )
{
if ( loop_state == LOOPSTATE_LOOKINGFORTWO )
{
//the equal_to partition never needs sorting, because all members are already equal
val_found = p_equal->data;
loop_state = LOOPSTATE_FOUNDONE;
}
//set the next iteration of the loop to process the less-than partition
//pp_start is already set to the desired value
p_middle = *pp_less_middle;
p_end = p_less_end;
num_total = num_less;
}
//else if equal-to partition holds both possible median values
else if ( k + 2 <= num_less + num_equal )
{
//the equal_to partition never needs sorting, because all members are already equal
if ( loop_state == LOOPSTATE_FOUNDONE )
return arithmetic_mean<T,U>( val_found, p_equal->data );
return p_equal->data;
}
//else if equal-to partition holds one of both possible median values
else if ( k + 1 == num_less + num_equal )
{
switch ( loop_state )
{
case LOOPSTATE_LOOKINGFORONE:
return p_equal->data;
case LOOPSTATE_LOOKINGFORTWO:
val_found = p_equal->data;
loop_state = LOOPSTATE_FOUNDONE;
k = 0;
//set the next iteration of the loop to process the greater-than partition
pp_start = pp_equal_end;
p_middle = *pp_greater_middle;
p_end = p_greater_end;
num_total = num_greater;
break;
case LOOPSTATE_FOUNDONE:
return arithmetic_mean<T,U>( val_found, p_equal->data );
}
}
//else both possible median values must be in the greater-than partition
else
{
k = k - num_less - num_equal;
//set the next iteration of the loop to process the greater-than partition
pp_start = pp_equal_end;
p_middle = *pp_greater_middle;
p_end = p_greater_end;
num_total = num_greater;
}
}
}
// NOTE:
// The following code is not part of the algorithm, but is only intended to test the algorithm
// This simple class is designed to contain a singly-linked list
template <typename T>
class List
{
public:
List() : first( nullptr ) {}
// the following is required to abide by the rule of three/five/zero
// see: https://en.cppreference.com/w/cpp/language/rule_of_three
List( const List<T> & ) = delete;
List( const List<T> && ) = delete;
List<T>& operator=( List<T> & ) = delete;
List<T>& operator=( List<T> && ) = delete;
~List()
{
Node<T> *p = first;
while ( p != nullptr )
{
Node<T> *temp = p;
p = p->next;
delete temp;
}
}
void push_front( int data )
{
Node<T> *temp = new Node<T>;
temp->data = data;
temp->next = first;
first = temp;
}
//member variables
Node<T> *first;
};
int main()
{
//generated random numbers will be between 0 and 2 billion (fits in 32-bit signed int)
constexpr int min_val = 0;
constexpr int max_val = 2*1000*1000*1000;
//will allocate array for 1 million ints and fill with random numbers
constexpr int num_values = 1*1000*1000;
//this class contains the singly-linked list and is empty for now
List<int> l;
double result;
//These variables are used for random number generation
std::random_device rd;
std::mt19937 gen( rd() );
std::uniform_int_distribution<> dis( min_val, max_val );
try
{
//fill array with random data
std::cout << "Filling array with random data..." << std::flush;
auto unsorted_data = std::make_unique<int[]>( num_values );
for ( int i = 0; i < num_values; i++ ) unsorted_data[i] = dis( gen );
//fill the singly-linked list
std::cout << "done\nFilling linked list..." << std::flush;
for ( int i = 0; i < num_values; i++ ) l.push_front( unsorted_data[i] );
std::cout << "done\nCalculating median using STL function..." << std::flush;
//calculate the median using the functions provided by the C++ standard template library.
//Note: this is only done to compare the results with the algorithm provided in this file
if ( num_values % 2 == 0 )
{
int median1, median2;
std::nth_element( &unsorted_data[0], &unsorted_data[(num_values - 1) / 2], &unsorted_data[num_values] );
median1 = unsorted_data[(num_values - 1) / 2];
std::nth_element( &unsorted_data[0], &unsorted_data[(num_values - 0) / 2], &unsorted_data[num_values] );
median2 = unsorted_data[(num_values - 0) / 2];
result = arithmetic_mean<int,double>( median1, median2 );
}
else
{
int median;
std::nth_element( &unsorted_data[0], &unsorted_data[(num_values - 0) / 2], &unsorted_data[num_values] );
median = unsorted_data[(num_values - 0) / 2];
result = static_cast<int>(median);
}
std::cout << "done\nMedian according to STL function: " << std::setprecision( 12 ) << result << std::endl;
// NOTE: Since the STL functions only sorted the array, but not the linked list, the
// order of the linked list is still random and not pre-sorted.
//calculate the median using the algorithm provided in this file
std::cout << "Starting algorithm" << std::endl;
result = find_median<int,double>( l.first );
std::cout << "The calculated median is: " << std::setprecision( 12 ) << result << std::endl;
std::cout << "Cleaning up\n\n" << std::flush;
}
catch ( std::bad_alloc )
{
std::cerr << "Error: Unable to allocate sufficient memory!" << std::endl;
return -1;
}
return 0;
}
I have successfully tested my code with one million randomly generated elements and it found the correct median virtually instantaneously.
So what you can do is use iterators to hold the position. I have written the algorithm above to work with the std::forward_list. I know this isn't perfect, but wrote this up quickly and hope it helps.
int partition(int leftPos, int rightPos, std::forward_list<int>::iterator& currIter,
std::forward_list<int>::iterator lowIter, std::forward_list<int>::iterator highIter) {
auto iter = lowIter;
int i = leftPos - 1;
for(int j = leftPos; j < rightPos - 1; j++) {
if(*iter <= *highIter) {
++currIter;
++i;
std::iter_swap(currIter, iter);
}
iter++;
}
std::forward_list<int>::iterator newIter = currIter;
std::iter_swap(++newIter, highIter);
return i + 1;
}
std::forward_list<int>::iterator kthSmallest(std::forward_list<int>& list,
std::forward_list<int>::iterator left, std::forward_list<int>::iterator right, int size, int k) {
int leftPos {0};
int rightPos {size};
int pivotPos {0};
std::forward_list<int>::iterator resetIter = left;
std::forward_list<int>::iterator currIter = left;
++left;
while(leftPos <= rightPos) {
pivotPos = partition(leftPos, rightPos, currIter, left, right);
if(pivotPos == (k-1)) {
return currIter;
} else if(pivotPos > (k-1)) {
right = currIter;
rightPos = pivotPos - 1;
} else {
left = currIter;
++left;
resetIter = left;
++left;
leftPos = pivotPos + 1;
}
currIter = resetIter;
}
return list.end();
}
When makeing a call to kth iter, the left iterator should be one less than where you intend to start that. This allows us to be one position behind low in partition(). Here is an example of executing it:
int main() {
std::forward_list<int> list {10, 12, 12, 13, 4, 5, 8, 11, 6, 26, 15, 21};
auto startIter = list.before_begin();
int k = 6;
int size = getSize(list);
auto kthIter = kthSmallest(list, startIter, getEnd(list), size - 1, k);
std::cout << k << "th smallest: " << *kthIter << std::endl;
return 0;
}
6th smallest: 10
I'm trying to implement backtracing of the vertices visited for the lowest costing route to a vertex. I'm getting incorrect results and don't understand why. The only correct output was the last one in the image. What is causing the incorrect
Note: driverMap is a 2D 14x14 integer vector that holds the distances it takes to get to each vertex and path is a int array to hold the past path vertex taken. The beginning is a section of the code from my Main function to help out.
It's different as my previous questions were different, this time i'm asking for help on my current output against expected when attempting to backtrace.
for(int i = 0; i < allCars.size(); i++) //allCars.size()
{
int startInt = allCars[i].getStart(), loopEnder = 0;
for(int k = 0; k < driverMap.size(); k++)
{
path[k] = 0;
Distances[k] = 0;
}
for(int j = 0; j < driverMap.size(); j++)
{
Distances[j] = driverMap[startInt][j];
}
cout << "\nSTART INTERSECTION: '" << startInt << "' END INTERSECTION: '" << allCars[i].getEnd() << "'" << endl;
Dijkstra(driverMap, Distances, path, startInt);
int endInt = allCars[i].getEnd(), atInt = path[endInt];
cout << "END = " << endInt;
//allCars[i].addPath(endInt);
do
{
cout << "AT = " << atInt;
allCars[i].addPath(atInt);
atInt = path[atInt];
loopEnder++;
}while(atInt != endInt && loopEnder < 5);
cout << endl;
//allCars[i].addPath(startInt);
allCars[i].displayCar();
}
void Dijkstra(const vector< vector<int> > & driverMap, int Distances[], int path[], int startInt)
{
int Intersections[driverMap.size()];
for(int a = 0; a < driverMap.size(); a++)
{
Intersections[a] = a;
}
Intersections[startInt] = -1;
for(int l = 0; l < driverMap.size(); l++)
{
int minValue = 99999;
int minNode = 0;
for (int i = 0; i < driverMap.size(); i++)
{
if(Intersections[i] == -1)
{
continue;
}
if(Distances[i] > 0 && Distances[i] < minValue)
{
minValue = Distances[i];
minNode = i;
}
}
Intersections[minNode] = -1;
for(int i = 0; i < driverMap.size(); i++)
{
if(driverMap[minNode][i] < 0)
{
continue;
}
if(Distances[i] < 0)
{
Distances[i] = minValue + driverMap[minNode][i];
path[i] = minNode;
continue;
}
if((Distances[minNode] + driverMap[minNode][i]) < Distances[i])
{
Distances[i] = minValue + driverMap[minNode][i];
path[i] = minNode;
}
}
}
}
Doing backtracking in djikstra
Record the node that update the value of the current node with a smaller value
// Every time you update distance value with a smaller value
Distances[i] = minValue + driverMap[minNode][i];
back[i] = minNode; //Record the node with an int array, should be something like this
After you have complete all djikstra looping. Backtrack from any point except the start point.
Say, we want to trace from pt 5 to pt 0 in your graph where pt 5 is starting point. We start at 0, take back[0] (should equal to 4), then we take back[4] (should equal to 8), then we take back[8] (should equal to 5), then we should have some kinds of mechanism to stop here as pt 5 is a starting point. As a result, you get 0-4-8-5 and you reverse the order. You get the path 5-8-4-0.
In my approach, pathTaken[minNode].push_back(i); is not using. You might need to initiate the int array back[] with starting point's value for those who is connected to starting point.
Edit Part
You miss the point: "You might need to initiate the int array back[] with starting point's value for those who is connected to starting point".
path[k] = 0; is wrong. You should not initiate path with fixed index for all cases. Instead, you should initiate with startInt (for those directly connected to starting node) and non-exist node index -1 (for those not directly connected to starting node)
What back-tracking doing is
to record which node provides a new minimum Cost to current node (and keep updating in dijkstra loop). At the end, every node will get a node value (back[i] in my case) that point to a node that provides node i with its minimize Cost.
Base on the concept of dijkstra algorithm with back-tracking, back[i] is the previous node in the path from starting node to node i. That mean the path should be looks like :
(start node)->(path with zero or more nodes)->node point by back[i]-> node i
Applying this concept, we can track the path backward with back[i], back[back[i]], back[back[back[i]]],...
Any why path[k] = 0; is wrong? In your code, your starting node is not always node 0, but startint. Consider case like startint = 13 and your target destination node is 11. Obviously, the path is 13-11. For node 11, it will never encounter Distances[i] = minValue + driverMap[minNode][i]; when startint = 13 in your coding because that is first minimum cost node. And what have you set? node 11 has back[11]=0 (initialization) which state that the previous node in the path node 13 to 11 is node 0 which is obviously incorrect in concept and back[0]=0 (best path from 13 to 0 is 13-0, no update also) which will loop to itself as 0=back[back[0]]=back[back[back[0]]]=....
In djikstra if you go backwards (finnish -> start) just pick lowest cost node at each step to reach staring point. This works after you have solved graph and you have each node evaluated/costed.
This program should work correctly but it doesn't! assume you are building a minheap by inserting nmubers into an array. Each time of insertion should be followed by Heapify function to make sure that the sort of numbers do not violate the minheap rule. This is what I wrote but there is something wrong with it and I couldn't make it!
int P(int i) //returning the index of parent
{
if (i % 2 == 0) { i = ((i - 2) / 2); }
else { i = ((i - 1) / 2); }
return i;
}
void Heapify(double A[], int i)//putting the smallest value in the root because we have a min heap
{
if (P(i) != NULL && A[i] < A[P(i)])
{
temp = A[P(i)];
A[P(i)] = A[i];
A[i] = temp;
Heapify(A, P(i));
}
}
Generally speaking, your heapify function doesn't seem to take a minimum of both left and right branches into consideration. Let me show you an ideal, working implementation (object-oriented, so you might want to pass the heap as a parameter). You can find the exact pseudocode all over the internet, so I'm not really presenting anything unique.
void Heap::Heapify (int i)
{
int l = left(i);
int r = right(i);
int lowest;
if (l < heap_size && heap[l] -> value < heap[i] -> value )
lowest = l;
else
lowest = i;
if (r < heap_size && heap[r] -> value < heap[lowest] -> value)
lowest = r;
if (lowest != i)
{
swap (heap[i], heap[lowest]);
Heapify(lowest);
}
}
where
int left ( int i ) { return 2 * i; }
int right ( int i ) { return 2 * i + 1; }
As you can see, an algorithm first checks which one of left and right children have lower value. That value is swapped with current value. That is everything there is to it.
I am working on implementing a minimum heap using an array datastructure, but I am having an issue with my moveDown method (used at the root node to return the collection to a heap if the root's children are every smaller than it). I assume that the reader will know a what a min heap is, but I'll describe it just in case someone does not or my understanding is incorrect.
A minimum heap (in this case) is a binary tree represented by an array such that:
The root node is the smallest value in the datastructure
A node must always be smaller than it's children
Given a node of array index I, it's left child will be at index I*2 + 1 and the right child at I*2 + 2
The current issue I am having is that my moveDown function runs into an infinite loop when swapping. I am having a very difficult time finding a logic error, so I fear it might be something closer to the root (pun intended, I couldn't help myself).
Notable data members of the heap.cpp file:
int size;
MiniVector array;// The implementing custom array
void moveDown(int root);
moveDown function in my heap.cpp file:
void BinaryHeap::moveDown( int root ){
int temp = root;
int tempLC = temp*2 + 1;//LeftChild
int tempRC = temp*2 + 2;//RightChild
//Not a leaf
while( ( tempLC < array.size() && tempRC < array.size() )
&&
( (array[temp] > array[tempLC]) || (array[temp] > array[tempRC]) ) ){
int hold = array[temp];
if( array[temp] > array[tempRC] ){
array[temp] = array[tempRC];
array[tempRC] = hold;
temp = tempRC;
}
else{
array[temp] = array[tempLC];
array[tempLC] = hold;
temp = tempLC;
}
int tempLC = temp*2 + 1;//LeftChild
int tempRC = temp*2 + 2;//RightChild
}
}
You redeclare your variables. At the bottom of the while loop
int tempLC = temp*2 + 1;//LeftChild
int tempRC = temp*2 + 2;//RightChild
should be
tempLC = temp*2 + 1;//LeftChild
tempRC = temp*2 + 2;//RightChild
Wouldn't happen in Java.
Also wouldn't happen if you rewrote your loop as an infinite for loop with a break in the middle
for (;;)
{
int tempLC = temp*2 + 1;//LeftChild
int tempRC = temp*2 + 2;//RightChild
if (...)
break;
...
}
But I get flamed whenever I suggest this kind of loop is a good idea. Last time some suggested it was 'almost an anti-pattern' and that was one of the more polite responses.