I need an algorithm which can find the median of a singly linked list in linear time complexity O(n) and constant space complexity O(1).
EDIT: The singly linked list is a C-style singly linked list. No stl allowed (no container, no functions, everything stl is forbidden, e.g no std::forward_list). Not allowed to move the numbers in any other container (like an array).
It's acceptable to have a space complexity of O(logn) as this will be actually even under 100 for my lists. Also I am not allowed to use the STL functions like the nth_element
Basically I have linked list with like 3 * 10^6 elements and I need to get the median in 3 seconds, so I can't use a sorting algoritm to sort the list (that will be O(nlogn) and will take something like 10-14 seconds maybe).
I've done some search online and I've found that it's posibile to find the median of an std::vector in O(n) and O(1) space compleity with quickselect (the worst case is in O(n^2), but it is rare), example: https://www.geeksforgeeks.org/quickselect-a-simple-iterative-implementation/
But I can't find any algoritm that does this for a linked list. The issue is that I can use the array index to randomly acces the vectorIf I want to modify that algoritm the complexity will be much bigger, because. For example when I change the pivotindex to the left I actually need to traverse the list to get that new element and go further (this will get me at least O(kn) with a big k for my list, even aproching O(n^2)...).
EDIT 2:
I know I have too many variables but I've been testing different stuff and I am still working on my code...
My current code:
#include <bits/stdc++.h>
using namespace std;
template <class T> class Node {
public:
T data;
Node<T> *next;
};
template <class T> class List {
public:
Node<T> *first;
};
template <class T> T getMedianValue(List<T> & l) {
Node<T> *crt,*pivot,*incpivot;
int left, right, lung, idx, lungrel,lungrel2, left2, right2, aux, offset;
pivot = l.first;
crt = pivot->next;
lung = 1;
//lung is the lenght of the linked list (yeah it's lenght in romanian...)
//lungrel and lungrel2 are the relative lenghts of the part of
//the list I am processing, e.g: 2 3 4 in a list with 1 2 3 4 5
right = left = 0;
while (crt != NULL) {
if(crt->data < pivot->data){
aux = pivot->data;
pivot->data = crt->data;
crt->data = pivot->next->data;
pivot->next->data = aux;
pivot = pivot->next;
left++;
}
else right++;
// cout<<crt->data<<endl;
crt = crt->next;
lung++;
}
if(right > left) offset = left;
// cout<<endl;
// cout<<pivot->data<<" "<<left<<" "<<right<<endl;
// printList(l);
// cout<<endl;
lungrel = lung;
incpivot = l.first;
// offset = 0;
while(left != right){
//cout<<"parcurgere"<<endl;
if(left > right){
//cout<<endl;
//printList(l);
//cout<<endl;
//cout<<"testleft "<<incpivot->data<<" "<<left<<" "<<right<<endl;
crt = incpivot->next;
pivot = incpivot;
idx = offset;left2 = right2 = lungrel = 0;
//cout<<idx<<endl;
while(idx < left && crt!=NULL){
if(pivot->data > crt->data){
// cout<<"1crt "<<crt->data<<endl;
aux = pivot->data;
pivot->data = crt->data;
crt->data = pivot->next->data;
pivot->next->data = aux;
pivot = pivot->next;
left2++;lungrel++;
}
else {
right2++;lungrel++;
//cout<<crt->data<<" "<<right2<<endl;
}
//cout<<crt->data<<endl;
crt = crt->next;
idx++;
}
left = left2 + offset;
right = lung - left - 1;
if(right > left) offset = left;
//if(pivot->data == 18) return 18;
//cout<<endl;
//cout<<"l "<<pivot->data<<" "<<left<<" "<<right<<" "<<right2<<endl;
// printList(l);
}
else if(left < right && pivot->next!=NULL){
idx = left;left2 = right2 = 0;
incpivot = pivot->next;offset++;left++;
//cout<<endl;
//printList(l);
//cout<<endl;
//cout<<"testright "<<incpivot->data<<" "<<left<<" "<<right<<endl;
pivot = pivot->next;
crt = pivot->next;
lungrel2 = lungrel;
lungrel = 0;
// cout<<"p right"<<pivot->data<<" "<<left<<" "<<right<<endl;
while((idx < lungrel2 + offset - 1) && crt!=NULL){
if(crt->data < pivot->data){
// cout<<"crt "<<crt->data<<endl;
aux = pivot->data;
pivot->data = crt->data;
crt->data = (pivot->next)->data;
(pivot->next)->data = aux;
pivot = pivot->next;
// cout<<"crt2 "<<crt->data<<endl;
left2++;lungrel++;
}
else right2++;lungrel++;
//cout<<crt->data<<endl;
crt = crt->next;
idx++;
}
left = left2 + left;
right = lung - left - 1;
if(right > left) offset = left;
// cout<<"r "<<pivot->data<<" "<<left<<" "<<right<<endl;
// printList(l);
}
else{
//cout<<cmx<<endl;
return pivot->data;
}
}
//cout<<cmx<<endl;
return pivot->data;
}
template <class T> void printList(List<T> const & l) {
Node<T> *tmp;
if(l.first != NULL){
tmp = l.first;
while(tmp != NULL){
cout<<tmp->data<<" ";
tmp = tmp->next;
}
}
}
template <class T> void push_front(List<T> & l, int x)
{
Node<T>* tmp = new Node<T>;
tmp->data = x;
tmp->next = l.first;
l.first = tmp;
}
int main(){
List<int> l;
int n = 0;
push_front(l, 19);
push_front(l, 12);
push_front(l, 11);
push_front(l, 101);
push_front(l, 91);
push_front(l, 21);
push_front(l, 9);
push_front(l, 6);
push_front(l, 25);
push_front(l, 4);
push_front(l, 18);
push_front(l, 2);
push_front(l, 8);
push_front(l, 10);
push_front(l, 200);
push_front(l, 225);
push_front(l, 170);
printList(l);
n=getMedianValue(l);
cout<<endl;
cout<<n;
return 0;
}
Do you have any sugestion on how to adapt quickselect to a singly listed link or other algoritm that would work for my problem ?
In your question, you mentioned that you are having trouble selecting a pivot that is not at the start of the list, because this would require traversing the list. If you do it correctly, you only have to traverse the entire list twice:
once for finding the middle and the end of the list in order to select a good pivot (e.g. using the "median-of-three" rule)
once for the actual sorting
The first step is not necessary if you don't care much about selecting a good pivot and you are happy with simply selecting the first element of the list as the pivot (which causes worst case O(n^2) time complexity if the data is already sorted).
If you remember the end of the list the first time you traverse it by maintaining a pointer to the end, then you should never have to traverse it again to find the end. Also, if you are using the standard Lomuto partition scheme (which I am not using for the reasons stated below), then you must also maintain two pointers into the list which represent the i and j index of the standard Lomuto partition scheme. By using these pointers, should never have to traverse the list for accessing a single element.
Also, if you maintain a pointer to the middle and the end of every partition, then, when you later must sort one of these partitions, you will not have to traverse that partition again to find the middle and end.
I have now created my own implementation of the QuickSelect algorithm for linked lists, which I have posted below.
Since you stated that the linked list is singly-linked and cannot be upgraded to a doubly-linked list, I can't use the Hoare partition scheme, as iterating a singly-linked list backwards is very expensive. Therefore, I am using the generally less efficient Lomuto partition scheme instead.
When using the Lomuto partition scheme, either the first element or the last element is typically selected as a pivot. However, selecting either of those has the disadvantage that sorted data will cause the algorithm to have the worst-case time complexity of O(n^2). This can be prevented by selecting a pivot according to the "median-of-three" rule, which is to select a pivot from the median value of the first element, middle element and last element. Therefore, in my implementation, I am using this "median-of-three" rule.
Also, the Lomuto partition scheme typically creates two partitions, one for values smaller than the pivot and one for values larger than or equal to the pivot. However, this will cause the worst-case time complexity of O(n^2) if all values are identical. Therefore, in my implementation, I am creating three partitions, one for values smaller than the pivot, one for values larger than the pivot, and one for values equal to the pivot.
Although these measures don't completely eliminate the possibility of worst-case time complexity of O(n^2), they at least make it highly unlikely (unless the input is provided by a malicious attacker). In order to guarantee a time complexity of O(n), a more complex pivot selection algorithm would have to be used, such as median of medians.
One significant problem I encountered is that for an even number of elements, the median is defined as the arithmetic mean of the two "middle" or "median" elements. For this reason, I can't simply write a function similar to std::nth_element, because if, for example, the total number of elements is 14, then I will be looking for the 7th and 8th largest element. This means I would have to call such a function twice, which would be inefficient. Therefore, I have instead written a function which can search for the two "median" elements at once. Although this makes the code more complex, the performance penalty due to the additional code complexity should be minimal compared to the advantage of not having to call the same function twice.
Please note that although my implementation compiles perfectly on a C++ compiler, I wouldn't call it textbook C++ code, because the question states that I am not allowed to use anything from the C++ standard template library. Therefore, my code is rather a hybrid of C code and C++ code.
In the following code, I only use the standard template library (in particular the function std::nth_element) for testing my algorithm and for verifying the results. I do not use any of these functions in my actual algorithm.
#include <iostream>
#include <iomanip>
#include <cassert>
// The following two headers are only required for testing the algorithm and verifying
// the correctness of its results. They are not used in the algorithm itself.
#include <random>
#include <algorithm>
// The following setting can be changed to print extra debugging information
// possible settings:
// 0: no extra debugging information
// 1: print the state and length of all partitions in every loop iteraton
// 2: additionally print the contents of all partitions (if they are not too big)
#define PRINT_DEBUG_LEVEL 0
template <typename T>
struct Node
{
T data;
Node<T> *next;
};
// NOTE:
// The return type is not necessarily the same as the data type. The reason for this is
// that, for example, the data type "int" requires a "double" as a return type, so that
// the arithmetic mean of "3" and "6" returns "4.5".
// This function may require template specializations to handle overflow or wrapping.
template<typename T, typename U>
U arithmetic_mean( const T &first, const T &second )
{
return ( static_cast<U>(first) + static_cast<U>(second) ) / 2;
}
//the main loop of the function find_median can be in one of the following three states
enum LoopState
{
//we are looking for one median value
LOOPSTATE_LOOKINGFORONE,
//we are looking for two median values, and the returned median
//will be the arithmetic mean of the two
LOOPSTATE_LOOKINGFORTWO,
//one of the median values has been found, but we are still searching for
//the second one
LOOPSTATE_FOUNDONE
};
template <
typename T, //type of the data
typename U //type of the return value
>
U find_median( Node<T> *list )
{
//This variable points to the pointer to the first element of the current partition.
//During the partition phase, the linked list will be broken and reassembled afterwards, so
//the pointer this pointer points to will be nullptr until it is reassembled.
Node<T> **pp_start = &list;
//This pointer represents nothing more than the cached value of *pp_start and it is
//not always valid
Node<T> *p_start = *pp_start;
//These pointers are maintained for accessing the middle of the list for selecting a pivot
//using the "median-of-three" rule.
Node<T> *p_middle;
Node<T> *p_end;
//result is not defined if list is empty
assert( p_start != nullptr );
//in the main loop, this variable always holds the number of elements in the current partition
int num_total = 1;
// First, we must traverse the entire linked list in order to determine the number of elements,
// in order to calculate k1 and k2. If it is odd, then the median is defined as the k'th smallest
// element where k = n / 2. If the number of elements is even, then the median is defined as the
// arithmetic mean of the k'th element and the (k+1)'th element.
// We also set a pointer to the nodes in the middle and at the end, which will be required later
// for selecting a pivot according to the "median-of-three" rule.
p_middle = p_start;
for ( p_end = p_start; p_end->next != nullptr; p_end = p_end->next )
{
num_total++;
if ( num_total % 2 == 0 ) p_middle = p_middle->next;
}
// find out whether we are looking for only one or two median values
enum LoopState loop_state = num_total % 2 == 0 ? LOOPSTATE_LOOKINGFORTWO : LOOPSTATE_LOOKINGFORONE;
//set k to the index of the middle element, or if there are two middle elements, to the left one
int k = ( num_total - 1 ) / 2;
// If we are looking for two median values, but we have only found one, then this variable will
// hold the value of the one we found. Whether we have found one can be determined by the state of
// the variable loop_state.
T val_found;
for (;;)
{
//make p_start cache the value of *pp_start again, because a previous iteration of the loop
//may have changed the value of pp_start
p_start = *pp_start;
assert( p_start != nullptr );
assert( p_middle != nullptr );
assert( p_end != nullptr );
assert( num_total != 0 );
if ( num_total == 1 )
{
switch ( loop_state )
{
case LOOPSTATE_LOOKINGFORONE:
return p_start->data;
case LOOPSTATE_FOUNDONE:
return arithmetic_mean<T,U>( val_found, p_start->data );
default:
assert( false ); //this should be unreachable
}
}
//select the pivot according to the "median-of-three" rule
T pivot;
if ( p_start->data < p_middle->data )
{
if ( p_middle->data < p_end->data )
pivot = p_middle->data;
else if ( p_start->data < p_end->data )
pivot = p_end->data;
else
pivot = p_start->data;
}
else
{
if ( p_start->data < p_end->data )
pivot = p_start->data;
else if ( p_middle->data < p_end->data )
pivot = p_end->data;
else
pivot = p_middle->data;
}
#if PRINT_DEBUG_LEVEL >= 1
//this line is conditionally compiled for extra debugging information
std::cout << "\nmedian of three: " << (*pp_start)->data << " " << p_middle->data << " " << p_end->data << " ->" << pivot << std::endl;
#endif
// We will be dividing the current partition into 3 new partitions (less-than,
// equal-to and greater-than) each represented as a linked list. Each list
// requires a pointer to the start of the list and a pointer to the pointer at
// the end of the list to write the address of new elements to. Also, when
// traversing the lists, we need to keep a pointer to the middle of the list,
// as this information will be required for selecting a new pivot in the next
// iteration of the loop. The latter is not required for the equal-to partition,
// as it would never be used.
Node<T> *p_less = nullptr, **pp_less_end = &p_less, **pp_less_middle = &p_less;
Node<T> *p_equal = nullptr, **pp_equal_end = &p_equal;
Node<T> *p_greater = nullptr, **pp_greater_end = &p_greater, **pp_greater_middle = &p_greater;
// These pointers are only used as a cache to the location of the end node.
// Despite their similar name, their function is quite different to pp_less_end
// and pp_greater_end.
Node<T> *p_less_end = nullptr;
Node<T> *p_greater_end = nullptr;
// counter for the number of elements in each partition
int num_less = 0;
int num_equal = 0;
int num_greater = 0;
// NOTE:
// The following loop will temporarily split the linked list. It will be merged later.
Node<T> *p_next_node = p_start;
//the following line isn't necessary; it is only used to clarify that the pointers no
//longer point to anything meaningful
*pp_start = p_start = nullptr;
for ( int i = 0; i < num_total; i++ )
{
assert( p_next_node != nullptr );
Node<T> *p_current_node = p_next_node;
p_next_node = p_next_node->next;
if ( p_current_node->data < pivot )
{
//link node to pp_less
assert( *pp_less_end == nullptr );
*pp_less_end = p_less_end = p_current_node;
pp_less_end = &p_current_node->next;
p_current_node->next = nullptr;
num_less++;
if ( num_less % 2 == 0 )
{
pp_less_middle = &(*pp_less_middle)->next;
}
}
else if ( p_current_node->data == pivot )
{
//link node to pp_equal
assert( *pp_equal_end == nullptr );
*pp_equal_end = p_current_node;
pp_equal_end = &p_current_node->next;
p_current_node->next = nullptr;
num_equal++;
}
else
{
//link node to pp_greater
assert( *pp_greater_end == nullptr );
*pp_greater_end = p_greater_end = p_current_node;
pp_greater_end = &p_current_node->next;
p_current_node->next = nullptr;
num_greater++;
if ( num_greater % 2 == 0 )
{
pp_greater_middle = &(*pp_greater_middle)->next;
}
}
}
assert( num_total == num_less + num_equal + num_greater );
assert( num_equal >= 1 );
#if PRINT_DEBUG_LEVEL >= 1
//this section is conditionally compiled for extra debugging information
{
std::cout << std::setfill( '0' );
switch ( loop_state )
{
case LOOPSTATE_LOOKINGFORONE:
std::cout << "LOOPSTATE_LOOKINGFORONE k = " << k << "\n";
break;
case LOOPSTATE_LOOKINGFORTWO:
std::cout << "LOOPSTATE_LOOKINGFORTWO k = " << k << "\n";
break;
case LOOPSTATE_FOUNDONE:
std::cout << "LOOPSTATE_FOUNDONE k = " << k << " val_found = " << val_found << "\n";
}
std::cout << "partition lengths: ";
std::cout <<
std::setw( 2 ) << num_less << " " <<
std::setw( 2 ) << num_equal << " " <<
std::setw( 2 ) << num_greater << " " <<
std::setw( 2 ) << num_total << "\n";
#if PRINT_DEBUG_LEVEL >= 2
Node<T> *p;
std::cout << "less: ";
if ( num_less > 10 )
std::cout << "too many to print";
else
for ( p = p_less; p != nullptr; p = p->next ) std::cout << p->data << " ";
std::cout << "\nequal: ";
if ( num_equal > 10 )
std::cout << "too many to print";
else
for ( p = p_equal; p != nullptr; p = p->next ) std::cout << p->data << " ";
std::cout << "\ngreater: ";
if ( num_greater > 10 )
std::cout << "too many to print";
else
for ( p = p_greater; p != nullptr; p = p->next ) std::cout << p->data << " ";
std::cout << "\n\n" << std::flush;
#endif
std::cout << std::flush;
}
#endif
//insert less-than partition into list
assert( *pp_start == nullptr );
*pp_start = p_less;
//insert equal-to partition into list
assert( *pp_less_end == nullptr );
*pp_less_end = p_equal;
//insert greater-than partition into list
assert( *pp_equal_end == nullptr );
*pp_equal_end = p_greater;
//link list to previously cut off part
assert( *pp_greater_end == nullptr );
*pp_greater_end = p_next_node;
//if less-than partition is large enough to hold both possible median values
if ( k + 2 <= num_less )
{
//set the next iteration of the loop to process the less-than partition
//pp_start is already set to the desired value
p_middle = *pp_less_middle;
p_end = p_less_end;
num_total = num_less;
}
//else if less-than partition holds one of both possible median values
else if ( k + 1 == num_less )
{
if ( loop_state == LOOPSTATE_LOOKINGFORTWO )
{
//the equal_to partition never needs sorting, because all members are already equal
val_found = p_equal->data;
loop_state = LOOPSTATE_FOUNDONE;
}
//set the next iteration of the loop to process the less-than partition
//pp_start is already set to the desired value
p_middle = *pp_less_middle;
p_end = p_less_end;
num_total = num_less;
}
//else if equal-to partition holds both possible median values
else if ( k + 2 <= num_less + num_equal )
{
//the equal_to partition never needs sorting, because all members are already equal
if ( loop_state == LOOPSTATE_FOUNDONE )
return arithmetic_mean<T,U>( val_found, p_equal->data );
return p_equal->data;
}
//else if equal-to partition holds one of both possible median values
else if ( k + 1 == num_less + num_equal )
{
switch ( loop_state )
{
case LOOPSTATE_LOOKINGFORONE:
return p_equal->data;
case LOOPSTATE_LOOKINGFORTWO:
val_found = p_equal->data;
loop_state = LOOPSTATE_FOUNDONE;
k = 0;
//set the next iteration of the loop to process the greater-than partition
pp_start = pp_equal_end;
p_middle = *pp_greater_middle;
p_end = p_greater_end;
num_total = num_greater;
break;
case LOOPSTATE_FOUNDONE:
return arithmetic_mean<T,U>( val_found, p_equal->data );
}
}
//else both possible median values must be in the greater-than partition
else
{
k = k - num_less - num_equal;
//set the next iteration of the loop to process the greater-than partition
pp_start = pp_equal_end;
p_middle = *pp_greater_middle;
p_end = p_greater_end;
num_total = num_greater;
}
}
}
// NOTE:
// The following code is not part of the algorithm, but is only intended to test the algorithm
// This simple class is designed to contain a singly-linked list
template <typename T>
class List
{
public:
List() : first( nullptr ) {}
// the following is required to abide by the rule of three/five/zero
// see: https://en.cppreference.com/w/cpp/language/rule_of_three
List( const List<T> & ) = delete;
List( const List<T> && ) = delete;
List<T>& operator=( List<T> & ) = delete;
List<T>& operator=( List<T> && ) = delete;
~List()
{
Node<T> *p = first;
while ( p != nullptr )
{
Node<T> *temp = p;
p = p->next;
delete temp;
}
}
void push_front( int data )
{
Node<T> *temp = new Node<T>;
temp->data = data;
temp->next = first;
first = temp;
}
//member variables
Node<T> *first;
};
int main()
{
//generated random numbers will be between 0 and 2 billion (fits in 32-bit signed int)
constexpr int min_val = 0;
constexpr int max_val = 2*1000*1000*1000;
//will allocate array for 1 million ints and fill with random numbers
constexpr int num_values = 1*1000*1000;
//this class contains the singly-linked list and is empty for now
List<int> l;
double result;
//These variables are used for random number generation
std::random_device rd;
std::mt19937 gen( rd() );
std::uniform_int_distribution<> dis( min_val, max_val );
try
{
//fill array with random data
std::cout << "Filling array with random data..." << std::flush;
auto unsorted_data = std::make_unique<int[]>( num_values );
for ( int i = 0; i < num_values; i++ ) unsorted_data[i] = dis( gen );
//fill the singly-linked list
std::cout << "done\nFilling linked list..." << std::flush;
for ( int i = 0; i < num_values; i++ ) l.push_front( unsorted_data[i] );
std::cout << "done\nCalculating median using STL function..." << std::flush;
//calculate the median using the functions provided by the C++ standard template library.
//Note: this is only done to compare the results with the algorithm provided in this file
if ( num_values % 2 == 0 )
{
int median1, median2;
std::nth_element( &unsorted_data[0], &unsorted_data[(num_values - 1) / 2], &unsorted_data[num_values] );
median1 = unsorted_data[(num_values - 1) / 2];
std::nth_element( &unsorted_data[0], &unsorted_data[(num_values - 0) / 2], &unsorted_data[num_values] );
median2 = unsorted_data[(num_values - 0) / 2];
result = arithmetic_mean<int,double>( median1, median2 );
}
else
{
int median;
std::nth_element( &unsorted_data[0], &unsorted_data[(num_values - 0) / 2], &unsorted_data[num_values] );
median = unsorted_data[(num_values - 0) / 2];
result = static_cast<int>(median);
}
std::cout << "done\nMedian according to STL function: " << std::setprecision( 12 ) << result << std::endl;
// NOTE: Since the STL functions only sorted the array, but not the linked list, the
// order of the linked list is still random and not pre-sorted.
//calculate the median using the algorithm provided in this file
std::cout << "Starting algorithm" << std::endl;
result = find_median<int,double>( l.first );
std::cout << "The calculated median is: " << std::setprecision( 12 ) << result << std::endl;
std::cout << "Cleaning up\n\n" << std::flush;
}
catch ( std::bad_alloc )
{
std::cerr << "Error: Unable to allocate sufficient memory!" << std::endl;
return -1;
}
return 0;
}
I have successfully tested my code with one million randomly generated elements and it found the correct median virtually instantaneously.
So what you can do is use iterators to hold the position. I have written the algorithm above to work with the std::forward_list. I know this isn't perfect, but wrote this up quickly and hope it helps.
int partition(int leftPos, int rightPos, std::forward_list<int>::iterator& currIter,
std::forward_list<int>::iterator lowIter, std::forward_list<int>::iterator highIter) {
auto iter = lowIter;
int i = leftPos - 1;
for(int j = leftPos; j < rightPos - 1; j++) {
if(*iter <= *highIter) {
++currIter;
++i;
std::iter_swap(currIter, iter);
}
iter++;
}
std::forward_list<int>::iterator newIter = currIter;
std::iter_swap(++newIter, highIter);
return i + 1;
}
std::forward_list<int>::iterator kthSmallest(std::forward_list<int>& list,
std::forward_list<int>::iterator left, std::forward_list<int>::iterator right, int size, int k) {
int leftPos {0};
int rightPos {size};
int pivotPos {0};
std::forward_list<int>::iterator resetIter = left;
std::forward_list<int>::iterator currIter = left;
++left;
while(leftPos <= rightPos) {
pivotPos = partition(leftPos, rightPos, currIter, left, right);
if(pivotPos == (k-1)) {
return currIter;
} else if(pivotPos > (k-1)) {
right = currIter;
rightPos = pivotPos - 1;
} else {
left = currIter;
++left;
resetIter = left;
++left;
leftPos = pivotPos + 1;
}
currIter = resetIter;
}
return list.end();
}
When makeing a call to kth iter, the left iterator should be one less than where you intend to start that. This allows us to be one position behind low in partition(). Here is an example of executing it:
int main() {
std::forward_list<int> list {10, 12, 12, 13, 4, 5, 8, 11, 6, 26, 15, 21};
auto startIter = list.before_begin();
int k = 6;
int size = getSize(list);
auto kthIter = kthSmallest(list, startIter, getEnd(list), size - 1, k);
std::cout << k << "th smallest: " << *kthIter << std::endl;
return 0;
}
6th smallest: 10
Related
I have an assignment that requires me to create a binary search function that will search an array of structs that contain dates for a specified month and then print all of those entries with matching months.
I am having a very difficult time getting the binary search to work properly when I am searching for multiple values, and can't seem to figure out where I'm going wrong.
Here is my binary search function:
void binsearch(Event* ev_ptr[], int size, int month)
{
int low = 0, high = size - 1, first_index = -1, last_index = -1;
while (low <= high) //loop to find first occurence
{
int mid = (low + high) / 2;
if (ev_ptr[mid]->date.month < month)
{
low = mid + 1;
}
else if (ev_ptr[mid]->date.month > month)
{
first_index = mid;
high = mid - 1;
}
else if (ev_ptr[mid]->date.month == month)
{
low = mid + 1;
}
}
low = 0; high = size - 1; //Reset so we can find the last occurence
while (low <= high) //loop to find last occurence
{
int mid = (low + high) / 2;
if (ev_ptr[mid]->date.month < month)
{
last_index = mid;
low = mid + 1;
}
else if (ev_ptr[mid]->date.month > month)
{
high = mid - 1;
}
else if (ev_ptr[mid]->date.month == month)
{
high = mid + 1;
}
}
for (int i = first_index; i <= last_index; i++)
{
cout << "\nEntry found: "
<< endl << ev_ptr[i]->desc
<< endl << "Date: " << ev_ptr[i]->date.month << '/' << ev_ptr[i]->date.day << '/' << ev_ptr[i]->date.year
<< endl << "Time: " << setw(2) << setfill('0') << ev_ptr[i]->time.hour << ':' << setw(2) << setfill('0') << ev_ptr[i]->time.minute << endl;
}
}
and here is my main function:
const int MAX = 50;
int main()
{
Event* event_pointers[MAX];
int count, userMonth;
char userString[80];
count = readEvents(event_pointers, MAX);
sort_desc(event_pointers, count);
display(event_pointers, count);
cout << "\n\nEnter a search string: ";
cin.getline(userString, 80, '\n');
cin.ignore();
linsearch(event_pointers, count, userString);
sort_date(event_pointers, count);
display(event_pointers, count);
cout << "\n\nEnter a month to list Events for: ";
cin >> userMonth;
cin.ignore();
binsearch(event_pointers, count, userMonth);
for (int j = 0; j < count; j++) //Cleanup loop
delete event_pointers[j];
cout << "\nPress any key to continue...";
(void)_getch();
return 0;
}
I've gotten everything else to work as I need to for this assignment, but it's just this binary search that seems to be causing problems. I have tried using some things I found online in the most recent iteration (What I posted above), but to no avail. Any help would be greatly appreciated!
Don't set theses indices with binsearch. Search for an occurence than loop downwards and upwards until the conditions fails. Something like
else if (ev_ptr[mid]->date.month == month)
{
// mid = some occurence found
// increment and decrement mid until condition fails
}```
To design correct binary search function, don't try to guess the solution, it's hard to get it right. Use the method of loop invariants. The function that finds the first occurrence is called lower_bound in the standard library, so let's use this name here, too:
template<class It, typename T>
It lower_bound(It first, std::size_t size, const T& value);
Let's introduce the last variable: auto last = first + size. We will be looking for a transition point pt, such that in the range [first, pt), all elements have values < value, and in the range [pt, last), all elements have values >= value. Let's introduce two iterators (pointers) left and right with the loop invariants:
in the range [first, left) all elements have values < value,
in the range [right, last) all elements have values >= value.
These ranges represent elements examined so far. Initially, left = first, and right = last, so both ranges are empty. At each iteration one of them will be expanded. Finally, left = right, so the whole range [first, last) has been examined. From the definitions above, it follows that pt = right.
The following algorithm implements this idea:
template<class It, typename T>
It lower_bound(const It first, const std::size_t size, const T& value) {
const auto last = first + size;
auto left = first;
auto right = last;
while (left < right) {
const auto mid = left + (right - left) / 2;
if (*mid < value) // examined [first, left)
left = mid + 1;
else // examined [right, last)
right = mid;
}
return right;
}
Here we can reuse variables first and last to represent left and right. I didn't do it for clarify.
Now let's analyze your implementation. I can infer the following loop invariants:
[first, low) - all elements have values < value,
(high, last) - all elements have values >= value.
These are the same invariants, with right being replaced with high + 1. The while loop itself is correct, but the condition, which can be rewritten as
if (*mid <= value)
low = mid + 1;
else {
first_index = mid;
high = mid - 1;
}
is broken. With this condition, the range [first, low) will contain all elements with values <= value. This corresponds to the upper_bound. The comparison should be <, not <=.
You can analyse the second loop in the same way. In that loop at least one assignment of mid is incorrect.
int mid = (low + high) / 2;
...
high = mid + 1;
...
This is potentially an infinite loop. If high = low + 1, then mid = low, and you set high to mid + 1 = high. You modify neither low, nor high, and the loop becomes infinite.
The first approach, with two half-open ranges is beneficial IMO. It is symmetrical and is easier to reason about. If no value has been found, last = first + size is returned, which is a natural choice to represent the end of the range. You should check for first_index and last_index after the loops. What if they have not been reassigned and still hold -1?
1 Define you struct as this example,
struct element {
YourDate date;
...
operator int() const { return date.month;}
};
2 Sort elements as,
std::sort(elements.begin(), elements.end(), std::less<int>());
3 use
std::equal_range(elements.begin(), elements.end(), your_target_month);
4 print what you get from std::equal_range
So I have written this quick sort function, and it works for SOME input.
For example it works for the following inputs : "5 4 3 2 1", "3 4 5 6 7", etc.
However when I input something like : "0 3 5 4 -5 100 7777 2014" it will always mix up the multi digit numbers.
I was hoping someone could help point me to where my code is failing at this test case.
Sort.cpp
std::vector<int> QuickSort::sortFunc(std::vector<int> vec, int left, int right) {
int i = left, j = right;
int tmp;
int pivot = vec.at( (left + right) / 2 );
/* partition */
while (i <= j) {
while (vec.at(i) < pivot)
i++;
while (vec.at(j) > pivot)
j--;
if (i <= j) {
tmp = vec.at(i);
vec.at(i) = vec.at(j);
vec.at(j) = tmp;
i++;
j--;
}
}
/* recursion */
if (left < j)
return sortFunc( vec, left, j );
if (i < right)
return sortFunc( vec, i, right );
else
{
return vec;
}
}
main.cpp
int main()
{
// The user inputs a string of numbers (e.g. "6 4 -2 88 ..etc") and those integers are then put into a vector named 'vec'.
std::vector<int> vec;
// Converts string from input into integer values, and then pushes said values into vector.
std::string line;
if ( getline(std::cin, line) )
{
std::istringstream str(line);
int value;
str >> value;
vec.push_back( value );
while ( str >> value )
{
vec.push_back( value );
}
}
// Creating QuickSort object.
QuickSort qSort;
QuickSort *ptrQSort = &qSort;
// Creating new vector that has been 'Quick Sorted'.
int vecSize = vec.size();
std::vector<int> qSortedVec;
qSortedVec = ptrQSort->sortFunc( vec, 0, vecSize-1 );
// Middle, start, and end positions on the vector.
int mid = ( 0 + (vec.size()-1) ) / 2;
int start = 0, end = vec.size() - 1;
// Creating RecursiveBinarySearch object.
RecursiveBinarySearch bSearch;
RecursiveBinarySearch *ptrBSearch = &bSearch;
//bool bS = ptrBSearch->binarySearch( qSortedVec, mid, start, end );
bool bS = ptrBSearch->binarySearch( bSortedVec, mid, start, end );
/*--------------------------------------OUTPUT-----------------------------------------------------------------------*/
// Print out inputted integers and the binary search result.
// Depending on the binary search, print either 'true' or 'false'.
if ( bS == 1 )
{
std::cout << "true ";
}
if ( bS == 0 )
{
std::cout << "false ";
}
// Prints the result of the 'quick sorted' array.
int sortedSize = qSortedVec.size();
for ( int i = 0; i < sortedSize; i++ )
{
std::cout << qSortedVec[i] << " ";
}
std::cout << "\n";
return 0;
}
Thanks for any and all help you can give me guys.
I'm not sure if this solves it completely, but after sorting the left part, you still need to sort the right part, but you already return instead.
Also, passing the vector by value and returning it is overhead and not needed, because in the end there should only be one version of the vector, so passing by reference is preferred. Passing by value and returning is sometimes needed when doing recursion, especially when backtracking (looking for different paths), but not in this case where left and right provide the needed state.
EDIT Took a different approach and found the solution, updated the function to correctly find the mode or modes
I've been at this algorithm all day and night, I've looked at about 12 code examples 10x over but none of them seem to go above and beyond to address my problem.
Problem: Find the mode(s) in an array, if the array has more than one mode, display them all. (This is a homework assignment so I must use arrays/pointers)
Sample array:
-1, -1, 5, 6, 1, 1
Sample output:
This array has the following mode(s): -1, 1
The problem I'm having is trying to figure how to store and display just the highest mode OR the multiple modes if they exist.
I have used a lot of approaches and so I will post my most recent approach:
void getMode(int *arr, int size)
{
int *count = new int[size]; // to hold the number of times a value appears in the array
// fill the count array with zeros
for (int i = 0; i < size; i++)
count[i] = 0;
// find the possible modes
for (int x = 0; x < size; x++)
{
for (int y = 0; y < size; y++)
{
// don't count the values that will always occur at the same element
if (x == y)
continue;
if (arr[x] == arr[y])
count[x]++;
}
}
// find the the greatest count occurrences
int maxCount = getMaximum(count, size);
// store only unique values in the mode array
int *mode = new int[size]; // to store the mode(s) in the list
int modeCount = 0; // to count the number of modes
if (maxCount > 0)
{
for (int i = 0; i < size; i++)
{
if (count[i] == maxCount)
{
// call to function searchList
if (!searchList(mode, modeCount, arr[i]))
{
mode[modeCount] = arr[i];
modeCount++;
}
}
}
}
// display the modes
if (modeCount == 0)
cout << "The list has no mode\n";
else if (modeCount == 1)
{
cout << "The list has the following mode: " << mode[0] << endl;
}
else if (modeCount > 1)
{
cout << "The list has the following modes: ";
for (int i = 0; i < modeCount - 1; i++)
{
cout << mode[i] << ", ";
}
cout << mode[modeCount - 1] << endl;
}
// delete the dynamically allocated arrays
delete[]count;
delete[]mode;
count = NULL;
mode = NULL;
}
/*
definition of function searchList.
searchList accepts a pointer to an int array, its size, and a value to be searched for as its arguments.
if searchList finds the value to be searched for, searchList returns true.
*/
bool searchList(int *arr, int size, int value)
{
for (int x = 0; x < size; x++)
{
if (arr[x] == value)
{
return true;
}
}
return false;
}
It's best to build algorithms from smaller building blocks. The standard <algorithm> library is a great source of such components. Even if you're not using that, the program should be similarly structured with subroutines.
For homework at least, the reasoning behind the program should be fairly "obvious," especially given some comments.
Here's a version using <algorithm>, and std::unique_ptr instead of new, which you should never use. If it helps to satisfy the homework requirements, you might implement your own versions of the standard library facilities.
// Input: array "in" of size "count"
// Output: array "in" contains "count" modes of the input sequence
void filter_modes( int * in, int & count ) {
auto end = in + count;
std::sort( in, end ); // Sorting groups duplicate values.
// Use an ordered pair data type, <frequency, value>
typedef std::pair< int, int > value_frequency;
// Reserve memory for the analysis.
auto * frequencies = std::make_unique< value_frequency[] >( count );
int frequency_count = 0;
// Loop once per group of equal values in the input.
for ( auto group = in; group != end; ++ group ) {
auto group_start = group;
// Skip to the last equal value in this subsequence.
group = std::adjacent_find( group, end, std::not_equal_to<>{} );
frequencies[ frequency_count ++ ] = { // Record this group in the list.
group - group_start + 1, // One unique value plus # skipped values.
* group // The value.
};
}
// Sort <frequency, value> pairs in decreasing order (by frequency).
std::sort( frequencies.get(), frequencies.get() + frequency_count,
std::greater<>{} );
// Copy modes back to input array and set count appropriately.
for ( count = 0; frequencies[ count ].first == frequencies[ 0 ].first; ++ count ) {
in[ count ] = frequencies[ count ].second;
}
}
There's no real answer because of the way the mode is defined. Occasionally you see in British high school leaving exams the demand to identify the mode from a small distribution which is clearly amodal, but has one bin with excess count.
You need to bin the data, choosing bins so that the data has definite peaks and troughs. The modes are then the tips of the peaks. However little subsidiary peaks on the way up to the top are not modes, they're a sign that your binning has been too narrow. It's easy enough to eyeball the modes, a bit more difficult to work it out in a computer algorithm which has to be formal. One test is to move the bins by half a bin. If a mode disappears, it's noise rather than a real mode.
I'm trying to print a b tree in level order,but it keeps on crashing. Im not sure whats the real reason but i think its crashing because of the pointers. Im trying to use a function i found online that goes through each level and puts it in a queue and prints it, but ive run into this problem.If anyone has another way of doing it please let me know.
// C++ program for B-Tree insertion
#include<iostream>
#include <queue>
using namespace std;
int ComparisonCount = 0;
// A BTree node
class BTreeNode
{
int *keys; // An array of keys
int t; // Minimum degree (defines the range for number of keys)
BTreeNode **C; // An array of child pointers
int n; // Current number of keys
bool leaf; // Is true when node is leaf. Otherwise false
public:
BTreeNode(int _t, bool _leaf); // Constructor
// A utility function to insert a new key in the subtree rooted with
// this node. The assumption is, the node must be non-full when this
// function is called
void insertNonFull(int k);
// A utility function to split the child y of this node. i is index of y in
// child array C[]. The Child y must be full when this function is called
void splitChild(int i, BTreeNode *y);
// A function to traverse all nodes in a subtree rooted with this node
void traverse();
// A function to search a key in subtree rooted with this node.
BTreeNode *search(int k); // returns NULL if k is not present.
// Make BTree friend of this so that we can access private members of this
// class in BTree functions
friend class BTree;
};
// A BTree
class BTree
{
BTreeNode *root; // Pointer to root node
int t; // Minimum degree
public:
// Constructor (Initializes tree as empty)
BTree(int _t)
{
root = NULL; t = _t;
}
// function to traverse the tree
void traverse()
{
if (root != NULL) root->traverse();
}
// function to search a key in this tree
BTreeNode* search(int k)
{
return (root == NULL) ? NULL : root->search(k);
}
// The main function that inserts a new key in this B-Tree
void insert(int k);
};
// Constructor for BTreeNode class
BTreeNode::BTreeNode(int t1, bool leaf1)
{
// Copy the given minimum degree and leaf property
t = t1;
leaf = leaf1;
// Allocate memory for maximum number of possible keys
// and child pointers
keys = new int[2 * t - 1];
C = new BTreeNode *[2 * t];
// Initialize the number of keys as 0
n = 0;
}
// Function to traverse all nodes in a subtree rooted with this node
/*void BTreeNode::traverse()
{
// There are n keys and n+1 children, travers through n keys
// and first n children
int i;
for (i = 0; i < n; i++)
{
// If this is not leaf, then before printing key[i],
// traverse the subtree rooted with child C[i].
if (leaf == false)
{
ComparisonCount++;
C[i]->traverse();
}
cout << " " << keys[i];
}
// Print the subtree rooted with last child
if (leaf == false)
{
ComparisonCount++;
C[i]->traverse();
}
}*/
// Function to search key k in subtree rooted with this node
BTreeNode *BTreeNode::search(int k)
{
// Find the first key greater than or equal to k
int i = 0;
while (i < n && k > keys[i])
i++;
// If the found key is equal to k, return this node
if (keys[i] == k)
{
ComparisonCount++;
return this;
}
// If key is not found here and this is a leaf node
if (leaf == true)
{
ComparisonCount++;
return NULL;
}
// Go to the appropriate child
return C[i]->search(k);
}
// The main function that inserts a new key in this B-Tree
void BTree::insert(int k)
{
// If tree is empty
if (root == NULL)
{
ComparisonCount++;
// Allocate memory for root
root = new BTreeNode(t, true);
root->keys[0] = k; // Insert key
root->n = 1; // Update number of keys in root
}
else // If tree is not empty
{
// If root is full, then tree grows in height
if (root->n == 2 * t - 1)
{
ComparisonCount++;
// Allocate memory for new root
BTreeNode *s = new BTreeNode(t, false);
// Make old root as child of new root
s->C[0] = root;
// Split the old root and move 1 key to the new root
s->splitChild(0, root);
// New root has two children now. Decide which of the
// two children is going to have new key
int i = 0;
if (s->keys[0] < k)
{
ComparisonCount++;
i++;
}s->C[i]->insertNonFull(k);
// Change root
root = s;
}
else // If root is not full, call insertNonFull for root
root->insertNonFull(k);
}
}
// A utility function to insert a new key in this node
// The assumption is, the node must be non-full when this
// function is called
void BTreeNode::insertNonFull(int k)
{
// Initialize index as index of rightmost element
int i = n - 1;
// If this is a leaf node
if (leaf == true)
{
ComparisonCount++;
// The following loop does two things
// a) Finds the location of new key to be inserted
// b) Moves all greater keys to one place ahead
while (i >= 0 && keys[i] > k)
{
keys[i + 1] = keys[i];
i--;
}
// Insert the new key at found location
keys[i + 1] = k;
n = n + 1;
}
else // If this node is not leaf
{
// Find the child which is going to have the new key
while (i >= 0 && keys[i] > k)
i--;
// See if the found child is full
if (C[i + 1]->n == 2 * t - 1)
{
ComparisonCount++;
// If the child is full, then split it
splitChild(i + 1, C[i + 1]);
// After split, the middle key of C[i] goes up and
// C[i] is splitted into two. See which of the two
// is going to have the new key
if (keys[i + 1] < k)
i++;
}
C[i + 1]->insertNonFull(k);
}
}
// A utility function to split the child y of this node
// Note that y must be full when this function is called
void BTreeNode::splitChild(int i, BTreeNode *y)
{
// Create a new node which is going to store (t-1) keys
// of y
BTreeNode *z = new BTreeNode(y->t, y->leaf);
z->n = t - 1;
// Copy the last (t-1) keys of y to z
for (int j = 0; j < t - 1; j++)
z->keys[j] = y->keys[j + t];
// Copy the last t children of y to z
if (y->leaf == false)
{
ComparisonCount++;
for (int j = 0; j < t; j++)
z->C[j] = y->C[j + t];
}
// Reduce the number of keys in y
y->n = t - 1;
// Since this node is going to have a new child,
// create space of new child
for (int j = n; j >= i + 1; j--)
C[j + 1] = C[j];
// Link the new child to this node
C[i + 1] = z;
// A key of y will move to this node. Find location of
// new key and move all greater keys one space ahead
for (int j = n - 1; j >= i; j--)
keys[j + 1] = keys[j];
// Copy the middle key of y to this node
keys[i] = y->keys[t - 1];
// Increment count of keys in this node
n = n + 1;
}
void BTreeNode::traverse()
{
std::queue<BTreeNode*> queue;
queue.push(this);
while (!queue.empty())
{
BTreeNode* current = queue.front();
queue.pop();
int i;
for (i = 0; i < n; i++)
{
if (leaf == false)
queue.push(current->C[i]);
cout << " " << current->keys[i] << endl;
}
if (leaf == false)
queue.push(current->C[i]);
}
}
// Driver program to test above functions
int main()
{
BTree t(4); // A B-Tree with minium degree 4
srand(29324);
for (int i = 0; i<200; i++)
{
int p = rand() % 10000;
t.insert(p);
}
cout << "Traversal of the constucted tree is ";
t.traverse();
int k = 6;
(t.search(k) != NULL) ? cout << "\nPresent" : cout << "\nNot Present";
k = 28;
(t.search(k) != NULL) ? cout << "\nPresent" : cout << "\nNot Present";
cout << "There are " << ComparisonCount << " comparison." << endl;
system("pause");
return 0;
}
Your traversal code uses the field values for this as though they were the values for the current node in the loop body.
You need to stick current-> in front of the member references in the loop body like this (in the lines marked with "//*"):
while (!queue.empty())
{
BTreeNode* current = queue.front();
queue.pop();
int i;
for (i = 0; i < current->n; i++) //*
{
if (current->leaf == false) //*
queue.push(current->C[i]);
cout << " " << current->keys[i] << endl;
}
if (current->leaf == false) //*
queue.push(current->C[i]);
}
This is a strong indicator that all the stuff qualified with current-> in reality wants to live in a function where it is this and thus does not need to be named explicitly.
Your code is better organised and more pleasant to read than most debug requests we get here, but it is still fairly brittle and it contains quite a few smelly bits like if (current->leaf == false) instead of if (not current->is_leaf).
You may want to post it over on Code Review when you have got it into working shape; I'm certain that the experienced coders hanging out there can give you lots of valuable advice on how to improve your code.
In order to ease prototyping and development I would strongly advise the following:
use std::vector<> instead of naked arrays during the prototype phase
invalidate invalid entries during development/prototyping (set keys to -1 and pointers to 0)
use assert() for documenting - and checking - local invariants
write functions that verify the structural invariants exactly and call them before/after every function that modifies the structure
compile your code with /Wall /Wextra and clean it up so that it always compiles without warnings
Also, don't use int indiscriminately; the basic type for things that cannot become negative is unsigned (node degree, current key count etc.).
P.S.: it would be easier to build a conforming B-tree by pinning the order on the number of keys (i.e. number of keys can vary between K and 2*K for some K). Pinning the order on the number of pointers makes things more difficult, and one consequence is that the number of keys for 'order' 2 (where a node is allowed to have between 2 and 4 pointers) can vary between 1 and 3. For most folks dealing with B-trees that will be a rather unexpected sight!
I want to identify which ones and how many values are duplicate in a linked list that was user's input. And this is the code I wrote for it:
int count;
int compare, compare2;
for (p = first; p != NULL; p = p->next){
compare = p->num;
for (j = first; j != NULL; j = j->next){
if (compare == j->num){
compare2 = j->num;
count++;
}
}
if (count > 1){
cout << "There are at least 2 identical values of: " << compare2 << " that repeat for: " << count << "times" << endl;
}
}
Basically the idea of it was that I take the first element in the first loop and compare it to all the elements in the second loop and count if there are cases of them being similar, and print the result after - then I take next element and so on.
However the output is all the elements and it doesn't count correctly either. I'm just lost at how to adjust it.
I tried using the same p variable in both loops as it is the same list I want to loop, but then the .exe failed as soon as I'd finished input.
I saw a few examples around where there was function for deleting duplicate values, but the comparison part run through with while loop, and I'm just wondering - what am I doing wrong on this one?!
Your O(N*N) approach :
// Pick an element
for (p = first; p != NULL && p->next !=NULL ; p = p->next)
{ // Compare it with remaining elements
for (j = p->next ; j != NULL; j = j->next)
{
if ( p->num == j->num)
{
count++;
}
if( cout > 1 )
{
std::cout << p->num << " occurs "<< count << times << '\n' ;
}
}
Its better to use a HashMap to solve this is O(N) time with N extra space
std::unordered_map<int, int> m ;
for( p = first; p != NULL ; p = p->next )
{
m[ p->num ]++;
}
for (const auto &pair : m )
{
if( pair.second > 1 )
std::cout << pair.first << ": " << pair.second << '\n';
}
Your logic is flawed since both p and j iterate over the entire list. When p == j, the values are bound to match.
Change the block
if (compare == j->num){
compare2 = j->num;
count++;
}
to
if (p != j && compare == j->num){
compare2 = j->num;
count++;
}
Also, you don't need the line
compare2 = j->num;
since compare2 will be equal to compare.
You can reduce the number of tests by changing the inner for loop a bit. Then, you won't need the p != j bit either.
for (j = p->next; j != NULL; j = j->next){
if (compare == j->num){
count++;
}
}
The problem is that you don't exclude element you compare to (compare). So for every element it found at least one duplicate - itself!
Try to compare element in inner loop only followed by current (p).