I am solving a problem where, from given list of dates, we have to print the third latest date.
Input: [24-01-2001, 9-2-2068, 4-04-2019, 31-10-1943, 2-10-2013, 17-12-1990]
output:2-10-2013
I have written the following code for it
using namespace std;
struct Date
{
int Day;
int Year;
int Month;
};
// comparator function used during insertion in set
bool operator<(const Date& date1, const Date & date2)
{
if(date1.Year<date2.Year)
return true;
if(date1.Year == date2.Year and date1.Month<date2.Month)
return true;
if(date1.Year == date2.Year and date1.Month==date2.Month and date1.Day<date2.Day)
return true;
return false;
}
Date ThirdLatest(std::vector<Date> & dates) {
//using set data structure to eliminate duplicate dates
set<struct Date>UniqueDates;
//using operator function the dates are inserted into the
//set in a sorted manner
for(auto i:dates)
{
UniqueDates.insert(i);
}
//clear original dates vector
dates.clear();
//push dates from the set back into dates vector
for(auto i: UniqueDates)
{
dates.push_back(i);
}
int DatesSize=dates.size();
return dates[DatesSize-3];
}
I was just wondering about the complexity of this code as this uses just an ordered set and elements are inserted into it using overloader function operator< to sort the dates instead of using the sort() function. Insertion into ordered set is O(log n) so is the complexity of this code also log n or am I calculating it wrong?
Also, I had one more question regarding the overloader function. I studied about overloader function from here that when the symbol is mentioned the function of that can be overpassed. But in this code, how does that work because for insertion into the set the symbol < is not mentioned anywhere in the code. The code works so how is < being used here?
You have overloaded operator< as you are inserted elements into a sorted set. So, sorted set is implemented as red–black tree. Red-black tree is a kind of self-balancing binary search tree. Since, it is essentially a binary, insertion of each element would be of order O(log(n)). Insertion of n-elements would be order O(n*log(n)). The overloaded operator< is used for searching in binary tree. If the element is < then current element then search goes to left subtree and otherwise it goes to right subtree. The search is continued until element is found. Detailed explanation could be found here: https://www.geeksforgeeks.org/red-black-tree-set-1-introduction-2/.
Also instead of inserting element into set and then inserting them back in the vector, you could have sorted the existing vector itself using std::sort() method. That could be done using: std::sort(dates.begin(), dates.end()). You wouldn't require 3rd argument since you have already overloaded operator<. Refer to: https://www.cplusplus.com/reference/algorithm/sort/ for more details.
Also, it is fortunate that fields inside date all support operator< and operator==. However, in general it is not guaranteed so better to write any operator<() method by calling operator<() on its members like:
// comparator function used during insertion in set or by sort method
bool operator<(const Date& date1, const Date & date2)
{
if(date1.Year < date2.Year)
{
return true;
}
if(date2.Year < date1.Year)
{
return false;
}
// Equality case for year
if(date1.Month < date2.Month)
{
return true;
}
if(date2.Month < date1.Month)
{
return false;
}
// Equality case for year and month
if(date1.Day < date2.Day)
{
return true;
}
if(date2.Day < date1.Day)
{
return false;
}
return false;
}
Clearing the vector and pushing back all the elements also has a disadvantage that if the vector was reasonable large, on clearing it would become of smallest size possible, and get re-allocated and re-copied once it crosses the allocated memory. Even if you were do such operations, would recommend calling: vector.reserve() prior to inserting, or simply re-assigning all the values without clearing/pushing back.
And if your code requires processing of date frequently and requires entries to be ordered by date, I would recommend using: map/set instead of vector for storing dates.
Since the problem is to find 3rd largest date, rather than sorting all the elements by sorting entire vector or inserting all of them into a set, you would need to insert elements in to priority_queue of size 3. In general, to find Kth largest element, you need to maintain a priority_queue of size K. Priority queue is implemented as heap, which is completely balanced binary tree. It is not ordered like: AVL or Red-Black tree used in ordered sets and maps, but optimized for insertion, and even faster than ordered sets in this case.
Generally, the priority queue always have greatest elements first. So you would need to define a comparator so that it places smallest element at top which is required in your case.
template<class T>
class TestAscending
{
public:
bool operator() (const Date& l, const Date&r) const
{
return r < l;
}
};
// Somewhere in code you would define priority queue as
priority_queue<Date, vector<Date>, TestAscending<Date> > p;
So, you insert elements of vector into the priority_queue one by one. Until, the size of set is < k, you can add elements in the priority queue without any conditional check. When the size of the priority_queue (p.size()) becomes k, then you add element only when it is greater than top-most element of the priority queue (p.top()) (being the smallest). You do it by removing existing smallest by calling p.pop() and adding the new one with p.push().
At the end of the program topmost element p.top() will kth largest element.
Since, the priority is of size K, the complexity of program is reduced from O(nlog(n)) to O(nlog(k)) in the worst case. Since, priority queue is even faster than set in insertion, the complexity and execution time would be faster than using set of size k.
Related
I have a std::vector<std::vector<type T>> matrix, I insert elements of type T to this matrix and I do some instructions by line on these elements. I need also at each iteration to delete the element with a minimum cost.
I created an std::priority_queue<Costelement, std::vector<Costelement>, Costelement::Comparekeys> Costvec; where:
struct Costelement
{
int row;
int column;
std::vector<double> cost;
struct CompareCosts
{
bool operator()(const Costelement &e1, const Costelement &e2)
{
return (e1.cost > e2.cost);
}
};
};
where row and column are the position of the element in matrix having the corresponding cost. However, when I delete the element with minimum key from matrix, the positions of the elements in the corresponding row change. I used std::min_element at each iteration on matrix but this was very costly. How can we model efficiently this case?
A std::priority_queue by default is just a std::vector that is kept in a sorted state. It can still be expensive to insert and remove elements from the queue, and as you noticed, you would potentially need to update all of the Costelements in the queue when you insert or remove an element from matrix in order to relect the new positions. However, you can make that a bit more efficient by making the priority queue two-dimensional as well, something that looks like:
std::priority_queue<std::priority_queue<Costelement, ...>, ...> cost_matrix;
Basically, the inner priority queue sort the cost of the columns of a single row, the outer priority queue should then sort the cost of whole rows. Let's create ColumnCost and RowCost structs:
struct ColumnCost {
int column;
double cost;
friend bool operator<(const ColumnCost &a, const ColumnCost &b) {
return a.cost > b.cost;
}
};
struct RowCost {
int row;
std::priority_queue<ColumnCost> columns;
friend bool operator<(const RowCost &a, const RowCost &b) {
return a.columns.top() > b.columns.top();
}
};
std::priority_queue<RowCost> cost_matrix;
Now you can easily get the lowest cost element from costmatrix, which returns the RowCost which contains the lowest cost element, and then you get the ColumnCost with the lowest cost from that one:
const auto &lowest_row = cost_matrix.top();
const auto &lowest_column = lowest_row.columns.top();
int row = lowest_row.row;
int column = lowest_column.column;
When you now insert or delete an element from matrix, you insert or delete from cost_matrix in the same way. You still need to update row or column coordinates, but now it is much less work. The only thing to be aware of is that if you update add or remove an element to the priority queue of a RowCost, you need to delete and re-insert that whole row from cost_matrix to ensure the outer priority queue is kept correctly sorted.
Another possible optimization is to use a std::priority_queue to keep the rows sorted, but use std::min_element() to keep track of the minimum of each individual row. This greatly reduces the amount of memory necessary to store the cost_matrix, and you would only need to call std::min_element() to recalculate the minimum cost element of a row when you change that row.
You may want to replace a row vector with a rope (see the rope data structure in Wikipedia).
It's a binary tree based structure, which allows quite efficient removing elements and searching for an n-th element ('indexing'), so you needn't update positions in all elements when you remove one of them.
Suppose I have a QList of 100 MyItem objects inserted in a certain order. Every MyItem has an associated timestamp and some property p, which is not guaranteed to be unique.
struct MyItem {
enum MyProperty { ONE, TWO, THREE };
double timestamp; //unique
MyProperty p; //non-unique
bool operator<(const MyItem& other) const {
return p < other.p;
}
};
Supposing I added my 100 objects in chronological order, if I were to run qStableSort on that container (thereby sorting by p), do I have a guarantee that for a given value of p that they are still in chronological order?
https://en.wikipedia.org/wiki/Category:Stable_sorts
Stable sorting algorithms maintain the relative order of records with equal keys (i.e. values). That is, a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list.
Therefore the keyword stable in qStableSort is referring exactly to what you're asking for.
Note however, that qStableSort is obsoleted in Qt 5.5
Use std::stable_sort instead.
Sorts the items in range [begin, end) in ascending order using a stable sorting algorithm.
If neither of the two items is "less than" the other, the items are taken to be equal. The item that appeared before the other in the original container will still appear first after the sort. This property is often useful when sorting user-visible data.
As per the Qt documentation, you should prefer to use std::stable_sort
This is a data structures question, but also regarding implementation. A set is typically implemented using a BST, but my professor wants us to know how to implement some data structures when only given limited options. So he wants us to be able to understand how to create a set using only an array.
Using a standard (unsorted) array I understand the implementation/complexity...
void add(Student[] arr, Student findstu)
{
Student stu = new Student();
int i=0;
boolean found = false;
while(stu!=NULL)
{
stu = arr[i++];
if (stu==findstu)
{
found = true;
}
}
if (found==false)
{
arr[i+1] = findstu;
}
}
The add/remove/contains are ally pretty much the same code, all will have the first while loop, which will make them O(n).
But if we used a sorted array, why would contains be O(lgn) and add/remove O(n)?
Searching would be of O(logN) because due to the fact that the array is sorted you could apply binary search which is of O(logN) complexity.
Insertion and erasure would be O(N) complexity (i.e., linear time) because every time you would attempt to insert or erase an element in the sorted array you would have to shift the elements of your array one position which is O(N) linear time complexity.
In a "self-avoiding random walk" situation, I have a 2-dimensional vector with a configuration of step-coordinates. I want to be able to check if a certain site has been occupied, but the problem is that the axis can be zero, so checking if the fabs() of the coordinate is true (or that it has a value), won't work. Therefore, I've considered looping through the steps and checking if my coordinate equals another coordinate on all axis, and if it does, stepping back and trying again (a so-called depth-first approach).
Is there a more efficient way to do this? I've seen someone use a boolean array with all possible coordinates, like so:
bool occupied[nMax][nMax]; // true if lattice site is occupied
for (int y = -rMax; y <= rMax; y++)
for (int x = -rMax; x <= rMax; x++)
occupied[index(y)][index(x)] = false;
But, in my program the number of dimensions is unknown, so would an approach such as:
typedef std::vector<std::vector<long int>> WalkVec;
WalkVec walk(1, std::vector<long int>(dof,0));
siteVisited = false; counter = 0;
while (counter < (walkVec.back().size()-1))
{
tdof = 1;
while (tdof <= dimensions)
{
if (walkHist.back().at(tdof-1) == walkHist.at(counter).at(tdof-1) || walkHist.back().at(tdof-1) == 0)
{
siteVisited = true;
}
else
{
siteVisited = false;
break;
}
tdof++;
}
work where dof if the number of dimensions. (the check for zero checks if the position is the origin. Three zero coordinates, or three visited coordinates on the same step is the only way to make it true)
Is there a more efficient way of doing it?
You can do this check in O(log n) or O(1) time using STL's set or unordered_set respectively. The unordered_set container requires you to write a custom hash function for your coordinates, while the set container only needs you to provide a comparison function. The set implementation is particularly easy, and logarithmic time should be fast enough:
#include <iostream>
#include <set>
#include <vector>
#include <cassert>
class Position {
public:
Position(const std::vector<long int> &c)
: m_coords(c) { }
size_t dim() const { return m_coords.size(); }
bool operator <(const Position &b) const {
assert(b.dim() == dim());
for (size_t i = 0; i < dim(); ++i) {
if (m_coords[i] < b.m_coords[i])
return true;
if (m_coords[i] > b.m_coords[i])
return false;
}
return false;
}
private:
std::vector<long int> m_coords;
};
int main(int argc, const char *argv[])
{
std::set<Position> visited;
std::vector<long int> coords(3, 0);
visited.insert(Position(coords));
while (true) {
std::cout << "x, y, z: ";
std::cin >> coords[0] >> coords[1] >> coords[2];
Position candidate(coords);
if (visited.find(candidate) != visited.end())
std::cout << "Aready visited!" << std::endl;
else
visited.insert(candidate);
}
return 0;
}
Of course, as iavr mentions, any of these approaches will require O(n) storage.
Edit: The basic idea here is very simple. The goal is to store all the visited locations in a way that allows you to quickly check if a particular location has been visited. Your solution had to scan through all the visited locations to do this check, which makes it O(n), where n is the number of visited locations. To do this faster, you need a way to rule out most of the visited locations so you don't have to compare against them at all.
You can understand my set-based solution by thinking of a binary search on a sorted array. First you come up with a way to compare (sort) the D-dimensional locations. That's what the Position class' < operator is doing. As iavr pointed out in the comments, this is basically just a lexicographic comparison. Then, when all the visited locations are sorted in this order, you can run a binary search to check if the candidate point has been visited: you recursively check if the candidate would be found in the upper or lower half of the list, eliminating half of the remaining list from comparison at each step. This halving of the search domain at each step gives you logarithmic complexity, O(log n).
The STL set container is just a nice data structure that keeps your elements in sorted order as you insert and remove them, ensuring insertion, removal, and queries are all fast. In case you're curious, the STL implementation I use uses a red-black tree to implement this data structure, but from your perspective this is irrelevant; all that matters is that, once you give it a way to compare elements (the < operator), inserting elements into the collection (set::insert) and asking if an element is in the collection (set::find) are O(log n). I check against the origin by just adding it to the visited set--no reason to treat it specially.
The unordered_set is a hash table, an asymptotically more efficient data structure (O(1)), but a harder one to use because you must write a good hash function. Also, for your application, going from O(n) to O(log n) should be plenty good enough.
Your question concerns the algorithm rather the use of the (C++) language, so here is a generic answer.
What you need is a data structure to store a set (of point coordinates) with an efficient operation to query whether a new point is in the set or not.
Explicitly storing the set as a boolean array provides constant-time query (fastest), but at space that is exponential in the number of dimensions.
An exhaustive search (your second option) provides queries that are linear in the set size (walk length), at a space that is also linear in the set size and independent of dimensionality.
The other two common options are tree structures and hash tables, e.g. available as std::set (typically using a red-black tree) and std::unordered_set (the latter only in C++11). A tree structure typically has logarithmic-time query, while a hash table query can be constant-time in practice, almost bringing you back to the complexity of a boolean array. But in both cases the space needed is again linear in the set size and independent of dimensionality.
In Smalltalk, you can create a sortedCollection, which is to say that you can add an element and it would insert it into the correct location.
Is there anything like this in C++? Or even better is there anything like a sortedQueue, such that when you add an element, it would sort it into a queue like structure that you can just pop the first element off of?
I looked into set, this is what I need in terms of sorting, but it is an unordered collection. I am looking for a small run time as possible.
There are four sorted containers in the C++ standard library:
std::set - A sorted sequence of unique values.
std::map - A sorted sequence of unique key/value pairs.
std::multiset - A sorted sequence of values (possible repeats).
std::multimap - A sorted sequence of key/value pairs (possible repeats).
If you just want a sorted queue, then what you are looking for is std::priority_queue, which is a container adaptor rather than a stand-alone container.
#include <queue>
int main()
{
std::priority_queue<int> q;
q.push(2);
q.push(3);
q.push(1);
assert(q.top() == 3); q.pop();
assert(q.top() == 2); q.pop();
assert(q.top() == 1); q.pop();
return 0;
}
If you want to store your own types in a priority_queue then you need to define operator< for your class.
class Person
{
public:
Person(int age) : m_age(age) {}
bool operator<(const Person& other) const
{
return m_age < other.m_age;
}
private:
int m_age;
};
Creating a priority_queue of Persons would then give you a queue with the oldest people at the front.
The STL container choice flowchart (from this question):
You seem to be looking for the std::priority_queue, which is located in the <queue> header file. With push(), you can insert an element into the priority queue; with top(), you will get the currently largest element in the queue (or the smallest one, depending on how you implement operator<); and with pop(), you will remove the largest/smallest element.
As far as I know, it's implemented with a heap, which makes the time complexity of each push and pop operation O(lg n). Simply looking at the top element is done in O(1).
std::map for sorted container
std::queue for queue.
std::priority_queue for sorted queue
std::set is an ordered collection; iterating over it will give you the elements in order (either as defined by the < operator or a custom predicate). Finding and removing the first element are O(1).
Alternatively you could use std::priority_queue, which is basically a heap and allows efficient insert and least item removal.
In fact it's harder to find unordered (hashed) containers - they weren't part of the original standard, although they were widely available in non-standard form.
Of course you may find that simply holding your items in a sorted vector is faster, even if it is theoretically slower, if the number of items is not significantly large.