Multiple condition on a sort (c++) - c++

I'm trying to do this LeetCode problem using only the c++ sort function, but I can seem to find the correct way to sort elements depending on multiple condition.
The problem wants you to return the destination city which don't have any path outgoing to another city. In my sorting, I want this city to be the last element of my vector. Right now, this is the code that I have :
class Solution {
public:
string destCity(vector<vector<string>>& paths) {
sort(paths.begin(), paths.end(),[](vector<string> a, vector<string> b ){return (a[1] == b[0]);});
return (paths[paths.size() -1][1]);
}
};`
The input is a vector of paths ([cityA -> cityB], [cityM -> city N] , ... ).
In the following picture, we can see the input, output, my stdout and the expected result of the function :
You can withness that my sort only insures that two paths will be consecutive if the outgoing city of a path equal to the ingoing city of another path [cityA-> cityB], [cityB -> cityC], ... . But my issue is that my sort doesn't treat the case of a city (aka the destination city) that don't have any path outgoing to another city. I would like to place it this particular city as the last element of my vector. Can I do this by adding some code (conditions) to my sort function ? If yes, how ?
Thank you.

You cannot do it with the std::sort. The sort function requires a total order over the elements. You only have a partial one. Once you put all your pairs in, the sort might decide to compare two unrelated elements. It won't try to find a way to compare them in the "right" order.
You need to find a new algorithm. May I suggest:
std::map<string, string> destinations;
And simply following the destination until you get to the end?

Sorting is O(N Log N), even though you might find ways to sort, we'd ideally want to avoid sorting here, and reduce the time complexity to O(N) for this problem.
This'll pass through:
#include <string>
#include <unordered_set>
struct Solution {
static std::string destCity(
const std::vector<std::vector<std::string>>& paths
) {
std::unordered_set<std::string> starts_map;
for (const auto& path : paths) {
starts_map.insert(path[0]);
}
for (const auto& path : paths) {
if (!starts_map.count(path[1])) {
return path[1];
}
}
return "";
}
};
References
For additional details, you can see the Discussion Board. There are plenty of accepted solutions with a variety of languages and explanations, efficient algorithms, as well as asymptotic time/space complexity analysis1, 2 in there.

Related

Vector Sort Algorithm, sort only elements bigger then 0

I have to sort a vector of structs. Let's say the struct has two members:
Struct game
{
string name;
int rating;
};
So I have created a std::vector<game> games and simple sort them by rating.
std::sort(games.begin(),games.end(), [](game& info1, game& info2)
{
return info1.rating > info2.rating;
});
Everything is alright so far.
The problem is if all games have rating value 0, they mix. Simply put I have to sort only elements with rating bigger than zero. Let's give you an example:
All games are pushed in the vector by names in alphabetic order and rating 0, when a sort is triggered, the alphabet order gets violated.
Example before sort:
"A_Game", "B_Game", "C_Game", "E_Game", "G_Game", etc. (continue with all next letters)
after sort (all games are with rating 0):
"G_Game", "S_Game", "P_Game", "M_Game", "L_Game", "I_Game", etc.
I need to sort only these games that have rating bigger than 0.
Thanks in advance.
You can use std::stable_sort to prevent moving around elements that are not affected by the sorting criteria.
std::stable_sort(games.begin(),games.end(), [](game& info1, game& info2)
{
return info1.rating > info2.rating;
});
std::sort() is not a stable sorting algorithm, i.e., elements with equivalent keys may not preserve the original order between them after being sorted.
You can use std::stable_sort() instead of std::sort():
std::stable_sort(games.begin(),games.end(), [](game& info1, game& info2)
{
return info1.rating > info2.rating;
});
As its name already suggest, std::stable_sort() implements a stable sorting algorithm.
You can use std::stable_sort().
However, you can keep using std::sort() and make the comparator return true for games with the same rating (so the relative order is kept), by changing the condition to
return !(info1.rating < info2.rating)
You can use stable_sort instead of sort. This would be the best option for the question.
You can also modify the sort so that when two games have equal rating, sort alphabetically comparing the two names (or any other condition that might come up in the future). It might look like this.
std::sort(games.begin(),games.end(), [](game& info1, game& info2)
{
if (info1.rating == info2.rating)
return info1.name.compare(info2.name);
return info1.rating > info2.rating;
});
std::sort indeed doesn't guarantee any ordering for when elements compare equal. std::stable_sort guarantees that the original ordering gets kept if it compares equal. (See the other answers)
When in doubt about the original order, I like to explicitly sort with all of the criteria:
std::sort(games.begin(),games.end(), [](game const & info1, game const & info2)
{
if (info1.rating != info2.rating)
return info1.rating > info2.rating;
return info1.name < info2.name;
});
In the above, I prefer to use the following pattern
if member1 different
return compare member1
if member2 different
return compare member2
return compare member<last> OR compare pointers
This pattern is easily recognizable and easy extendable when you add extra members.
Ideally, when you want to use this sorting at other places, you make this a function with an unambiguous name. (Don't use operator< as this causes confusion, since the game titles could as well be used as a logical way of sorting)

Constraining remove_if on only part of a C++ list

I have a C++11 list of complex elements that are defined by a structure node_info. A node_info element, in particular, contains a field time and is inserted into the list in an ordered fashion according to its time field value. That is, the list contains various node_info elements that are time ordered. I want to remove from this list all the nodes that verify some specific condition specified by coincidence_detect, which I am currently implementing as a predicate for a remove_if operation.
Since my list can be very large (order of 100k -- 10M elements), and for the way I am building my list this coincidence_detect condition is only verified by few (thousands) elements closer to the "lower" end of the list -- that is the one that contains elements whose time value is less than some t_xv, I thought that to improve speed of my code I don't need to run remove_if through the whole list, but just restrict it to all those elements in the list whose time < t_xv.
remove_if() though does not seem however to allow the user to control up to which point I can iterate through the list.
My current code.
The list elements:
struct node_info {
char *type = "x";
int ID = -1;
double time = 0.0;
bool spk = true;
};
The predicate/condition for remove_if:
// Remove all events occurring at t_event
class coincident_events {
double t_event; // Event time
bool spk; // Spike condition
public:
coincident_events(double time,bool spk_) : t_event(time), spk(spk_){}
bool operator()(node_info node_event){
return ((node_event.time==t_event)&&(node_event.spk==spk)&&(strcmp(node_event.type,"x")!=0));
}
};
The actual removing from the list:
void remove_from_list(double t_event, bool spk_){
// Remove all events occurring at t_event
coincident_events coincidence(t_event,spk_);
event_heap.remove_if(coincidence);
}
Pseudo main:
int main(){
// My list
std::list<node_info> event_heap;
...
// Populate list with elements with random time values, yet ordered in ascending order
...
remove_from_list(0.5, true);
return 1;
}
It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for cycle as suggested for example in this post?
It seems that remove_if may not be ideal in this context. Should I consider instead instantiating an iterator and run an explicit for loop?
Yes and yes. Don't fight to use code that is preventing you from reaching your goals. Keep it simple. Loops are nothing to be ashamed of in C++.
First thing, comparing double exactly is not a good idea as you are subject to floating point errors.
You could always search the point up to where you want to do a search using lower_bound (I assume you list is properly sorted).
The you could use free function algorithm std::remove_if followed by std::erase to remove items between the iterator returned by remove_if and the one returned by lower_bound.
However, doing that you would do multiple passes in the data and you would move nodes so it would affect performance.
See also: https://en.cppreference.com/w/cpp/algorithm/remove
So in the end, it is probably preferable to do you own loop on the whole container and for each each check if it need to be removed. If not, then check if you should break out of the loop.
for (auto it = event_heap.begin(); it != event_heap.end(); )
{
if (coincidence(*it))
{
auto itErase = it;
++it;
event_heap.erase(itErase)
}
else if (it->time < t_xv)
{
++it;
}
else
{
break;
}
}
As you can see, code can easily become quite long for something that should be simple. Thus, if you need to do that kind of algorithm often, consider writing you own generic algorithm.
Also, in practice you might not need to do a complete search for the end using the first solution if you process you data in increasing time order.
Finally, you might consider using an std::set instead. It could lead to simpler and more optimized code.
Thanks. I used your comments and came up with this solution, which seemingly increases speed by a factor of 5-to-10.
void remove_from_list(double t_event,bool spk_){
coincident_events coincidence(t_event,spk_);
for(auto it=event_heap.begin();it!=event_heap.end();){
if(t_event>=it->time){
if(coincidence(*it)) {
it = event_heap.erase(it);
}
else
++it;
}
else
break;
}
}
The idea to make erase return it (as already ++it) was suggested by this other post. Note that in this implementation I am actually erasing all list elements up to t_event value (meaning, I pass whatever I want for t_xv).

Sort relative paths

My problem is pretty simple, but i can't think of an elegant solution.
In my application a bunch of relative paths are collected from user input (they don't really exist in the filesystem).
As an example:
somefile.ext
z/file.ext
A/B/file2.ext
A/B/z/file.ext
file1.ext
A/B/file.ext
Those are gathered in a QList (which contains pointers to instances of a class that have the paths as a QString member), but the container doesn't really matter to me.
The desired result is having the list in an ordered form, directories over files (as we are used from most file managers), basically like you would tree a real filesystem.
A/B/z/file.ext
A/B/file.ext
A/B/file2.ext
z/file.ext
file.ext
somefile.ext
Obviously a normal string based sorting would not work, but what is the most elegant / easiest way to sort a list like this?
I would write a custom comparator function for a builtin sort. Something like this may work:
auto cmp = [](const string& lhs, const string& rhs) {
int len = min(lhs.size(), rhs.size());
for (int i = 0; i < len; i++) {
if (lhs[i] = rhs[i]) continue;
// check if there is a '/' starting from position i in both paths
// put directory first (if only one is a directory)
// or return lhs[i] < rhs[i]
};
It should be fairly easy to finish this code,
and then
sort(begin(paths), end(paths), cmp);
Note that I'm using c++11
Take a step back, and take a moment to summarize your desired sorting criteria, in plain English:
Directories come first, files come second.
Directories are sorted alphabetically, and files are sorted alphabetically.
Now, once your logical is reduced to this kind of simplistic, straightforward definition, simply translate this logic directly into C++ code.
First, translate each string into a single value, 0 or 1, that represents whether the string has a pathname component, or not.
Sort on that. Then if both strings' values are equivalent, sort on the strings themselves, as usual. Simple:
std::sort(container.begin(),
container.end(),
[]
(const QString &a,
const QString &b)
{
// Compute whether a or b is a subdirectory, representing
// each as:
//
// 0: directory
// 1: file
int a_hier=a.contains('/') ? 0:1;
int b_hier=b.contains('/') ? 0:1;
if (a_hier != b_hier)
return a_hier < b_hier;
return a < b;
});
This is you basic, initial approach. If necessary, this can be tweaked to sort the pathnames with directories slightly differently (since the / component in the pathname participates in the relative sorting order, which you may or may not want, this was unclear in your question).

How to find an object in std::vector?

Suppose I have a class called Bank, with attributes
class Bank {
string _name;
}
Now I declare a vector of Bank.
vector<Bank> list;
Given a string, how do I search the vector list for that particular Bank object that has the same string name?
I'm trying to avoid doing loops and see if there is an stl function that can do this.
You can use good old linear search:
auto it = std::find_if(list.begin(), list.end(), [&](const Bank& bank)
{
return bank._name == the_name_you_are_looking_for;
});
If there is no such bank in the list, the end iterator will be returned:
if (it == list.end())
{
// no bank in the list with the name you were looking for :-(
}
else
{
// *it is the first bank in the list with the name you were looking for :-)
}
If your compiler is from the stone ages, it won't understand lambdas and auto. Untested C++98 code:
struct NameFinder
{
const std::string& captured_name;
bool operator()(const Bank& bank) const
{
return bank.name == captured_name;
}
};
NameFinder finder = {the_name_you_are_looking_for};
std::vector<Bank>::iterator it = std::find_if(list.begin(), list.end(), finder);
As per popular request, just a side note to warn potential beginners attracted by this question in the future:
std::find is using a linear method, because the underlying object (a vector in that case) is not designed with search efficiency in mind.
Using a vector for data where search time is critical will possibly work, given the computing power available in your average PC, but could become slow quickly if the volume of data to handle grows.
If you need to search quickly, you have other containers (std::set, std::map and a few variants) that allows retrieval in logarithmic times.
You can even use hash tables for (near) instant access in containers like unordered_set and unordered_map, but the cost of other operations grows accordingly. It's all a matter of balance.
You can also sort the vector first and then perform a dichotomic search with std:: algorithms, like binary_search if you have a strict order or lower_bound, upper_bound and equal_range if you can only define a partial order on your elements.
std::find will allow you to search through the vector in a variety of ways.

Algorithm: A Better Way To Calculate Frequencies of a list of words

This question is actually quite simple yet I would like to hear some ideas before jumping into coding. Given a file with a word in each line, calculating most n frequent numbers.
The first and unfortunately only thing that pops up in my mind use to use a std::map. I know fellow C++'ers will say that unordered_map would be so much reasonable.
I would like to know if anything could be added to the algorithm side or this is just basically 'whoever picks the best data structure wins' type of question. I've searched it over the internet and read that hash table and a priority queue might provide an algorithm with O(n) running time however I assume it will be to complex to implement
Any ideas?
The best data structure to use for this task is a Trie:
http://en.wikipedia.org/wiki/Trie
It will outperform a hash table for counting strings.
There are many different approaches to this question. It would finally depend on the scenario and others factors such as the size of the file (If the file has a billion lines) then a HashMapwould not be an efficient way to do it. Here are some things which you can do depending on your problem:
If you know that the number of unique words are very limited, you can use a TreeMap or in your case std::map.
If the number of words are very large then you can build a trie and keep count of various words in another data structure. This could be a heap (min/max depends on what you want to do) of size n. So you don't need to store all the words, just the necessary ones.
I would not start with std::map (or unordered_map) if I had much choice (though I don't know what other constraints may apply).
You have two data items here, and you use one as the key part of the time, but the other as the key another part of the time. For that, you probably want something like a Boost Bimap or possibly Boost MultiIndex.
Here's the general idea using Bimap:
#include <boost/bimap.hpp>
#include <boost/bimap/list_of.hpp>
#include <iostream>
#define elements(array) ((sizeof(array)/sizeof(array[0])))
class uint_proxy {
unsigned value;
public:
uint_proxy() : value(0) {}
uint_proxy& operator++() { ++value; return *this; }
unsigned operator++(int) { return value++; }
operator unsigned() const { return value; }
};
int main() {
int b[]={2,4,3,5,2,6,6,3,6,4};
boost::bimap<int, boost::bimaps::list_of<uint_proxy> > a;
// walk through array, counting how often each number occurs:
for (int i=0; i<elements(b); i++)
++a.left[b[i]];
// print out the most frequent:
std::cout << a.right.rbegin()->second;
}
For the moment, I've only printed out the most frequent number, but iterating N times to print out the N most frequent is pretty trivial.
If you are just interested in the top N most frequent words, and you don't need it to be exact, then there is a very clever structure you can use. I heard of this by way of Udi Manber, it works as follows:
You create an array of N elements, each element tracks a value and a count, you also keep a counter that indexes into this array. Additionally, you have a map from value to index into that array.
Every time you update your structure with a value (like a word from a stream of text) you first check your map to see if that value is already in your array, if it is you increment the count for that value. If it is not then you decrement the count of whatever element your counter is pointing at and then increment the counter.
This sounds simple, and nothing about the algorithm makes it seem like it will yield anything useful, but for typical real data it tends to do very well. Normally if you wish to track the top N things you might want to make this structure with the capacity of 10*N, since there will be a lot of empty values in it. Using the King James Bible as input, here is what this structure lists as the most frequent words (in no particular order):
0 : in
1 : And
2 : shall
3 : of
4 : that
5 : to
6 : he
7 : and
8 : the
9 : I
And here are the top ten most frequent words (in order):
0 : the , 62600
1 : and , 37820
2 : of , 34513
3 : to , 13497
4 : And , 12703
5 : in , 12216
6 : that , 11699
7 : he , 9447
8 : shall , 9335
9 : unto , 8912
You see that it got 9 of the top 10 words correct, and it did so using space for only 50 elements. Depending on your use case the savings on space here may be very useful. It is also very fast.
Here is the implementation of topN that I used, written in Go:
type Event string
type TopN struct {
events []Event
counts []int
current int
mapped map[Event]int
}
func makeTopN(N int) *TopN {
return &TopN{
counts: make([]int, N),
events: make([]Event, N),
current: 0,
mapped: make(map[Event]int, N),
}
}
func (t *TopN) RegisterEvent(e Event) {
if index, ok := t.mapped[e]; ok {
t.counts[index]++
} else {
if t.counts[t.current] == 0 {
t.counts[t.current] = 1
t.events[t.current] = e
t.mapped[e] = t.current
} else {
t.counts[t.current]--
if t.counts[t.current] == 0 {
delete(t.mapped, t.events[t.current])
}
}
}
t.current = (t.current + 1) % len(t.counts)
}
Given a file with a word in each line, calculating most n frequent numbers.
...
I've searched it over the internet and read that hash table and a priority queue might provide an algorithm with O(n)
If you meant the *n*s arethe same then no, this is not possible. However, if you just meant time linear in terms of the size of the input file, then a trivial implementation with a hash table will do what you want.
There might be probabilistic approximate algorithms with sublinear memory.