Sort relative paths - c++

My problem is pretty simple, but i can't think of an elegant solution.
In my application a bunch of relative paths are collected from user input (they don't really exist in the filesystem).
As an example:
somefile.ext
z/file.ext
A/B/file2.ext
A/B/z/file.ext
file1.ext
A/B/file.ext
Those are gathered in a QList (which contains pointers to instances of a class that have the paths as a QString member), but the container doesn't really matter to me.
The desired result is having the list in an ordered form, directories over files (as we are used from most file managers), basically like you would tree a real filesystem.
A/B/z/file.ext
A/B/file.ext
A/B/file2.ext
z/file.ext
file.ext
somefile.ext
Obviously a normal string based sorting would not work, but what is the most elegant / easiest way to sort a list like this?

I would write a custom comparator function for a builtin sort. Something like this may work:
auto cmp = [](const string& lhs, const string& rhs) {
int len = min(lhs.size(), rhs.size());
for (int i = 0; i < len; i++) {
if (lhs[i] = rhs[i]) continue;
// check if there is a '/' starting from position i in both paths
// put directory first (if only one is a directory)
// or return lhs[i] < rhs[i]
};
It should be fairly easy to finish this code,
and then
sort(begin(paths), end(paths), cmp);
Note that I'm using c++11

Take a step back, and take a moment to summarize your desired sorting criteria, in plain English:
Directories come first, files come second.
Directories are sorted alphabetically, and files are sorted alphabetically.
Now, once your logical is reduced to this kind of simplistic, straightforward definition, simply translate this logic directly into C++ code.
First, translate each string into a single value, 0 or 1, that represents whether the string has a pathname component, or not.
Sort on that. Then if both strings' values are equivalent, sort on the strings themselves, as usual. Simple:
std::sort(container.begin(),
container.end(),
[]
(const QString &a,
const QString &b)
{
// Compute whether a or b is a subdirectory, representing
// each as:
//
// 0: directory
// 1: file
int a_hier=a.contains('/') ? 0:1;
int b_hier=b.contains('/') ? 0:1;
if (a_hier != b_hier)
return a_hier < b_hier;
return a < b;
});
This is you basic, initial approach. If necessary, this can be tweaked to sort the pathnames with directories slightly differently (since the / component in the pathname participates in the relative sorting order, which you may or may not want, this was unclear in your question).

Related

Multiple condition on a sort (c++)

I'm trying to do this LeetCode problem using only the c++ sort function, but I can seem to find the correct way to sort elements depending on multiple condition.
The problem wants you to return the destination city which don't have any path outgoing to another city. In my sorting, I want this city to be the last element of my vector. Right now, this is the code that I have :
class Solution {
public:
string destCity(vector<vector<string>>& paths) {
sort(paths.begin(), paths.end(),[](vector<string> a, vector<string> b ){return (a[1] == b[0]);});
return (paths[paths.size() -1][1]);
}
};`
The input is a vector of paths ([cityA -> cityB], [cityM -> city N] , ... ).
In the following picture, we can see the input, output, my stdout and the expected result of the function :
You can withness that my sort only insures that two paths will be consecutive if the outgoing city of a path equal to the ingoing city of another path [cityA-> cityB], [cityB -> cityC], ... . But my issue is that my sort doesn't treat the case of a city (aka the destination city) that don't have any path outgoing to another city. I would like to place it this particular city as the last element of my vector. Can I do this by adding some code (conditions) to my sort function ? If yes, how ?
Thank you.
You cannot do it with the std::sort. The sort function requires a total order over the elements. You only have a partial one. Once you put all your pairs in, the sort might decide to compare two unrelated elements. It won't try to find a way to compare them in the "right" order.
You need to find a new algorithm. May I suggest:
std::map<string, string> destinations;
And simply following the destination until you get to the end?
Sorting is O(N Log N), even though you might find ways to sort, we'd ideally want to avoid sorting here, and reduce the time complexity to O(N) for this problem.
This'll pass through:
#include <string>
#include <unordered_set>
struct Solution {
static std::string destCity(
const std::vector<std::vector<std::string>>& paths
) {
std::unordered_set<std::string> starts_map;
for (const auto& path : paths) {
starts_map.insert(path[0]);
}
for (const auto& path : paths) {
if (!starts_map.count(path[1])) {
return path[1];
}
}
return "";
}
};
References
For additional details, you can see the Discussion Board. There are plenty of accepted solutions with a variety of languages and explanations, efficient algorithms, as well as asymptotic time/space complexity analysis1, 2 in there.

efficient is_in(vector<string>& S, string P) function

Given a set of S string { S0, S2, S3,..., Sn-1 }, and a string P, how to determine the function bool is_in( string, vector ) without doing the obvious loop.
Meaning that I don't want to do this:
bool is_in(vector<string>& S, string P)
{
for(int i=0; i<S.size(); i++)
if(P == S[i]) return true;
return false;
}
Ideally, I would like to have a sort of hash function, that I could compute a priori. Something like this:
bool is_in(vector<string>& S, string P)
{
someHashType h = hash( S );
if( someFunction( h, S ) return true;
return false;
}
Note:
S is s static vector (in my case, size 1000, unsorted)
P an entry of a collection of strings I'm testing against S (also unsorted) (in my case, 10M) -
So that's why I need to be fast.
This is NOT a homework problem - But part of a large scale software.
The problem with "I want this function to be faster" is that it does, nearly always, involve SOME extra work somewhere else. And that may or may not mean that the improvement is "worth it". All that depends on what your collection of strings that you are looking for is used for in the rest of the code. If it's just a "is the word in this list then do X" (e.g. a bad word check for commit messages, must not have swear-words and company names in them), then I would change the vector to an unordered_set. That has a O(1) search time, and would look something like:
bool is_in(unordered_set<string>& S, string P)
{
auto it = S.find(P);
return (it != S.end());
}
But this will of course have consequences elsewhere, and if you rely on the list being a vector so that for example iterating over it is fast somewhere else in the code, this will probably slow that part down.
Edit: You have, I take it, profiled your code in a real use-case and found this particular function to take a significant amount of time. Otherwise, you'd be better off measuring that FIRST.
Finally I found what I was looking for:
There is a tool called BloomFilter which allows a pre-computed hash of a collection of strings.
I developed my solution around the code located at C++Bloom Filter Library
The code would go like this:
insert all strings to the 'bloom' filter
check if a given string is in the filter.
The advantage is that the strings don't need to be storage in memory, as it would be in a set, unordered_set or any object like that.
in my particular object, I had a table of strings with 10M strings (800MB).
The size of the filter in memory is around 20M, and the search is quite faster.
The 'Bloom Filter' is an statistical algorithm, so it can have a few false positives. - But the probability for that is quite low (controlled by a parameter)
Note that there is no false negative.

Iterating through one variable in a vector of struct with lower/upper bound

I have 2 structs, one simply has 2 values:
struct combo {
int output;
int input;
};
And another that sorts the input element based on the index of the output element:
struct organize {
bool operator()(combo const &a, combo const &b)
{
return a.input < b.input;
}
};
Using this:
sort(myVector.begin(), myVector.end(), organize());
What I'm trying to do with this, is iterate through the input varlable, and check if each element is equal to another input 'in'.
If it is equal, I want to insert the value at the same index it was found to be equal at for input, but from output into another temp vector.
I originally went with a more simple solution (when I wasn't using a structs and simply had 2 vectors, one input and one output) and had this in a function called copy:
for(int i = 0; i < input.size(); ++i){
if(input == in){
temp.push_back(output[i]);
}
}
Now this code did work exactly how I needed it, the only issue is it is simply too slow. It can handle 10 integer inputs, or 100 inputs but around 1000 it begins to slow down taking an extra 5 seconds or so, then at 10,000 it takes minutes, and you can forget about 100,000 or 1,000,000+ inputs.
So, I asked how to speed it up on here (just the function iterator) and somebody suggested sorting the input vector which I did, implemented their suggestion of using upper/lower bound, changing my iterator to this:
std::vector<int>::iterator it = input.begin();
auto lowerIt = std::lower_bound(input.begin(), input.end(), in);
auto upperIt = std::upper_bound(input.begin(), input.end(), in);
for (auto it = lowerIt; it != upperIt; ++it)
{
temp.push_back(output[it - input.begin()]);
}
And it worked, it made it much faster, I still would like it to be able to handle 1,000,000+ inputs in seconds but I'm not sure how to do that yet.
I then realized that I can't have the input vector sorted, what if the inputs are something like:
input.push_back(10);
input.push_back(-1);
output.push_back(1);
output.push_back(2);
Well then we have 10 in input corresponding to 1 in output, and -1 corresponding to 2. Obviously 10 doesn't come before -1 so sorting it smallest to largest doesn't really work here.
So I found a way to sort the input based on the output. So no matter how you organize input, the indexes match each other based on what order they were added.
My issue is, I have no clue how to iterate through just input with the same upper/lower bound iterator above. I can't seem to call upon just the input variable of myVector, I've tried something like:
std::vector<combo>::iterator it = myVector.input.begin();
But I get an error saying there is no member 'input'.
How can I iterate through just input so I can apply the upper/lower bound iterator to this new way with the structs?
Also I explained everything so everyone could get the best idea of what I have and what I'm trying to do, also maybe somebody could point me in a completely different direction that is fast enough to handle those millions of inputs. Keep in mind I'd prefer to stick with vectors because not doing so would involve me changing 2 other files to work with things that aren't vectors or lists.
Thank you!
I think that if you sort it in smallest to largest (x is an integer after all) that you should be able to use std::adjacent_find to find duplicates in the array, and process them properly. For the performance issues, you might consider using reserve to preallocate space for your large vector, so that your push back operations don't have to reallocate memory as often.

Iterating over Unorderd_map using indexed for loop

I am trying to access values stored in an unorderd_map using a for loop, but I am stuck trying to access values using the current index of my loop. Any suggestion, or link to look-on? thanks. [Hint: I don't want to use an iterator].
my sample code:
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>
using namespace std;
int main()
{
unordered_map<int,string>hash_table;
//filling my hash table
hash_table.insert(make_pair(1,"one"));
hash_table.insert(make_pair(2,"two"));
hash_table.insert(make_pair(3,"three"));
hash_table.insert(make_pair(4,"four"));
//now, i want to access values of my hash_table with for loop, `i` as index.
//
for (int i=0;i<hash_table.size();i++ )
{
cout<<"Value at index "<<i<<" is "<<hash_table[i].second;//I want to do something like this. I don't want to use iterator!
}
return 0;
}
There are two ways to access an element from an std::unordered_map.
An iterator.
Subscript operator, using the key.
I am stuck trying to access values using the current index of my loop
As you can see, accessing an element using the index is not listed in the possible ways to access an element.
I'm sure you realize that since the map is unordered the phrase element at index i is quite meaningless in terms of ordering. It is possible to access the ith element using the begin iterator and std::advance but...
Hint: I don't want to use an iterator].
Hint: You just ran out of options. What you want to do is not possible. Solution: Start wanting to use tools that are appropriate to achieving your objective.
If you want to iterate a std::unordered_map, then you use iterators because that's what they're for. If you don't want to use iterators, then you cannot iterate an std::unordered_map. You can hide the use of iterators with a range based for loop, but they're still used behind the scenes.
If you want to iterate something using a position - index, then what you need is an array such as a std::vector.
First, why would you want to use an index versus an iterator?
Suppose you have a list of widgets you want your UI to draw. Each widget can have its own list of child widgets, stored in a map. Your options are:
Make each widget draw itself. Not ideal since widgets are now coupled to the UI kit you are using.
Return the map and use an iterator in the drawing code. Not ideal because now the drawing code knows your storage mechanism.
An API that can avoid both of these might look like this.
const Widget* Widget::EnumerateChildren(size_t* io_index) const;
You can make this work with maps but it isn't efficient. You also can't guarantee the stability of the map between calls. So this isn't recommended but it is possible.
const Widget* Widget::EnumerateChildren(size_t* io_index) const
{
auto& it = m_children.begin();
std::advance(it, *io_index);
*io_index += 1;
return it->second;
}
You don't have to use std::advance and could use a for loop to advance the iterator yourself. Not efficient or very safe.
A better solution to the scenario I described would be to copy out the values into a vector.
void Widget::GetChildren(std::vector<Widget*>* o_children) const;
You can't do it without an iterator. An unordered map could store the contents in any order and move them around as it likes. The concept of "3rd element" for example means nothing.
If you had a list of the keys from the map then you could index into that list of keys and get what you want. However unless you already have it you would need to iterate over the map to generate the list of keys so you still need an iterator.
An old question.
OK, I'm taking the risk: here may be a workaround (not perfect though: it is just a workaround).
This post is a bit long because I explain why this may be needed. In fact one might want to use the very same code to load and save data from and to a file. This is very useful to synchronize the loading and saving of data, to keep the same data, the same order, and the same types.
For example, if op is load_op or save_op:
load_save_data( var1, op );
load_save_data( var2, op );
load_save_data( var3, op );
...
load_save_data hides the things performed inside. Maintenance is thus much more easy.
The problem is when it comes to containers. For example (back to the question) it may do this for sets (source A) to save data:
int thesize = theset.size();
load_save(thesize, load); // template (member) function with 1st arg being a typename
for( elem: theset) {
load_save_data( thesize, save_op );
}
However, to read (source B):
int thesize;
load_save_data( thesize, save);
for( int i=0; i<thesize, i++) {
Elem elem;
load_save_data( elem, load_op);
theset.insert(elem);
}
So, the whole source code would be something like this, with too loops:
if(op == load_op) { A } else { B }
The problem is there are two different kinds of loop, and it would be nice to merge them as one only. Ideally, it would be nice to be able to do:
int thesize;
load_save_data( thesize, save);
for( int i=0; i<thesize, i++) {
Elem elem;
if( op == save_op ) {
elem=theset[i]; // not possible
}
load_save_data( elem, op);
if( op == load_op ) {
theset.insert(elem);
}
}
(as this code is used in different contexts, care may be taken to provide enough information to the compiler to allow it the strip the unnecessary code (the right "if"), not obvious but possible)
This way, each call to load_save_data is in the same order, the same type. You forget a field for both or none, but everything is kept synchronized between save and load. You may add a variable, change a type, change the order etc in one place only. The code maintenance is thus easier.
A solution to the impossible "theset[i]" is indeed to use a vector or a map instead of a set but you're losing the properties of a set (avoid two identical items).
So a workaround (but it has a heavy price: efficiency and simplicity) is something like:
void ...::load_save( op )
{
...
int thesize;
set<...> tmp;
load_save_data( thesize, save);
for( int i=0; i<thesize, i++) {
Elem elem;
if( op == save_op ) {
elem=*(theset.begin()); \
theset.erase(elem); > <-----
tmp.insert(elem); /
}
load_save_data( elem, op);
if( op == load_op ) {
theset.insert(elem);
}
}
if(op == save_op) {
theset.insert(tmp.begin(), tmp.end()); <-----
}
...
}
Not very beautiful but it does the trick, and (IMHO) itis the closest answer to the question.

assignment vs std::swap and merging and keeping duplicates in seperate object

Say I have two std::set<std::string>s. The first one, old_options, needs to be merged with additional options, contained in new_options. I can't just use std::merge (well, I do, but not only that) because I also check for doubles and warn the user about this accordingly. To this effect, I have
void merge_options( set<string> &old_options, const set<string> &new_options )
{
// find duplicates and create merged_options, a stringset containing the merged options
// handle duplicated the way I want to
// ...
old_options = merged_options;
}
Is it better to use
std::swap( merged_options, old_options );
or the assignment I have?
Is there a better way to filter duplicates and return the merged set than consecutive calls to std::set_intersection and std::set_union to detect dupes and merge the sets? I know it's slower than one traversal and doing both at once, but these sets are small (performance is not critical) and I trust the Standard more than I trust myself.
What's wrong with
void merge_options( set<string> &old_options, const set<string> &new_options )
{
for (set<string>::iterator i = new_options.begin();
i != new_options.end(); ++i)
if (old_options.find(*i) != old_options.end())
warn_duplicate(*i);
else
old_options.insert(*i);
}
This is a simple O(m lg n) algorithm, where m = new_options.size() and n = old_options.size().
Given (as you've stated) that performance isn't critical here, I'd use the assignment and the two-pass algorithm. It's simpler and easier to understand; it's only worth using a "trick" like swap if you really need what it gains.
Writing your own algorithm wouldn't be such a bad thing, but again, unless you have a real use for the benefit it provides, I wouldn't bother.
This is in part an answer to larsmans. There is a remove_copy_if algorithm that encapsulates his for loop into a single function. The following uses a C++0x lambda for the predicate.
void merge_options( set<string> &old_options, const set<string> &new_options )
{
remove_copy_if(
new_options.begin(),
new_options.end(),
inserter(old_options, old_options.end()),
[&](const string &s) {
return (old_options.count(s)) ? warn_duplicate(s), true : false;
}
);
}