Sorting struct(ures)

Sorting struct(ures) - c++

If i define a structure
struct dat{
int a;
char b;
};
and then I declare an array of that structure, i.e
dat array[10];
and then I sort the dat array array[i].a, i.e
std::sort((array.a),(array.a+10);
will this work?
And suppose after sorting, array[5].a goes to array[2].a, does array[5].b also go to array[2].b, if not how to do this using std library function sort.

To sort your data structure using the std::sort() algorithm, you could supply a comparator function as its third argument.
For example, to sort by the values of dat.a:
bool IntSorter (const dat& dat1, const dat& dat2) { return dat1.a < dat2.a; }
Then, you call sort like this:
std::sort(array, array + 10, IntSorter);
Also, you could refactor your code to avoid the magic number 10, also to avoid duplicating it when referring to the one-past-last element in the call to std::sort().

No, it won't work as written.
First std::sort((array.a),(array.a+10); is incorrect. array.a is not an array, and trying to treat it as one will certainly lead to some problems.
You would need to instead sort the array itself (std::sort(array, array+10);), but again, that won't work because you don't provide an overload of opeartor<(dat).
You could provide one:
bool operator<(const dat& l, const dat& r)
{
return l.a < r.a;
}
Then std::sort(array, array+10); would work as expected.
When you sort an object, it "all goes together". That means that dat::a and dat::b will not be modified within a specific object, but the location of that object in the sorted array may be at a different index.

If you want to sort the array of structures according to the value of a:
std::sort(std::begin(array), std::end(array),
[](dat const & lhs, dat const & rhs){return lhs.a < rhs.a;});
This will move the b values along with their corresponding a values, which I think is what you say you want. If you want the b values to stay where they are, then it would be somewhat messier. C++ doesn't provide any way to treat an array of dat as an array of int.

For starters, array.a isn't a legal expression, since array
doesn't have an a member, and in fact, isn't even a struct.
If you want to sort the dat members by field a, you'll need
to either provide a custom ordering function (the preferred way,
especially if you have C++11 and can use lambda functions), or
define an operator< on dat. If you just want to move the
a members during the sort, leaving the b members where they
are... you'd have to define a custom swap function as well:
void
swap( dat& lhs, dat& rhs )
{
std::swap( lhs.a, rhs.a );
}
But this would be very wierd, and I'd suggest finding some other
way of organizing your data.

Related

Stable sorting a vector using std::sort

So I have some code like this, I want to sort the vector based on id and put the last overridden element first:
struct Data {
int64_t id;
double value;
};
std::vector<Data> v;
// add some Datas to v
// add some 'override' Datas with duplicated `id`s
std::sort(v.begin(), v.end(),
[](const Data& a, const Data& b) {
if (a.id < b.id) {
return true;
} else if (b.id < a.id) {
return false;
}
return &a > &b;
});
Since vectors are contiguous, &a > &b should work to put the appended overrides first in the sorted vector, which should be equivalent to using std::stable_sort, but I am not sure if there is a state in the std::sort implementation where the equal values would be swapped such that the address of an element that appeared later in the original vector is earlier now. I don't want to use stable_sort because it is significantly slower for my use case. I have also considered adding a field to the struct that keeps track of the original index, but I will need to copy the vector for that.
It seems to work here: https://onlinegdb.com/Hk8z1giqX

std::sort gives no guarantees whatsoever on when elements are compared, and in practice, I strongly suspect most implementations will misbehave for your comparator.
The common std::sort implementation is either plain quicksort or a hybrid sort (quicksort switching to a different sort for small ranges), implemented in-place to avoid using extra memory. As such, the comparator will be invoked with the same element at different memory addresses as the sort progresses; you can't use memory addresses to implement a stable sort.
Either add the necessary info to make the sort innately stable (e.g. the suggested initial index value) or use std::stable_sort. Using memory addresses to stabilize the sort won't work.
For the record, having experimented a bit, I suspect your test case is too small to trigger the issue. At a guess, the hybrid sorting strategy works coincidentally for smallish vectors, but breaks down when the vector gets large enough for an actual quicksort to occur. Once I increase your vector size with some more filler, the stability disappears, Try it online!

c++ struct operator overload not working

I have a vector of pointers to structs, and want to check for existing items and sort by the value of a struct member. However, it appears that the check for existing items (I'm using QVector::indexOf()) is not working, and that std::sort is applying its sorting to some pointer value rather than the member value. I don't understand why the struct operator overloads aren't getting called (or called properly).
In header file:
struct PlotColumn {
quint32 octagonLength;
QVector<Qt::GlobalColor> columnVector;
bool operator< (const PlotColumn& pc) const
{
return (this->octagonLength < pc.octagonLength);
}
bool operator==(PlotColumn const& pc)
{
return this->octagonLength == pc.octagonLength;
}
bool operator> (const PlotColumn& pc) const
{
return (this->octagonLength > pc.octagonLength);
}
};
QVector<PlotColumn*> *plotMap;
In source file:
PlotColumn *pcNew = new PlotColumn();
pcNew->octagonLength = octagonLength;
// check if octagonLength has arrived before:
int plotXAxis = plotMap->indexOf(pcNew);
if (plotXAxis == -1) { // unknown, so insert:
... (some housekeeping)
plotMap->append(pcNew);
std::sort(plotMap->begin(), plotMap->end());
....
(some more housekeeping)
}
Using an external (not in the struct) is possible for sort, but not for indexOf (afaik).
Any thoughts?
PS: yes I know there are tons of already answered questions about sorting vectors of pointers to structs and I've tried to apply the pattern in those solutions but it still doesn't work.

You need to provide a comparater to std::sort() that works with pointers.
bool PlotColumnPointerComparator(const PlotColumn *a, const PlotColumn *b)
{
return (*a) < (*b); // this will call your PlotColumn:: operator<()
}
And change your std::sort() statement to
std::sort(plotMap->begin(), plotMap->end(), PlotColumnPointerComparator);
In C++11, you could do the above with a lambda function, but that's just a syntactic convenience.
Compilers are not omnipotent mindreaders. If you tell it to sort a set of pointers, it will do pointer comparison. If you want the comparison to dereference the pointers and compare the pointed-to objects, you need to tell it to do that .... as in the above.

std::sort is applying its sorting to some pointer value
Let's see:
QVector<PlotColumn*> *plotMap
std::sort(plotMap->begin(), plotMap->end());
Yep, that's exactly what you told std::sort to do (assuming QVector resembles vector): you have a container of pointers, told it to sort the elements of the container. Thus, you told it to sort pointers, not PlotColumn objects.
If you want to sort the pointers based on how the objects they point to compare, you have to apply one of those
answered questions about sorting vectors of pointers to structs
You've already identified your problem:
I've tried to apply the pattern in those solutions but it still doesn't work.
Don't give up on that line of inquiry: you've correctly identified something you need to work on, and should work to understand how those patterns work and how to apply them to your problem, and maybe ask a focused question regarding your attempts at that.

It will not work because You are using the same signature. Overload works for different signatures. Check your function signatures.

subtract two order-less std::vector of objects

I have two vectors of object. Something like:
std::vector<thing> all_things;
std::vector<thing> bad_things;
I want to obtain a third vector that contains the good_things. In other words, every object in all_thing that is not belong to bad_things:
std::vector<thing> good_things=subtract(all_things,bad_things);
Any Ideas about how to implement subtract in most efficient and standard way.
P.S the vectors can NOT be ordered because the class thing does not have any thing to be ordered by.
Thanks!
Edit:
and I do not want to make any change to all_things.
e.g.
void substract(const std::vector<thing>& a, const std::vector<thing>& b);

From the comments, your things can be sorted, but in a meaningless way.
And that is ok.
Sort them meaninglessly.
Write a function that takes two things and give them a meaningless order that is consistent, and two things only compare not-less-than each other if they are equal.
Call this bool arb_order_thing(thing const&, thing const&).
Now std::sort both vectors and use std::set_difference.
Now, this can be expensive if things are expensive to copy. So instead create two vectors of thing const*, write bool arb_order_thing_ptr(thing const*, thing const*) (which dereferences and compares using the meaningless ordering), sort the vectors-of-pointers using that, use set_difference using that, then convert back to a vector<thing>.
Alternatively, consider writing a thing const* hasher (not a std::hash<thing*>, because that is global and rude) and use unordered_set<thing const*>s to do the work manually. Hash the smaller of the two vectors, then do a std::copy_if testing against the hash on the other vector.

If you can't order them, you can use the brute force way. Just compare the vectors. E.g:
std::vector<Thing> all;
std::vector<Thing> bad;
std::vector<Thing> good (all.size());
auto it = std::copy_if (all.begin(), all.end(), good.begin(),
[&bad](const Thing& t){return std::find(bad.begin(), bad.end(), t) == bad.end();} );
all.resize(std::distance(all.begin(),it));

If thing is expensive to construct/copy and its container is long and bads are overwhelming, it is not a good idea to construct a same long 'not bad' array. Actually a flag matrix of all.size() x good.size() have to be filled based on thing comparison. If unicity is ensured iterating through bads could be spared. But O(N2) is the complexity anyway.

I would like to suggest code similar to mkaes, but with few adjustments:
std::vector<thing> substract(const std::vector<thing>& a, const std::vector<thing>& b) {
std::vector<thing> sub;
sub.reserve(a.size());
for (const auto &item : a) {
if (std::find(b.begin(), b.end(), item) == b.end()) {
sub.push_back(a);
}
}
return sub;
}
It's brutal version of what you want to achieve. But it's the best you can do, if you can't sort elements of vectors. Remember though, that you need to be able to compare two objects of item type, meaning you will need to provide operator==.

How can I pass the whole struct to a function including all elements?

I have a structure that I want to pass to a function which will sort the struct. However, I don't know how to pass the WHOLE structure.
What I've done is this until now:
void sort_datoteka_sifra(pole &artikli){
}
And I call it like sort_datoteka_sifra(artikli[0]) etc.. but it only passes the [0] elements, I want to pass the whole structure, so that I can use it in the function without having to call artikli[0], artikli[1] and so on in the main function.

You have several options here.
Pass the array as a pointer to its first element as well as the number of elements:
void sort_datoteka_sifra(pole *artikli, int count){
}
If count is static (known at compile time), you can also pass the array by reference:
void sort_datoteka_sifra(pole (&artikli)[100]){
}
If you don't want to hardcode the count, use a function template:
template <int N>
void sort_datoteka_sifra(pole (&artikli)[N]){
}
Use std::vector instead of C-arrays:
void sort_datoteka_sifra(std::vector<pole> &artikli){
}
Use std::sort instead of your custom sort function (#include <algorithms>) and use it with either your existing C-array or (recommended) a std::vector:
std::sort(std::begin(artikli), std::end(artikli));
You have to provide a way to compare two objects; this is done by either overloading operator< or by passing a function (or functor) to the sort algorithm:
bool comparePole(const pole & a, const pole & b) {
return /* condition when you want to have a before b */
}
std::sort(std::begin(artikli), std::end(artikli), &comparePole);
If you don't want to write a function and have C++11, you can use a lambda function:
std::sort(std::begin(artikli), std::end(artikli), [](const pole & a, const pole & b) {
return /* condition when you want to have a before b */
});
If you want to compare the elements by some member (which has a corresponding operator< overload, which is the case for simple types like int, std::string, etc.), use compareByMember from my other answer at https://stackoverflow.com/a/20616119/592323, e.g. let's say pole has an int ID by which you want to sort:
std::sort(std::begin(artikli), std::end(artikli), compareByMember(&pole::ID));
To sort a sub-array of size count, don't use std::end but:
std::sort(std::begin(artikli), std::begin(artikli) + count, &comparePole);
Of course you can combine the third option with one of the first two, i.e. provide a sort function which is implemented in terms of std::sort.

Your function requests a reference to a single element. And you obviously also pass only a single element. So, to pass the complete array, you should use a pointer, if it's an array allocated with new Or a statically allocated array, e.g.
void fun(pole* artikli);
Otherwise for C++, it's common to use std::vector and pass it by reference:
std::vector<pole> artikli;
void fun(std::vector<pole>& artikli);

C++ - Check if One Array of Strings Contains All Elements of Another

I've recently been porting a Python application to C++, but am now at a loss as to how I can port a specific function. Here's the corresponding Python code:
def foo(a, b): # Where `a' is a list of strings, as is `b'
for x in a:
if not x in b:
return False
return True
I wish to have a similar function:
bool
foo (char* a[], char* b[])
{
// ...
}
What's the easiest way to do this? I've tried working with the STL algorithms, but can't seem to get them to work. For example, I currently have this (using the glib types):
gboolean
foo (gchar* a[], gchar* b[])
{
gboolean result;
std::sort (a, (a + (sizeof (a) / sizeof (*a))); // The second argument corresponds to the size of the array.
std::sort (b, (b + (sizeof (b) / sizeof (*b)));
result = std::includes (b, (b + (sizeof (b) / sizeof (*b))),
a, (a + (sizeof (a) / sizeof (*a))));
return result;
}
I'm more than willing to use features of C++11.

I'm just going to add a few comments to what others have stressed and give a better algorithm for what you want.
Do not use pointers here. Using pointers doesn't make it c++, it makes it bad coding. If you have a book that taught you c++ this way, throw it out. Just because a language has a feature, does not mean it is proper to use it anywhere you can. If you want to become a professional programmer, you need to learn to use the appropriate parts of your languages for any given action. When you need a data structure, use the one appropriate to your activity. Pointers aren't data structures, they are reference types used when you need an object with state lifetime - i.e. when an object is created on one asynchronous event and destroyed on another. If an object lives it's lifetime without any asynchronous wait, it can be modeled as a stack object and should be. Pointers should never be exposed to application code without being wrapped in an object, because standard operations (like new) throw exceptions, and pointers do not clean themselves up. In other words, pointers should always be used only inside classes and only when necessary to respond with dynamic created objects to external events to the class (which may be asynchronous).
Do not use arrays here. Arrays are simple homogeneous collection data types of stack lifetime of size known at compiletime. They are not meant for iteration. If you need an object that allows iteration, there are types that have built in facilities for this. To do it with an array, though, means you are keeping track of a size variable external to the array. It also means you are enforcing external to the array that the iteration will not extend past the last element using a newly formed condition each iteration (note this is different than just managing size - it is managing an invariant, the reason you make classes in the first place). You do not get to reuse standard algorithms, are fighting decay-to-pointer, and generally are making brittle code. Arrays are (again) useful only if they are encapsulated and used where the only requirement is random access into a simple type, without iteration.
Do not sort a vector here. This one is just odd, because it is not a good translation from your original problem, and I'm not sure where it came from. Don't optimise early, but don't pessimise early by choosing a bad algorithm either. The requirement here is to look for each string inside another collection of strings. A sorted vector is an invariant (so, again, think something that needs to be encapsulated) - you can use existing classes from libraries like boost or roll your own. However, a little bit better on average is to use a hash table. With amortised O(N) lookup (with N the size of a lookup string - remember it's amortised O(1) number of hash-compares, and for strings this O(N)), a natural first way to translate "look up a string" is to make an unordered_set<string> be your b in the algorithm. This changes the complexity of the algorithm from O(NM log P) (with N now the average size of strings in a, M the size of collection a and P the size of collection b), to O(NM). If the collection b grows large, this can be quite a savings.
In other words
gboolean foo(vector<string> const& a, unordered_set<string> const& b)
Note, you can now pass constant to the function. If you build your collections with their use in mind, then you often have potential extra savings down the line.
The point with this response is that you really should never get in the habit of writing code like that posted. It is a shame that there are a few really (really) bad books out there that teach coding with strings like this, and it is a real shame because there is no need to ever have code look that horrible. It fosters the idea that c++ is a tough language, when it has some really nice abstractions that do this easier and with better performance than many standard idioms in other languages. An example of a good book that teaches you how to use the power of the language up front, so you don't build bad habits, is "Accelerated C++" by Koenig and Moo.
But also, you should always think about the points made here, independent of the language you are using. You should never try to enforce invariants outside of encapsulation - that was the biggest source of savings of reuse found in Object Oriented Design. And you should always choose your data structures appropriate for their actual use. And whenever possible, use the power of the language you are using to your advantage, to keep you from having to reinvent the wheel. C++ already has string management and compare built in, it already has efficient lookup data structures. It has the power to make many tasks that you can describe simply coded simply, if you give the problem a little thought.

Your first problem is related to the way arrays are (not) handled in C++. Arrays live a kind of very fragile shadow existence where, if you as much as look at them in a funny way, they are converted into pointers. Your function doesn't take two pointers-to-arrays as you expect. It takes two pointers to pointers.
In other words, you lose all information about the size of the arrays. sizeof(a) doesn't give you the size of the array. It gives you the size of a pointer to a pointer.
So you have two options: the quick and dirty ad-hoc solution is to pass the array sizes explicitly:
gboolean foo (gchar** a, int a_size, gchar** b, int b_size)
Alternatively, and much nicer, you can use vectors instead of arrays:
gboolean foo (const std::vector<gchar*>& a, const std::vector<gchar*>& b)
Vectors are dynamically sized arrays, and as such, they know their size. a.size() will give you the number of elements in a vector. But they also have two convenient member functions, begin() and end(), designed to work with the standard library algorithms.
So, to sort a vector:
std::sort(a.begin(), a.end());
And likewise for std::includes.
Your second problem is that you don't operate on strings, but on char pointers. In other words, std::sort will sort by pointer address, rather than by string contents.
Again, you have two options:
If you insist on using char pointers instead of strings, you can specify a custom comparer for std::sort (using a lambda because you mentioned you were ok with them in a comment)
std::sort(a.begin(), a.end(), [](gchar* lhs, gchar* rhs) { return strcmp(lhs, rhs) < 0; });
Likewise, std::includes takes an optional fifth parameter used to compare elements. The same lambda could be used there.
Alternatively, you simply use std::string instead of your char pointers. Then the default comparer works:
gboolean
foo (const std::vector<std::string>& a, const std::vector<std::string>& b)
{
gboolean result;
std::sort (a.begin(), a.end());
std::sort (b.begin(), b.end());
result = std::includes (b.begin(), b.end(),
a.begin(), a.end());
return result;
}
Simpler, cleaner and safer.

The sort in the C++ version isn't working because it's sorting the pointer values (comparing them with std::less as it does with everything else). You can get around this by supplying a proper comparison functor. But why aren't you actually using std::string in the C++ code? The Python strings are real strings, so it makes sense to port them as real strings.

In your sample snippet your use of std::includes is pointless since it will use operator< to compare your elements. Unless you are storing the same pointers in both your arrays the operation will not yield the result you are looking for.
Comparing adresses is not the same thing as comparing the true content of your c-style-strings.
You'll also have to supply std::sort with the neccessary comparator, preferrably std::strcmp (wrapped in a functor).
It's currently suffering from the same problem as your use of std::includes, it's comparing addresses instead of the contents of your c-style-strings.
This whole "problem" could have been avoided by using std::strings and std::vectors.
Example snippet
#include <iostream>
#include <algorithm>
#include <cstring>
typedef char gchar;
gchar const * a1[5] = {
"hello", "world", "stack", "overflow", "internet"
};
gchar const * a2[] = {
"world", "internet", "hello"
};
...
int
main (int argc, char *argv[])
{
auto Sorter = [](gchar const* lhs, gchar const* rhs) {
return std::strcmp (lhs, rhs) < 0 ? true : false;
};
std::sort (a1, a1 + 5, Sorter);
std::sort (a2, a2 + 3, Sorter);
if (std::includes (a1, a1 + 5, a2, a2 + 3, Sorter)) {
std::cerr << "all elements in a2 was found in a1!\n";
} else {
std::cerr << "all elements in a2 wasn't found in a1!\n";
}
}
output
all elements in a2 was found in a1!

A naive transcription of the python version would be:
bool foo(std::vector<std::string> const &a,std::vector<std::string> const &b) {
for(auto &s : a)
if(end(b) == std::find(begin(b),end(b),s))
return false;
return true;
}
It turns out that sorting the input is very slow. (And wrong in the face of duplicate elements.) Even the naive function is generally much faster. Just goes to show again that premature optimization is the root of all evil.
Here's an unordered_set version that is usually somewhat faster than the naive version (or was for the values/usage patterns I tested):
bool foo(std::vector<std::string> const& a,std::unordered_set<std::string> const& b) {
for(auto &s:a)
if(b.count(s) < 1)
return false;
return true;
}
On the other hand, if the vectors are already sorted and b is relatively small ( less than around 200k for me ) then std::includes is very fast. So if you care about speed you just have to optimize for the data and usage pattern you're actually dealing with.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js