choose the best variable after comparing them

choose the best variable after comparing them - c++

Surely the question was asked but I didn't know how to formulate my research to get relevant results.
In my problem, I have a point A and several other points B,C... on a plane.
I want to compare the distances between A-B, A-C, ... and return the closest point for example B.
Except that my way is not very optimized maybe you have better ideas.
here is a small piece of my code that summarizes what I said here is a small piece of my code that summarizes what I said and I replaced the points that are two dimensional vectors by simple int to simplify the explanation.
int A(5);
int B(7);
int C(-3);
int D(9);
int i(1);
int best_dist = A-B;
if((A-C) < best_dist)
{
best_dist = A-C;
i = 2;
}
if((A-D) < best_dist)
{
best_dist = A-D;
i = 3;
}
switch(i){
case 1:
return B;
break;
case 2:
return C;
break;
case 3:
return D;
break;
}

You could put the numbers (/coordinates) into a container (say std::array) and use std::min_element to find the member from which the "from" point (A, in your example) has the smallest absolute distance. E.g.:
#include <algorithm> // std::min_element
#include <array>
#include <cmath> // std::abs
#include <iostream>
int main() {
constexpr int from{5};
constexpr std::array<int, 3> candidates{7, -3, 9};
// consider using an != end() check on the resulting
// iterator if you do not know the container to be non-empty.
int const closest_point_on_plane = *std::min_element(
candidates.begin(), candidates.end(), [](int lhs, int rhs) {
return std::abs(from - lhs) < std::abs(from - rhs);
});
std::cout << closest_point_on_plane; // 7
}
For your actual use case of a plane, you will have to replace the
std::abs(from_point_on_line - to_point_on_line)
distance-on-a-line metric used in the compare lambda with an appropriate
distance(from_point_on_plane, to_point_on_plane)
distance-on-a-plane metric.

Related

How to get an element (struct) in an array by a value in the struct

Let's say I have this struct containing an integer.
struct Element
{
int number;
Element(int number)
{
this->number = number;
}
};
And I'm gonna create a vector containing many Element structs.
std::vector<Element> array;
Pretend that all the Element structs inside array have been initialized and have their number variable set.
My question is how can I instantly get an element based on the variable number?
It is very possible to do it with a for loop, but I'm currently focusing on optimization and trying to avoid as many for loops as possible.
I want it to be as instant as getting by index:
Element wanted_element = array[wanted_number]
There must be some kind of overloading stuff, but I don't really know what operators or stuff to overload.
Any help is appreciated :)

With comparator overloading implemented, std::find is available to help:
#include <iostream>
#include <vector>
#include <algorithm>
struct Element
{
int number;
Element(int number)
{
this->number = number;
}
bool operator == (Element el)
{
return number == el.number;
}
};
int main()
{
std::vector<Element> array;
std::vector<int> test;
for(int i=0;i<100;i++)
{
auto t = clock();
test.push_back(t);
array.push_back(Element(t));
}
auto valToFind = test[test.size()/2];
std::cout << "value to find: "<<valToFind<<std::endl;
Element toFind(valToFind);
auto it = std::find(array.begin(),array.end(),toFind);
if(it != array.end())
std::cout<<"found:" << it->number <<std::endl;
return 0;
}
The performance on above method depends on the position of the searched value in the array. Non-existing values & last element values will take the highest time while first element will be found quickest.
If you need to optimize searching-time, you can use another data-structure instead of vector. For example, std::map is simple to use here and fast on average (compared to latest elements of vector-version):
#include <iostream>
#include <vector>
#include <algorithm>
#include <map>
struct Element
{
int number;
Element(){ number = -1; }
Element(int number)
{
this->number = number;
}
};
int main()
{
std::map<int,Element> mp;
std::vector<int> test;
for(int i=0;i<100;i++)
{
auto t = clock();
test.push_back(t);
mp[t]=Element(t);
}
auto valToFind = test[test.size()/2];
std::cout << "value to find: "<<valToFind<<std::endl;
auto it = mp.find(valToFind);
if(it != mp.end())
std::cout<<"found:" << it->second.number <<std::endl;
return 0;
}
If you have to use vector, you can still use the map near the vector to keep track of its elements the same way above method just with extra memory space & extra deletions/updates on the map whenever vector is altered.
Anything you invent would with success would look like hashing or a tree in the end. std::unordered_map uses hashing while std::map uses red-black tree.
If range of values are very limited, like 0-to-1000 only, then simply saving its index in a second vector would be enough:
vec[number] = indexOfVector;
Element found = array[vec[number]];
If range is full and if you don't want to use any map nor unordered_map, you can still use a direct-mapped caching on the std::find method. On average, simple caching should decrease total time taken on duplicated searches (how often you search same item?).

Sort Integers by The Number of 1 Bits . I used one sort function to sort the vector ? But why sort is not working?

Sort Integers by The Number of 1 Bits
Leetcode : Problem Link
Example Testcase :
Example 1:
Input: arr = [0,1,2,3,4,5,6,7,8]
Output: [0,1,2,4,8,3,5,6,7]
Explantion: [0] is the only integer with 0 bits.
[1,2,4,8] all have 1 bit.
[3,5,6] have 2 bits.
[7] has 3 bits.
The sorted array by bits is [0,1,2,4,8,3,5,6,7]\
Example 2:
Input: arr = [1024,512,256,128,64,32,16,8,4,2,1]
Output: [1,2,4,8,16,32,64,128,256,512,1024]
Explantion: All integers have 1 bit in the binary representation, you should just sort them in ascending order.
My Solution :
class Solution {
public:
unsigned int setBit(unsigned int n){
unsigned int count = 0;
while(n){
count += n & 1;
n >>= 1;
}
return count;
}
vector<int> sortByBits(vector<int>& arr) {
map<int,vector<int>>mp;
for(auto it:arr){
mp[setBit(it)].push_back(it);
}
for(auto it:mp){
vector<int>vec;
vec=it.second;
sort(vec.begin(),vec.end()); //This Sort Function of vector is not working
}
vector<int>ans;
for(auto it:mp){
for(auto ele:it.second){
ans.push_back(ele);
}
}
return ans;
}
};
In my code why sort function is not working ?
[1024,512,256,128,64,32,16,8,4,2,1]
For the above testcase output is [1024,512,256,128,64,32,16,8,4,2,1] because of sort function is not working. It's correct output is [1,2,4,8,16,32,64,128,256,512,1024]
Note : In the above example testcase every elements of the testcase has only one set-bit(1)

As your iteration in //This sort function ...
refers to mp as the copy of the value inside the map, sort function will not sort the vector inside it, but the copy of it. Which does not affecting the original vector<int> inside the mp. Therefore, no effect occurs. You should refer the vector inside the map as a reference like this:
class Solution {
public:
unsigned int setBit(unsigned int n) {
unsigned int count = 0;
while (n) {
count += n & 1;
n >>= 1;
}
return count;
}
vector<int> sortByBits(vector<int>& arr) {
map<int, vector<int>>mp;
for (auto it : arr) {
mp[setBit(it)].push_back(it);
}
for (auto& it : mp) {
sort(it.second.begin(), it.second.end()); //Now the sort function works
}
vector<int>ans;
for (auto it : mp) {
for (auto ele : it.second) {
ans.push_back(ele);
}
}
return ans;
}
};
Although there is more design problem inside your solution, this will be a solution with minimized modification.

vector<int>vec is a copy of a copy of the one in the map which is then discarded. Try:
for(auto& entry:mp){
vector<int>&vec=entry.second;
sort(vec.begin(),vec.end());
}
Your other for loops should also use references for efficiency but it won't affect the behaviour.

I assume the OP is just learning, so fiddling with various data structures etc. can carry some educational value. Still, only one of the comments pointed out that the starting approach to the problem is wrong, and the whole point of the exercise is to find a custom method of comparing the numbers, by number of bits first, then - by value.
Provided std::sort is allowed (OP uses it), I guess the whole solution comes down to, conceptually, sth likes this (but I haven't verified it against LeetCode):
template <typename T>
struct Comp
{
std::size_t countBits(T number) const
{
size_t count;
while(number) {
count += number & 1;
number>>=1;
}
return count;
}
bool operator()(T lhs, T rhs) const
{
/*
auto lb{countBits(lhs)};
auto rb{countBits(rhs)};
return lb==rb ? lhs < rhs : lb < rb;
* The code above is the manual implementation of the line below
* that utilizes the standard library
*/
return std::tuple{countBits(lhs), lhs} < std::tuple{countBits(rhs), rhs};
}
};
class Solution {
public:
void sortByBits(vector<int>& arr) {
std::sort(begin(arr), end(arr), Comp<int>{});
}
};
Probably it can improved even further, but I'd take it as starting point for analysis.

Here is memory efficient and fast solution. I don't know why you are using map and extra vector. we can solve this questions without any extra memory efficiently. We just have to make a Comparator function which will sort elements according to our own requirements. Please let me know in comments if you require further help in code (or if you find difficult to understand my code). I am using __builtin_popcount() function which will return me number of set bits in a number.
bool sortBits(const int a, const int b){ //Comparator function to sort elements according to number of set bits
int numOfBits1 = __builtin_popcount(a);
int numOfBits2 = __builtin_popcount(b);
if(numOfBits1 == numOfBits2){ //if number of set bits are same, then sorting the elements according to magnitude of element (greater/smaller element)
return a < b;
}
return (numOfBits1 < numOfBits2); //if number of set bits are not same, then sorting the elements according to number of set bits in element
}
class Solution {
public:
vector<int> sortByBits(vector<int>& arr) {
sort(arr.begin(),arr.end(), sortBits);
return arr;
}
};

The problem is already evaluated and the fix is aready explained.
I want to give 2 additional/alternative solution proposals.
In C++17 we have the std::bitset count function. Please see here
And in C++20 we have directly the std::popcount function. Please see here.
(Elderly and grey haired people like me would also find 5 additional most efficient solutions in the Book "Hackers Delight")
Both variants lead to a one statement solution using std::sort with a lambda.
Please see:
#include <algorithm>
#include <vector>
#include <iostream>
#include <bitset>
// Solution
class Solution {
public:
std::vector<int> sortByBits(std::vector<int>& arr) {
std::sort(arr.begin(), arr.end(), [](const unsigned int i1, const unsigned int i2)
{ size_t c1{ std::bitset<14>(i1).count() }, c2{ std::bitset<14>(i2).count() }; return c1 == c2 ? i1 < i2 : c1 < c2; });
//{ int c1=std::popcount(i1), c2=std::popcount(i2); return c1 == c2 ? i1 < i2 : c1 < c2; });
return arr;
}
};
// Test
int main() {
std::vector<std::vector<int>> testData{
{0,1,2,3,4,5,6,7,8},
{1024,512,256,128,64,32,16,8,4,2,1}
};
Solution s;
for (std::vector<int>& test : testData) {
for (const int i : s.sortByBits(test)) std::cout << i << ' ';
std::cout << '\n';
}
}

Recommend C++ container to hold top 20 minimum values

In SQL there is the feature to say something like
SELECT TOP 20 distance FROM dbFile ORDER BY distance ASC
If my SQL is correct with, say 10,000 records, this should return the 20 smallest distances in my databse.
I don't have a database. I have a 100,000-element simple array.
Is there a C++ container, Boost, MFC or STL that provides simple code for a struct like
struct closest{
int ID;
double distance;
closest():ID(-1), distance(std::numeric_limits<double>::max( )){}
};
Where I can build a sorted by distance container like
boost::container::XXXX<closest> top(20);
And then have a simple
top.replace_if(closest(ID,Distance));
Where the container will replace the entry with the current highest distance in my container with my new entry if it is less than the current highest distance in my container.
I am not worried about speed. I like elegant clean solutions where containers and code do all the heavy lifting.
EDIT. Addendum after all the great answers received.
What I really would of liked to have found, due to its elegance. Is a sorted container that I could create with a container size limit. In my case 20. Then I could push or insert to my hearts content a 100 000 items or more. But. There is always a but. The container would of maintained the max size of 20 by replacing or not inserting an item if its comparator value was not within the lowest 20 values.
Yes. I know now from all these answers that via programming and tweaking existing containers the same effect can be achieved. Perhaps when the next round of suggestions for the C & C++ standards committee sits. We could suggest. Self sorting (which we kind of have already) and self size limiting containers.

What you need is to have a maxheap of size 20. Recall that the root of your heap will be the largest value in the heap.
This heap will contain the records with smallest distance that you have encountered so far. For the first 20 out of 10000 values you just push to the heap.
At this point you iterate through the rest of the records and for each record, you compare it to the root of your heap.
Remember that the root of your heap is basically the very worst of the very best.(The record with the largest distance, among the 20 records with the shortest distance you have encountered so far).
If the value you are considering is not worth keeping (its distance is larger that the root of your tree), ignore that record and just keep moving.
Otherwise you pop your heap (get rid of the root) and push the new value in. The priority queue will automatically put its record with the largest distance on the root again.
Once you keep doing this over the entire set of 10000 values, you will be left with the 20 records that have the smallest distance, which is what you want.
Each push-pop takes constant O(1) time, iterating through all inputs of N is O(n) so this is a Linear solution.
Edit:
I thought it would be useful to show my idea in C++ code. This is a toy example, you can write a generic version with templates but I chose to keep it simple and minimalistic:
#include <iostream>
#include <queue>
using namespace std;
class smallestElements
{
private:
priority_queue<int,std::vector<int>,std::less<int> > pq;
int maxSize;
public:
smallestElements(int size): maxSize(size)
{
pq=priority_queue<int, std::vector<int>, std::less<int> >();
}
void possiblyAdd(int newValue)
{
if(pq.size()<maxSize)
{
pq.push(newValue);
return;
}
if(newValue < pq.top())
{
pq.pop(); //get rid of the root
pq.push(newValue); //priority queue will automatically restructure
}
}
void printAllValues()
{
priority_queue<int,std::vector<int>,std::less<int> > cp=pq;
while(cp.size()!=0)
{
cout<<cp.top()<<" ";
cp.pop();
}
cout<<endl;
}
};
How you use this is really straight forward. basically in your main function somewhere you will have:
smallestElements se(20); //we want 20 smallest
//...get your stream of values from wherever you want, call the int x
se.possiblyAdd(x); //no need for bounds checking or anything fancy
//...keep looping or potentially adding until the end
se.printAllValues();//shows all the values in your container of smallest values
// alternatively you can write a function to return all values if you want

If this is about filtering the 20 smallest elements from a stream on the fly, then a solution based on std::priority_queue (or std::multiset) is the way to go.
However, if it is about finding the 20 smallest elements in a given arraym I wouldn't go for a special container at all, but simply the algorithm std::nth_element - a partial sorting algorithm that will give you the n smallest elements - EDIT: or std::partial_sort (thanks Jarod42) if the elements also have to be sorted. It has linear complexity and it's just a single line to write (+ the comparison operator, which you need in any case):
#include <vector>
#include <iostream>
#include <algorithm>
struct Entry {
int ID;
double distance;
};
std::vector<Entry> data;
int main() {
//fill data;
std::nth_element(data.begin(), data.begin() + 19, data.end(),
[](auto& l, auto& r) {return l.distance < r.distance; });
std::cout << "20 elements with smallest distance: \n";
for (size_t i = 0; i < 20; ++i) {
std::cout << data[i].ID << ":" << data[i].distance << "\n";
}
std::cout.flush();
}
If you don't want to change the order of your original array, you would have to make a copy of the whole array first though.

My first idea would be using a std::map or std::set with a custom comparator for this (edit: or even better, a std::priority_queue as mentioned in the comments).
Your comparator does your sorting.
You essentially add all your elements to it. After an element has been added, check whether there are more than n elements inside. If there are, remove the last one.

I am not 100% sure, that there is no more elegant solution, but even std::set is pretty pretty.
All you have to do is to define a proper comparator for your elements (e.g. a > operator) and then do the following:
std::set<closest> tops(arr, arr+20)
tops.insert(another);
tops.erase(tops.begin());

I would use nth_element like #juanchopanza suggested before he deleted it.
His code looked like:
bool comp(const closest& lhs, const closest& rhs)
{
return lhs.distance < rhs.distance;
}
then
std::vector<closest> v = ....;
nth_element(v.begin(), v.begin() + 20, v.end(), comp);
Though if it was only ever going to be twenty elements then I would use a std::array.

Just so you can all see what I am currently doing which seems to work.
struct closest{
int base_ID;
int ID;
double distance;
closest(int BaseID, int Point_ID,
double Point_distance):base_ID(BaseID),
ID(Point_ID),distance(Point_distance){}
closest():base_ID(-1), ID(-1),
distance(std::numeric_limits<double>::max( )){}
bool operator<(const closest& rhs) const
{
return distance < rhs.distance;
}
};
void calc_nearest(void)
{
boost::heap::priority_queue<closest> svec;
for (int current_gift = 0; current_gift < g_nVerticesPopulated; ++current_gift)
{ double best_distance=std::numeric_limits<double>::max();
double our_distance=0.0;
svec.clear();
for (int all_other_gifts = 0; all_other_gifts < g_nVerticesPopulated;++all_other_gifts)
{
our_distance = distanceVincenty(g_pVertices[current_gift].lat,g_pVertices[current_gift].lon,g_pVertices[all_other_gifts].lat,g_pVertices[all_other_gifts].lon);
if (our_distance != 0.0)
{
if (our_distance < best_distance) // don't bother to push and sort if the calculated distance is greater than current 20th value
svec.push(closest(g_pVertices[current_gift].ID,g_pVertices[current_gift].ID,our_distance));
if (all_other_gifts%100 == 0)
{
while (svec.size() > no_of_closest_points_to_calculate) svec.pop(); // throw away any points above no_of_closest_points_to_calculate
closest t = svec.top(); // the furthest of the no_of_closest_points_to_calculate points for optimisation
best_distance = t.distance;
}
}
}
std::cout << current_gift << "\n";
}
}
As you can see. I have 100 000 lat & long points draw on an openGl sphere.
I am calculating each point against every other point and only retaining currently the closest 20 points. There is some primitive optimisation going on by not pushing a value if it is bigger than the 20th closest point.
As I am used to Prolog taking hours to solve something I am not worried about speed. I shall run this overnight.
Thanks to all for your help.
It is much appreciated.
Still have to audit the code and results but happy that I am moving in the right direction.

I have posted a number of approaches to the similar problem of retrieving the top 5 minimum values recently here:
https://stackoverflow.com/a/33687969/1025391
There are implementations that keep a specific number of smallest or greatest items from an input vector in different ways. The nth_element algorithm performs a partial sort, the priority queue maintains a heap, the set a binary search tree, and the deque- and vector-based approaches just remove an element based on a (linear) min/max search.
It should be fairly easy to implement a custom comparison operator and to adapt the number of items to keep n.
Here's the code (refactored based off the other post):
#include <algorithm>
#include <functional>
#include <queue>
#include <set>
#include <vector>
#include <random>
#include <iostream>
#include <chrono>
template <typename T, typename Compare = std::less<T>>
std::vector<T> filter_nth_element(std::vector<T> v, typename std::vector<T>::size_type n) {
auto target = v.begin()+n;
std::nth_element(v.begin(), target, v.end(), Compare());
std::vector<T> result(v.begin(), target);
return result;
}
template <typename T, typename Compare = std::less<T>>
std::vector<T> filter_pqueue(std::vector<T> v, typename std::vector<T>::size_type n) {
std::vector<T> result;
std::priority_queue<T, std::vector<T>, Compare> q;
for (auto i: v) {
q.push(i);
if (q.size() > n) {
q.pop();
}
}
while (!q.empty()) {
result.push_back(q.top());
q.pop();
}
return result;
}
template <typename T, typename Compare = std::less<T>>
std::vector<T> filter_set(std::vector<T> v, typename std::vector<T>::size_type n) {
std::set<T, Compare> s;
for (auto i: v) {
s.insert(i);
if (s.size() > n) {
s.erase(std::prev(s.end()));
}
}
return std::vector<T>(s.begin(), s.end());
}
template <typename T, typename Compare = std::less<T>>
std::vector<T> filter_deque(std::vector<T> v, typename std::vector<T>::size_type n) {
std::deque<T> q;
for (auto i: v) {
q.push_back(i);
if (q.size() > n) {
q.erase(std::max_element(q.begin(), q.end(), Compare()));
}
}
return std::vector<T>(q.begin(), q.end());
}
template <typename T, typename Compare = std::less<T>>
std::vector<T> filter_vector(std::vector<T> v, typename std::vector<T>::size_type n) {
std::vector<T> q;
for (auto i: v) {
q.push_back(i);
if (q.size() > n) {
q.erase(std::max_element(q.begin(), q.end(), Compare()));
}
}
return q;
}
template <typename Clock = std::chrono::high_resolution_clock>
struct stopclock {
std::chrono::time_point<Clock> start;
stopclock() : start(Clock::now()) {}
~stopclock() {
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(Clock::now() - start);
std::cout << "elapsed: " << elapsed.count() << "ms\n";
}
};
std::vector<int> random_data(std::vector<int>::size_type n) {
std::mt19937 gen{std::random_device()()};
std::uniform_int_distribution<> dist;
std::vector<int> out(n);
for (auto &i: out)
i = dist(gen);
return out;
}
int main() {
std::vector<int> data = random_data(1000000);
stopclock<> sc;
std::vector<int> result = filter_nth_element(data, 5);
std::cout << "smallest values: ";
for (auto i : result) {
std::cout << i << " ";
}
std::cout << "\n";
std::cout << "largest values: ";
result = filter_nth_element<int, std::greater<int>>(data, 5);
for (auto i : result) {
std::cout << i << " ";
}
std::cout << "\n";
}
Example output is:
$ g++ test.cc -std=c++11 && ./a.out
smallest values: 4433 2793 2444 4542 5557
largest values: 2147474453 2147475243 2147477379 2147469788 2147468894
elapsed: 123ms
Note that in this case only the position of the nth element is accurate with respect to the order imposed by the provided comparison operator. The other elements are guaranteed to be smaller/greater or equal to that one however, depending on the comparison operator provided. That is, the top n min/max elements are returned, but they are not correctly sorted.
Don't expect the other algorithms to produce results in a specific order either. (While the approaches using priority queue and set actually produce sorted output, their results have the opposite order).
For reference:
http://en.cppreference.com/w/cpp/algorithm/nth_element
http://en.cppreference.com/w/cpp/container/priority_queue
http://en.cppreference.com/w/cpp/container/set
http://en.cppreference.com/w/cpp/algorithm/max_element

I actually have 100 000 Lat & Lon points drawn on a opengl sphere. I want to work out the 20 nearest points to each of the 100 000 points. So we have two loops to pick each point then calculate that point against every other point and save the closest 20 points.
This reads as if you want to perform a k-nearest neighbor search in the first place. For this, you usually use specialized data structures (e.g., a binary search tree) to speed up the queries (especially when you are doing 100k of them).
For spherical coordinates you'd have to do a conversion to a cartesian space to fix the coordinate wrap-around. Then you'd use an Octree or kD-Tree.
Here's an approach using the Fast Library for Approximate Nearest Neighbors (FLANN):
#include <vector>
#include <random>
#include <iostream>
#include <flann/flann.hpp>
#include <cmath>
struct Point3d {
float x, y, z;
void setLatLon(float lat_deg, float lon_deg) {
static const float r = 6371.; // sphere radius
float lat(lat_deg*M_PI/180.), lon(lon_deg*M_PI/180.);
x = r * std::cos(lat) * std::cos(lon);
y = r * std::cos(lat) * std::sin(lon);
z = r * std::sin(lat);
}
};
std::vector<Point3d> random_data(std::vector<Point3d>::size_type n) {
static std::mt19937 gen{std::random_device()()};
std::uniform_int_distribution<> dist(0, 36000);
std::vector<Point3d> out(n);
for (auto &i: out)
i.setLatLon(dist(gen)/100., dist(gen)/100.);
return out;
}
int main() {
// generate random spherical point cloud
std::vector<Point3d> data = random_data(1000);
// generate query point(s) on sphere
std::vector<Point3d> query = random_data(1);
// convert into library datastructures
auto mat_data = flann::Matrix<float>(&data[0].x, data.size(), 3);
auto mat_query = flann::Matrix<float>(&query[0].x, query.size(), 3);
// build KD-Tree-based index data structure
flann::Index<flann::L2<float> > index(mat_data, flann::KDTreeIndexParams(4));
index.buildIndex();
// perform query: approximate nearest neighbor search
int k = 5; // number of neighbors to find
std::vector<std::vector<int>> k_indices;
std::vector<std::vector<float>> k_dists;
index.knnSearch(mat_query, k_indices, k_dists, k, flann::SearchParams(128));
// k_indices now contains for each query point the indices to the neighbors in the original point cloud
// k_dists now contains for each query point the distances to those neighbors
// removed printing of results for brevity
}
You'd receive results similar to this one (click to enlarge):
For reference:
https://en.wikipedia.org/wiki/Nearest_neighbor_search
https://en.wikipedia.org/wiki/Octree
https://en.wikipedia.org/wiki/Kd-tree
http://www.cs.ubc.ca/research/flann/

Heap is the data structure that you need. pre-C++11 stl only had functions which managed heap data in your own arrays. Someone mentioned that boost has a heap class, but you don't need to go so far as to use boost if your data is simple integers. stl's heap will do just fine. And, of course, the algorithm is to order the heap so that the highest value is the first one. So with each new value, you push it on the heap, and, once the heap reaches 21 elements in size, you pop the first value from the heap. This way whatever 20 values remain are always the 20 lowest.

implicit transformation while calling std::adjacent_difference()

I wanted to get a vector of distances between adjacent points in a vector:
struct Point { double x, y, z; }
vector<double> adjacent_distances( vector<Point> points ) {
...
}
I thought that stl::adjacent_difference() would do the trick for me if I simply provided a function that finds the distance between 2 points:
double point_distance( Point a, Point b ) {
return magnitude(a-b); // implementation details are unimportant
}
Thus, I was hoping that this would work,
vector<double> adjacent_distances( vector<Point> points )
{
vector<double> distances;
std::adjacent_difference( points.begin(), points.end(),
std::back_inserter(distances),
ptr_fun( point_distance ) );
return distances;
}
only to find that input and output vectors had to be of (practically) the same type because adjacent_difference() calls
output[0] = input[0]; // forces input and output to be of same value_type
output[1] = op( input[1], input[0] );
output[2] = op( input[2], input[1] );
....
which, sadly, is inconsistent with respect to how std::adjacent_find() works.
So, I had to convert my code to
double magnitude( Point pt );
Point difference( Point a, Point b ); // implements b-a
vector<double> adjacent_distances( vector<Point> points )
{
vector<Point> differences;
std::adjacent_difference( points.begin(), points.end(),
std::back_inserter(differences),
ptr_fun( point_difference ) );
vector<double> distances;
std::transform( differences.begin(), differences.end(),
std::back_inserter(distances),
ptr_fun( magnitude ) );
return distances;
}
NB: the first element of differences had to be removed for the function to behave correctly, but I skipped the implementation details, for brevity.
Question: is there a way I could achieve some transformation implicitly, so that I don't have to create the extra vector, and achieve a call to adjacent_difference() with input_iterator and output_iterator of different value_types ?

Probably this isn't so neat though, in this specific case, std::transform
with 2 input sequences might meet the purpose.
For example:
vector<double> adjacent_distances( vector<Point> points ) {
if ( points.empty() ) return vector<double>();
vector<double> distances(
1, point_distance( *points.begin(), *points.begin() ) );
std::transform( points.begin(), points.end() - 1,
points.begin() + 1,
std::back_inserter(distances),
ptr_fun( point_distance ) );
return distances;
}
Hope this helps

Indeed that adjacent_difference algorithm is logically broken (why should be the difference of the same time of the elements? Why is the first output element equal to the first one instead of getting an output sequence one item shorter than the input one (way more logical)?
Anyway I don't understand why you are punishing yourself by using a functional approach with C++ where clearly the code is going to be harder to write, harder to read, slower to compile and not faster to execute. Oh.. and let's not talk about the kind of joke error message you are going to face if there is any error in what you type.
What is the bad part of
std::vector<double> distances;
for (int i=1,n=points.size(); i<n; i++)
distances.push_back(magnitude(points[i] - points[i-1]));
?
This is shorter, more readable, faster to compile and may be even faster to execute.
EDIT
I wanted to check my subjective "shorter, more readable, faster to compile and may be faster to execute". Here the results:
~/x$ time for i in {1..10}
> do
> g++ -Wall -O2 -o algtest algtest.cpp
> done
real 0m2.001s
user 0m1.680s
sys 0m0.150s
~/x$ time ./algtest
real 0m1.121s
user 0m1.100s
sys 0m0.010s
~/x$ time for i in {1..10}
> do
> g++ -Wall -O2 -o algtest2 algtest2.cpp
> done
real 0m1.651s
user 0m1.230s
sys 0m0.190s
~/x$ time ./algtest2
real 0m0.941s
user 0m0.930s
sys 0m0.000s
~/x$ ls -latr algtest*.cpp
-rw-r--r-- 1 agriffini agriffini 932 2011-11-25 21:44 algtest2.cpp
-rw-r--r-- 1 agriffini agriffini 1231 2011-11-25 21:45 algtest.cpp
~/x$
The following is the accepted solution (I fixed what is clearly a brainfart of passing the vector of points by value).
// ---------------- algtest.cpp -------------
#include <stdio.h>
#include <math.h>
#include <functional>
#include <algorithm>
#include <vector>
using std::vector;
using std::ptr_fun;
struct Point
{
double x, y;
Point(double x, double y) : x(x), y(y)
{
}
Point operator-(const Point& other) const
{
return Point(x - other.x, y - other.y);
}
};
double magnitude(const Point& a)
{
return sqrt(a.x*a.x + a.y*a.y);
}
double point_distance(const Point& a, const Point& b)
{
return magnitude(b - a);
}
vector<double> adjacent_distances( const vector<Point>& points ) {
if ( points.empty() ) return vector<double>();
vector<double> distances(
1, point_distance( *points.begin(), *points.begin() ) );
std::transform( points.begin(), points.end() - 1,
points.begin() + 1,
std::back_inserter(distances),
ptr_fun( point_distance ) );
return distances;
}
int main()
{
std::vector<Point> points;
for (int i=0; i<1000; i++)
points.push_back(Point(100*cos(i*2*3.141592654/1000),
100*sin(i*2*3.141592654/1000)));
for (int i=0; i<100000; i++)
{
adjacent_distances(points);
}
return 0;
}
Here is instead the explicit loop solution; it requires two include less, one function definition less and the function body is also shorter.
// ----------------------- algtest2.cpp -----------------------
#include <stdio.h>
#include <math.h>
#include <vector>
struct Point
{
double x, y;
Point(double x, double y) : x(x), y(y)
{
}
Point operator-(const Point& other) const
{
return Point(x - other.x, y - other.y);
}
};
double magnitude(const Point& a)
{
return sqrt(a.x*a.x + a.y*a.y);
}
std::vector<double> adjacent_distances(const std::vector<Point>& points)
{
std::vector<double> distances;
if (points.size()) distances.reserve(points.size()-1);
for (int i=1,n=points.size(); i<n; i++)
distances.push_back(magnitude(points[i] - points[i-1]));
return distances;
}
int main()
{
std::vector<Point> points;
for (int i=0; i<1000; i++)
points.push_back(Point(100*cos(i*2*3.141592654/1000),
100*sin(i*2*3.141592654/1000)));
for (int i=0; i<100000; i++)
{
adjacent_distances(points);
}
return 0;
}
Summary:
code size is shorter (algtest2.cpp is less than 76% of algtest.cpp)
compile time is better (algtest2.cpp requires less than 83% of algtest.cpp)
execution time is better (algtest2.cpp runs in less than 85% of algtest.cpp)
So apparently on my system (not hand-picked) I was right on all points except execution speed (the one with "maybe") where to get from slightly slower to substantially faster I had to call reserve on the result array. Even with this optimization the code is of course shorter.
I also think that the fact that this version is more readable is also objective and not an opinion... but I'd be happy to be proven wrong by meeting someone that can understand what the functional thing is doing and that cannot understand what the explicit one is doing instead.

Yes, this can be done, but not easily. I don't think it's worth the effort, unless you really need to avoid the copy.
If you really want to do this, you can try creating your own iterator that iterates over the vector<Point> and a wrapper around Point.
The iterator class will dereference to an instance of the wrapper class. The wrapper class should support operator - or your distance function, and it should store the distance. You should then implement an operator for implicit conversion to double, which will be invoked when adjacent_difference attempts to assign the wrapper to the vector<double>.
I don't have time to go into detail, so if anything is unclear, I'll check back later or someone else can try to explain better. Below is an example of a wrapper that does this.
struct Foo {
Foo(double value) { d = value; }
operator double() { return d; }
double d;
};
Foo sub(const Foo& a, const Foo& b) {
return Foo(a.d - b.d);
}
vector<Foo> values = {1, 2, 3, 5, 8};
vector<double> dist;
adjacent_difference(values.begin(), values.end(), back_inserter(dist), sub);
// dist = {1, 1, 1, 2, 3}

This is maybe a bit dirty, but you could simply add
struct Point {
double x,y,z;
operator double() { return 0.0; }
};
or perhaps
struct Point {
double x,y,z;
operator double() { return sqrt(x*x + y*y + z*z); } // or whatever metric you are using
};
The effect being to set the first distance to 0, or the distance of the first point from the origin. However, I could imagine that you wouldn't want to pollute your Point struct with a rather arbitrary definition for conversion to double - in which case dauphic's wrapper is a cleaner solution.

Since you have no use for the first element returned by adjacent_difference, which is precisely the one giving trouble, you can write your own version of the algorithm, skipping that initial assignment:
template <class InputIterator, class OutputIterator, class BinaryOperation>
OutputIterator my_adjacent_difference(InputIterator first, InputIterator last,
OutputIterator result,
BinaryOperation binary_op)
{
if (first != last)
{
InputIterator prev = first++; // To start
while (first != last)
{
InputIterator val = first++;
*result++ = binary_op(*val, *prev);
prev = val;
}
}
return result;
}
This should work, though you will be missing some STL optimisations.

I like the a) formulation of the problem, b) comparison of the execution times, c) my_adjacent_difference, d) self-comment that my_adjacent_difference may lack built-in optimizations. I agree that the Standard C++ adjacent_difference logic limits the algorithm's application and that the three lines loop-code is a solution, which many would go with. I reuse the idea to apply the algorithm transform and present the version in C++ 11 illustrating lambdas. Regards.
#include <iostream> /* Standard C++ cout, cerr */
#include <vector> /* Standard C++ vector */
#include <algorithm> /* Standard C++ transform */
#include <iterator> /* Standard C++ back_inserter */
#include <cmath> /* Standard C++ sqrt */
#include <stdexcept> /* Standard C++ exception */
using namespace std; /* Standard C++ namespace */
struct Point {double x, y, z;}; // I would define this differently.
int main(int, char*[])
{
try {
const Point points[] = {{0, 0, 0}, {1, 0, 0}, {1, 0, 3}};
vector<double> distances;
transform(points + 1, points + sizeof(points) / sizeof(Point),
points, back_inserter(distances),
[](const Point& p1, const Point& p2)
{
double dx = p2.x - p1.x;
double dy = p2.y - p1.y;
double dz = p2.z - p1.z;
return sqrt(dx * dx + dy * dy + dz * dz);
});
copy(distances.begin(), distances.end(),
ostream_iterator<double>(cout, "\n"));
}
catch(const exception& e) {
cerr << e.what() << endl;
return -1;
}
catch(...) {
cerr << "Unknown exception" << endl;
return -2;
}
return 0;
}
The output:
1
3

Sorting a set<string> on the basis of length

My question is related to this.
I wanted to perform a sort() operation over the set with the help of a lambda expression as a predicate.
My code is
#include <set>
#include <string>
#include <iostream>
#include <algorithm>
int main() {
using namespace std;
string s = "abc";
set<string> results;
do {
for (int n = 1; n <= s.size(); ++n) {
results.insert(s.substr(0, n));
}
} while (next_permutation(s.begin(), s.end()));
sort (results.begin(),results.end());[](string a, string b)->bool{
size_t alength = a.length();
size_t blength = b.length();
return (alength < blength);
});
for (set<string>::const_iterator x = results.begin(); x != results.end(); ++x) {
cout << *x << '\n';
}
return 0;
}
But the numbers and types of errors were so complex that I couldn't understand how to fix them. Can someone tell me whats wrong with this code.

Edit: Note that Steve Townsend's solution is actually the one you're searching for, as he inlines as a C++0x Lambda what I write as C++03 code below.
Another solution would be to customize the std::set ordering function:
The std::set is already ordered...
The std::set has its own ordering, and you are not supposed to change it once it is constructed. So, the following code:
int main(int argc, char* argv[])
{
std::set<std::string> aSet ;
aSet.insert("aaaaa") ;
aSet.insert("bbbbb") ;
aSet.insert("ccccccc") ;
aSet.insert("ddddddd") ;
aSet.insert("e") ;
aSet.insert("f") ;
outputSet(aSet) ;
return 0 ;
}
will output the following result:
- aaaaa
- bbbbb
- ccccccc
- ddddddd
- e
- f
... But you can customize its ordering function
Now, if you want, you can customize your set by using your own comparison function:
struct MyStringLengthCompare
{
bool operator () (const std::string & p_lhs, const std::string & p_rhs)
{
const size_t lhsLength = p_lhs.length() ;
const size_t rhsLength = p_rhs.length() ;
if(lhsLength == rhsLength)
{
return (p_lhs < p_rhs) ; // when two strings have the same
// length, defaults to the normal
// string comparison
}
return (lhsLength < rhsLength) ; // compares with the length
}
} ;
In this comparison functor, I did handle the case "same length but different content means different strings", because I believe (perhaps wrongly) that the behaviour in the original program is an error. To have the behaviour coded in the original program, please remove the if block from the code.
And now, you construct the set:
int main(int argc, char* argv[])
{
std::set<std::string, MyStringLengthCompare> aSet ;
aSet.insert("aaaaa") ;
aSet.insert("bbbbb") ;
aSet.insert("ccccccc") ;
aSet.insert("ddddddd") ;
aSet.insert("e") ;
aSet.insert("f") ;
outputSet(aSet) ;
return 0 ;
}
The set will now use the functor MyStringLengthCompare to order its items, and thus, this code will output:
- e
- f
- aaaaa
- bbbbb
- ccccccc
- ddddddd
But beware of the ordering mistake!
When you create your own ordering function, it must follow the following rule:
return true if (lhs < rhs) is true, return false otherwise
If for some reason your ordering function does not respect it, you'll have a broken set on your hands.

std::sort rearranges the elements of the sequence you give it. The arrangement of the sequence in the set is fixed, so the only iterator you can have is a const iterator.
You'll need to copy results into a vector or deque (or such) first.
vector sortable_results( results.begin(), results.end() );

You can customize the ordering of the elements in the set by providing a custom predicate to determine ordering of added elements relative to extant members. set is defined as
template <
class Key,
class Traits=less<Key>,
class Allocator=allocator<Key>
>
class set
where Traits is
The type that provides a function
object that can compare two element
values as sort keys to determine their
relative order in the set. This
argument is optional, and the binary
predicate less is the default
value.
There is background on how to use lambda expression as a template parameter here.
In your case this translates to:
auto comp = [](const string& a, const string& b) -> bool
{ return a.length() < b.length(); };
auto results = std::set <string, decltype(comp)> (comp);
Note that this will result in set elements with the same string length being treated as duplicates which is not what you want, as far as I can understand the desired outcome.

sort requires random access iterators which set doesn't provide (It is a bidirectional iterator). If you change the code to use vector it compiles fine.

You cannot sort a set. It's always ordered on keys (which are elements themselves).
To be more specific, std::sort requires random access iterators. The iterators provided by std::set are not random.

Since I wrote the original code you're using, perhaps I can expand on it... :)
struct cmp_by_length {
template<class T>
bool operator()(T const &a, T const &b) {
return a.length() < b.length() or (a.length() == b.length() and a < b);
}
};
This compares by length first, then by value. Modify the set definition:
set<string, cmp_by_length> results;
And you're good to go:
int main() {
using namespace std;
string s = "abc";
typedef set<string, cmp_by_length> Results; // convenience for below
Results results;
do {
for (int n = 1; n <= s.size(); ++n) {
results.insert(s.substr(0, n));
}
} while (next_permutation(s.begin(), s.end()));
// would need to add cmp_by_length below, if I hadn't changed to the typedef
// i.e. set<string, cmp_by_length>::const_iterator
// but, once you start using nested types on a template, a typedef is smart
for (Results::const_iterator x = results.begin(); x != results.end(); ++x) {
cout << *x << '\n';
}
// of course, I'd rather write... ;)
//for (auto const &x : results) {
// cout << x << '\n';
//}
return 0;
}

std::set is most useful to maintain a sorted and mutating list. It faster and smaller to use a vector when the set itself wont change much once it's been built.
#include <vector>
#include <string>
#include <iostream>
#include <algorithm>
int main() {
using namespace std;
string s = "abc";
vector<string> results;
do {
for (size_t n = 1; n <= s.size(); ++n) {
results.push_back(s.substr(0, n));
}
} while (next_permutation(s.begin(), s.end()));
//make it unique
sort( results.begin(), results.end() );
auto end_sorted = unique( results.begin(), results.end() );
results.erase( end_sorted, results.end() );
//sort by length
sort (results.begin(),results.end());
[](string lhs, string rhs)->bool
{ return lhs.length() < rhs.length(); } );
for ( const auto& result: results ) {
cout << result << '\n';
}
}
I used the classic, sort/unique/erase combo to make the results set unique.I also cleaned up your code to be a little bit more c++0x-y.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

choose the best variable after comparing them - c++

Related

How to get an element (struct) in an array by a value in the struct

Sort Integers by The Number of 1 Bits . I used one sort function to sort the vector ? But why sort is not working?

Recommend C++ container to hold top 20 minimum values

implicit transformation while calling std::adjacent_difference()

Sorting a set<string> on the basis of length

Categories

Resources