As I was writing the in the other post, I am trying to implement the K-means algorithm in C++. When debugging, I get no errors, but when trying to run the program, I get the error I mentioned in the title:
Terminate called after throwing an instance of 'std:invalid_argument' what():stof
I know that this error comes when the file can't convert to float. Then, what I wanted to ask is, is there something that I should change in my code, or would it be better to use another file format as input, instead of a *.csv file ? (obviosuly, I'm making the hypothesis that the program can't read something from the file. I don't know if that's the right reasoning though.)
Thank you everyone!
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
using namespace std;
//Inizializzare il punto
struct Point {
double x, y; // Coordinate del punto
int cluster; // Cluster di default
double minDist; // Distanza minima
Point()
: x(0.0)
, y(0.0)
, cluster(-1)
, minDist(__DBL_MAX__)
{
}
Point(double x, double y)
: x(x)
, y(y)
, cluster(-1)
, minDist(__DBL_MAX__)
{
}
double distance(Point p)
{
return (p.x - x) * (p.x - x) + (p.y - y) * (p.y - y);
}
};
vector<Point> readcsv()
{
vector<Point> points;
string line;
ifstream file("Mall_Customers.csv");
while (getline(file, line)) {
stringstream lineStream(line);
string bit;
double x, y;
getline(lineStream, bit, ',');
x = stof(bit);
getline(lineStream, bit, '\n');
y = stof(bit);
points.push_back(Point(x, y));
}
return points;
}
vector<Point> points = readcsv();
void kMeansClustering(vector<Point>* points, int epochs, int k)
{
int n = points->size();
vector<Point> centroids;
srand(time(0));
for (int i = 0; i < k; ++i) {
centroids.push_back(points->at(rand() % n));
}
for (vector<Point>::iterator c = begin(centroids); c != end(centroids); ++c) {
int clusterId = c - begin(centroids);
{
for (vector<Point>::iterator it = points->begin(); it != points->end(); ++it) {
Point p = *it;
double dist = c->distance(p);
if (dist < p.minDist) {
p.minDist = dist;
p.cluster = clusterId;
}
*it = p;
}
}
}
vector<int> nPoints;
vector<double> sumX, sumY;
for (int j = 0; j < k; j++) {
nPoints.push_back(0.0);
sumX.push_back(0.0);
sumY.push_back(0.0);
}
for (vector<Point>::iterator it = points->begin(); it != points->end(); ++it) {
int clusterId = it->cluster;
nPoints[clusterId] += 1;
sumX[clusterId] += it->x;
sumY[clusterId] += it->y;
it->minDist = __DBL_MAX__; // reset distance
}
// Compute the new centroids
for (vector<Point>::iterator c = begin(centroids); c != end(centroids); ++c) {
int clusterId = c - begin(centroids);
c->x = sumX[clusterId] / nPoints[clusterId];
c->y = sumY[clusterId] / nPoints[clusterId];
}
// Write to csv
ofstream myfile;
myfile.open("output.csv");
myfile << "x,y,c" << endl;
for (vector<Point>::iterator it = points->begin(); it != points->end();
++it) {
myfile << it->x << "," << it->y << "," << it->cluster << endl;
}
myfile.close();
}
int main()
{
vector<Point> points = readcsv();
// Run k-means with 100 iterations and for 5 clusters
kMeansClustering(&points, 100, 5);
}
It seems to work now, thank you everyone for the help you gave me, in particular #Yksisarvinen and #molbdnilo : the problem was in the fact that I blindly imported all the csv file in the program, but I was supposed to import only two columns: one was the income, the other one was the spending column.
Although now I have to figure why it shows me the final result, not in the form of "bubbles",as one would expect from the algorithm,but in the form of points being clustered on a straight line.
When trying to open the original file, the original dataset, I get the error shown in the image below.
How do I merge certain elements in a vector or similar data structure by only iterating over the complete list once? Is there a more efficient way than what I have got?
I have a vector of vectors of points: std::vector<std::vector<cv::Point>> contours
And I need to compare always two of them and then decide if I want to merge them or continue comparing.
ContourMoments has some helper functions to calculate the distance between points for example. The function merge() only takes all the points from one ContourMoments object and adds them to the calling ContourMoments object.
#include <iostream>
#include <cmath>
#include <ctime>
#include <cstdlib>
#include <opencv/cv.h>
#include <opencv2/imgproc/imgproc.hpp>
class ContourMoments
{
private:
void init()
{
moments = cv::moments(contour);
center = cv::Point2f(moments.m10 / moments.m00, moments.m01 / moments.m00);
float totalX = 0.0, totalY = 0.0;
for (auto const&p : contour)
{
totalX += p.x;
totalY += p.y;
}
gravitationalCenter = cv::Point2f(totalX / contour.size(), totalY / contour.size());
}
public:
cv::Moments moments;
std::vector<cv::Point2f> contour;
cv::Point2f center;
cv::Point2f gravitationalCenter;
ContourMoments(const std::vector<cv::Point2f> &c)
{
contour = c;
init();
}
ContourMoments(const ContourMoments &cm)
{
contour = cm.contour;
init();
}
void merge(const ContourMoments &cm)
{
contour.insert(contour.end(), std::make_move_iterator(cm.contour.begin()), std::make_move_iterator(cm.contour.end()));
init();
}
float horizontalDistanceTo(const ContourMoments &cm)
{
return std::abs(center.x - cm.center.x);
}
float verticalDistanceTo(const ContourMoments &cm)
{
return std::abs(center.y - cm.center.y);
}
float centerDistanceTo(const ContourMoments &cm)
{
return std::sqrt(std::pow(center.x - cm.center.x, 2) + std::pow(center.y - cm.center.y, 2));
}
ContourMoments() = default;
~ContourMoments() = default;
};
float RandomFloat(float a, float b) {
float random = ((float) rand()) / (float) RAND_MAX;
float diff = b - a;
float r = random * diff;
return a + r;
}
std::vector<std::vector<cv::Point2f>> createData()
{
std::vector<std::vector<cv::Point2f>> cs;
for (int i = 0; i < 2000; ++i) {
std::vector<cv::Point2f> c;
int j_stop = rand()%11;
for (int j = 0; j < j_stop; ++j) {
c.push_back(cv::Point2f(RandomFloat(0.0, 20.0), RandomFloat(0.0, 20.0)));
}
cs.push_back(c);
}
return cs;
}
void printVectorOfVectorsOfPoints(const std::vector<std::vector<cv::Point2f>> &cs) {
std::cout << "####################################################" << std::endl;
for (const auto &el : cs) {
bool first = true;
for (const auto &pt : el) {
if (!first) {
std::cout << ", ";
}
first = false;
std::cout << "{x: " + std::to_string(pt.x) + ", y: " + std::to_string(pt.y) + "}";
}
std::cout << std::endl;
}
std::cout << "####################################################" << std::endl;
}
void merge(std::vector<std::vector<cv::Point2f>> &contours, int &counterMerged){
for(auto it = contours.begin() ; it < contours.end() ; /*++it*/)
{
int counter = 0;
if (it->size() < 5)
{
it = contours.erase(it);
continue;
}
for (auto it2 = it + 1; it2 < contours.end(); /*++it2*/)
{
if (it2->size() < 5)
{
it2 = contours.erase(it2);
continue;
}
ContourMoments cm1(*it);
ContourMoments cm2(*it2);
if (cm1.centerDistanceTo(cm2) > 4.0)
{
++counter;
++it2;
continue;
}
counterMerged++;
cm1.merge(std::move(cm2));
it2 = contours.erase(it2);
}
if (counter > 0)
{
std::advance(it, counter);
}
else
{
++it;
}
}
}
int main(int argc, const char * argv[])
{
srand(time(NULL));
std::vector<std::vector<cv::Point2f>> contours = createData();
printVectorOfVectorsOfPoints(contours);
int counterMerged = 0;
merge(contours, counterMerged);
printVectorOfVectorsOfPoints(contours);
std::cout << "Merged " << std::to_string(counterMerged) << " vectors." << std::endl;
return 0;
}
Thanks in advance!
Best wishes
Edit
posted the complete example - install opencv first
This generates 2000 vectors of up to 10 points and merges them if they are close together.
For each of your contours, you can pre-compute the centers. Then, what you want to do is to cluster the contours. Each pair of contours whose center is at most a distance of d apart should belong to the same cluster.
This can be done with a simple radius query. Like this:
for each contour ci in all contours
for each contour cj with cluster center at most d away from ci
merge cluster ci and cj
For the radius query, I would suggest something like a k-d tree. Enter all your contours into the tree and then you can query for the neighbors in O(log n) instead of O(n) like you are doing now.
For the merging part, I would suggest a union-find data structure. This lets you do the merge in practically constant time. At the end, you can just gather all your contour data into a big cluster contour.
I didn't scrutinize your code. If what you want to do is to scan the vector, form groups of contiguous points and emit a single point per group, in sequence, e.g.
a b c|d e|f g h|i j k|l => a b c d f i l
the simplest is to perform the operation in-place.
Keep an index after the last emitted point, let n (initially 0), and whenever you emit an new one, store it at array[n++], overwriting this element, which has already been processed. In the end, resize the array.
One might think of a micro-optimization to compare n to the current index, and avoid overwriting an element with itself. Anyway, as soon as an element has been dropped, the test becomes counter-productive.
I am trying to convert vector values that are received form a file into two different formats. After converting into first format and printing the vector out, I want to use the original "read in" values to convert them into the second format.
At the moment, it seems that the second conversion is happening on the already converted values. However, I don't understand why it doesn't convert back to the original value then? And ultimately, how can I use the original values of the vector for the second conversion so the
else {
GetType();
GetXArg();
GetYArg();
}
works for the second time?
Here is the code snippet:
void Force::convToP() //converts to polar
{
if (forceType == 'c')
{
SetType('p');
SetXArg(sqrt(xArg * xArg + yArg * yArg));
SetYArg(atan(yArg / xArg));
}
else {
GetType(); //just return type, xArg and yArg in their original form
GetXArg();
GetYArg();
}
}
void Force::convToC() //converts to cartesian
{
if (forceType == 'p') {
SetType('c');
SetXArg(xArg * cos(yArg));
SetYArg(xArg * sin(yArg));
}
else {
GetType();
GetXArg();
GetYArg();
}
}
and the main function:
while (file >> type >> x >> y) {
Force f(type, x, y);
force.push_back(f);
}
for (int i = 0; i < force.size(); ++i) {
force[i].printforce();
}
cout << "Forces in Polar form: " << endl;
for (int i = 0; i < force.size(); ++i)
{
force[i].convToP();
force[i].printforce();
}
cout << "Forces in Cartesian form: " << endl;
for (int i = 0; i < force.size(); ++i) {
force[i].convToC();
force[i].printforce();
}
finally, the output at the moment is:
p 10 0.5
c 12 14
p 25 1
p 100 0.8
c 50 50
p 20 3.14
c -100 25
p 12 1.14
Forces in Polar form: <-first conversion. All works fine
p 10 0.5
p 18.4391 0.649399
p 25 1
p 100 0.8
p 70.7107 0.61548
p 20 3.14
p 103.078 0.237941
p 12 1.14
Forces in Cartesian form:
c 8.77583 4.20736 <-works fine
c 14.6858 8.8806 <-why doesn't convert back to c 12 14/ how to use the original values of vector
c 13.5076 11.3662
c 69.6707 49.9787
c 57.735 33.3333
c -20 -0.0318509
c 100.173 23.6111
c 5.01113 4.55328
Press any key to continue . . .
Very new to this, been baffled for some time, so would really appreciate any help and advice.
You are modifying xArg and then using the modified value to convert yArg. You need to do both conversions before either modification.
void Force::convToP() //converts to polar
{
if (forceType == 'c')
{
forceType = 'p';
decltype(xArg) newX = sqrt(xArg * xArg + yArg * yArg);
decltype(yArg) newY = atan(yArg / xArg);
xArg = newX;
yArg = newY
}
// no else needed
}
void Force::convToC() //converts to cartesian
{
if (forceType == 'p') {
forceType = 'c';
decltype(xArg) newX = xArg * cos(yArg);
decltype(yArg) newY = xArg * sin(yArg);
xArg = newX;
yArg = newY
}
// no else needed
}
You can verify the correct values with std::complex
#include <complex>
#include <vector>
#include <iostream>
int main() {
std::vector<std::complex<double>> nums
{
std::polar<double>(10, 0.5),
std::complex<double>(12, 14),
std::polar<double>(25, 1),
std::polar<double>(100, 0.8),
std::complex<double>(50, 50),
std::polar<double>(20, 3.14),
std::complex<double>(-100, 25),
std::polar<double>(12, 1.14)
};
for (auto num : nums)
{
std::cout << num << " (" << std::abs(num) << ", " << std::arg(num) << ")\n";
}
}
(8.77583,4.79426) (10, 0.5)
(12,14) (18.4391, 0.86217)
(13.5076,21.0368) (25, 1)
(69.6707,71.7356) (100, 0.8)
(50,50) (70.7107, 0.785398)
(-20,0.0318531) (20, 3.14)
(-100,25) (103.078, 2.89661)
(5.01113,10.9036) (12, 1.14)
It's often helpful to break sub-operations down into small single-concern functions.
The compiler will optimise away all the redundant copies, loads and stores:
#include <cmath>
#include <tuple>
auto computed(double xArg, double yArg)
{
return
std::make_tuple(
std::sqrt(xArg * xArg + yArg * yArg),
std::atan(yArg / xArg));
}
void modify(double& xArg, double& yArg)
{
std::tie(xArg, yArg) = computed(xArg, yArg);
}
Without knowing exactly what GetXArg() etc. do, it appears that you are modifying your original forces when you call force[i].convToP();. So after this loop all the forces in your array are converted to polar. If all you want to do is print the polar and cartesian representations without changing the originals, you should generate copies of the forces:
cout << "Forces in Polar form: " << endl;
for (int i = 0; i < force.size(); ++i)
{
Force tmpForce = force[i];
tmpForce.convToP();
tmpForce.printforce();
}
etc.
Edit: Looks like #Aconcagua beat me to it.
Your conversion function obviously changes the object it operates on. Then you need to be aware that the index operator ([]) returns a reference to the object in the vector. If you do not want to modify the original object, you'll have to make a copy of:
for (int i = 0; i < force.size(); ++i)
{
auto copy = force[i]
copy.convToP();
copy.printforce();
}
Now, you can operate on the unchanged values in the second run. Assuming you don't need the original values afterwards any more, you can just leave the second loop as is, otherwise, make copies again...
All a little easier with range based for loop:
for(auto c : force)
{
c.convToP();
c.printforce();
}
for(auto& c : force)
// ^ if you don't need original values any more, can use reference now
{
c.convToC();
c.printforce();
}
I'm a programming student, and for a project I'm working on, on of the things I have to do is compute the median value of a vector of int values and must be done by passing it through functions. Also the vector is initially generated randomly using the C++ random generator mt19937 which i have already written down in my code.I'm to do this using the sort function and vector member functions such as .begin(), .end(), and .size().
I'm supposed to make sure I find the median value of the vector and then output it
And I'm Stuck, below I have included my attempt. So where am I going wrong? I would appreciate if you would be willing to give me some pointers or resources to get going in the right direction.
Code:
#include<iostream>
#include<vector>
#include<cstdlib>
#include<ctime>
#include<random>
#include<vector>
#include<cstdlib>
#include<ctime>
#include<random>
using namespace std;
double find_median(vector<double>);
double find_median(vector<double> len)
{
{
int i;
double temp;
int n=len.size();
int mid;
double median;
bool swap;
do
{
swap = false;
for (i = 0; i< len.size()-1; i++)
{
if (len[i] > len[i + 1])
{
temp = len[i];
len[i] = len[i + 1];
len[i + 1] = temp;
swap = true;
}
}
}
while (swap);
for (i=0; i<len.size(); i++)
{
if (len[i]>len[i+1])
{
temp=len[i];
len[i]=len[i+1];
len[i+1]=temp;
}
mid=len.size()/2;
if (mid%2==0)
{
median= len[i]+len[i+1];
}
else
{
median= (len[i]+0.5);
}
}
return median;
}
}
int main()
{
int n,i;
cout<<"Input the vector size: "<<endl;
cin>>n;
vector <double> foo(n);
mt19937 rand_generator;
rand_generator.seed(time(0));
uniform_real_distribution<double> rand_distribution(0,0.8);
cout<<"original vector: "<<" ";
for (i=0; i<n; i++)
{
double rand_num=rand_distribution(rand_generator);
foo[i]=rand_num;
cout<<foo[i]<<" ";
}
double median;
median=find_median(foo);
cout<<endl;
cout<<"The median of the vector is: "<<" ";
cout<<median<<endl;
}
The median is given by
const auto median_it = len.begin() + len.size() / 2;
std::nth_element(len.begin(), median_it , len.end());
auto median = *median_it;
For even numbers (size of vector) you need to be a bit more precise. E.g., you can use
assert(!len.empty());
if (len.size() % 2 == 0) {
const auto median_it1 = len.begin() + len.size() / 2 - 1;
const auto median_it2 = len.begin() + len.size() / 2;
std::nth_element(len.begin(), median_it1 , len.end());
const auto e1 = *median_it1;
std::nth_element(len.begin(), median_it2 , len.end());
const auto e2 = *median_it2;
return (e1 + e2) / 2;
} else {
const auto median_it = len.begin() + len.size() / 2;
std::nth_element(len.begin(), median_it , len.end());
return *median_it;
}
There are of course many different ways how we can get element e1. We could also use max or whatever we want. But this line is important because nth_element only places the nth element correctly, the remaining elements are ordered before or after this element, depending on whether they are larger or smaller. This range is unsorted.
This code is guaranteed to have linear complexity on average, i.e., O(N), therefore it is asymptotically better than sort, which is O(N log N).
Regarding your code:
for (i=0; i<len.size(); i++){
if (len[i]>len[i+1])
This will not work, as you access len[len.size()] in the last iteration which does not exist.
std::sort(len.begin(), len.end());
double median = len[len.size() / 2];
will do it. You might need to take the average of the middle two elements if size() is even, depending on your requirements:
0.5 * (len[len.size() / 2 - 1] + len[len.size() / 2]);
Instead of trying to do everything at once, you should start with simple test cases and work upwards:
#include<vector>
double find_median(std::vector<double> len);
// Return the number of failures - shell interprets 0 as 'success',
// which suits us perfectly.
int main()
{
return find_median({0, 1, 1, 2}) != 1;
}
This already fails with your code (even after fixing i to be an unsigned type), so you could start debugging (even 'dry' debugging, where you trace the code through on paper; that's probably enough here).
I do note that with a smaller test case, such as {0, 1, 2}, I get a crash rather than merely failing the test, so there's something that really needs to be fixed.
Let's replace the implementation with one based on overseas's answer:
#include <algorithm>
#include <limits>
#include <vector>
double find_median(std::vector<double> len)
{
if (len.size() < 1)
return std::numeric_limits<double>::signaling_NaN();
const auto alpha = len.begin();
const auto omega = len.end();
// Find the two middle positions (they will be the same if size is odd)
const auto i1 = alpha + (len.size()-1) / 2;
const auto i2 = alpha + len.size() / 2;
// Partial sort to place the correct elements at those indexes (it's okay to modify the vector,
// as we've been given a copy; otherwise, we could use std::partial_sort_copy to populate a
// temporary vector).
std::nth_element(alpha, i1, omega);
std::nth_element(i1, i2, omega);
return 0.5 * (*i1 + *i2);
}
Now, our test passes. We can write a helper method to allow us to create more tests:
#include <iostream>
bool test_median(const std::vector<double>& v, double expected)
{
auto actual = find_median(v);
if (abs(expected - actual) > 0.01) {
std::cerr << actual << " - expected " << expected << std::endl;
return true;
} else {
std::cout << actual << std::endl;
return false;
}
}
int main()
{
return test_median({0, 1, 1, 2}, 1)
+ test_median({5}, 5)
+ test_median({5, 5, 5, 0, 0, 0, 1, 2}, 1.5);
}
Once you have the simple test cases working, you can manage more complex ones. Only then is it time to create a large array of random values to see how well it scales:
#include <ctime>
#include <functional>
#include <random>
int main(int argc, char **argv)
{
std::vector<double> foo;
const int n = argc > 1 ? std::stoi(argv[1]) : 10;
foo.reserve(n);
std::mt19937 rand_generator(std::time(0));
std::uniform_real_distribution<double> rand_distribution(0,0.8);
std::generate_n(std::back_inserter(foo), n, std::bind(rand_distribution, rand_generator));
std::cout << "Vector:";
for (auto v: foo)
std::cout << ' ' << v;
std::cout << "\nMedian = " << find_median(foo) << std::endl;
}
(I've taken the number of elements as a command-line argument; that's more convenient in my build than reading it from cin). Notice that instead of allocating n doubles in the vector, we simply reserve capacity for them, but don't create any until needed.
For fun and kicks, we can now make find_median() generic. I'll leave that as an exercise; I suggest you start with:
typename<class Iterator>
auto find_median(Iterator alpha, Iterator omega)
{
using value_type = typename Iterator::value_type;
if (alpha == omega)
return std::numeric_limits<value_type>::signaling_NaN();
}
Consider the following dataset and centroids. There are 7 individuals and two means each with 8 dimensions. They are stored row major order.
short dim = 8;
float centroids[] = {
0.223, 0.002, 0.223, 0.412, 0.334, 0.532, 0.244, 0.612,
0.742, 0.812, 0.817, 0.353, 0.325, 0.452, 0.837, 0.441
};
float data[] = {
0.314, 0.504, 0.030, 0.215, 0.647, 0.045, 0.443, 0.325,
0.731, 0.354, 0.696, 0.604, 0.954, 0.673, 0.625, 0.744,
0.615, 0.936, 0.045, 0.779, 0.169, 0.589, 0.303, 0.869,
0.275, 0.406, 0.003, 0.763, 0.471, 0.748, 0.230, 0.769,
0.903, 0.489, 0.135, 0.599, 0.094, 0.088, 0.272, 0.719,
0.112, 0.448, 0.809, 0.157, 0.227, 0.978, 0.747, 0.530,
0.908, 0.121, 0.321, 0.911, 0.884, 0.792, 0.658, 0.114
};
I want to calculate each euclidean distances. c1 - d1, c1 - d2 ....
On CPU I would do:
float dist = 0.0, dist_sqrt;
for(int i = 0; i < 2; i++)
for(int j = 0; j < 7; j++)
{
float dist_sum = 0.0;
for(int k = 0; k < dim; k++)
{
dist = centroids[i * dim + k] - data[j * dim + k];
dist_sum += dist * dist;
}
dist_sqrt = sqrt(dist_sum);
// do something with the distance
std::cout << dist_sqrt << std::endl;
}
Is there any built in solution of vector distance calculation in THRUST?
It can be done in thrust. Explaining how will be rather involved, and the code is rather dense.
The key observation to start with is that the core operation can be done via a transformed reduction. The thrust transform operation is used to perform the elementwise subtraction of the vectors (individual-centroid) and squaring of each result, and the reduction sums the results together to produce the square of the euclidean distance. The starting point for this operation is thrust::reduce_by_key, but it gets rather involved to present the data correctly to reduce_by_key.
The final results are produced by taking the square root of each result from above, and we can use an ordinary thrust::transform for this.
The above is a summary description of the only 2 lines of thrust code that do all the work. However, the first line has considerable complexity to it. In order to exploit parallelism, the approach I took was to virtually "lay out" the necessary vectors in sequence, to be presented to reduce_by_key. To take a simple example, suppose we have 2 centroids and 4 individuals, and suppose our dimension is 2.
centroid 0: C00 C01
centroid 1: C10 C11
individ 0: I00 I01
individ 1: I10 I11
individ 2: I20 I21
individ 3: I30 I31
We can "lay out" the vectors like this:
C00 C01 C00 C01 C00 C01 C00 C01 C10 C11 C10 C11 C10 C11 C10 C11
I00 I01 I10 I11 I20 I21 I30 I31 I00 I01 I10 I11 I20 I21 I30 I31
To facilitate the reduce_by_key, we will also need to generate key values to delineate the vectors:
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
The above data "laid-out" data sets can be quite large, and we don't want to incur storage and retrieval cost, so we will generate these "on-the-fly" using thrust's collection of fancy iterators. This is where things get quite dense. With the above strategy in mind, we will use thrust::reduce_by_key to do the work. We'll create a custom functor provided to a transform_iterator to do the subtraction (and squaring) of the I and C vectors, which will be zipped together for this purpose. The "lay out" of the vectors will be created on the fly using permutation iterators with additional custom index-creation functors, to help with the replicated patterns in each of I and C.
Therefore, working from the "inside out", the sequence of steps is as follows:
for both I (data) and C (centr) use a counting_iterator combined with a custom indexing functor inside of a transform_iterator to produce the indexing sequences we will need.
using the indexing sequences created in step 1 and the base I and C vectors, virtually "lay out" the vectors via a permutation_iterator (one for each laid-out vector).
zip the 2 "laid out" virtual I and C vectors together, to create a <float, float> tuple vector (virtual).
take the zip_iterator from step 3, and combine with a custom distance-calculation functor ((I-C)^2) in a transform_iterator
use another transform_iterator, combining a counting_iterator with a custom key-generating functor, to produce the key sequence (virtual)
pass the iterators in steps 4 and 5 to reduce_by_keyas the inputs (keys, values) to be reduced. The output vectors for reduce_by_key are also keys and values. We don't need the keys, so we'll use a discard_iterator to dump those. The values we will save.
The above steps are all accomplished in a single line of thrust code.
Here's a code illustrating the above:
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/reduce.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/discard_iterator.h>
#include <thrust/copy.h>
#include <math.h>
#include <time.h>
#include <sys/time.h>
#include <stdlib.h>
#define MAX_DATA 100000000
#define MAX_CENT 5000
#define TOL 0.001
unsigned long long dtime_usec(unsigned long long prev){
#define USECPSEC 1000000ULL
timeval tv1;
gettimeofday(&tv1,0);
return ((tv1.tv_sec * USECPSEC)+tv1.tv_usec) - prev;
}
unsigned verify(float *d1, float *d2, int len){
unsigned pass = 1;
for (int i = 0; i < len; i++)
if (fabsf(d1[i] - d2[i]) > TOL){
std::cout << "mismatch at: " << i << " val1: " << d1[i] << " val2: " << d2[i] << std::endl;
pass = 0;
break;}
return pass;
}
void eucl_dist_cpu(const float *centroids, const float *data, float *rdist, int num_centroids, int dim, int num_data, int print){
int out_idx = 0;
float dist, dist_sqrt;
for(int i = 0; i < num_centroids; i++)
for(int j = 0; j < num_data; j++)
{
float dist_sum = 0.0;
for(int k = 0; k < dim; k++)
{
dist = centroids[i * dim + k] - data[j * dim + k];
dist_sum += dist * dist;
}
dist_sqrt = sqrt(dist_sum);
// do something with the distance
rdist[out_idx++] = dist_sqrt;
if (print) std::cout << dist_sqrt << ", ";
}
if (print) std::cout << std::endl;
}
struct dkeygen : public thrust::unary_function<int, int>
{
int dim;
int numd;
dkeygen(const int _dim, const int _numd) : dim(_dim), numd(_numd) {};
__host__ __device__ int operator()(const int val) const {
return (val/dim);
}
};
typedef thrust::tuple<float, float> mytuple;
struct my_dist : public thrust::unary_function<mytuple, float>
{
__host__ __device__ float operator()(const mytuple &my_tuple) const {
float temp = thrust::get<0>(my_tuple) - thrust::get<1>(my_tuple);
return temp*temp;
}
};
struct d_idx : public thrust::unary_function<int, int>
{
int dim;
int numd;
d_idx(int _dim, int _numd) : dim(_dim), numd(_numd) {};
__host__ __device__ int operator()(const int val) const {
return (val % (dim*numd));
}
};
struct c_idx : public thrust::unary_function<int, int>
{
int dim;
int numd;
c_idx(int _dim, int _numd) : dim(_dim), numd(_numd) {};
__host__ __device__ int operator()(const int val) const {
return (val % dim) + (dim * (val/(dim*numd)));
}
};
struct my_sqrt : public thrust::unary_function<float, float>
{
__host__ __device__ float operator()(const float val) const {
return sqrtf(val);
}
};
unsigned long long eucl_dist_thrust(thrust::host_vector<float> ¢roids, thrust::host_vector<float> &data, thrust::host_vector<float> &dist, int num_centroids, int dim, int num_data, int print){
thrust::device_vector<float> d_data = data;
thrust::device_vector<float> d_centr = centroids;
thrust::device_vector<float> values_out(num_centroids*num_data);
unsigned long long compute_time = dtime_usec(0);
thrust::reduce_by_key(thrust::make_transform_iterator(thrust::make_counting_iterator<int>(0), dkeygen(dim, num_data)), thrust::make_transform_iterator(thrust::make_counting_iterator<int>(dim*num_data*num_centroids), dkeygen(dim, num_data)),thrust::make_transform_iterator(thrust::make_zip_iterator(thrust::make_tuple(thrust::make_permutation_iterator(d_centr.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator<int>(0), c_idx(dim, num_data))), thrust::make_permutation_iterator(d_data.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator<int>(0), d_idx(dim, num_data))))), my_dist()), thrust::make_discard_iterator(), values_out.begin());
thrust::transform(values_out.begin(), values_out.end(), values_out.begin(), my_sqrt());
cudaDeviceSynchronize();
compute_time = dtime_usec(compute_time);
if (print){
thrust::copy(values_out.begin(), values_out.end(), std::ostream_iterator<float>(std::cout, ", "));
std::cout << std::endl;
}
thrust::copy(values_out.begin(), values_out.end(), dist.begin());
return compute_time;
}
int main(int argc, char *argv[]){
int dim = 8;
int num_centroids = 2;
float centroids[] = {
0.223, 0.002, 0.223, 0.412, 0.334, 0.532, 0.244, 0.612,
0.742, 0.812, 0.817, 0.353, 0.325, 0.452, 0.837, 0.441
};
int num_data = 8;
float data[] = {
0.314, 0.504, 0.030, 0.215, 0.647, 0.045, 0.443, 0.325,
0.731, 0.354, 0.696, 0.604, 0.954, 0.673, 0.625, 0.744,
0.615, 0.936, 0.045, 0.779, 0.169, 0.589, 0.303, 0.869,
0.275, 0.406, 0.003, 0.763, 0.471, 0.748, 0.230, 0.769,
0.903, 0.489, 0.135, 0.599, 0.094, 0.088, 0.272, 0.719,
0.112, 0.448, 0.809, 0.157, 0.227, 0.978, 0.747, 0.530,
0.908, 0.121, 0.321, 0.911, 0.884, 0.792, 0.658, 0.114,
0.721, 0.555, 0.979, 0.412, 0.007, 0.501, 0.844, 0.234
};
std::cout << "cpu results: " << std::endl;
float dist[num_data*num_centroids];
eucl_dist_cpu(centroids, data, dist, num_centroids, dim, num_data, 1);
thrust::host_vector<float> h_data(data, data + (sizeof(data)/sizeof(float)));
thrust::host_vector<float> h_centr(centroids, centroids + (sizeof(centroids)/sizeof(float)));
thrust::host_vector<float> h_dist(num_centroids*num_data);
std::cout << "gpu results: " << std::endl;
eucl_dist_thrust(h_centr, h_data, h_dist, num_centroids, dim, num_data, 1);
float *data2, *centroids2, *dist2;
num_centroids = 10;
num_data = 1000000;
if (argc > 2) {
num_centroids = atoi(argv[1]);
num_data = atoi(argv[2]);
if ((num_centroids < 1) || (num_centroids > MAX_CENT)) {std::cout << "Num centroids out of range" << std::endl; return 1;}
if ((num_data < 1) || (num_data > MAX_DATA)) {std::cout << "Num data out of range" << std::endl; return 1;}
if (num_data * dim * num_centroids > 2000000000) {std::cout << "data set out of range" << std::endl; return 1;}}
std::cout << "Num Data: " << num_data << std::endl;
std::cout << "Num Cent: " << num_centroids << std::endl;
std::cout << "result size: " << ((num_data*num_centroids*4)/1048576) << " Mbytes" << std::endl;
data2 = new float[dim*num_data];
centroids2 = new float[dim*num_centroids];
dist2 = new float[num_data*num_centroids];
for (int i = 0; i < dim*num_data; i++) data2[i] = rand()/(float)RAND_MAX;
for (int i = 0; i < dim*num_centroids; i++) centroids2[i] = rand()/(float)RAND_MAX;
unsigned long long dtime = dtime_usec(0);
eucl_dist_cpu(centroids2, data2, dist2, num_centroids, dim, num_data, 0);
dtime = dtime_usec(dtime);
std::cout << "cpu time: " << dtime/(float)USECPSEC << "s" << std::endl;
thrust::host_vector<float> h_data2(data2, data2 + (dim*num_data));
thrust::host_vector<float> h_centr2(centroids2, centroids2 + (dim*num_centroids));
thrust::host_vector<float> h_dist2(num_data*num_centroids);
dtime = dtime_usec(0);
unsigned long long ctime = eucl_dist_thrust(h_centr2, h_data2, h_dist2, num_centroids, dim, num_data, 0);
dtime = dtime_usec(dtime);
std::cout << "gpu total time: " << dtime/(float)USECPSEC << "s, gpu compute time: " << ctime/(float)USECPSEC << "s" << std::endl;
if (!verify(dist2, &(h_dist2[0]), num_data*num_centroids)) {std::cout << "Verification failure." << std::endl; return 1;}
std::cout << "Success!" << std::endl;
return 0;
}
Notes:
The code is set up to do 2 passes, a short one using a data set similar to yours, with printout for visual check. Then a larger data set can be entered, via command-line sizing parameters (number of centroids, then number of individuals), for benchmark comparison and validation of results.
Contrary to what I stated in the comments, the thrust code is only running about 25% faster than the naive single-threaded CPU code. Your mileage may vary.
This is just one way to think about handling it. I have had other ideas, but not enough time to flesh them out.
The data sets can become rather large. The code right now is intended to be limited to data sets where the product of dimension*number_of_centroids*number_of_individuals is less than 2 billion. However, as you approach even this number, you will need a GPU and CPU that both have a few GB of memory. I briefly explored larger data set sizes. A few code changes would be needed in various places to extend from e.g. int to unsigned long long, etc. However I haven't provided that as I am still investigating an issue with that code.
For another, non-thrust-related look at computing euclidean distances on the GPU, you may be interested in this question. If you follow the sequence of optimizations that were made there, it may shed some light on either how this thrust code might be improved, or else how another non-thrust realization could be used.
Sorry I wasn't able to squeeze more performance out.