Cosine similarity using Armadillo C++ gives me negative results - c++

I have implmented a cosine similarity function which uses armadillo c++ linear library. My code is the next one:
double cosine_similarity(rowvec A, rowvec B)
{
double dot = as_scalar(A*B.t());
double denomA = as_scalar(A*A.t());
double denomB = as_scalar(B*B.t());
return dot / (sqrt(denomA) * sqrt(denomB)) ;
}
I have this matrix as an example:
-0.0261 -0.6780 -0.7338 0.0345
-0.0230 0.0082 -0.0400 -0.7056
-0.2590 -0.7052 0.6590 -0.0371
-0.9650 0.2072 -0.1551 0.0426
-0.0230 0.0082 -0.0400 -0.7056
When I calculate the cosine similarity between the second row vs all the rows i get the following results:
Similarity [1,0]: -1.07944e-16
Similarity [1,1]: 1
Similarity [1,2]: -1.96262e-17
Similarity [1,3]: -1.71729e-16
Similarity [1,4]: 1
Is this correct? Am worrying about the negative results even when what they mean is zero. I am wondering if I am doing something wrong. cosine_similarity is used in this way:
for (unsigned int row = 0; row < redV.n_rows ; row++)
{
double ans = cosine_similarity(redV.row(indicate), redV.row(row));
cout << "Similarity [" << indicate << "," << row << "]: " << ans << endl;
cout << "Similarity [" << indicate << "," << row << "]: " << norm_dot(redV.row(indicate), redV.row(row)) << endl;
}

Your code seems right, you’re just encountering machine precision issues. A*B.t() for A the third row and B for the second row (or vice versa) should be zero but isn’t, but is within machine precision. Scipy’s cosine has the same problem:
In [10]: from scipy.spatial.distance import cosine
In [11]: 1 - cosine([-0.2590, -0.7052, 0.6590, -0.0371], [-0.0230, 0.0082, -0.0400, -0.7056])
Out[11]: -1.114299639159988e-05 # <=============== should not be negative!
(I subtract by 1 just because of how Scipy defines cosine. And this answer doesn’t match yours because you posted only four decimal points—but the punchline is, it’s negative.)
If you want to check whether a floating point number x is within machine precision of another y, compare their difference with std::numeric_limits::epsilon. See the definition of almost_equal here. You might want cosine_similarity to check if the result is almost_equal to 0 or 1, in which case return 0 or 1.

Related

Why is the volume always calculated by zero?

I'm beginner in using functions, and I wrote this simple code. But I don't know why the volume is always calculated as zero.
#include <iostream>
using namespace std;
double area (double) ;
double volume (double) ;
int main () {
double radious ;
cout << "please enter the Radious \n" ;
cin >> radious ;
cout << "The area = " << area(radious) << "\n" ;
cin >> radious ;
cout << "The volume = " << volume(radious) << "\n" ;
cout << radious << "\n" ;
}
// defintion function of the area
double area (double R) {
return ( (4) * (3.14) * (R * R) ) ;
}
// defintion function of the volume
double volume (double R) {
return ( (3/4) * (3.14) * (R * R * R) ) ;
}
Your function volume will always return zero because 3 divided by 4 is 0 when interpreted as an integer. This is because casting any real value to integer will simply result in discarding decimal part. For example, 2.7 as int will be 2 not 3, there is no rounding, in a mathematical sense.
You can fix this in 2 ways:
A) reorder your equation so division will be the last operation you do, e.g. ((3.14*R*R*R*3)/4). Note that this is often necessary, when you want your result to be int, which is not the case here.
B) explicitly say that either (or both) 3 or 4 have to be treated as a real number (float/double) by adding .0, e.g. 3.0/4 or 3/4.0 or 3.0/4.0. This approach is better in your case since you expect double anyway.
For more information refer to Numeric conversions and this FAQ
The part 3/4 in your code performs an integer division. Integers cannot have floating points so usually the last part of integer is truncated, which leaves 0 in your case.
You can replace 3/4 with 3.0/4.0to make it work.
Good Luck!

Usage of arrays to generate random numbers

I'm trying to generate N random floats between 0 and 1 where N is specified by the user. Then I need to find the mean and the variance of the generated numbers. Struggling with finding the variance.
Already tried using variables instead of an array but have changed my code to allow for arrays instead.
#include <cstdlib>
#include <ctime>
#include <cmath>
using namespace std;
int main(){
int N, i;
float random_numbers[i], sum, mean, variance, r;
cout << "Enter an N value" << endl;
cin >> N;
sum = 0;
variance = 0;
for (i = 0; i < N; i++) {
srand(i + 1);
random_numbers[i] = ((float) rand() / float(RAND_MAX));
sum += random_numbers[i];
cout << random_numbers[i] << endl;
mean= sum / N;
variance += pow(random_numbers[i]-mean,2);
}
variance = variance / N;
cout << " The sum of random numbers is " << sum << endl;
cout << " The mean is " << mean << endl;
cout << " The variance is " << variance << endl;
}
The mean and sum is currently correct however the variance is not.
The mean you calculate inside the loop is a "running-mean", ie for each new incoming number you calculate the mean up to this point. For the variance however your forumla is incorrect. This:
variance += pow(random_numbers[i]-mean,2);
would be correct if mean was the final value, but as it is the running mean the result for variance is incorrect. You basically have two options. Either you use the correct formula (search for "variance single pass algorithm" or "running variance") or you first calculate the mean and then set up a second loop to calculate the variance (for this case your formula is correct).
Note that the single pass algorithm for variance is numerically not as stable as using two loops, so if you can afford it memory and performance-wise you should prefer the algorithm using two passes.
PS: there are other issues with your code, but I concentrated on your main question.
The mean that you use inside the variance computation is only the mean of the first to i element. You should compute the mean of the sample first, then do another loop to compute the variance.
Enjoy

Having difficulties locating a problem in the for loop

I am writing a program where a user inputs the name of contestants and buys like a ticket for a competition. I am trying to figure out the percent chance for each contestant to win but for some reason its returning zero, here's the code
for(int i = 0; i < ticPurch.size(); i++){
totalTics = ticPurch[i] + totalTics; //Figuring out total amount of ticket bought
}
cout << totalTics;
for (int i = 0; i < names.size(); i++){
cout << "Contenstant " << " Chance of winning " << endl;
cout << names[i] << " " << ((ticPurch.at(i))/(totalTics)) * 100 << " % " << endl; //Figuring out the total chance of winning
}
ticPurch is a vector of the the tickets each contestant bought and names is a vector for the contestants name. For some reason the percent is always returning zero and I don't know why
return 0;
Dividing an integer by an integer gives you an integer, by truncation of the fractional part.
Since your values are less than one, your result will always be zero.
You could cast an operand to a floating-point type to get the calculation you wanted:
(ticPurch.at(i) / (double)totalTics) * 100
Then probably round this result, since you seem to want whole number results:
std::floor((ticPurch.at(i) / (double)totalTics) * 100)
My preferred approach, which avoids floating-point entirely (always nice!), is to multiply to the resolution of your calculation first:
(ticPurch.at(i) * 100) / totalTics
This will always round down, so be aware of that if you decided to go with, say, std::round (or std::ceil) instead of std::floor in the example above. Arithmetic trickery can mimic those if needs be.
Now, instead of e.g. (3/5) * 100 (which is 0*100 (which is 0)), you have e.g. (3*100)/5 (which is 300/5 (which is 60)).

Cos function giving me zero

I am a complete beginner in programming and I was given the following assignment:
Write a C++ program that computes a pair of estimates of π, using a sequence of inscribed and circumscribed regular polygons. Halt after no more than 30 steps, or when the difference between the perimeters of the circumscribed and inscribed polygons is less than a tolerance of ε=10⁻¹⁵. Your output should have three columns, for the number of sides, the perimeter of an inscribed polygon, and perimeter of the circumscribed polygon. For the last two columns, display 14 digits after the decimal point.
well, I decided to use the law of cos to find the lengths of the sides of the polygon but when I was testing out my program I realized the line:
a = cos(360 / ngon);
keeps giving me a zero as the output which makes everything else also zero and I am not sure what is wrong please help.
P.S. Sorry if the program looks really sloppy, I am really bad at this.
#include "stdafx.h"
#include <iostream>
#include <iomanip>
#include <fstream>
#define _USE_MATH_DEFINES
#include <math.h>
#include <cmath>
using namespace std;
int main()
{
char zzz;
int ngon = 3, a, ak;
double insngon = 0.0;
double cirngon = 0.0;
cout << "Number of Sides" << "\t\t\t" << "Perimeter of insribed region" << "\t\t\t" << "Perimeneter of circumscribed polygon" << "\t\t" << "\n";
while (ngon <= 30)
{
a = cos(360 / ngon);
ak = pow(.5, 2) + pow(.5, 2) - 2 * .5*.5*a;
insngon = (ak*ngon);
cirngon = (ak / (sqrt(1 - pow(ak, 2))));
cout << fixed << setprecision(14) << ngon << " " << insngon << " " << cirngon << endl;
ngon++;
if (cirngon - insngon <= pow(10.0, -15));
cin >> zzz;
return 0;
}
cout << "\nEnter any character and space to end ";
cin >> zzz;
return 0;
}
One issue is that you declared integers, yet you are using them in the call to cos here:
int ngon = 3, a, ak;
//...
a = cos(360 / ngon);
Since a is an integer, the return value of cos (which is of type double) will be truncated. Also, since ngon is an integer, the 360 / ngon will also truncate.
The fix is to make a a double, and divide 360.0 by ngon to prevent the truncation:
int ngon = 3, ak;
double a;
//...
a = cos(360.0 / ngon);
The other issue, as pointed out in the comments is that the trigonometric functions in C++ use radians as the argument, not degrees. You need to change the argument to the equivalent value in radians.
Another issue is that you're using pow to compute values that are constant. There is no need to introduce an unnecessary function call to compute constant values. Just define the constants and use them.
For example:
const double HALF_SQUARED = 0.25
const double EPSILON_VALUE = 10.0e-15;
and then use HALF_SQUARED and EPSILON_VALUE instead of the calls to pow.
Also, pow is itself a floating point function, thus can produce results that are not exact as is discussed by this question . Thus pow(ak, 2) should be replaced with simply ak * ak.
Use float a; (or double a) instead of int a.
Here the return type of a is int
And calculating
a = cos(360/ngon)
Is equivalent to a= cos(120) that is the result of cos(120) is 0.8141 and being a integer type "a" will only store the integer part it.
Therefore 'a' will be 0 and discarding floating value.
Also use double ak; instead of int ak;.
Because here pow function has been used which have return type 'double'

Am I doing double to float conversion here

const double dBLEPTable_8_BLKHAR[4096] = {
0.00000000000000000000000000000000,
-0.00000000239150987901837200000000,
-0.00000000956897738824125100000000,
-0.00000002153888378764179400000000,
-0.00000003830892270073604800000000,
-0.00000005988800189093979000000000,
-0.00000008628624126316708500000000,
-0.00000011751498329992671000000000,
-0.00000015358678995269770000000000,
-0.00000019451544774895524000000000,
-0.00000024031597312124120000000000,
-0.00000029100459975062165000000000
}
If I change the double above to float, am I doing incurring conversion cpu cycles when I perform operations on the array contents? Or is the "conversion" sorted out during compile time?
Say, dBLEPTable_8_BLKHAR[1] + dBLEPTable_8_BLKHAR[2] , something simple like this?
On a related note, how many trailing decimal places should a float be able to store?
This is c++.
Any good compiler will convert the initializers during compile time. However, you also asked
am I incurring conversion cpu cycles when I perform operations on the array contents?
and that depends on the code performing the operations. If your expression combines array elements with variables of double type, then the operation will be performed at double precision, and the array elements will be promoted (converted) before the arithmetic takes place.
If you just combine array elements with variables of float type (including other array elements), then the operation is performed on floats and the language doesn't require any promotion (But if your hardware only implements double precision operations, conversion might still be done. Such hardware surely makes the conversions very cheap, though.)
Ben Voigt answer addresses your question for most parts.
But you also ask:
On a related note, how many trailing decimal places should a float be able to store
It depends on the value of the number you are trying to store. For large numbers there is no decimals - in fact the format can't even give you a precise value for the integer part. For instance:
float x = BIG_NUMBER;
float y = x + 1;
if (x == y)
{
// The code get here if BIG_NUMBER is very high!
}
else
{
// The code get here if BIG_NUMBER is no so high!
}
If BIG_NUMBER is 2^23 the next greater number would be (2^23 + 1).
If BIG_NUMBER is 2^24 the next greater number would be (2^24 + 2).
The value (2^24 + 1) can not be stored.
For very small numbers (i.e. close to zero), you will have a lot of decimal places.
Floating point is to be used with great care because they are very imprecise.
http://en.wikipedia.org/wiki/Single-precision_floating-point_format
For small numbers you can experiment with the program below.
Change the exp variable to set the starting point. The program will show you what the step size is for the range and the first four valid numbers.
int main (int argc, char* argv[])
{
int exp = -27; // <--- !!!!!!!!!!!
// Change this to set starting point for the range
// Starting point will be 2 ^ exp
float f;
unsigned int *d = (unsigned int *)&f; // Brute force to set f in binary format
unsigned int e;
cout.precision(100);
// Calculate step size for this range
e = ((127-23) + exp) << 23;
*d = e;
cout << "Step size = " << fixed << f << endl;
cout << "First 4 numbers in range:" << endl;
// Calculate first four valid numbers in this range
e = (127 + exp) << 23;
*d = e | 0x00000000;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000001;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000002;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
*d = e | 0x00000003;
cout << hex << "0x" << *d << " = " << fixed << f << endl;
return 0;
}
For exp = -27 the output will be:
Step size = 0.0000000000000008881784197001252323389053344726562500000000000000000000000000000000000000000000000000
First 4 numbers in range:
0x32000000 = 0.0000000074505805969238281250000000000000000000000000000000000000000000000000000000000000000000000000
0x32000001 = 0.0000000074505814851022478251252323389053344726562500000000000000000000000000000000000000000000000000
0x32000002 = 0.0000000074505823732806675252504646778106689453125000000000000000000000000000000000000000000000000000
0x32000003 = 0.0000000074505832614590872253756970167160034179687500000000000000000000000000000000000000000000000000
const double dBLEPTable_8_BLKHAR[4096] = {
If you change the double in that line to float, then one of two things will happen:
At compile time, the compiler will convert the numbers -0.00000000239150987901837200000000 to the float that best represents them, and will then store that data directly into the array.
At runtime, during the program initialization (before main() is called!) the runtime that the compiler generated will fill that array with data of type float.
Either way, once you get to main() and to code that you've written, all of that data will be stored as float variables.