c++ - implementing a multivariate probability density function for a likelihood filter - c++

I'm trying to construct a multivariate likelihood function in c++ code with the aim of comparing multiple temperature simulations for consistency with observations but taking into account autocorrelation between the time steps. I am inexperienced in c++ and so have been struggling to understand how to write the equation in c++ form. I have the covariance matrix, the simulations I wish to judge and the observations to compare to. The equation is as follows:
f(x,μ,Σ) = (1/√(∣Σ∣(2π)^d))*exp(−1/2(x-μ)Σ^(-1)(x-μ)')
So I need to find the determinant and the inverse of the covariance matrix. Does anyone know how to do that in c++ if x,μ and Σ are all specified?

I have found a few examples and resources to follow
https://github.com/dirkschumacher/rcppglm
https://www.youtube.com/watch?v=y8Kq0xfFF3U&t=953s
https://www.codeproject.com/Articles/25335/An-Algorithm-for-Weighted-Linear-Regression
https://www.geeksforgeeks.org/regression-analysis-and-the-best-fitting-line-using-c/
https://cppsecrets.com/users/489510710111510497118107979811497495464103109971051084699111109/C00-MLPACK-LinearRegression.php
https://stats.stackexchange.com/questions/146230/how-to-implement-glm-computationally-in-c-or-other-languages

Related

Weighted Matrix Factorization

So I have got this weird function which can be formulated in to objective function as shown on the image attached. I do know I can't use any L2 Low rank matrix factorization(LRMF) algorithms to solve this as this is considered a weighted Low rank matrix factorization(LRMF). I have looked at Iterative reweighted least square and Weighted principal component analysis but unfortunately they are available in matlab only. Any one knows any python functions I can use to solve this problem? My project does not require me to write a L2 LRMF algorithm and merely requires me to solve it using any existing functions. Any help would be ideal.

Randomized incremental algorithm for linear optimization

I'm trying to figure out a randomized incremental algorithm for a huge bunch of equations (Around 60K). The function I'm attempting to create takes in the number of inequalities, the coefficient matrix and the right-hand side, and the coefficients of the objective function. What I wanted to ask is which would be the best algorithm that would work best as I've tried to do research to aid and I just can't seem to find one that would work.

C++ armadillo not correctly solving poorly conditioned matrix

I have a relatively simple question regarding the linear solver built into Armadillo. I am a relative newcomer to C++ but have experience coding in other languages. I am solving a fluid flow problem by successive linearization, using the armadillo function Solve(A,b) to get the solution at each iteration.
The issue that I am running into is that my matrix is very ill-conditioned. The determinant is on the order of 10^-20 and the condition number is 75000. I know these are terrible conditions but it's what I've got. Does anyone know if it is possible to specify the precision in my A matrix and in the solve function to something beyond double (long double perhaps)? I know that there are double matrix classes in Armadillo but I haven't found any documentation for higher levels of precision.
To approach this from another angle, I wrote some code in Mathematica and the LinearSolve worked very well and the program converged to the correct answer. My reasoning is that Mathematica variables have higher precision which can handle the higher levels of rounding error.
If anyone has any insight on this, please let me know. I know there are other ways to approach a poorly conditioned matrix (like preconditioning and pivoting), but my work is more in the physics than in the actual numerical solution so I'm trying to steer clear of that.
EDIT: I just limited the precision in the Mathematica version to 15 decimal places and the program still converges. This leads me to believe it is NOT a variable precision question but rather an issue with the method.
As you said "your work is more in the physics": rather than trying to increase the accuracy, I would use the Moore-Penrose Pseudo-Inverse, which in Armadillo can be obtained by the function pinv. You should then experience a bit with the parameter tolerance to set it to a reasonable level.
The geometrical interpretation is as follows: bad condition numbers are due to the fact that the row/column-vectors are linearly dependent. In physics, such linearly dependencies usually have an origin which at least needs to be interpreted. The pseudoinverse first projects the matrix onto a lower dimensional space in which the vectors are "less linearly dependent" by dropping all singular vectors with singular values smaller than the parameter tolerance. The reulting matrix has a better condition number such that the standard inverse can be constructed with less problems.

Better understanding of cosine similarity

I am doing a little research on text mining and data mining. I need more help in understanding cosine similarity. I have read about it and notice that all of the given examples on the internet is using tf-idf before computing it through cosine-similarity.
My question
Is it possible to calculate cosine similarity just by using highest frequency distribution from a text file which will be the dataset. Most of the videos and tutorials that i go through has tf-idf ran before inputting it's data into cosine similarity, if no, what other types of equation/algorithm that can be input into cosine similarity?
2.Why is normalization used with tf-idf to compute cosine similarity? (can i do it without normalization?) Cosine similarity are computed from normalization of tf-idf output. Why is normalization needed?
3.What cosine similarity actually does to the weights of tf-idf?
I do not understand question 1.
TF-IDF weighting is a weighting scheme that worked well for lots of people on real data (think Lucene search). But the theoretical foundations of it are a bit weak. In particular, everybody seems to be using a slightly different version of it... and yes, it is weights + cosine similarity. In practise, you may want to try e.g. Okapi BM25 weighting instead, though.
I do not undestand this question either. Angular similarity is beneficial because the length of the text has less influence than with other distances. Furthermore, sparsity can be nicely exploitet. As for the weights, IDF is a heuristic with only loose statistical arguments: frequent words are more likely to occur at random, and thus should have less weight.
Maybe you can try to rephrase your questions so I can fully understand them. Also search for related questions such as these: Cosine similarity and tf-idf and
Better text documents clustering than tf/idf and cosine similarity?

Is there any free ITERATIVE linear system solver in c++ that allows me to feed in an arbitrary initial guess?

I am looking for an iterative linear system solver to calculate a continuously changing field. For the simulation to work properly, I need to re-calculate the field (maybe several times) for every time step. Fortunately, I have a good initial guess for each time step, so it is better I can feed it into an iterative solver. And the coefficient matrix is very dense.
The problem is I checked several iterative solvers online like Gmm++, IML++, ITL, DUNE/ISTL and so on. They are either for sparse systems or don't provide interfaces for inputting initial guesses (I might be wrong since I didn't have time to go through all the documents).
So I have two questions:
1 Is there any such c++ solver available online?
2 Since the coefficient matrix can be as large as thousands * thousands, could a direct solver be quicker than an iterative solver with a really good initial guess?
Great Thanks!
He
If you check the header for Conjugate Gradient in IML++ (http://math.nist.gov/iml++/cg.h.txt), you'll see that you can very easily provide the initial guess for the solution in the very variable where you'd expect to get the solution.