Evaluation of SHAP outputs - evaluation

Am working on implementing XAI using SHAP library, the implementation is just fine but in case we want to prove that shap outputs are accurate and we eventually can trust them, is there any kind of metrics or how can we make sure that the values we get are truly the correct features contributions

Related

What are the main rates and values we should figure to evaluate both feature detection, description and matching?

I work on palmprint recognition using feature2D with Open_CV library, and I use algorithms such as SIFT, SURF, ORB... to detect features and extract/match descriptors. My test include (1 vs 1) palmprint and also (1 vs Data Base) of palmprint.
Ones I get the result, I need to evaluate the algorithm, and for this I know that there are some rates or scores (like EER, rank-1 identification, recall and accuracy) which gives an estimation about how much this method was successful. Now I need to know if any of those rates are implemented in Open_CV, and how to use them. If they aren't, what are the different formulas used in the literary.
As far as I know there is little implemented in OpenCV. A common way is to store the results (e.g. in JSON) and process those with other programs such as Matlab or Python. This also allows you to change the evaluation without the need to recompute the algorithms.
There is no overall best method to show the results. It always depends on what you want to show. In my opinion ROC is the best way to express your output. It is also very widely used in research.
If you insist on doing it in C++, then you could use:
Roceasy or
DLIB

How to write tests for mathematical optimization procedures?

I'm working on project where I need to minimize functions by several variables like func(input_parameters, variable_parameters) -> min(variable_parameters).
I use optimizing functions from SciPy, so minimization process is a grey box: I can see the code on GitHub and read about used algorithms, but I'd like to think that it's okay and aim to testing of my own project.
Though, particular libraries shouldn't matter in this question.
At the moment I use few approaches:
Create simple examples and find global/local minima by hand and create test that performs optimization and compares its solution with the right one
If method needs gradients, compare analytically calculated gradients with their numerical approximation in tests
For iterative algorithms built upon ones provided by SciPy check that sequence of function values is monotonically nonincreasing in tests
Is there a book or an article about testing of mathematical optimization procedures?
P. S. I'm not talking about Test functions for optimization
, I'm asking about approaches used to test optimization procedure to find bugs faster.
I find the hypothesis library really useful for testing optimisation algorithms in development.
You can set it up to generate random test cases (functions, linear programs, etc) according to some specification. The idea is that you pass these to your algorithm and test for known invariants. For example you could have it throw random problems or subproblems at your algorithm and check that (for example):
Gradient descent methods produce a series of nonincreasing objectives
Local search finds a solution with no better neighbours
Heuristics maintain feasibility
There's a useful PyCon talk here explaining the idea of property based testing. It focuses more on testing APIs than algorithms, but I think the ideas transfer. I've found this approach does a pretty good job finding cases of unexpected behaviour as I'm writing a new algorithm.

Supprt Vector Machine works in matlab, doesn't work in c++

I'm writing an application that uses an SVM to do classification on some images (specifically these). My Matlab implementation works really well. Using a SIFT bag-of-words approach, I'm able to get near 100% accuracy with a linear kernel.
I need to implement this in C++ for speed/portability reasons, and so I've tried using both libsvm and dlib. I've tried multiple SVM types (c_svm, nu_svm, one_class) and multiple kernels (linear, polynomial, rbf). The best I've been able to achieve is around 50% accuracy - even on the same samples that I've trained on. I've confirmed that my feature generators are working, because when I export my c++-generated features to Matlab and train on those, I'm able to get near-perfect results again.
Is there something magical about Matlab's SVM implementation? Are there any common pitfalls or areas that I might look into that would explain the behavior I'm seeing? I know this is a little vague, but part of the problem is that I don't know where to go. Please let me know in the comments if there is other info I can provide that would be helpful.
There is nothing magical about the Matlab version of the libraries, other that it runs in Matlab which makes it harder to shoot yourself on the foot.
A check list:
Are you normalizing your data, making all values lie between 0 and 1
(or between -1 and 1), either linearly or using the mean and the
standard deviation?
Are you parameter searching for a good value of C (or C and gamma in
the case of an RBF kernel)? Doing cross validation or on a hold out set?
Are you sure that your're handling NaN, and all other floating point
nastiness? Matlab is very good at hiding this from you, C++ not so
much.
Could it be that you're loading your data incorrectly, reading a
"%s" into a double or something that is adding noise to your input
data?
Could it be that libsvm/dlib expects the data in row major order and
your're sending it in in column major (or the other way around)? Again Matlab makes this almost impossible, C++ not so much.
32-64 bit nastiness one version of the library, executable compiled
with the other?
Some other things:
Could it be that in Matlab you're somehow leaking the class (y) into
the preprocessing? no one does this on purpose, but I've seen it happen.
If you make almost any f(y) a feature, you'll get almost 100%
everytime.
Sometimes it helps to verify that everything is numerically
identical by printing to file before training both in C++ and
Matlab.
i'm very happy with libsvm using the rbf kernel. carlosdc pointed out the most common errors in the correct order :-). for libsvm - did you use the python tools shipped with libsvm? if not i recommend to do so. write your feature vectors to a file (from matlab and/or c++) and do a metatraining for the rbf kernel with easy.py. you get the parameters and a prediction for the generated model. if this prediction is ok continue with c++. from training you also get a scaled feature file (min/max transformed to -1.0/1.0 for every feature). compare these to your c++ implementation as well.
some libsvm issues: a nasty habit is (if i remember correctly) that values scaling to 0 (zero) are omitted in the scaled file. in grid.py is a parameter "nr_local_worker" which is defining the mumber of threads. you might wish to increase it.

What are some good practices for unit testing probability distributions?

I'm working in a project where I need to generate Poisson, Normal, etc. variables from scratch. I know there are implementations in Python. I'm used to writing tests for almost everything I code.
I'm wondering what would be a good practice (if any) to test those functions?
I assume that your implementation is built on top of a uniform-distribution pseudonumber generator which you trust to be good enough (Not only the distribution of the generated values, but also the randomness of their order - see Diehard tests).
You should build two histograms: The first, based on values generated by your implementation. The second, based on a trusted implementation, or better - based on a maximum-likelihood estimate of the value count in each histogram column of the given distribution.
Next, you can verify that the counts match, for all histogram columns, using a tight confidence interval.
What I've done in similar circumstances is a) write a simple histogram routine that plots a histogram of samples, and run it on a few thousand samples to eyeball it; and b) test some key statistics - standard deviation, mean, ... to see that they behave as they should.
You could at the very least assert that the returned value is not null and in the range you expect. That still ensures that the methods at least run and don't error out and that they pass a basic sanity check.
You could also gather many values, and assert that you get somewhere close to the expected distribution of values but that would take more work.

What's a good technique to unit test digial audio generation

I want to unit test a signal generator - let's say it generates a simple sine wave, or does frequency modulation of a signal onto a sine wave. It's easy enough to define sensible test parameters, and it's well known what the output should "look like" - but this is quite hard to test.
I could do (eg) a frequency analysis on the output and check that, check the maximum amplitude etc, but a) this will make the test code significantly more complicated than the code it's testing and b) doesn't fully test the shape of the output.
Is there an established way to do this?
One way to do this would be to capture a "known good" output and compare bit-for-bit against that. As long as your algorithm is deterministic you should get the same output every time. You might have to recalibrate it occasionally if anything changes, but at least you'll know if it does change at all.
This situation is a strong argument for a modeling tool like Matlab, to generate and review a well understood test set automatically, as well as to provide an environment for automatic comparison and scoring. Especially for instances where combinatorial explosions of test variations take place, automation makes it possible and straight forward generate a huge dataset, locate problems, and pare back if needed to a representative qualification test set.
Often undervalued is the means to generate a large, extensive tests exercising both the requirements and the limits of the implementation of your design. Thinking about and designing those cases up front is also a huge advantage in introducing a clean, problem free system.
One possible semi-automated way of testing is to code up your signal generators from spec by 3 different algorithms, or perhaps by 3 different programmers in 3 different programming languages. Then randomly generate parameters within the complete range of legal control input values and capture and compare the outputs of all 3 generators to see if they agree within some error bound. You could also include some typical and some suspected worse case parameters. If the outputs always agree, then there's a much higher probability that everything works per spec than if they don't.