In GTest, parameterized tests ceated by INSTANTIATE_TEST_SUITE_P I use ::testing::Combine method extensively to generate thousands of tests. Is there a comparable feature in other unit test frameworks? I may need to use a different framework than GTest, but would hate to lose the Combine feature.
Copied GTest description of Combine if it's helpful
// Combine() allows the user to combine two or more sequences to produce
// values of a Cartesian product of those sequences' elements.
//
// Synopsis:
// Combine(gen1, gen2, ..., genN)
// - returns a generator producing sequences with elements coming from
// the Cartesian product of elements from the sequences generated by
// gen1, gen2, ..., genN. The sequence elements will have a type of
// tuple<T1, T2, ..., TN> where T1, T2, ..., TN are the types
// of elements from sequences produces by gen1, gen2, ..., genN.
//
// Combine can have up to 10 arguments. This number is currently limited
// by the maximum number of elements in the tuple implementation used by Google
// Test.
Related
In a property-based test setting like Haskell's quickcheck for custom data structures, how do you generate test data for n-ary properties of relations, e.g., transitivity or symmetry? The implementation language does not matter, I think.
Here is a naive C++ example using rapidcheck (just because I have this tool at hand, right now):
rc::check("Double equality is symmetric.", [](double a, double b) {
RC_ASSERT(!(a == b) || (b == a)); // a == b ==> b == a
});
In such a naive case it is quite unlikely that the tool will generate many examples where the premise (a == b) actually holds, so you end up wasting a lot of effort on meaningless tests. It gets even worse for 3-ary relations like transitivity.
Is there a general technique to tackle these issues? Do I need to generate equal pairs (for some constructive definition of "equals")? What about stuff like orderings?
What I do to raise the probability of value clashes is to restrict value generation to a smaller range and sometimes combine that with a more general generator.
Consider the following generator adapted from https://johanneslink.net/how-to-specify-it/#46-a-note-on-generation:
#Provide
Arbitrary<Integer> keys() {
return Arbitraries.oneOf(
Arbitraries.integers().between(-25, 25),
Arbitraries.integers()
);
}
Generation will first choose with equal probability between any integer and an integer between -25 and +25. Thus about every 100th value will be a duplicate.
In more difficult cases I might even have a generator that picks from a small set of predefined values.
UPDATE: The latest version of jqwik allows to explicitly generate duplicates with a given probability: https://jqwik.net/docs/snapshot/user-guide.html#inject-duplicate-values
I don't know, though, if QuickCheck or any other PBT library has a similar feature.
I've been asked to perform a linear discriminant analysis on a set of data for one of my projects. I'm using ALGLIB (C++ version) which has a fisherlda function but I need some help understanding how to use it.
The user answers a set of 6 questions (answers are a number from 1-7) which gives me an example data set of e.g. {1,2,3,4,5,6}. I then have 5 classes of 6 values each e.g. {0.765, 0.895, 1.345, 2.456, 0.789, 5.678}.
The fisher lda function takes a 2 dimensional array of values and returns another 1d array of values (that I have no idea what they mean).
As I understand it I need to see to which class the users answers best fit?
Any help understanding LDA and/or how I can use this function would be greatly appreciated.
EDIT:
Here's the definition of the function I'm trying to use:
/*************************************************************************
Multiclass Fisher LDA
Subroutine finds coefficients of linear combination which optimally separates
training set on classes.
INPUT PARAMETERS:
XY - training set, array[0..NPoints-1,0..NVars].
First NVars columns store values of independent
variables, next column stores number of class (from 0
to NClasses-1) which dataset element belongs to. Fractional
values are rounded to nearest integer.
NPoints - training set size, NPoints>=0
NVars - number of independent variables, NVars>=1
NClasses - number of classes, NClasses>=2
OUTPUT PARAMETERS:
Info - return code:
* -4, if internal EVD subroutine hasn't converged
* -2, if there is a point with class number
outside of [0..NClasses-1].
* -1, if incorrect parameters was passed (NPoints<0,
NVars<1, NClasses<2)
* 1, if task has been solved
* 2, if there was a multicollinearity in training set,
but task has been solved.
W - linear combination coefficients, array[0..NVars-1]
-- ALGLIB --
Copyright 31.05.2008 by Bochkanov Sergey
*************************************************************************/
void fisherlda(const real_2d_array &xy, const ae_int_t npoints, const ae_int_t nvars, const ae_int_t nclasses, ae_int_t &info, real_1d_array &w);
You are using fisherlda function which is an implementation of LDA algorithm.
LDA(Linear Discriminant Analysis) is aimed to find a linear combination of features that best characterizes or separates two or more classes of objects or events.
Assume the line y=wx(w,x both stand for a matrix here),so the given result of fisherlad is a 1d array of coefficients which is w.Then you can use this line to determine which class the answers belong.
In my business field, I have values that are expressed as W.mˆ-2.Kˆ-1.
In a 'base dimensions' point of view, these values are expressed as kg.sˆ-3.Kˆ-1 (W = kg.mˆ2.sˆ-3)
How do I implement this dimension and this unit with Boost Unit?
The only examples I found, including the official documentation, were about deriving dimensions from base dimensions, but I'd like to derive from the 'power' dimension, which is itself a derived dimension.
Also, I don't know if I have to derive from the power dimension, or if I must derive my dimension from the base ones and set my dimension's unit such as it is expressed in W.mˆ-2.Kˆ-1. I forsee that the latter would be more difficult to manipulate (getting the number of watts, given an area and a temperature would not be trivial given that my 'base-derived' dimension is about kilograms and seconds...).
Thanks.
You can use the unit operators to manipulate higher level dimensions and then typedef them into something useful. These units operators are available in the <boost/units/operators.hpp> header file.
Examples are available in the documentation and they are used to create high level dimensions for physical constants here [<boost/units/systems/si/codata/typedefs.hpp>][1]
typedef divide_typeof_helper<frequency,electric_potential>::type frequency_over_electric_potential;
typedef divide_typeof_helper<electric_charge,mass>::type electric_charge_over_mass;
typedef divide_typeof_helper<mass,amount>::type mass_over_amount;
and for your specific case:
typedef divide_typeof_helper< power , area >::type power_over_area;
typedef divide_typeof_helper< power_over_area, temperature >::type heat_transfer_coeff;
I developed the same algorithm (Baum-Welch for estimating parameters of a hidden Markov model) both in F# (.Net) and C++. In both cases I developed the same test that generates random test data with known distribution and then uses the algorithm to estimate the parameters, and makes sure it converges to the known right answer.
The problem is that the test works fine in the F# case, but fails to converge in the C++ implementation. I compared both algorithms on some real-world data and they give the same results, so my guess is that the generation of the test data is broken in the C++ case. Hence my question: What is the random number generator that comes with .Net 4 (I think this is the default version with VS2010)?
In F# I am using:
let random = new Random()
let randomNormal () = //for a standard normal random variable
let u1 = random.NextDouble()
let u2 = random.NextDouble()
let r = sqrt (-2. * (log u1))
let theta = 2. * System.Math.PI * u2
r * (sin theta)
//random.NextDouble() for uniform random variable on [0-1]
In C++ I use the standard Boost classes:
class HmmGenerator
{
public:
HmmGenerator() :
rng(37), //the seed does change the result, but it doesn't make it work
normalGenerator(rng, boost::normal_distribution<>(0.0, 1.0)),
uniformGenerator(rng, boost::uniform_01<>()) {}//other stuff here as well
private:
boost::mt19937 rng;
boost::variate_generator<boost::mt19937&,
boost::normal_distribution<> > normalGenerator;
boost::variate_generator<boost::mt19937&,
boost::uniform_01<> > uniformGenerator;
};
Should I expect different results using these two ways of generating random numbers?
EDIT: Also, is the generator used in .Net available in Boost (ideally with the same parameters), so I could run it in C++ and compare the outcomes?
Hence my question: What is the random number generator that comes with .Net 4 (I think this is the default version with VS2010)?
From the documentation on Random
The current implementation of the Random class is based on Donald E. Knuth's subtractive random number generator algorithm. For more information, see D. E. Knuth. "The Art of Computer Programming, volume 2: Seminumerical Algorithms". Addison-Wesley, Reading, MA, second edition, 1981.
.
Should I expect different results using these two ways of generating random numbers?
The Mersenne-Twister algorithm you're using in C++ is considered very respectable, compared to other off-the-shelf random generators.
I suspect any discrepancy in your codes lie elsewhere.
I'm trying to write a Monte Carlo simulation. In my simulation I need to generate many random variates from a discrete probability distribution.
I do have a closed-form solution for the distribution and it has finite support; however, it is not a standard distribution. I am aware that I could draw a uniform[0,1) random variate and compare it to the CDF get a random variate from my distribution, but the parameters in the distributions are always changing. Using this method is too slow.
So I guess my question has two parts:
Is there a method/algorithm to quickly generate finite, discrete random variates without using the CDF?
Is there a Python module and/or a C++ library which already has this functionality?
Acceptance\Rejection:
Find a function that is always higher than the pdf. Generate 2 Random variates. The first one you scale to calculate the value, the second you use to decide whether to accept or reject the choice. Rinse and repeat until you accept a value.
Sorry I can't be more specific, but I haven't done it for a while..
Its a standard algorithm, but I'd personally implement it from scratch, so I'm not aware of any implementations.
Indeed acceptance/rejection is the way to go if you know analytically your pdf. Let's call it f(x). Find a pdf g(x) such that there exist a constant c, such that c.g(x) > f(x), and such that you know how to simulate a variable with pdf g(x) - For example, as you work with a distribution with a finite support, a uniform will do: g(x) = 1/(size of your domain) over the domain.
Then draw a couple (G, U) such that G is simulated with pdf g(x), and U is uniform on [0, c.g(G)]. Then, if U < f(G), accept U as your variable. Otherwise draw again. The U you will finally accept will have f as a pdf.
Note that the constant c determines the efficiency of the method. The smaller c, the most efficient you will be - basically you will need on average c drawings to get the right variable. Better get a function g simple enough (don't forget you need to draw variables using g as a pdf) but will the smallest possible c.
If acceptance rejection is also too inefficient you could also try some Markov Chain MC method, they generate a sequence of samples each one dependent on the previous one, so by skipping blocks of them one can subsample obtaining a more or less independent set. They only need the PDF, or even just a multiple of it. Usually they work with fixed distributions, but can also be adapted to slowly changing ones.