I have to generate an integer number z which can take the values +1 or -1: it represents the charge of muons. The problem is that positive muons are 1.3 times more abundant than the negative ones. How can I generate the charges with that distribution in Fortran?
Related
I am trying to get a better understanding of the outputs given by Google's sentiment analysis API. It takes in a sentence and gives out two values - magnitude and score. I am trying to interpret the magnitude value better. Magnitude is defined in the documentation as -
A non-negative number in the [0, +inf) range, which represents the absolute magnitude of sentiment regardless of score (positive or negative).
Initially, I thought it is a confidence score or weight but I am not sure how the value would change since it can be ANY number. Does anybody know how it is calculated or what it means apart from the definition provided in the documentation?
magnitude indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf. Unlike score, magnitude is not normalized; each expression of emotion within the text (both positive and negative) contributes to the text's magnitude (so longer text blocks may have greater magnitudes) (Ref).
The score of a document's sentiment indicates the overall emotion of a document. The magnitude of a document's sentiment indicates how much emotional content is present within the document, and this value is often proportional to the length of the document (Ref).
A document with a neutral score (around 0.0) may indicate a low-emotion document, or may indicate mixed emotions, with both high positive and negative values which cancel each out. Generally, you can use magnitude values to disambiguate these cases, as truly neutral documents will have a low magnitude value, while mixed documents will have higher magnitude values (Ref).
I'm using feed forward, gradient descent, backpropagation neural networks
where hidden/output neurons are using tanh activation function and input neurons are linear.
What is the best way, in your opinion, for normalizing numerical data if:
Maximum number is known and for example maximum positive number would be 1000 and maximum negative -1000.
Maximum number is unknown.
And if I should keep the maximum numbers same for all inputs or would it be okay
if network's inputs have different normalizing way?
Thanks!
If max and min are known, the easiest normalization is :
normalized = (val - min) / (max - min)
If max is unknown, you can normalize based on the data you do have, with the knowledge that tanh has good characteristics for values that exceed a magnitude of 1.
You should normalize different inputs based on the range of values of those inputs and you may use different normalization procedures for different inputs.
I have a set of values which follows exponential distribution. Now, I want to calculate the rate parameter alpha. Can anybody help me how to calculate it (I am using c++ to code it)?
If you know these values are from an exponential distribution, then you can calculate the maximum likelihood of λ (lambda, not alpha) as the average of 1 / value for each of these values (because the mean of the exponential distribution is 1 / λ). this is a statistical calculation, since you are trying to assess a parameter through observation.
I'm wondering which is the best way to create two lookups table for square root and cubic root of float values in range [0.0, 1.0).
I already profiled the code and saw that this is quite a strong bottleneck of performances (because I need to compute them for several tenths of thousands of values each). Then I remembered about lookup tables and thought they would help me increasing the performance.
Since my values are in a small range I was thinking about splitting the range with steps of, let's say, 0.0025 (hoping it's enough) but I'm unsure about which should be the most efficient way to retrieve them.
I can easily populate the lookup table but I need a way to efficiently get the correct value for a given float (which is not discretized on any step). Any suggestions or well known approaches to this problem?
I'm working with a mobile platform, just to specify.
Thanks in advance
You have (1.0-0.0)/0.0025 = 400 steps
Just create a 400x1 matrix and access it by multiplying the float you want the square/cube to by 400.
For instance if you want to look up the square of 0.0075. Multiply 0.0075 by 400 and get 3 which is your index in the matrix
double table_sqrt(double v)
{
return table[(unsigned int)(v / 0.0025)];
}
You could multiply the values by whatever precision that you want, and then use a hash-table since the results would be integral values.
For instance, rather than using a floating point key-value for something like 0.002, give yourself a precision of three or four decimal places, making your key value for 0.002 equal to 200 or 2000. Then you can quickly look-up the resulting floating point value for the square and cubic root stored in the hash-table key for the 2000 slot.
If you're wanting to also get values out of the non-discreet ranges in-between slots, you could use an array or tree rather than a hash-table so that you can generate "in-between" values by interpolating between the roots stored at two adjacent key-value slots.
If you only need to split into 10 different stripes, find the inputs which correspond to the thresholds between stripes, and use an unrolled binary search to test against those 9 values. Or is there additional computation required before the threshold test is done, so that the looked-up value isn't the final result.
I'm working with the random number generator available within C++11. At the moment, I'm using a uniform distribution, which should give me an equal probability to get any number within the range A & B which I specify.
However, I'm confused about generating Poisson distributions. While I understand how to determine the Poisson probability, I don't understand how a random series of numbers can be "distributed" based on the Poisson distribution.
For instance, the C++11 constructor for a Poisson distribution takes one argument -- λ, which is the mean of the distribution
std::tr1::poisson_distribution<double> poisson(7.0);
std::cout << poisson(eng) << std::endl;
In a Poisson probability problem, this is equal to the expected number of successes / occurrences during a given interval. However, I don't understand what it represents in this instance. What is a "success" / "occurrence" in a random number scenario?
I appreciate any assistance or reference materials which I can use to help me understand this.
The probability of a Poisson distribution is the chance a specific value occurs. Imagine you want to calculate how many cars pass a certain point each day. This value will be more some days, but less on other days. But when keeping track of this over a serious amount of time, a mean will start to emerge, with values in its vicinity occurring more often, and values further away (0 cars per day or a tenfold amount) being less likely. λ is that mean that emerged.
When reflecting this to RNG's, the algorithm would return you the amount of cars that passed on a random day (which is selected uniformly). As you can imagine the mean value λ is more likely to emerge, and the extremes are least likely to pop up.
The following link has an example of the distribution Poisson has, showing the discrete results you acquire, and the chance each of them has of occurring:
http://www.mathworks.com/help/toolbox/stats/brn2ivz-127.html
A sample implementation could calculate for each value the probability it occurs, and then calculate ranges based on these values to translate a uniform distribution to Poisson. e.g. for λ == 2 we have 13% chance for 0, 27% chance for 1, 27% chance for 2... Then we generate a good old uniform random number between 0.0 and 1.0. If this number is <= 0.13 return 0. Is it <= 0.40 return 1. Is it <= 0.67 return 2 etc...