couchdb: calculate average of a value

couchdb: calculate average of a value - mapreduce

I'm new to Couchdb. I want to calculate an average of a value in futon.
My map-function:
function(doc) {
if(doc.Fares.Taxes)
emit(1, doc.Fares.Taxes);
}
My reduce-function:
function(amount,values){
return sum(values/amount);
}

You have built in support for sum and count and you have stats; you should be able to use them to calculate it: http://wiki.apache.org/couchdb/Built-In_Reduce_Functions
You also have a sample of how to calculate sum "manually" in the "Cookbook": http://guide.couchdb.org/draft/cookbook.html#aggregate You should be able to extend that sum sample to calculate an avg instead.

Related

Subtract sample from list, update average

I have a map defined as:
std::map<float, std::map<unsigned short, int>>
The float value is the value of which I keep an average value. Fx a speed. The short is an identifier fx a date. The integer is a count, which I also keep track of. The count says how many samples there have been, with that specific value. Eg. On this date, I have a 'bin' with the values of 220 km/h. The count for this could be 2132.
In case I add a new sample, e.g. a car that drove 220, it then falls in a specific 'bin' for that speed and day, I update the average and count as following:
count++;
average = average + ((samplevalue - average) / count);
This seems to work fine, the average is updated.
But my problem is when I want to remove entries for a specific identifier, e.g. day.
for (auto& bin : *containerMap_)
{
auto iter = bin.second.find(currentDay);
if (iter != bin.second.end())
{
average = (( average * count) - iter->first) / (count - (iter->second));
meassurementCount -= iter->second;
bin.second.erase(iter);
}
}
Here iter->first is the value of the bin (speed) and iter->second is the count.
Though when I delete for a specific identifier, the average is not updated as expected.
In my testcase i only insert samples with one speed, so I only have one bin. Each speed is 3, so I expect the average to keep being 3. It does so with insert, but when I try to remove from one identifier (day), the average is updated to:
Avg : 3.00139
Why does that calculation not work? I would really hate to itterate through everything to update the average, since the count could be in billions.
Best regards :)

Recursively finding maximum sum possible in a vector with a limit

So,
I have a limit, lets say 100 and
I have a vector that is filled with some numbers like 20,60,45 etc.
I need to find the maximum sum possible that is less than limit.
I have tried a couple of things but after trying too much, i just gave up so there is nothing i can show you for now.
Basically i just couldn't find the algorithm.
I tried to add more variables to the function, like totalWeight.
Although I'm supposed to return some totalWeight
int findOptimumLuggageWeight(int weightLimit, vector<int> collectionOfWeights)
{
int numberOfItems = collectionOfWeights.size();
if (){
}
else{
}
}
There is an example of output:
weightLimit = 100
collectionOfWeights = 40,25,20,30,44
Output: 99

Display an updated average of random numbers in a file

I have a program that displays one random number in file .
#include <iostream>
#include <fstream>
#include <random>
using namespace std;
int main() {
std::ofstream file("file.txt",std::ios_base::app);
int var = rand() % 100 + 1;
file<<var ;
return 0;
}
Results after 4 trial :
1,2 2,20 3,40 1,88
I am looking to not display the numbers . but only there updated average after each try.
Is there any way to calculate the average incrementally ?
Inside the file should exist only the average value :
For example first trial :
1.2
second trial displays the average in the file (1.2+2.2)/2
1.7

Even though it's kind of strange, what you are trying to do, and I'm sure there is a better way of doing it, here's how you can do it:
float onTheFlyAverage()
{
static int nCount=0;
float avg, newAvg;
int newNumber = getRandomNum();
nCount++; //increment the total number count
avg = readLastAvgFromFile(); //this will read the last average in your file
newAvg = avg*(nCount-1)/nCount+ (float)(newNumber)/nCount;
return newAvg;
}
If for some reason you want to keep an average in a file which you provide as input to your program and expect it to keep on averaging numbers for you (a sort of stop and continue feature), you will have to save/load the total number count in the file as well as the average.
But if you do it in one go this should work. IMHO this is far from the best way of doing it - but there you have it :)
NOTE: there is a divide by 0 corner-case I did not take care of; I leave that up to you.

You can use some simple math to calculate the mean values incrementally. However you have to count how many values contribute to the mean.
Let's say you have n numbers with a mean value of m. Your next number x contributes to the mean in the following way:
m = (mn + x)/(n+1)

It's bad for performance to divide and then multiply the average back. I suggest you store the sum and number.
(pseudocode)
// these variables are stored between function calls
int sum
int n
function float getNextRandomAverage() {
int rnd = getOneRandom()
n++;
sum += rnd;
float avg = sum/n
return avg
}
function writeNextRandomAverage() {
writeToFile(getNextRandomAverage())
}
Also it seems strange to me that your method closes the file. How does it know that it should close it? What if the file should be used later? (Say, consecutive uses of this method).

Erroneous calculation method in C++

I am taking a beginners C++ course and I am struggling with an assignment right now. The assignment was:
A particular talent competition has 5 judges, each of whom awards a score between 0 and 10 to each performer. Write a program that uses these rules to calculate and display a contestant’s score. It should include the following functions:
• int getJudgeData() should ask the user for a judge’s score, store it in a reference parameter variable, and validate it. This function should be called by main once for each of the 5 judges.
• double calcScore() should calculate and return the average of the 3 scores that remain after dropping the highest and lowest scores the performer received. This function should be called just once by main and should be passed the 5 scores.
Two additional functions, described below, should be called by calcScore, which uses the returned information to determine which of the scores to drop.
• int findLowest() should find and return the lowest of the 5 scores passed to it.
• int findHighest() should find and return the highest of the 5 scores passed to it.
When testing my program it works properly if the score from judge one is the lowest but it will now work properly for any other judges being the lowest.
Ex: I will enter 2,1,5,4,3 so it should drop the 1 & 5 and come out with the avg of 3 but the result is 2.6667
the code I have for int findLowest() is:
int findLowest(int scoreOne,int scoreTwo,int scoreThree,int scoreFour,int scoreFive)
{
int lowest = scoreOne;
if ( scoreTwo < lowest )
lowest = scoreTwo;
if ( scoreThree < lowest )
lowest = scoreThree;
if ( scoreFour < lowest )
lowest = scoreFour;
if ( scoreFive < lowest )
lowest = scoreFive;
return lowest;
}
The int findHighest is similar but the less than symbols are switched obviously.
for the calcAverage() function I have:
double calcAverage(double OneScore,double twoScore,double threeScore, double fourScore,double fiveScore)
{
double lowest, highest, sum;
lowest=findLowest(OneScore,twoScore,threeScore,fourScore,fiveScore);
highest=findHighest(OneScore,twoScore,threeScore,fourScore,fiveScore);
sum = (OneScore + twoScore + threeScore + fourScore + fiveScore);
sum = sum - lowest;
sum = sum - highest;
sum = sum / 3;
cout<<"\nAfter droping highest and lowest scores\n";
cout<<"Your average score is "<<sum << endl;
return 0;
}
EDIT: I have put cout statements in the findHighest and findLowest functions to check what number it is determining is correct and each time it selects the correct highest number and for the lowest it will have 0
EDIT TWO: I have found that the program sets score one to 0 regardless of what is inputed. The program takes the correct input for the other scores.

After a quick look, the code you have should be working. Run through it again and make sure each if statement is consistent with what you want to do. If that doesn't work another way you could try is
double lowest = oneScore;
if (twoScore < lowest)
lowest = twoScore;
if(threeScore < lowest)
lowest = threeScore;
etc...
Your calcAverage() function is probably only failing because of the lowest() and hihgest() functions.
EDIT: If you have learned arrays, use it to declare an array of scores then use a for loop to iterate through, as Paul is hinting at in his comment

prevent long running averaging from overflow?

suppose I want to calculate average value of a data-set such as
class Averager {
float total;
size_t count;
float addData (float value) {
this->total += value;
return this->total / ++this->count;
}
}
sooner or later the total or count value will overflow, so I make it doesn't remember the total value by :
class Averager {
float currentAverage;
size_t count;
float addData (float value) {
this->currentAverage = (this->currentAverage*count + value) / ++count;
return this->currentAverage;
}
}
it seems they will overflow longer, but the multiplication between average and count lead to overflow problem, so next solution is:
class Averager {
float currentAverage;
size_t count;
float addData (float value) {
this->currentAverage += (value - this->currentAverage) / ++count;
return this->currentAverage;
}
}
seems better, next problem is how to prevent count from overflow?

Aggregated buckets.
We pick a bucket size that's comfortably less than squareRoot(MAXINT). To keep it simple, let's pick 10.
Each new value is added to the current bucket, and the moving average can be computed as you describe.
When the bucket is full start a new bucket, remembering the average of the full bucket. We can safely calculate the overall average by combining the averages of the full buckets and the current, partial bucket. When we get to 10 full buckets, we create a bigger bucket, capacity 100.
To compute the total average we first compute the average of the "10s" and then combine that with the "100s". This pattern repeats for "1,000s" "10,000s" and so on. At each stage we only need to consider two levels one 10 x bigger than the previous one.

Use double total; unsigned long long count;. You should still worry about accuracy, but it will be much less of a problem than with float.

What about using Arbitrary-precision arithmetic ?
There's a list of libraries you could use on Wikipedia: http://en.wikipedia.org/wiki/Bignum#Libraries
Most of Arbitrary-precision arithmetic libraries will not overflow until the number of digits stored fill the available memory (which is quite unlikely).

You want to use kahan's summation algorithm:
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
See also the section about errors in summation in
"What Every Computer Scientist Should Know About Floating-Point Arithmetic"
http://docs.sun.com/source/806-3568/ncg_goldberg.html#1262

You could use these special datatypes where integeres can grow infinitely until your RAM is full.

I was just thinking about this also. I think this solution works in terms of the new value 'moving the needle'. It only moves it by a factor of the number of previous values that contributed to the average-so-far (plus 1 for itself). It will lose accuracy as the inputs grow but on average should be practically acceptable.
Here's some Java code that seems to work. I used floats and ints here to demonstrate that it will work with those limitations but you could use double to gain accuracy. This is just to give you an idea of how to average an array of near-max integers. You would need to keep track of the total number of inputs and the current average, but not the total sum of the inputs. If your total number of inputs approaches MAX_INT, this eventually won't work and you should use the bucket suggestion above, but that is pretty drastic in most cases.
public float calcAverageContinuous(int[] integers)
{
float ave = 0;
for (int i = 0; i < integers.length; i++) {
ave += (((float)integers[i] - ave) / (float)(i + 1));
}
return ave;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

couchdb: calculate average of a value - mapreduce

I'm new to Couchdb. I want to calculate an average of a value in futon. My map-function: function(doc) { if(doc.Fares.Taxes) emit(1, doc.Fares.Taxes); } My reduce-function: function(amount,values){ return sum(values/amount); }

Related

Subtract sample from list, update average

Recursively finding maximum sum possible in a vector with a limit

Display an updated average of random numbers in a file

Erroneous calculation method in C++

prevent long running averaging from overflow?

Categories

Resources