I'm using the CERN ROOT Data Analysis Framework implementation of the GNU MultiRootFinder to solve for the unknowns, x and y, in the following system:
The system describes the solution to a localization problem. That is, given the locations [x_i, y_i] of three parties, the speed of some signal, and the time at which each party "saw" the signal, I want to determine the coordinates, [x,y], of the source. We can assume the three parties and the source are coplanar, for simplicity.
My code is as follows. Note that this code is designed to run in Cling, where it is a perfectly valid program. void localize() plays the role of int main() here. Some changes would be needed to compile this in, say, gcc.
#define x1 300000
#define y1 360000
#define x2 210000
#define y2 210000
#define x3 96000
#define y3 360000
#define c 29980000000
void localize(){
ROOT::RDataFrame frame("D","./path/to/data");
auto statn1 = frame.Take<double>("statn1");
auto statn2 = frame.Take<double>("statn2");
auto statn3 = frame.Take<double>("statn3");
TF2 *f1 = new TF2("f1","sqrt((x-[0])^2 + (y-[1])^2) + [2]*([3] - [4]) - sqrt((x-[5])^2 + (y-[6])^2)");
TF2 *f2 = new TF2("f2","sqrt((x-[0])^2 + (y-[1])^2) + [2]*([3] - [4]) - sqrt((x-[5])^2 + (y-[6])^2)");
ROOT::Math::MultiRootFinder rootFinder(0);
int i = 0;
f1->SetParameters(x1,y1,c, statn2->at(i), statn1->at(i), x2,y2);
f2->SetParameters(x2,y2,c, statn3->at(i), statn2->at(i), x3,y3);
ROOT::Math::WrappedMultiTF1 g1(*f1,2);
ROOT::Math::WrappedMultiTF1 g2(*f2,2);
rootFinder.AddFunction(g1);
rootFinder.AddFunction(g2);
rootFinder.SetPrintLevel(1);
double init[2] = {2000, 3200};
rootFinder.Solve(init);
}
When I run the code, I get the error
Error in <ROOT::Math::GSLMultiRootFinder::Solve>: The iteration is not
making any progress
I've set the iterations to the maximum number possible, and I've chosen the starting point to be the center of the circle circumscribed by the three parties.
I fed the solver only the first two equations. Feeding it all three is, as I understand it, unneccessary, and produces an error as the solution to all three includes a third unknown, the time of emission.
What am I doing wrong here? Apologies if this is a basic question. This isn't something I'm very familiar with.
EDIT:
I'm wondering if this has anything to do with the fact that two equations will produce two solutions, in general. So, perhaps the root finder isn't converging on a single solution, causing the error? I can't find anything in the docs that suggests this, but I can't find anything that doesn't either.
If this is the case, I'm wondering, how could I introduce the third equation so as to disambaguate?
EDIT:
I've tried playing with the tolerances and # of iterations. If I start at 3 iterations, I get a new error:
ROOT::Math::GSLMultiRootFinder::Solve: exceeded max iterations, reached tolerance is not sufficient; absTol = 1e-06
Where the tolerance is set to the default value of 1E-06 (obviously). This continues up to 10 iterations, when I get the "not making progress" error.
Related
I'm adding items to a list, so each insert takes just a bit longer than the last (this is a requirement, assume you can't change that). I've manually timed a sample dataset on MY computer but I want a generalized way to predict the time on any computer, and given ANY dataset size.
In my flailing around trying to figure this out, what i have collected is a vector, 100 long, of "how long 1/100th the sample data" took. So in my example data set i have 237,965 objects, which means in the vector of times i collected, each bucket tells how long it took to add 2,379 items.
Here's a link to the sample data of 100 items. So you can see the first 2k items took about 8 seconds, and the last 2k items took about 101 seconds. All together, if you add all the time, that's 4,295 seconds or about 1 hr 11 minutes.
So my question is, given this data set, and using it for future predictions, how do i estimate the remaining time when adding different size data?
In more flailing, i made some plots, wondering if it could help. First plot is just the raw data on a log graph:
I then made a 2nd data set based on first, this time showing accumulated time, rather than just the time for the current slice, and plotted that on a linear graph:
Notice the lovely trend line formula? That MUST be something that i just need to somehow plug into my code but i can't for the life of me figure out how.
Should i have instead gathered the data into time-slices and not index-slices? ie: i KNOW this data takes 1:10 to load, so take snapshots every 1/100th of that duration, instead of snapshotting every 1/100th of the data set?
Or HOW do i figure this out?
the function I need to write has this API:
CFTimeInterval get_estimated_end_time(int maxI, int curI, CFTimeInterval elapsedT);
so given only those three variables (maxI, curI, and elapsedT), and knowing the trend line formula from above, i need to return "duration until maxI" (seconds).
Any ideas?
Update:
well it seems after much futzing around, i can just do this (note "LERP" is just linear interpolate):
#define kDataSetMax 237965
double FunctionX(int in_x)
{
double _x(LERP(0, 100, in_x, 0, i_maxI));
double resultF =
(0.32031139888898874 * math_square(_x))
+ (9.609731568497784 * _x)
- (7.527252350031663);
if (resultF <= 1) {
resultF = 1;
}
return resultF;
}
CFTimeInterval get_estimated_end_time(int maxI, int curI, CFTimeInterval elapsedT)
{
CFTimeInterval endT(FunctionX(maxI));
return remainingT;
}
But that means i'm just ignoring curI and elapsedT?? That doesn't seem... right? What am I missing?
Footnotes:
#define LERP(to_min, to_max, from, from_min, from_max) \
((from_max) == (from_min) ? from : \
(double)(to_min) + ((double)((to_max) - (to_min)) \
* ((double)((from) - (from_min)) \
/ (double)((from_max) - (from_min)))))
#define LERP_PERCENT(from, from_max) \
LERP(0.0f, 1.0f, from, 0.0f, from_max)
Your FunctionX is most of the way there. It's currently calculating expectedTimeToReachMaxIOnMyMachine. What you need to do is figure out how much slower the current time is relative to the expected on your machine to reach this same point, and then extrapolate that same ratio to the maximum time.
CFTimeInterval get_estimated_end_time(int maxI, int curI, CFTimeInterval elapsedT) {
//calculate how long we expected it to take to reach this point
CFTimeInterval expectedTimeToReachCurrentIOnMyMachine = FunctionX(curI);
//calculate how much slower we are than the expectation
//if this machine is faster, the math still works out.
double slowerThanExpectedByRatio
= double(elapsedT) / expectedTimeToReachCurrentIOnMyMachine;
//calculate how long we expected to reach the max
CFTimeInterval expectedTimeToReachMaxIOnMyMachine = FunctionX(maxI);
//if we continue to be the same amount slower, we'll reach the max at:
CFTimeInterval estimatedTimeToReachMaxI
= expectedTimeToReachMaxIOnMyMachine * slowerThanExpectedByRatio;
return estimatedTimeToReachMaxI;
}
Note that a smart implementation can cache and reuse expectedTimeToReachMaxIOnMyMachine and not calculate it every time.
Basically this assumes that after doing X% of the work, we can calculate how much slower we were than the expected curve, and assume we will stay approximately that same amount slower than the expected curve.
In the example below, the expected time taken is the blue line. At 4000 elements, we see that the expected time on your machine was 8,055,826, But the actual time taken on this machine was 10,472,573, which is 30% higher (slowerThanExpectedByRatio=1.3). At that point, we can extrapolate that we'll probably remain 30% higher throughout the entire process (the purple line). So if the total expected time on your machine for 10000 elements was 32,127,229, then our total estimated time on this machine for 10000 will be 41,765,398 (30% higher)
I am currently developing a stimuli provider for the brain's visual cortex as a part of a university project. The program is to (preferably) be written in c++, utilising visual studio and OpenCV. The way it is supposed to work is that the program creates a number of threads, accordingly to the amount of different frequencies, each running a timer for their respective frequency.
The code looks like this so far:
void timerThread(void *param) {
t *args = (t*)param;
int id = args->data1;
float freq = args->data2;
unsigned long period = round((double)1000 / (double)freq)-1;
while (true) {
Sleep(period);
show[id] = 1;
Sleep(period);
show[id] = 0;
}
}
It seems to work okay for some of the frequencies, but others vary quite a lot in frame rate. I have tried to look into creating my own timing function, similar to what is done in Arduino's "blinkWithoutDelay" function, though this worked very badly. Also, I have tried with the waitKey() function, this worked quite like the Sleep() function used now.
Any help would be greatly appreciated!
You should use timers instead of "sleep" to fix this, as sometimes the loop may take more or less time to complete.
Restart the timer at the start of the loop and take its value right before the reset- this'll give you the time it took for the loop to complete.
If this time is greater than the "period" value, then it means you're late, and you need to execute right away (and even lower the period for the next loop).
Otherwise, if it's lower, then it means you need to wait until it is greater.
I personally dislike sleep, and instead constantly restart the timer until it's greater.
I suggest looking into "fixed timestep" code, such as the one below. You'll need to put this snippet of code on every thread with varying values for the period (ns) and put your code where "doUpdates()" is.
If you need a "timer" library, since I don't know OpenCV, I recommend SFML (SFML's timer docs).
The following code is from here:
long int start = 0, end = 0;
double delta = 0;
double ns = 1000000.0 / 60.0; // Syncs updates at 60 per second (59 - 61)
while (!quit) {
start = timeAsMicro();
delta+=(double)(start - end) / ns; // You can skip dividing by ns here and do "delta >= ns" below instead //
end = start;
while (delta >= 1.0) {
doUpdates();
delta-=1.0;
}
}
Please mind the fact that in this code, the timer is never reset.
(This may not be completely accurate but is the best assumption I can make to fix your problem given the code you've presented)
I am trying to use REFPROPs HSFLSH subroutine to compute properties for steam.
When the same state property is calculated over multiple iterations
(fixed enthalpy and entropy (Enthalpy = 50000 J/mol & Entropy = 125 J/mol),
the time taken to compute using HSFLSH after every 4th/5th iteration increases to about 0.15 ms against negligible amount of time for other iterations. This is turning problematic because my program places call to this subroutine over several thousand times. Thus leading to abnormally huge program run times.
The program used to generate the above log is here:
C refprop check
program time_check
parameter(ncmax=20)
dimension x(ncmax)
real hkj,skj
character hrf*3, herr*255
character*255 hf(ncmax),hfmix
C
C SETUP FOR WATER
C
nc=1 !Number of components
hf(1)='water.fld' !Fluid name
hfmix='hmx.bnc' !Mixture file name
hrf='DEF' !Reference state (DEF means default)
call setup(nc,hf,hfmix,hrf,ierr,herr)
if (ierr.ne.0) write (*,*) herr
call INFO(1,wm,ttp,tnbp,tc,pc,dc,zc,acf,dip,rgas)
write(*,*) 'Mol weight ', wm
h = 50000.0
s = 125.0
c
C
DO I=1,NCMAX
x(I) = 0
END DO
C ******************************************************
C THIS IS THE ACTUAL CALL PLACE
C ******************************************************
do I=1,100
call cpu_time(tstrt)
CALL HSFLSH(h,s,x,T_TEMP,P_TEMP,RHO_TEMP,dl,dv,xliq,xvap,
& WET_TEMP,e,
& cv,cp,VS_TEMP,ierr,herr)
call cpu_time(tstop)
write(*,*),I,' time taken to run hsflsh routine= ',tstop - tstrt
end do
stop
end
(of course you will need the FORTRAN FILES, which unfortunately I cannot share since REFPROP isn't open source)
Can someone help me figure out why is this happening.?
P.S : The above code was compiled using gfortran -fdefault-real-8
UPDATE
I tried using system_clock to time my computations as suggested by #Ross below. The results are uniform across the loop (image below). I will have to find alternate ways to improve computation speed I guess (Sigh!)
I don't have a concrete answer, but this sort of behaviour looks like what I would expect if all calls really took around 3 ms, but your call to CPU_TIME doesn't register anything below around 15 ms. Do you see any output with time taken less than, say 10 ms? Of particular interest to me is the approximately even spacing between calls that return nonzero time - it's about even at 5.
CPU timing can be a tricky business. I recommended in a comment that you try system_clock, which can be higher precision than CPU_TIME. You said it doesn't work, but I'm unconvinced. Did you pass a long integer to system_clock? What was the count_rate for your system? Were all the times still either 15 or 0 ms?
I have a scheduler, endlessly executing n actions. Each action is scheduled for x seconds into the future. When an action completes, it is re-scheduled for another x seconds into the future after its previously scheduled time. Every 1s, the scheduler "ticks", executing at most 25 actions which are due to fire. Actions may take a second or so to complete (though this value should be considered variable and unpredictable).
Say that x is 60 seconds. Due to the throttling of at most 25 actions being executed simultaneously, when n grows large, it is conceivable that the scheduler won't have time to execute all n actions within a 60 second window, and actions will be executed later and later as time goes on. This is undesirable, as it'll become true that there are actions to execute on every single tick and this increases load on my system. It's less important to me to keep x exactly constant than it is to keep load down.
So I wish to implement an adaptive "handicap", an automatically-applied fudge factor h, increasing it when a majority of actions are executed "late", and decreasing it (edging it back to its default of zero) when they're all seemingly and consistently on time. The scheduler would then be made to schedule actions for x+h seconds' time, rather than x.
At a high level, how would you approach this? How would you define "a majority of actions are executed 'late'" and how would you represent/detect it in C++03 code?
Better yet, is there an existing well-known approach that objectively "works" here?
To be clear, you are aiming to avoid sustained high load where there are tasks
every tick, rather than aiming to minimise the scheduling delay.
Correspondingly, the metric you should be looking at when considering the fudge
factor is the load, not the lateness.
If you have full knowledge of the system — the number of tasks, their
rescheduling intervals, the distribution of their execution time —
you could in principle exactly solve for a handicap value that would give you
a mean target load when busy, or would say, only exceed the target load
10% of the time when busy, or so on.
On the other hand, if this information is not available or predictable,
you will need an adaptive approach.
The general theory for this sort of thing is control theory, which can get
quite involved. Broadly though the heuristic is: if the load is less than the
threshold, and we have a positive handicap, reduce the handicap; if the load is
over the threshold, increase the handicap.
The handicap should be proportional, rather than additional: if, for example,
we knew we were consistently 10% overloaded, then we'd be right on target if we
applied a proportional delay of 10% on the scheduling of jobs. That is, we're
looking to apply a handicap factor h such that jobs are scheduled at xh
seconds time instead of x. A factor of 1 would correspond to no handicap.
When we're overloaded, but not maximally overloaded, the response then is linear
in the log: log(h) = log(load) - log(load_target). So the simplest method
would be:
load = get_current_load();
if (load>load_target) h = load/load_target;
else h = 1.0;
Unfortunately, there is a maximum measured load, and linearity breaks down
here. The linear model can be extended to incorporate the accumulated
deviation from the target load, and the rate of change of the load.
This corresponds to the proportional-integral-derivative controller.
As this is a noisy environment (there is variation in the action
execution times), it might be wise to shy away from the derivative bit
of this model, and stick with the proportional-integral (PI) part.
When this model is discretized, we get an expression for log(h)
that is proportional to the current (log) overload, plus a term that
captures how badly we've been doing:
load = get_current_load();
deviation = load > load_target ? log(load/load_target) : 0;
accum += p1 * deviation;
log_h = p2 * deviation + accum;
h = log_h < 0 ? 1.0 : exp(log_h);
Except, we don't have a symmetric problem: when we're below
the load target, but the accumulated error term stays high.
We could work around it by accumulating negative deviations
as well, but limiting the accumulated error to be at least
non-negative, so that a period of legitimately low load
doesn't give us a free pass for later:
load = get_current_load();
if (load > 0) {
deviation = log(load/load_target);
accum += p1 * deviation;
if (accum < 0) accum = 0;
if (deviation < 0) deviation = 0;
}
else {
accum = 0;
deviation = 0;
}
log_h = p2 * deviation + accum;
h = log_h < 0 ? 1.0 : exp(log_h);
The value for p2 will be somewhere (roughly) between 0.5 and 0.9,
to leave some room for the influence of the accumulated error.
A good value for p1 will be probably be around 0.3 to 0.5 times
the reciprocal of the lag time, the number of steps it takes for a change
in h to present itself as a change in load. This can be estimated
by the mean rescheduling time of the actions.
You can play around with these parameters to get the sort of
response you'd like, or you can make a more faithful mathematical
model of your scheduling problem and then do maths to it!
The parameters themselves can also be modified adaptively over
time, based on the observed response to changes in load.
(Warning, I haven't actually tried these fragments in a mock scheduler!)
This is a follow-up to Fast percentile in C++
I have a sorted array of 365 daily cashflows (xDailyCashflowsDistro) which I randomly sample 365 times to get a generated yearly cashflow. Generating is carried out by
1/ picking a random probability in the [0,1] interval
2/ converting this probability to an index in the [0,364] interval
3/ determining what daily cashflow corresponds to this probability by using the index and some linear aproximation.
and summing 365 generated daily cashflows. Following the previously mentioned thread, my code precalculates the differences of sorted daily cashflows (xDailyCashflowDiffs) where
xDailyCashflowDiffs[i] = xDailyCashflowsDistro[i+1] - xDailyCashflowsDistro[i]
and thus the whole code looks like
double _dIdxConverter = ((double)(365 - 1)) / (double)(RAND_MAX - 1);
for ( unsigned int xIdx = 0; xIdx < _xCount; xIdx++ )
{
double generatedVal = 0.0;
for ( unsigned int xDayIdx = 0; xDayIdx < 365; xDayIdx ++ )
{
double dIdx = (double)fastRand()* _dIdxConverter;
long iIdx1 = (unsigned long)dIdx;
double dFloor = (double)iIdx1;
generatedVal += xDailyCashflowsDistro[iIdx1] + xDailyCashflowDiffs[iIdx1] *(dIdx - dFloor);
}
results.push_back(generatedVal) ;
}
_xCount (the number of simulations) is 1K+, usually 10K.
The problem:
This simulation is being carried out 15M times (compared to 100K when the first thread was written) at the moment, and it takes ~10 minutes on a 3.4GHz machine. Due to the nature of problem, this 15M is unlikely to be significantly lowered in the future, only increased. Having used VTune Analyzer, I am being told that the last but one line (generatedVal += ...) generates 80% of runtime. And my question is why and how I can work with that.
Things I have tried:
1/ getting rid of the (dIdx - dFloor) part to see whether double difference and multiplication is the main culprit - runtime dropped by a couple of percent
2/ declaring xDailyCashflowsDistro and xDailyCashflowDiffs as __restict so as to prevent the compiler thinking they are dependendent on each other - no change
3/ tried using 16 days (as opposed to 365) to see whether it is cache misses that drag my performance - not a slight change
4/ tried using floats as opposed to doubles - no change
5/ compiling with different /fp: - no change
6/ compiling as x64 - has effect on the double <-> ulong conversions, but the line in question is unaffected
What I am willing to sacrifice is resolution - I do not care whether the generatedVal is 100010.1 or 100020.0 at the end if the speed gain is substantial.
EDIT:
The daily/yearly cashflows are related to the whole portfolio. I could divide all daily cashflows by portflio size and would thus (at 99.99% confidence level) ensure that daily cashflows/pflio_size will not reach out of the [-1000,+1000] interval. In this case, though, I would need precision to the hundredths.
Perhaps you could turn your piecewise linear function into a piecewise-linear "histogram" of its values. The number you're sampling appears to be the sum of 365 samples from that histogram. What you're doing is a not-particularly-fast way to sample from the sum of 365 samples from that histogram.
You might try computing a Fourier (or wavelet or similar) transform, keeping only the first few terms, raising it to the 365th power, and computing the inverse transform. You won't get a probability distribution in the end, but there shouldn't be "too much" mass below 0 or above 1 and the total mass shouldn't be "too different" from 1 with this technique. (I don't know what your data looks like; this technique may well be unworkable for good mathematical reasons.)