How would you implement this adaptive 'fudge factor' in a scheduler? - c++

I have a scheduler, endlessly executing n actions. Each action is scheduled for x seconds into the future. When an action completes, it is re-scheduled for another x seconds into the future after its previously scheduled time. Every 1s, the scheduler "ticks", executing at most 25 actions which are due to fire. Actions may take a second or so to complete (though this value should be considered variable and unpredictable).
Say that x is 60 seconds. Due to the throttling of at most 25 actions being executed simultaneously, when n grows large, it is conceivable that the scheduler won't have time to execute all n actions within a 60 second window, and actions will be executed later and later as time goes on. This is undesirable, as it'll become true that there are actions to execute on every single tick and this increases load on my system. It's less important to me to keep x exactly constant than it is to keep load down.
So I wish to implement an adaptive "handicap", an automatically-applied fudge factor h, increasing it when a majority of actions are executed "late", and decreasing it (edging it back to its default of zero) when they're all seemingly and consistently on time. The scheduler would then be made to schedule actions for x+h seconds' time, rather than x.
At a high level, how would you approach this? How would you define "a majority of actions are executed 'late'" and how would you represent/detect it in C++03 code?
Better yet, is there an existing well-known approach that objectively "works" here?

To be clear, you are aiming to avoid sustained high load where there are tasks
every tick, rather than aiming to minimise the scheduling delay.
Correspondingly, the metric you should be looking at when considering the fudge
factor is the load, not the lateness.
If you have full knowledge of the system — the number of tasks, their
rescheduling intervals, the distribution of their execution time —
you could in principle exactly solve for a handicap value that would give you
a mean target load when busy, or would say, only exceed the target load
10% of the time when busy, or so on.
On the other hand, if this information is not available or predictable,
you will need an adaptive approach.
The general theory for this sort of thing is control theory, which can get
quite involved. Broadly though the heuristic is: if the load is less than the
threshold, and we have a positive handicap, reduce the handicap; if the load is
over the threshold, increase the handicap.
The handicap should be proportional, rather than additional: if, for example,
we knew we were consistently 10% overloaded, then we'd be right on target if we
applied a proportional delay of 10% on the scheduling of jobs. That is, we're
looking to apply a handicap factor h such that jobs are scheduled at xh
seconds time instead of x. A factor of 1 would correspond to no handicap.
When we're overloaded, but not maximally overloaded, the response then is linear
in the log: log(h) = log(load) - log(load_target). So the simplest method
would be:
load = get_current_load();
if (load>load_target) h = load/load_target;
else h = 1.0;
Unfortunately, there is a maximum measured load, and linearity breaks down
here. The linear model can be extended to incorporate the accumulated
deviation from the target load, and the rate of change of the load.
This corresponds to the proportional-integral-derivative controller.
As this is a noisy environment (there is variation in the action
execution times), it might be wise to shy away from the derivative bit
of this model, and stick with the proportional-integral (PI) part.
When this model is discretized, we get an expression for log(h)
that is proportional to the current (log) overload, plus a term that
captures how badly we've been doing:
load = get_current_load();
deviation = load > load_target ? log(load/load_target) : 0;
accum += p1 * deviation;
log_h = p2 * deviation + accum;
h = log_h < 0 ? 1.0 : exp(log_h);
Except, we don't have a symmetric problem: when we're below
the load target, but the accumulated error term stays high.
We could work around it by accumulating negative deviations
as well, but limiting the accumulated error to be at least
non-negative, so that a period of legitimately low load
doesn't give us a free pass for later:
load = get_current_load();
if (load > 0) {
deviation = log(load/load_target);
accum += p1 * deviation;
if (accum < 0) accum = 0;
if (deviation < 0) deviation = 0;
}
else {
accum = 0;
deviation = 0;
}
log_h = p2 * deviation + accum;
h = log_h < 0 ? 1.0 : exp(log_h);
The value for p2 will be somewhere (roughly) between 0.5 and 0.9,
to leave some room for the influence of the accumulated error.
A good value for p1 will be probably be around 0.3 to 0.5 times
the reciprocal of the lag time, the number of steps it takes for a change
in h to present itself as a change in load. This can be estimated
by the mean rescheduling time of the actions.
You can play around with these parameters to get the sort of
response you'd like, or you can make a more faithful mathematical
model of your scheduling problem and then do maths to it!
The parameters themselves can also be modified adaptively over
time, based on the observed response to changes in load.
(Warning, I haven't actually tried these fragments in a mock scheduler!)

Related

c++ Create time remaining estimate when data calcs get progressively longer?

I'm adding items to a list, so each insert takes just a bit longer than the last (this is a requirement, assume you can't change that). I've manually timed a sample dataset on MY computer but I want a generalized way to predict the time on any computer, and given ANY dataset size.
In my flailing around trying to figure this out, what i have collected is a vector, 100 long, of "how long 1/100th the sample data" took. So in my example data set i have 237,965 objects, which means in the vector of times i collected, each bucket tells how long it took to add 2,379 items.
Here's a link to the sample data of 100 items. So you can see the first 2k items took about 8 seconds, and the last 2k items took about 101 seconds. All together, if you add all the time, that's 4,295 seconds or about 1 hr 11 minutes.
So my question is, given this data set, and using it for future predictions, how do i estimate the remaining time when adding different size data?
In more flailing, i made some plots, wondering if it could help. First plot is just the raw data on a log graph:
I then made a 2nd data set based on first, this time showing accumulated time, rather than just the time for the current slice, and plotted that on a linear graph:
Notice the lovely trend line formula? That MUST be something that i just need to somehow plug into my code but i can't for the life of me figure out how.
Should i have instead gathered the data into time-slices and not index-slices? ie: i KNOW this data takes 1:10 to load, so take snapshots every 1/100th of that duration, instead of snapshotting every 1/100th of the data set?
Or HOW do i figure this out?
the function I need to write has this API:
CFTimeInterval get_estimated_end_time(int maxI, int curI, CFTimeInterval elapsedT);
so given only those three variables (maxI, curI, and elapsedT), and knowing the trend line formula from above, i need to return "duration until maxI" (seconds).
Any ideas?
Update:
well it seems after much futzing around, i can just do this (note "LERP" is just linear interpolate):
#define kDataSetMax 237965
double FunctionX(int in_x)
{
double _x(LERP(0, 100, in_x, 0, i_maxI));
double resultF =
(0.32031139888898874 * math_square(_x))
+ (9.609731568497784 * _x)
- (7.527252350031663);
if (resultF <= 1) {
resultF = 1;
}
return resultF;
}
CFTimeInterval get_estimated_end_time(int maxI, int curI, CFTimeInterval elapsedT)
{
CFTimeInterval endT(FunctionX(maxI));
return remainingT;
}
But that means i'm just ignoring curI and elapsedT?? That doesn't seem... right? What am I missing?
Footnotes:
#define LERP(to_min, to_max, from, from_min, from_max) \
((from_max) == (from_min) ? from : \
(double)(to_min) + ((double)((to_max) - (to_min)) \
* ((double)((from) - (from_min)) \
/ (double)((from_max) - (from_min)))))
#define LERP_PERCENT(from, from_max) \
LERP(0.0f, 1.0f, from, 0.0f, from_max)
Your FunctionX is most of the way there. It's currently calculating expectedTimeToReachMaxIOnMyMachine. What you need to do is figure out how much slower the current time is relative to the expected on your machine to reach this same point, and then extrapolate that same ratio to the maximum time.
CFTimeInterval get_estimated_end_time(int maxI, int curI, CFTimeInterval elapsedT) {
//calculate how long we expected it to take to reach this point
CFTimeInterval expectedTimeToReachCurrentIOnMyMachine = FunctionX(curI);
//calculate how much slower we are than the expectation
//if this machine is faster, the math still works out.
double slowerThanExpectedByRatio
= double(elapsedT) / expectedTimeToReachCurrentIOnMyMachine;
//calculate how long we expected to reach the max
CFTimeInterval expectedTimeToReachMaxIOnMyMachine = FunctionX(maxI);
//if we continue to be the same amount slower, we'll reach the max at:
CFTimeInterval estimatedTimeToReachMaxI
= expectedTimeToReachMaxIOnMyMachine * slowerThanExpectedByRatio;
return estimatedTimeToReachMaxI;
}
Note that a smart implementation can cache and reuse expectedTimeToReachMaxIOnMyMachine and not calculate it every time.
Basically this assumes that after doing X% of the work, we can calculate how much slower we were than the expected curve, and assume we will stay approximately that same amount slower than the expected curve.
In the example below, the expected time taken is the blue line. At 4000 elements, we see that the expected time on your machine was 8,055,826, But the actual time taken on this machine was 10,472,573, which is 30% higher (slowerThanExpectedByRatio=1.3). At that point, we can extrapolate that we'll probably remain 30% higher throughout the entire process (the purple line). So if the total expected time on your machine for 10000 elements was 32,127,229, then our total estimated time on this machine for 10000 will be 41,765,398 (30% higher)

Monitor task CPU utilization in VxWorks while program is running

I'm running a VxWorks 6.9 OS embedded system and I need to see when I'm starving low priority tasks. Ideally I'd like to have CPU utilization by task so I know what is eating up all my CPU time.
I know this is a built in feature in many operating systems but have been so far unable to find it for VxWorks 6.9.
If I can't measure by task I'd like to at least to see what percentage of time the CPU is idle.
To that end I've been trying to make a lowest priority task that will run the function below that would try to measure it indirectly.
float Foo::IdleTime(Foo* f)
{
bool inIdleTask;
float timeIdle;
float totalTime;
float percentIdle;
while(true)
{
startTime = _time(); //get time before before measurement starts
inIdleTask = true;
timeIdle = 0;
while(inIdleTask) // I have no clue how to detect when the task left and set this to false
{
timeIdle += (amount_of_time_for_inner_loop); //measure idle time
}
returnTime = _time(); //get time after you return to IdleTime task
totalTime = ( returnTime - startTime );
percentIdle = ( timeIdle / totalTime ) * 100; //calculate percentage of idle time
//logic to report percentIdle
}
The big problem with this concept is I don't know how I would detect when this task is left for a higher priority task.
If you are looking for a one time measurement done during the developement, then spyLib is what you are looking for. Simply call spy from the command line to get per task CPU usage report in 10s intervals. Call spyHelp to learn how to configure the spy. (Might need to inculude the spyLib to kernel if not already included.)
If you want to go the extra mile, taskHookLib is what you need. Simply put, you hook a function to be called in every task switch. Call gives you the TASK_IDs of tasks going in and out of the CPU. You can either simply monitor the starvation of low pri tasks or take action and increase their priority temporarily.
From experience, spy adds a little performance overhead, especially if stdout faces to a slow I/O (e.g. a 9600 baud serial), but fairly easy to use. taskHook'ing adds little to none overhead if you are not immediately printing the results on the terminal, but takes a bit of programming to get it running.
Another thing that might be of interest is WindRiver's remote debugger. Haven't use that one personally, imagine it would require setting up the workbench and the target properly.

Want to change a value continuously from min to max to min in a loop as a Sine curve

I am working on a game where I need an algorithm to vary a value in a loop. I have implemented the algorithm but I guess its not working as I want it to work. Here's what I want and what I have already implemented :
Given :
a commodity whose price I want to circulate (from min to max to min again and continuously in a loop)
I am using cocos2d-x (C++) where I have a scheduler which runs a function at a given interval say SCHEDULE_INTERVAL
MIN_PRICE and MAX_PRICE of the commodity
currentPrice
Time duration which it will take to complete one cycle (min-max-min)
Current Implementation :
SCHEDULE_INTERVAL = 0.3 (sec) (so the function is running every 0.3 secs)
counter = 0;
timeDuration = time to complete one cycle
function
{
counter++;
_amplitude = (maxPrice - minPrice)/2;
_midValue = (maxPrice + minPrice)/2;
currentPrice = _midValue + _amplitude * sin (2*PI*counter/timeDuration)
}
why i am using sine wave : because at the peaks i want to make the transitions slow.
Problem : for some reasons its not behaving the way I want it to behave
I want to continuously change the currentPrice form minPrice-maxPrice-minPrice in timeDuration and the loop running at SCHEDULE_INTERVAL
please suggest any solutions.
Thanks :)
EDIT :
what's not working in the above implementation is that the values are not changing according to the 'timeDuration' variable
If the pseudocode you posted accurately mirrors the expressions you use in real code, you probably want to change the argument of sin to this:
2 * PI * (counter * SCHEDULE_INTERVAL) / timeDuration
counter is the number of executions, while timeDuration is (I presume) the desired length in seconds.
In other words, your units don't match - it's always worthwhile to perform a dimensional analysis when formulae don't work.

Fast percentile in C++ - speed more important than precision

This is a follow-up to Fast percentile in C++
I have a sorted array of 365 daily cashflows (xDailyCashflowsDistro) which I randomly sample 365 times to get a generated yearly cashflow. Generating is carried out by
1/ picking a random probability in the [0,1] interval
2/ converting this probability to an index in the [0,364] interval
3/ determining what daily cashflow corresponds to this probability by using the index and some linear aproximation.
and summing 365 generated daily cashflows. Following the previously mentioned thread, my code precalculates the differences of sorted daily cashflows (xDailyCashflowDiffs) where
xDailyCashflowDiffs[i] = xDailyCashflowsDistro[i+1] - xDailyCashflowsDistro[i]
and thus the whole code looks like
double _dIdxConverter = ((double)(365 - 1)) / (double)(RAND_MAX - 1);
for ( unsigned int xIdx = 0; xIdx < _xCount; xIdx++ )
{
double generatedVal = 0.0;
for ( unsigned int xDayIdx = 0; xDayIdx < 365; xDayIdx ++ )
{
double dIdx = (double)fastRand()* _dIdxConverter;
long iIdx1 = (unsigned long)dIdx;
double dFloor = (double)iIdx1;
generatedVal += xDailyCashflowsDistro[iIdx1] + xDailyCashflowDiffs[iIdx1] *(dIdx - dFloor);
}
results.push_back(generatedVal) ;
}
_xCount (the number of simulations) is 1K+, usually 10K.
The problem:
This simulation is being carried out 15M times (compared to 100K when the first thread was written) at the moment, and it takes ~10 minutes on a 3.4GHz machine. Due to the nature of problem, this 15M is unlikely to be significantly lowered in the future, only increased. Having used VTune Analyzer, I am being told that the last but one line (generatedVal += ...) generates 80% of runtime. And my question is why and how I can work with that.
Things I have tried:
1/ getting rid of the (dIdx - dFloor) part to see whether double difference and multiplication is the main culprit - runtime dropped by a couple of percent
2/ declaring xDailyCashflowsDistro and xDailyCashflowDiffs as __restict so as to prevent the compiler thinking they are dependendent on each other - no change
3/ tried using 16 days (as opposed to 365) to see whether it is cache misses that drag my performance - not a slight change
4/ tried using floats as opposed to doubles - no change
5/ compiling with different /fp: - no change
6/ compiling as x64 - has effect on the double <-> ulong conversions, but the line in question is unaffected
What I am willing to sacrifice is resolution - I do not care whether the generatedVal is 100010.1 or 100020.0 at the end if the speed gain is substantial.
EDIT:
The daily/yearly cashflows are related to the whole portfolio. I could divide all daily cashflows by portflio size and would thus (at 99.99% confidence level) ensure that daily cashflows/pflio_size will not reach out of the [-1000,+1000] interval. In this case, though, I would need precision to the hundredths.
Perhaps you could turn your piecewise linear function into a piecewise-linear "histogram" of its values. The number you're sampling appears to be the sum of 365 samples from that histogram. What you're doing is a not-particularly-fast way to sample from the sum of 365 samples from that histogram.
You might try computing a Fourier (or wavelet or similar) transform, keeping only the first few terms, raising it to the 365th power, and computing the inverse transform. You won't get a probability distribution in the end, but there shouldn't be "too much" mass below 0 or above 1 and the total mass shouldn't be "too different" from 1 with this technique. (I don't know what your data looks like; this technique may well be unworkable for good mathematical reasons.)

Random scheduling of n non-overlapping events in a time interval

Please forgive me if this is a well-known class of problem with a well-known solution. I've been searching but obviously not succeeding.
Assume I have n events that must occur in an interval (e.g., [0,1]). Each event is associated with a duration that is stochastically drawn from a predefined distribution. Assume that the interval is much larger than all the event durations combined: valid schedules always exist. The probabilities of event occurrence over the [0,1] interval are not uniform, but the events are independent as long as they are not overlapping.
What is an efficient and accurate (random) way to schedule these events?
Here's my current pseudocode:
lowerBound = min(interval)
upperBound = max(interval)
for n in numEvents {
draw startTime
draw endTime
while ( startTime is less than lowerBound || endTime exceeds upperBound ) {
draw startTime
draw endTime
}
add event
reset lowerBound and upperBound to define largest remaining (event-free) interval
}
I think that by choosing the larger remaining interval in which to schedule the new event, I'm making the events overdispersed--more spaced out than they'd otherwise be. This nonetheless seems very efficient and probably extremely accurate when the number of events is small.
I'm using C++, although that's probably irrelevant.
I'd also greatly appreciate search terms, if you know what this kind of problem is called.
Context: Accuracy is most important to me here. This is not a time-intensive step in the overall program.
I'd just place each event randomly, checking for collisions, and if there's a collision wipe the slate clean and start over.
You say that the interval is much larger than the sum of the event durations, so collisions will be rare, so this method is quite fast in practice.
You say you want accurate results. I'm not sure what that means, but at a guess I'd say that all valid solutions should be equally probable; the current solution in the question doesn't satisfy that requirement, but this one does.