How to calculate time estimate for parallel tasks? - concurrency

I need to calculate the total amount of time for a certain number of tasks to be completed. Details:
5 tasks total. Time estimates (in seconds) for each: [30, 10, 15, 20, 25]
Concurrency: 3 tasks at a time
How can I calculate the total time it will take to process all tasks, given the concurrency? I know it will take at least as long as the longest task (25 seconds), but is there a formula/method to calculate a rough total estimate, that will scale with more tasks added?

If you don't mind making some approximations it could be quite simple. If the tasks take roughly the same time to complete, you could use the average of the tasks duration as a basis (here, 20 seconds).
Assuming that the system is always full of tasks, that task duration is small enough, that there are many tasks and that concurrency level is high enough, then:
estimated_duration = average_task_duration * nb_tasks / nb_workers
Where nb_workers is the number of concurrent threads.
Here is some Python code that shows the idea:
from random import random
from time import sleep, monotonic
from concurrent.futures import ThreadPoolExecutor
def task(i: int, duration: float):
sleep(duration)
def main():
nb_tasks = 20
nb_workers = 3
average_task_duration = 2.0
expected_duration = nb_tasks * average_task_duration / nb_workers
durations = [average_task_duration + (random() - 0.5) for _ in range(nb_tasks)]
print(f"Starting work... Expected duration: {expected_duration:.2f} s")
start = monotonic()
with ThreadPoolExecutor(max_workers=nb_workers) as executor:
for i, d in enumerate(durations):
executor.submit(task, i, d)
stop = monotonic()
print(f"Elapsed: {(stop - start):.2f} s")
if __name__ == "__main__":
main()
If these hypotheses cannot hold in your case, then you'd better use a Bin Packing algorithm as Jerôme suggested.

Related

Planning subsequent orders

Let's say I have 5 orders and 3 drivers. I want to maximize the amount of miles they have on the road. Each driver has times that they're available to drive and orders have times that they're able to be picked up at.
Ideally, I would like to be able to plan subsequent orders in one go, rather than writing multiple models at once. My current iteration is to write multiple models that give output and subsequent models take those as inputs. How can you write this as a singular LP model?
O = {Order1, Order2, Order3, Order4, Order5}
D = {Driver1, Driver2, Driver3}
O_avail = {2 pm, 3pm, 230 pm, 8pm, 9pm, 12 am}
D_avail = {2pm, 3pm, 230pm}
Time_to_depot = {7 hours,5 hours,2 hours,5 hours,3hours, 4hours}
constraints
d_avail <= o_avail
obj function
max sum D_i*time_to_depot_i
I laid it out in such a way that driver 1 takes order 1, order 5 and order6. Driver 2 takes order 2 and order 4.

Power BI: how to remove top 20% of values from an average

I'm working with call center data and looking to calculate the average ring time of calls while removing the highest 20% of ring times. I assume I'll need to use PERCENTILEX.EXC embedded somewhere in AVERAGE, but I'm not quite sure where, or if I'm totally off base. 2 other caveats on this are that there are calls answered immediately (queue time = 0) which have to be counted in the average time and only data where the disposition column = Handled are used.
Example:
The Aborted and Abandoned call would be filtered out. Of the remaining calls, the top 20% of queue times (the 14,9, 6, and one of the 5s) would be eliminated and the average would be 3 seconds.
Appreciate any help on this!
I would do it like this:
VAR totalRows = COUNTROWS(FILTER(table, table[disposition] = "Handled"))
VAR bottomN = ROUNDDOWN(totalRows * .8, 0)
RETURN AVERAGEX(TOPN(bottomN, FILTER(table, table[disposition] = "Handled"), table[queue time], ASC),table[queue time])

Is it possible to measure accurate Milliseconds time between few code statements using python?

I am developing a python application using Python2.7.
That application is using pyserial library to read some data bytes from serial port.
Using while loop to read 1 byte of data in each iteration. In each iteration
I have to measure execution time between statements if it is less than 10 ms wait for it to reach 10ms before starting next iteration. there are two questions here as follows:
Time measurement
What would be the best way to measure time between python statements to the accuracy (1ms or 2ms difference acceptable) of milliseconds.
Time delay
How can i use that measured time for delay in order to wait till total time is 10ms (total time = 10ms = code execution time+delay)
I have tried time library but it does not give a good resolution when in millisecond some time it does not give anything when small time duration.
for example:
import time
while uart.is_open:
uart.read()
start = time.time()
#user code will go here
end = time.time()
execution_time=end - start
if execution_time< 10ms:
remaining_time = 10ms - execution_time
delay(remaining_time)
You can get a string that is minutes:seconds:microseconds using datetime like so:
import datetime
string = datetime.datetime.now().strftime("%M:%S:%f")
And then turn it into a number to make comparison more handy:
m, s, u = string.split(":")
time_us = float(m)/60e6 + float(s)/1e6 + float(u)
In your example, that would look like this:
import datetime
def time_us_now():
h, m, s, u = (float(string) for string in datetime.datetime.now().strftime("%H:%M:%S:%f").split(":"))
return h/3600e6 + m/60e6 + s/1e6 + u
start = time_us_now()
# User code will go here.
end = time_us_now()
execution_time = end - start
Solution:
I have managed to get upto 1ms resolution by doing two things.
Changed (increased) the baud rate to 115200 from 19200.
Used time.clock() rather than time.time() as it has more accuracy and resolution.

Django formatting multiple times

Using the default delta time field for Django, but as most know it returns horrifically. So I have developed my own function, convert_timedelta(duration). It computes days, hours, minutes and seconds, just like you might find on this website. It works like a charm on a single time ( I.e. when I was calculating an average time for one specific column ) .. However now I'm modifying how this works so it returns multiple times for a column... Separating times returned grouped by their ID, rather than just having one set time returned. So now I have multiple times returned, which works fine with the default formatting. But when I apply it to my function it doesn't like it and no particular informative errors are alerted.
def convert_timedelta(duration):
days, seconds = duration.days, duration.seconds
hours = days * 24 + seconds // 3600
minutes = (seconds % 3600) // 60
seconds = (seconds % 60)

How does scikit's cross validation work?

I have the following snippet:
print '\nfitting'
rfr = RandomForestRegressor(
n_estimators=10,
max_features='auto',
criterion='mse',
max_depth=None,
)
rfr.fit(X_train, y_train)
# scores
scores = cross_val_score(
estimator=rfr,
X=X_test,
y=y_test,
verbose=1,
cv=10,
n_jobs=4,
)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
1) Does running the cross_val_score do more training on the regressor?
2) Do I need to pass in a trained regressor or just a new one, e.g. estimator=RandomForestRegressor(). How then do I test the accuracy of a regressor, i.e. must I use another function in scikit?
3) My accuracy is about 2%. Is that the MSE score, where lower is better or is it the actual accuracy. If it is the actual accuracy, can you explain it, because it doesn't make sense how a regressor will accurately predict on a range.
It re-trains the estimator, k times in fact.
Untrained (or trained, but then the model is deleted and you're just wasting time).
It's the R² score, so that's not actually 2% but .02; R² is capped at 1 but can be negative. Accuracy is not well-defined for regression. (You can define it as for classification, but that makes no sense.)