Using "map function" in python to reduce time of processing - python-2.7

I am trying to run a loop for 100,000 times. I have used map function as shown below to divide the work between processors and make it less time consuming.
But also I have to pass the variable as argument to the map function due to which it consumes more time as compared to when I define this variable inside the main function. But the problem with define the variable inside the main function is - this variable is generated by random function hence when different processors come to pick function every time it give new random gussian plot and this is not required.
Hence- as a solution I defined the gussian random function out of the main function and passed as an argument to main function. But now the map is consuming more time to process. Can any one please help to reduce the time of map processing or suggest me where to define the random gussian variable so that it is calculated once and picked by different processors.
Defining random gussian variable to pass as an argument to map function
def E_and_P(Velocity_s, Position_s,tb):
for index in range(0,4000):
return X_position,Y_position,Z_position, VX_Vel, VY_Vel
if __name__ == "__main__":
Velocity_mu = 0
Velocity_sigma = 1*1e8 # mean and standard deviation
Velocity_s = np.random.normal(Velocity_mu, Velocity_sigma, 100000)
print("Velocity_s =", Velocity_s)
#print("Velocity_s=", Velocity_s)
Position_mu = 0
Position_sigma = 1*1e-9 # mean and standard deviation
Position_s = np.random.normal(Position_mu, Position_sigma, 100000)
#print("Position_s =", Position_s)
tb = range(100000)
items = [(Velocity_s, Position_s,tb) for tb in range(100000)]
p = Pool(processes=4)
result = p.starmap(E_and_P, items)
Please help or suggest some new ways.

Based on your last comment, you could change this line:
items = [(Velocity_s, Position_s,tb) for tb in range(100000)]
items = [(Velocity_s[tb], Position_s[tb],tb) for tb in range(100000)]
Each element of items is now a simple 3-number tuple: one velocity, one position, and one index.
You will also have to change E_and_P since its arguments are now 3 scalers: one velocity, one position, and one index.
def E_and_P(vel, pos, tb):
This should dramatically improve the performance. When using multiprocessing, keep in mind that different processes do not share an address space. All the data that gets exchanged between processes must be digitized on one end and rebuilt as Python objects on the other end. As I indicated in my comment, your original implementation resulted in about 20 billion digitizations. This approach still has 100000 steps, but each step needs to digitize only 3 numbers. So 300000 digitizations instead of 2000000000.


c++ Create time remaining estimate when data calcs get progressively longer?

I'm adding items to a list, so each insert takes just a bit longer than the last (this is a requirement, assume you can't change that). I've manually timed a sample dataset on MY computer but I want a generalized way to predict the time on any computer, and given ANY dataset size.
In my flailing around trying to figure this out, what i have collected is a vector, 100 long, of "how long 1/100th the sample data" took. So in my example data set i have 237,965 objects, which means in the vector of times i collected, each bucket tells how long it took to add 2,379 items.
Here's a link to the sample data of 100 items. So you can see the first 2k items took about 8 seconds, and the last 2k items took about 101 seconds. All together, if you add all the time, that's 4,295 seconds or about 1 hr 11 minutes.
So my question is, given this data set, and using it for future predictions, how do i estimate the remaining time when adding different size data?
In more flailing, i made some plots, wondering if it could help. First plot is just the raw data on a log graph:
I then made a 2nd data set based on first, this time showing accumulated time, rather than just the time for the current slice, and plotted that on a linear graph:
Notice the lovely trend line formula? That MUST be something that i just need to somehow plug into my code but i can't for the life of me figure out how.
Should i have instead gathered the data into time-slices and not index-slices? ie: i KNOW this data takes 1:10 to load, so take snapshots every 1/100th of that duration, instead of snapshotting every 1/100th of the data set?
Or HOW do i figure this out?
the function I need to write has this API:
CFTimeInterval get_estimated_end_time(int maxI, int curI, CFTimeInterval elapsedT);
so given only those three variables (maxI, curI, and elapsedT), and knowing the trend line formula from above, i need to return "duration until maxI" (seconds).
Any ideas?
well it seems after much futzing around, i can just do this (note "LERP" is just linear interpolate):
#define kDataSetMax 237965
double FunctionX(int in_x)
double _x(LERP(0, 100, in_x, 0, i_maxI));
double resultF =
(0.32031139888898874 * math_square(_x))
+ (9.609731568497784 * _x)
- (7.527252350031663);
if (resultF <= 1) {
resultF = 1;
return resultF;
CFTimeInterval get_estimated_end_time(int maxI, int curI, CFTimeInterval elapsedT)
CFTimeInterval endT(FunctionX(maxI));
return remainingT;
But that means i'm just ignoring curI and elapsedT?? That doesn't seem... right? What am I missing?
#define LERP(to_min, to_max, from, from_min, from_max) \
((from_max) == (from_min) ? from : \
(double)(to_min) + ((double)((to_max) - (to_min)) \
* ((double)((from) - (from_min)) \
/ (double)((from_max) - (from_min)))))
#define LERP_PERCENT(from, from_max) \
LERP(0.0f, 1.0f, from, 0.0f, from_max)
Your FunctionX is most of the way there. It's currently calculating expectedTimeToReachMaxIOnMyMachine. What you need to do is figure out how much slower the current time is relative to the expected on your machine to reach this same point, and then extrapolate that same ratio to the maximum time.
CFTimeInterval get_estimated_end_time(int maxI, int curI, CFTimeInterval elapsedT) {
//calculate how long we expected it to take to reach this point
CFTimeInterval expectedTimeToReachCurrentIOnMyMachine = FunctionX(curI);
//calculate how much slower we are than the expectation
//if this machine is faster, the math still works out.
double slowerThanExpectedByRatio
= double(elapsedT) / expectedTimeToReachCurrentIOnMyMachine;
//calculate how long we expected to reach the max
CFTimeInterval expectedTimeToReachMaxIOnMyMachine = FunctionX(maxI);
//if we continue to be the same amount slower, we'll reach the max at:
CFTimeInterval estimatedTimeToReachMaxI
= expectedTimeToReachMaxIOnMyMachine * slowerThanExpectedByRatio;
return estimatedTimeToReachMaxI;
Note that a smart implementation can cache and reuse expectedTimeToReachMaxIOnMyMachine and not calculate it every time.
Basically this assumes that after doing X% of the work, we can calculate how much slower we were than the expected curve, and assume we will stay approximately that same amount slower than the expected curve.
In the example below, the expected time taken is the blue line. At 4000 elements, we see that the expected time on your machine was 8,055,826, But the actual time taken on this machine was 10,472,573, which is 30% higher (slowerThanExpectedByRatio=1.3). At that point, we can extrapolate that we'll probably remain 30% higher throughout the entire process (the purple line). So if the total expected time on your machine for 10000 elements was 32,127,229, then our total estimated time on this machine for 10000 will be 41,765,398 (30% higher)

Parallelizing a nested Python for loop

What type of parallel Python approach would be suited to efficiently spreading the CPU bound workload shown below. Is it feasible to parallelize the section? It looks like there is not much tight coupling between the loop iterations i.e. portions of the loop could be handled in parallel so long as an appropriate communication to reconstruct the store variable is done at the end. I'm currently using Python2.7, but if a strong case could be made that this problem can be easily handled in a newer version, then I will consider migrating the code base.
I have tried to capture the spirit of the computation with the example below. I believe that it has the same connectedness between the loops/variables as my actual code.
nx = 20
ny = 30
myList1 = [0]*100
myList2 = [1]*25
value1 = np.zeros(nx)
value2 = np.zeros(ny)
store = np.zeros(nx,ny,len(myList1),len(myList2))
for i in range(nx):
for j in range(ny):
f = calc(value1[i],value2[j]) #returns a list
for k,data1 in enumerate(myList1):
for p,data2 in enumerate(myList2):
meanval = np.sum(f[:]/data1)*data2
store[i,j,k,p] = meanval
Here are two approaches you can take. What's wise also depends on where the bottleneck is, which is something that can best be measured rather than guessed.
The ideal option would be to leave all low level optimization to Numpy. Right now you have a mix of native Python code and Numpy code. The latter doesn't play well with loops. They work, of course, but by having loops in Python, you force operations to take place sequentially in the order you specified. It's better to give Numpy operations that it can perform on as many elements at once as possible, i.e. matrix transformations. That benefits performance, not only because of automatic (partial) parallelization; even single threads will be able to get more out of the CPU. A highly recommended read to learn more about this is From Python to Numpy.
If you do need to parallelize pure Python code, you have few options but to go with multiple processes. For that, refer to the multiprocessing module. Rearrange the code into three steps:
Preparing the inputs for every job
Dividing those jobs between a pool of workers to be run in parallel (fork/map)
Collecting the results (join/reduce)
You need to strike a balance between enough processes to make parallelizing worthwhile, and not so many that they will be too short-lived. The cost of spinning up processes and communicating with them would then become significant by itself.
A simple solution would be to generate a list of (i,j) pairs, so that there will nx*ny jobs. Then make a function that takes such pair as input and returns a list of (i,j,k,p,meanval). Try to only use the inputs to the function and return a result. Everything local; no side effects et cetera. Read-only access to globals such as myList1 is okay, but modification requires special measures as described in the documentation. Pass the function and the list of inputs to a worker pool. Once it has finished producing partial results, combine all those into your store.
Here's an example:
from multiprocessing import Pool
import numpy as np
# Global variables are OK, as long as their contents are not modified, although
# these might just as well be moved into the worker function or an initializer
nx = 20
ny = 30
myList1 = [0]*100
myList2 = [1]*25
value1 = np.zeros(nx)
value2 = np.zeros(ny)
def calc_meanvals_for(pair):
"""Process a reasonably sized chunk of the problem"""
i, j = pair
f = calc(value1[i], value2[j])
results = []
for k, data1 in enumerate(myList1):
for p, data2 in enumerate(myList2):
meanval = np.sum(f[:]/data1)*data2
return results
# This module will be imported by every worker - that's how they will be able
# to find the global variables and the calc function - so make sure to check
# if this the main program, because without that, every worker will start more
# workers, each of which will start even more, and so on, in an endless loop
if __name__ == '__main__':
# Create a pool of worker processes, each able to use a CPU core
pool = Pool()
# Prepare the arguments, one per function invocation (tuples to fake multiple)
arg_pairs = [(i,j) for i in range(nx) for j in range(ny)]
# Now comes the parallel step: given a function and a list of arguments,
# have a worker invoke that function with one argument until all arguments
# have been used, collecting the return values in a list
return_values =, arg_pairs)
# Since the function also returns a list, there's now a list of lists - consider
# itertools.chain.from_iterable to flatten them - to be processed further
store = np.zeros(nx, ny, len(myList1), len(myList2))
for results in return_values:
for i, j, k, p, meanval in results:
store[i,j,k,p] = meanval

Want to change a value continuously from min to max to min in a loop as a Sine curve

I am working on a game where I need an algorithm to vary a value in a loop. I have implemented the algorithm but I guess its not working as I want it to work. Here's what I want and what I have already implemented :
Given :
a commodity whose price I want to circulate (from min to max to min again and continuously in a loop)
I am using cocos2d-x (C++) where I have a scheduler which runs a function at a given interval say SCHEDULE_INTERVAL
MIN_PRICE and MAX_PRICE of the commodity
Time duration which it will take to complete one cycle (min-max-min)
Current Implementation :
SCHEDULE_INTERVAL = 0.3 (sec) (so the function is running every 0.3 secs)
counter = 0;
timeDuration = time to complete one cycle
_amplitude = (maxPrice - minPrice)/2;
_midValue = (maxPrice + minPrice)/2;
currentPrice = _midValue + _amplitude * sin (2*PI*counter/timeDuration)
why i am using sine wave : because at the peaks i want to make the transitions slow.
Problem : for some reasons its not behaving the way I want it to behave
I want to continuously change the currentPrice form minPrice-maxPrice-minPrice in timeDuration and the loop running at SCHEDULE_INTERVAL
please suggest any solutions.
Thanks :)
what's not working in the above implementation is that the values are not changing according to the 'timeDuration' variable
If the pseudocode you posted accurately mirrors the expressions you use in real code, you probably want to change the argument of sin to this:
2 * PI * (counter * SCHEDULE_INTERVAL) / timeDuration
counter is the number of executions, while timeDuration is (I presume) the desired length in seconds.
In other words, your units don't match - it's always worthwhile to perform a dimensional analysis when formulae don't work.

Returning the memory used so I can predict the memory required to compute ML algorithm

I am running a Random Forest ML script using a test size data set 5 k observations with a set number of parameters with a varying number of forests. My real model is closer to 1 million observations with 500+ parameters. I am trying to calculate how much memory this model would require assuming x number of forests.
In order to do this I could use a method of returning how much memory was used in a running of the script. Is it possible to return this, so that I can calculate the RAM required to compute the full model?
I currently use the following to tell me how long it takes to compute:
global starttime
print "The whole routine took %.3f seconds" % (time() - starttime)
Edit Re to my own answer
Feel like I am conversing with myself a little but hey ho, I tried running the following code to find out how much memory is actually being used, and why when I increase the number of n_estimators_value my PC runs out of memory. Unfortunately all of the % memory usage come back the same, I assume this is because it is calculating the memory usage at the incorrect time, it needs to record it at its peak whilst actually fitting the random forest. See code:
psutilpercent = psutil.virtual_memory()
print "\n", " --> Memory Check 1 Percent:", str(psutilpercent.percent) + "%\n"
n_estimators_value = 500
rf = ensemble.RandomForestRegressor(n_estimators = n_estimators_value, oob_score=True, random_state = 1)
psutilpercent = psutil.virtual_memory()
print "\n", " --> Memory Check 1 Percent:", str(psutilpercent.percent) + "%\n"
Any methods to find out the peak memory usage? I am trying to calculate how much memory would be required to fit a rather large RF, and I cant calculate this without knowing how much memory my smaller models require.
/usr/bin/time reports peak memory usage for a program. There's also the memory_profiler for Python.

How can I periodically execute some function if this function takes along time to run (less than peroid)

I want to run a function for example func() exactly 1 time per second. However the running time of func() is about 500 ms. How Can I do that? I know if the running time of the function is low, I can write a while loop in func() and sleep() for 1 second after each execution. But now, the running time is high. What should I do to ensure the func() run exactly 1 time per second? Thanks.
Yo do:
Take the current time in start_time.
Perform your job
Take the current time in end_time
Wait for (1 second + start_time - end_time)
That way, you can perform your tasks every seconds reliably. If the task takes less time, you will wait longer and vice versa. Note however that this assumes that your task takes always less than 1 sec. to execute. In the real code, you want to check for that before the sleep statement.
Implementation details depend on the platform.
Note that using this method still results in a small drift due to the time it takes to compute step 4. A more accurate alternative would be to synchronize on integer multiple of one second. That way, over 1000s of cycles you would not drift.
It depends on the level of accuracy you need.
If you want a brute, easy to code solution, you can get the time before first run of the function and save it in some variable (start_time). Create repeat index count variable (repeat_number) that stores next repeat number. Then you can do kinda this:
1) next_run_time = ++repeat_number*1sec + start_time;
2) func();
3) wait_time = next_run_time - current_time;
4) sleep(wait_time)
5) goto 1;
This approach disables accumulation of time error on each iteration.
But for the real application you should find some event framework or library.