How can I execute a function on a CPU core, and get a callback when it has completed? - concurrency

How can I execute a function on a CPU core, and get a callback when it has completed?
Context
I'm recieving a stream:
symbols = ['ABC', 'DFG', ...] # 52 of these
handlers = { symbol: Handler(symbol) for symbol in symbols }
async for symbol, payload in lines: # 600M of these
handlers[symbol].feed(payload)
I need to make use of multiple CPU cores to speed it up.
handler['ABC'] (e.g.) holds state, but it's disjoint from the state of (e.g.) handler['DFG']
Basically I can't have 2 cores simultaneously operating e.g. handler['ABC'].
My approach so far
I have come up with the following solution, but it's part pseudocode, as I can't see how to implement it.
NCORES = 4
symbol_curr_active_on_core = [None]*NCORES
NO_CORES_FREE = -1
def first_free_core():
for i, symbol in enumerate(symbol_curr_active_on_core):
if not symbol:
return i
return NO_CORES_FREE
for symbol, payload in lines:
# wait for avail core to handle it
while True:
sleep(0.001)
if first_free_core() == NO_CORES_FREE:
continue
if symbol in symbol_curr_active_on_core:
continue
core = first_free_core()
symbol_curr_active_on_core[core] = symbol
cores[core].execute(
processor[symbol].feed(payload),
on_complete=lambda core_index: \
symbol_curr_active_on_core[core_index] = None
)
So my question is specifically: How to convert that last statement into working Python code?
cores[core].execute(
processor[symbol].feed(payload),
on_complete=lambda core_index: \
symbol_curr_active_on_core[core_index] = None
)
PS More generally, is my approach optimal?

The following approach should be feasible assuming:
Your Handler class can be "pickled" and
The Handler class does not carry so much state information so as to make its serialization to and from each worker invocation prohibitively expensive.
The main process creates a handlers dictionary where the key is one of the 52 symbols and the value is a dictionary with two keys: 'handler' whose value is the handler for the symbol and 'processing' whose value is either True or False according to whether a process is currently processing one or more payloads for that symbol.
Each process in the pool is initialized with another queue_dict dictionary whose key is one of the 52 symbols and whose value is a multiprocessing.Queue instance that will hold payload instances to be processed for that symbol.
The main process iterates each line of the input to get the next symbol/payload pair. The payload in enqueued onto the appropriate queue for the current symbol. The handlers dictionary is accessed to determine whether a task has been enqueued to the processing pool to handle the symbol-specific handler for the current symbol by inspecting the processing flag for the current symbol. If this flag is True, nothing further need be done. Otherwise, the processing flag is set to True and apply_async is invoked passing as an argument the handler for this symbol.
A count of enqueued tasks (i.e. payloads) is maintained and is incremented every time the main task writes a payload to one of the 52 handler queues. The worker function specified as the argument to apply_async takes its handler argument and from that deduces the queue that requires processing. For every payload it finds on the queue, it invokes the handler's feed method. It then returns a tuple consisting of the updated handler and a count of the number of payload messages that were removed from the queue. The callback function for the apply_async method (1) updates the handler in the handlers dictionary and (2) resets the processing flag for the appropriate symbol to False. Finally, it decrements the number of enqueued tasks by the number of payload messages that had been removed.
When the main process after enqueuing a payload checks to see if there is currently a process running a handler for this symbol and sees that the processing flag is True and on that basis does not submit a new task via apply_async, there is a small window where that worker has already finished processing all of its payloads on its queue and is about to return or has already returned and the callback function has just not yet set the processing flag to False. In that scenario the payload will sit unprocessed on the queue until the next payload for that symbol is read from the input and processed. But if there are no further input lines for that symbol, then when all tasks have completed we will have unprocessed payloads. But we will also have a non-zero count of enqueued tasks that indicates to us we have this situation. So rather than trying to implement a complicated multiprocessing synchronization protocol, it is just simpler to detect this situation and to handle it by recreating a new pool and checking each of the 52 queues.
from multiprocessing import Pool, Queue
import time
from queue import Empty
from threading import Lock
# This class needs to be Pickle-able:
class Handler:
def __init__(self, symbol):
self.symbol = symbol
self.counter = 0
def feed(self, payload):
# For testing just increment counter by payload:
self.counter += payload
def init_pool(the_queue_dict):
global queue_dict
queue_dict = the_queue_dict
def worker(handler):
symbol = handler.symbol
q = queue_dict[symbol]
tasks_removed = 0
while True:
try:
payload = q.get_nowait()
handler.feed(payload)
tasks_removed += 1
except Empty:
break
# return updated handler:
return handler, tasks_removed
def callback_result(result):
global queued_tasks
global lock
handler, tasks_removed = result
# show done processing this symbol by updating handler state:
d = handlers[handler.symbol]
# The order of the next two statements matter:
d['handler'] = handler
d['processing'] = False
with lock:
queued_tasks -= tasks_removed
def main():
global handlers
global lock
global queued_tasks
symbols = [
'A','B','C','D','E','F','G','H','I','J','K','L','M','AA','BB','CC','DD','EE','FF','GG','HH','II','JJ','KK','LL','MM',
'a','b','c','d','e','f','g','h','i','j','k','l','m','aa','bb','cc','dd','ee','ff','gg','hh','ii','jj','kk','ll','mm'
]
queue_dict = {symbol: Queue() for symbol in symbols}
handlers = {symbol: {'processing': False, 'handler': Handler(symbol)} for symbol in symbols}
lines = [
('A',1),('B',1),('C',1),('D',1),('E',1),('F',1),('G',1),('H',1),('I',1),('J',1),('K',1),('L',1),('M',1),
('AA',1),('BB',1),('CC',1),('DD',1),('EE',1),('FF',1),('GG',1),('HH',1),('II',1),('JJ',1),('KK',1),('LL',1),('MM',1),
('a',1),('b',1),('c',1),('d',1),('e',1),('f',1),('g',1),('h',1),('i',1),('j',1),('k',1),('l',1),('m',1),
('aa',1),('bb',1),('cc',1),('dd',1),('ee',1),('ff',1),('gg',1),('hh',1),('ii',1),('jj',1),('kk',1),('ll',1),('mm',1)
]
def get_lines():
# Emulate 52_000 lines:
for _ in range(10_000):
for line in lines:
yield line
POOL_SIZE = 4
queued_tasks = 0
lock = Lock()
# Create pool of POOL_SIZE processes:
pool = Pool(POOL_SIZE, initializer=init_pool, initargs=(queue_dict,))
for symbol, payload in get_lines():
# Put some limit on memory utilization:
while queued_tasks > 10_000:
time.sleep(.001)
d = handlers[symbol]
q = queue_dict[symbol]
q.put(payload)
with lock:
queued_tasks += 1
if not d['processing']:
d['processing'] = True
handler = d['handler']
pool.apply_async(worker, args=(handler,), callback=callback_result)
# Wait for all tasks to complete
pool.close()
pool.join()
if queued_tasks:
# Re-create pool:
pool = Pool(POOL_SIZE, initializer=init_pool, initargs=(queue_dict,))
for d in handlers.values():
handler = d['handler']
d['processing'] = True
pool.apply_async(worker, args=(handler,), callback=callback_result)
pool.close()
pool.join()
assert queued_tasks == 0
# Print results:
for d in handlers.values():
handler = d['handler']
print(handler.symbol, handler.counter)
if __name__ == "__main__":
main()
Prints:
A 10000
B 10000
C 10000
D 10000
E 10000
F 10000
G 10000
H 10000
I 10000
J 10000
K 10000
L 10000
M 10000
AA 10000
BB 10000
CC 10000
DD 10000
EE 10000
FF 10000
GG 10000
HH 10000
II 10000
JJ 10000
KK 10000
LL 10000
MM 10000
a 10000
b 10000
c 10000
d 10000
e 10000
f 10000
g 10000
h 10000
i 10000
j 10000
k 10000
l 10000
m 10000
aa 10000
bb 10000
cc 10000
dd 10000
ee 10000
ff 10000
gg 10000
hh 10000
ii 10000
jj 10000
kk 10000
ll 10000
mm 10000

This is far from the only (or probably even "best") approach, but based on my comment on your other post, here's an example of having specific child processes handle specific "symbol"s
from multiprocessing import Process, Queue
from queue import Empty
from math import ceil
class STOPFLAG: pass
class Handler:
def __init__(self, symbol):
self.counter = 0 #maintain some state for each "Handler"
self.symbol = symbol
def feed(self, payload):
self.counter += payload
return self.counter
class Worker(Process):
def __init__(self, out_q):
self.handlers = {}
self.in_q = Queue()
self.out_q = out_q
super().__init__()
def run(self):
while True:
try:
symbol = self.in_q.get(1)
except Empty:
pass #put break here if you always expect symbols to be available and a timeout "shouldn't" happen
else:
if isinstance(symbol, STOPFLAG):
#pass back the handlers with their now modified state
self.out_q.put(self.handlers)
break
else:
self.handlers[symbol[0]].feed(symbol[1])
def main():
n_workers = 4
# Just 8 for testing:
symbols = ['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STU', 'VWX']
workers = []
out_q = Queue()
for i in range(n_workers):
workers.append(Worker(out_q))
symbol_worker_mapping = {}
for i, symbol in enumerate(symbols):
workers[i%n_workers].handlers[symbol] = Handler(symbol)
symbol_worker_mapping[symbol] = i%n_workers
for worker in workers: worker.start() #start processes
# Just a few for testing:
lines = [
('ABC', 1),
('DEF', 1),
('GHI', 1),
('JKL', 1),
('MNO', 1),
('PQR', 1),
('STU', 1),
('VWX', 1),
('ABC', 1),
('DEF', 1),
('GHI', 1),
('JKL', 1),
('MNO', 1),
('PQR', 1),
('STU', 1),
('VWX', 1),
]
#putting this loop in a thread could allow results to be collected while inputs are still being fed in.
for symbol, payload in lines: #feed in tasks
worker = workers[symbol_worker_mapping[symbol]] #select the correct worker
worker.in_q.put([symbol, payload]) #pass the inputs
results = [] #results are handler dicts from each worker
for worker in workers:
worker.in_q.put(STOPFLAG()) #Send stop signal to each worker
results.append(out_q.get()) #get results (may be out of order)
for worker in workers: worker.join() #cleanup
for result in results:
for symbol, handler in result.items():
print(symbol, handler.counter)
if __name__ == "__main__":
main()
Each child process handles a subset of "symbols" and each gets their own input queue. this is different to the normal pool where each child is identical, and they all share an input queue where the next available child always takes the next input. They all then put results onto a shared output queue back to the main process.
An entirely different solution might be to hold all the state in the main process, maintain a lock for each symbol, and hold the lock while the necessary state is sent to the worker until the results are received, and the state in the main process is updated.

Related

How do I terminate an async scipy.optimize based on time?

Really struggling with this one... Forgive the longish post.
I have an experiment that on each trial displays some stimulus, collects a response, and then moves on to the next trial.
I would like to incorporate an optimizer that runs in between trials and therefore must have a specific time-window designated by me to run, or it should be terminated. If it's terminated, I would like to return the last set of parameters it tried so that I can use it later.
Generally speaking, here's the order of events I'd like to happen:
In between trials:
Display stimulus ("+") for some number of seconds.
While this is happening, run the optimizer.
If the time for displaying the "+" has elapsed and the optimizer has
not finished, terminate the optimizer, return the most recent set of parameters it tried, and move on.
Here is some of the relevant code I'm working with so far:
do_bns() is the objective function. In it I include NLL['par'] = par or q.put(par)
from scipy.optimize import minimize
from multiprocessing import Process, Manager, Queue
from psychopy import core #for clock, and other functionality
clock = core.Clock()
def optim(par, NLL, q)::
a = minimize(do_bns, (par), method='L-BFGS-B', args=(NLL, q),
bounds=[(0.2, 1.5), (0.01, 0.8), (0.001, 0.3), (0.1, 0.4), (0.1, 1), (0.001, 0.1)],
options={"disp": False, 'maxiter': 1, 'maxfun': 1, "eps": 0.0002}, tol=0.00002)
if __name__ == '__main__':
print('starting optim')
max_time = 1.57
with Manager() as manager:
par = manager.list([1, 0.1, 0.1, 0.1, 0.1, 0.1])
NLL = manager.dict()
q = Queue()
p = Process(target=optim, args=(par, NLL, q))
p.start()
start = clock.getTime()
while clock.getTime() - start < max_time:
p.join(timeout=0)
if not p.is_alive():
break
if p.is_alive():
res = q.get()
p.terminate()
stop = clock.getTime()
print(NLL['a'])
print('killed after: ' + str(stop - start))
else:
res = q.get()
stop = clock.getTime()
print('terminated successfully after: ' + str(stop - start))
print(NLL)
print(res)
This code, on its own, seems to sort of do what I want. For example, the res = q.get() right above the p.terminate() actually takes something like 200ms so it will not terminate exactly at max_time if max_time < ~1.5s
If I wrap this code in a while-loop that checks to see if it's time to stop presenting the stimulus:
stim_start = clock.getTime()
stim_end = 5
print('showing stim')
textStim.setAutoDraw(True)
win.flip()
while clock.getTime() - stim_start < stim_end:
# insert the code above
print('out of loop')
I get weird behavior such as multiple iterations of the whole code from the beginning...
showing stim
starting optim
showing stim
out of loop
showing stim
out of loop
[1.0, 0.10000000000000001, 0.10000000000000001, 0.10000000000000001, 0.10000000000000001, 0.10000000000000001]
killed after: 2.81303440395
Note the multiple 'showing stim's' and 'out of loop's.
I'm open to any solution that accomplishes my goal :|
Help and thank you!
Ben
General remark
Your solution would give me nightmares! I don't see a reason to use multiprocessing here and i'm not even sure how you grab those updated results before termination. Maybe you got your reason for this approach, but i highly recommend something else (which has a limitation).
Callback-based approach
The general idea i would pursue is the following:
fire up your optimizer with some additional time-limit information and some callback enforcing this
the callback is called in each iteration of this optimizer
if time-limit reached: raise a customized Exception
The limits:
as the callback is only called once in each iteration, there is some limited sequence of points in time where the optimizer might get stopped
the potential difference is highly dependent on iteration-time for your problem! (numerical-differentiation, huge-data, slow function eval; all this matters)
if not exceeding some given time is of highest priority, this approach might be not right or you would need some kind of safeguarded interpolation to reason if one more iteration is possible in time
or: combine your kind of killing off workers with my approach of updating intermediate-results through some callback
Example code (bit hacky):
import time
import numpy as np
import scipy.sparse as sp
import scipy.optimize as opt
np.random.seed(1)
""" Fake task: sparse NNLS """
M, N, D = 2500, 2500, 0.1
A = sp.random(M, N, D)
b = np.random.random(size=M)
""" Optimization-setup """
class TimeOut(Exception):
"""Raise for my specific kind of exception"""
def opt_func(x, A, b):
return 0.5 * np.linalg.norm(A.dot(x) - b)**2
def opt_grad(x, A, b):
Ax = A.dot(x) - b
grad = A.T.dot(Ax)
return grad
def callback(x):
time_now = time.time() # probably not the right tool in general!
callback.result = [np.copy(x)] # better safe than sorry -> copy
if time_now - callback.time_start >= callback.time_max:
raise TimeOut("Time out")
def optimize(x0, A, b, time_max):
result = [np.copy(x0)] # hack: mutable type
time_start = time.time()
try:
""" Add additional info to callback (only takes x as param!) """
callback.time_start = time_start
callback.time_max = time_max
callback.result = result
res = opt.minimize(opt_func, x0, jac=opt_grad,
bounds=[(0, np.inf) for i in range(len(x0))], # NNLS
args=(A, b), callback=callback, options={'disp': False})
except TimeOut:
print('time out')
return result[0], opt_func(result[0], A, b)
return res.x, res.fun
print('experiment 1')
start_time = time.perf_counter()
x, res = optimize(np.zeros(len(b)), A, b, 0.1) # 0.1 seconds max!
end_time = time.perf_counter()
print(res)
print('used secs: ', end_time - start_time)
print('experiment 2')
start_time = time.perf_counter()
x_, res_ = optimize(np.zeros(len(b)), A, b, 5) # 5 seconds max!
end_time = time.perf_counter()
print(res_)
print('used secs: ', end_time - start_time)
Example output:
experiment 1
time out
422.392771467
used secs: 0.10226839151517493
experiment 2
72.8470708728
used secs: 0.3943936788825996

Run Python with Tkinter (sometimes) headless OR replacement for root.after()

I have working code below.
I have a set of machines operated with Python. I have a gui in Tkinter but very often these machines are run headless with the python code auto-starting at boot.
I really like the design pattern of using root.after() to start multiple tasks and keep them going. My problem is that this comes from the Tkinter library and when running headless the line "root=Tk()" will throw an error.
I have two questions
Can I perform some trick to have the code ignore the fact there is no display?
OR
Is there a library that will match the design pattern of Tkinter "root.after(time_in_ms,function_to_call)".
I did try to poke around in the underlying code of Tkinter to see if there was simply another library wrapped by Tkinter but I don't have the skill to decode what is going on in that library.
This code works with a display connected: (it prints hello 11 times then ends)
from Tkinter import *
# def __init__(self, screenName=None, baseName=None, className='Tk', useTk=1, sync=0, use=None):
root = Tk() # error is thrown here if starting this command in headless hardware setup
h = None
count = 0
c = None
def stop_saying_hello():
global count
global h
global c
if count > 10:
root.after_cancel(h)
print "counting cancelled"
else:
c = root.after(200, stop_saying_hello)
def hello():
global h
global count
print "hello " + str(count)
count += 1
h = root.after(1000, hello)
h = root.after(1000, hello) # time in ms, function
c = root.after(200, stop_saying_hello)
root.mainloop()
If this is run headless - in an ssh session from a remote computer then this error message is returned
Traceback (most recent call last): File "tkinter_headless.py", line
5, in
root = Tk() File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 1813, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable
You can use
threading and threating.timer()
shed
APSheduler
or create own taks manager with own after() and mainloop()
Simple example
import time
class TaskManager():
def __init__(self):
self.tasks = dict()
self.index = 0
self.running = True
def after(self, delay, callback):
# calcuate time using delay
current_time = time.time()*1000
run_time = current_time + delay
# add to tasks
self.index += 1
self.tasks[self.index] = (run_time, callback)
# return index
return self.index
def after_cancel(self, index):
if index in self.tasks:
del self.tasks[index]
def mainloop(self):
self.running = True
while self.running:
current_time = time.time()*1000
# check all tasks
# Python 3 needs `list(self.tasks.keys())`
# because `del` changes `self.tasks.keys()`
for key in self.tasks.keys():
if key in self.tasks:
run_time, callback = self.tasks[key]
if current_time >= run_time:
# execute task
callback()
# remove from list
del self.tasks[key]
# to not use all CPU
time.sleep(0.1)
def quit(self):
self.running = False
def destroy(self):
self.running = False
# --- function ---
def stop_saying_hello():
global count
global h
global c
if count > 10:
root.after_cancel(h)
print "counting cancelled"
else:
c = root.after(200, stop_saying_hello)
def hello():
global count
global h
print "hello", count
count += 1
h = root.after(1000, hello)
# --- main ---
count = 0
h = None
c = None
root = TaskManager()
h = root.after(1000, hello) # time in ms, function
c = root.after(200, stop_saying_hello)
d = root.after(12000, root.destroy)
root.mainloop()

Run parallel op with different inputs and same placeholder

I have the necessity to calculate more then one accuracy in the same time, concurrently.
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
The piece of code is the same of the mnist example in the tutorial of TensorFlow but instead of having:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
I have two placeolder because I already calculated and stored them.
W = tf.placeholder(tf.float32, [784, 10])
b = tf.placeholder(tf.float32, [10])
I want to fill the network with the values I aready have and then calculate the accuracy and this have to happen for each network I loaded.
So if I load 20 networks I want to calculate in parallel the accuracy for each one. There is a way with the session run to execute the same operation with different input?
You have multiple options to make things happen in parallel:
Parallelize using multiple python threads / subprocesses. (See Python's "multiprocessing" library.)
Batch up the operations into single larger operations. (e.g. Similar to the image operations that operate on a batch of images simultaneously https://www.tensorflow.org/api_docs/python/image/resizing#resize_bilinear.)
Make a single graph that has the 20 network accuracy calculations.
I think the last one is the easiest, so I've included a bit of sample code below to get you started:
import tensorflow as tf
def construct_accuracy_calculation(i):
W = tf.placeholder(tf.float32, [784, 10], name=("%d_W" % i))
b = tf.placeholder(tf.float32, [10], name=("%d_b" % i))
# ...
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
return (W, b, accuracy)
def main():
accuracy_computations = []
feed_dict={}
for i in xrange(NUM_NETWORKS):
(W, b) = load_network(i)
(W_op, b_op, accuracy) = construct_accuracy_calculation(i)
feed_dict[W_op] = W
feed_dict[b_op] = b
accuracy_computations.append(accuracy)
# sess = ...
accuracy_values = sess.run(accuracy_computations, feed_dict=feed_dict)
if __name__ == "__main__":
main()
One approach to parallelizing TF computations is to execute run calls in parallel using threads (TF is incompatible with multiprocessing). It's a bit more complicated than other approaches because you have to handle parallelism yourself on the Python side.
Here's an example that runs same matmul op in same session in different Python threads with different fed inputs and runs about 4x faster with 4 threads compared to 1 thread
import os, sys, queue, threading, time
import tensorflow as tf
import numpy as np
def p(s):
# helper function for printing from multiple threads
# need to append \n or results get intermixed in notebook
print(s+"\n", flush=True, end="")
num_threads = 4
data_size = 32 # number of data points to enqueue
work_per_thread = data_size/num_threads
timeout = 10 # grace period for dequeing
input_queue = queue.Queue(data_size)
output_queue = queue.Queue(data_size)
dtype = np.float32
# use matrix vector matmul since it's compute intensive and uses single core
# see issue #6752
n = 16*1024
with tf.device("/cpu:0"):
x = tf.placeholder(dtype)
matrix = tf.Variable(tf.ones((n, n)))
vector = tf.Variable(tf.ones((n, 1)))
y = tf.matmul(matrix, vector)[0, 0] + x
# turn off graph-rewriting optimizations
sess = tf.Session(config=tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))))
sess.run(tf.global_variables_initializer())
done = False
def runner(runner_id):
p("Starting runner %s" % (runner_id,))
count = 0
while not done:
try:
x_val = input_queue.get(timeout=1)
except queue.Empty:
# retry on empty queue
continue
p("Start computing %d on %d" %(x_val, runner_id))
out = sess.run(y, {x: x_val})
count+=1
output_queue.put(out)
if count>=work_per_thread:
break
else:
p("Stopping runner "+str(runner_id))
threads = []
print("Creating threads.")
for i in range(num_threads):
t = threading.Thread(target=runner, args=(i,))
threads.append(t)
for i in range(data_size):
input_queue.put(i, timeout=timeout)
# start threads
p("Launching runners.")
start_time = time.time()
for t in threads:
t.start()
p("Reading results.")
for i in range(data_size):
try:
p("Main thread: obtained %.2f" % (output_queue.get(timeout=timeout),))
except queue.Empty:
print("No results after %d, terminating computation."%(timeout,))
break
else:
p("Computed successfully.")
done = True
p("Waiting for threads to finish.")
for t in threads:
t.join()
print("Done in %.2f seconds" %(time.time() - start_time))

disambiguating simpy multi event results

I am trying to code a simple multiplexer in simpy as part of a network modelling exercise. What I have is two stores, s1, and s2 and I wish to do a single yield which waits for one or both of s1 and s2 to return a 'packet' via the standard Store.get() method. This does work, but I am unable then to determine which of the two stores actually returned the packet. What is the right way to do this - by inserting the appropriate code instead of the comment in the code below?
import simpy
env = simpy.Environment()
s1 = simpy.Store(env, capacity=4)
s2 = simpy.Store(env, capacity=4)
def putpkts():
a =1
b= 2
s1.put(a)
s2.put(b)
yield env.timeout(40)
s1.put(a)
yield env.timeout(40)
s2.put(b)
yield env.timeout(40)
def getpkts():
while True:
stuff = (yield s1.get() | s2.get() )
# here, I need to put code to determine
# whether the 'stuff' dict
# contains an item from store s1, or store s2, or both.
# how can I do this?
proc1 = env.process(putpkts())
proc2 = env.process(getpkts())
env.run(until = proc2)
You need to bind the Store.get() event to a name and then check if it is in the results, e.g.:
get1 = s1.get()
get2 = s2.get()
results = yield get1 | get2
item1 = results[get1] if get1 in results else None
item2 = results[get2] if get2 in results else None

Why are changes to a list made in a sub-process not showing up in the parent process?

I am creating a sub-process for reading a growing log file. I passed a counter ( inside of a list) into the log_file_reader function, and append 1 to the counter list if the line is valid. I check the counter in the main process every 5 seconds. The counter in the increases as expected in the sub-process, but it is always 0 in the main process. I checked the id of the counter; it is identical both in sub-process and main process. Why isn't the counter increasing in the main process? If i change counter to counter = multiprocessing.Queue() and check the qsize() in log_file_reader(...) or the main thread, everything is working fine.
import subprocess
import select
import multiprocessing
import time
def log_file_reader(filename, counter):
f = subprocess.Popen(['tail', '-F',filename], stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)
while True:
if p.poll(1):
line = f.stdout.readline().strip()
if line:
'''appends 1 to counter if line is valid'''
counter.append(1)
def main():
counter = list() # initializes a counter in type list
# starts up a process keep tailing file
reader_process = multiprocessing.Process(target=log_file_reader, args=("/home/haifzhan/logfile.log", counter))
reader_process.start()
# main thread check the counter every 5 seconds
while True:
time.sleep(5)
print "periodically check---counter:{0},id:{1}".format(len(counter), id(counter))
if __name__ == "__main__":
# everything starts here
main()
Plain list objects are not shared between processes, so the counter in the child process is actually a completely distinct object from the counter in the parent. Changes you make to one will not affect the other. If you want to share the list between processes, you need to use a multiprocessing.Manager().list:
import subprocess
import select
import multiprocessing
import time
def log_file_reader(filename, counter):
f = subprocess.Popen(['tail', '-F',filename], stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)
while True:
if p.poll(1):
line = f.stdout.readline().strip()
if line:
'''appends 1 to counter if line is valid'''
counter.append(1)
def main():
m = multiprocessing.Manager()
counter = m.list() # initializes a counter in type list
# starts up a process keep tailing file
reader_process = multiprocessing.Process(target=log_file_reader, args=("/home/haifzhan/logfile.log", counter))
reader_process.start()
# main thread check the counter every 5 seconds
while True:
time.sleep(5)
print "periodically check---counter:{0},id:{1}".format(len(counter), id(counter))
if __name__ == "__main__":
# everything starts here
main()
If you're just using the list as a counter, though, you might as well use a multiprocessing.Value, rather than a list, which really is meant to be used for counting purposes, and doesn't require starting a Manager process:
import subprocess
import select
import multiprocessing
import time
def log_file_reader(filename, counter):
f = subprocess.Popen(['tail', '-F',filename], stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)
while True:
if p.poll(1):
line = f.stdout.readline().strip()
if line:
'''appends 1 to counter if line is valid'''
with counter.get_lock():
counter.value += 1
def main():
m = multiprocessing.Manager()
counter = multiprocessing.Value('i', 0) # A process-safe int, initialized to 0
# starts up a process keep tailing file
reader_process = multiprocessing.Process(target=log_file_reader, args=("/home/haifzhan/logfile.log", counter))
reader_process.start()
# main thread check the counter every 5 seconds
while True:
time.sleep(5)
with counter.get_lock():
print "periodically check---counter:{0},id:{1}".format(counter.value, id(counter))