Behaviour of Regular List passed into Process Python - python-2.7

I am just wondering why a regular list that is passed into a process as an argument (and modified in the process) doesn't remain as a pass by reference as if I had passed it into a function normally?
The following is some example code:
from multiprocessing import Process
def appendThings(x):
x.append(1)
x.append(2)
x.append(3)
x = []
p = Process(target=appendThings, args=(x))
p.start()
p.join()
print(x)
I expected to see:
[1,2,3]
but instead got:
[]
General insight into multiprocessing is welcome too as I am currently learning :)

Two things you should note:
You should not terminate the process by p.terminate(). That would just terminate the process before it can run its course.
The following is some example code:
from multiprocessing import Process
def appendThings(x):
x.append(1)
x.append(2)
x.append(3)
print ("I reached here!")
x = []
p = Process(target=appendThings, args=(x,))
p.start()
p.terminate()
p.join()
print(x)
It will output only the following:
[]
Process don't share memory by themselves.
So if you do the following:
def appendThings(x):
x.append(1)
x.append(2)
x.append(3)
print (x)
It will print:
[1,2,3]
So in multiprocessing you should use Manager to share the objects between processes. A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Queue, Value and Array. You can read more about Manager() from the Python Official Doco.
Finally you should modify your code as follows to make it do what you intended:
from multiprocessing import Process, Manager
def appendThings(x):
x.append(1)
x.append(2)
x.append(3)
x = Manager().list()
#x =list()
p = Process(target=appendThings, args=(x,))
p.start()
p.join()
print(x)
So the output will be:
[1, 2, 3]

Multiprocessing can not pass python objects directly. The arguments sent to the Process are a copy.
You have at least a couple of options to return data:
Shared ctypes
multiprocessing.Manager()
A working example using the manager:
from multiprocessing import Process, Manager
def append_things(x):
x.append(1)
x.append(2)
x.append(3)
if __name__ == '__main__':
x = Manager().list([])
p = Process(target=append_things, args=(x,))
p.start()
p.join()
print(x)

Related

Using ThreadPoolExecutor under multiprocessing leads to deadlock in python2

I am trying to submit functions in processes, and in each process it submit functions to a global threadPool.
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=200)
def fake_func():
time.sleep(3)
print("??")
def query():
print("submitted fake func threadpool: {}".format(executor))
result = executor.submit(fake_func)
result.result()
print("Done fake func: {}".format(result.done()))
return []
def simple_parallel(f, args_list):
import multiprocessing as mp
processes = []
print("start")
for args in args_list:
p = mp.Process(target=f, args=(args))
processes.append(p)
p.start()
for p in processes:
p.join(timeout=10)
return [None] * len(args_list)
The function runs fine with the following:
if __name__ == "__main__":
data = []
for result in simple_parallel(query, [()] * 10):
if result:
data.append(result)
However, if I submit a function to the threadpool before start the parallel call, it will hang forever.
result = executor.submit(fake_func)
result.result()
print("done dry call")
data = []
for result in simple_parallel(query, [()] * 10):
if result:
data.append(result)
I think the reason is that the subprocess will copy over the threadpool, and if there is already thread objects in the pool, they will be copied over too. However, threads are not copied over (thread itself). So when the subprocess's threadpool tried to reuse the thread, it cannot be found.
Is the statement correct?
Any suggestions on how to bypass it? This is a legacy code so I cannot easily avoid multiprocess.

Parallel processing in Python for image batching

I like to parallel two functions, one for image batching (streaming all 25 images for processing) and another one for processing batched images. They need to be in parallel.
So I have main function for batching images BatchStreaming(self) and processing for BatchProcessing(self, b_num). Now BatchStreaming is working well. After streaming 25 images, need to proceed for batch processing. I have two parallel processes. They are
(1)While loop in BatchStreaming need to continue for another batch of images.
(2)At the same time, current batched images need to be processed.
I am confusing whether I should use process or thread. I prefer process as I like to utilize all cores in CPU. (Python's thread run only on one CPU core)
Then I have two issues
(1)Process has to join back to main program to proceed. But I need to continue for next batch of images.
(2)In the following program, when BatchProcessing(self, b_num) is called and have exception as
Caught Main Exception
(<class 'TypeError'>, TypeError("'module' object is not callable",), <traceback object at 0x7f98635dcfc8>)
What could be issue?
The code is as follow.
import multiprocessing as MultiProcess
import time
import vid_streamv3 as vs
import cv2
import sys
import numpy as np
import os
BATCHSIZE=25
CHANNEL=3
HEIGHT=480
WIDTH=640
ORGHEIGHT=1080
ORGWIDTH=1920
class ProcessPipeline:
def __init__(self):
#Current Cam
self.camProcess = None
self.cam_queue = MultiProcess.Queue(maxsize=100)
self.stopbit = None
self.camlink = 'rtsp://root:pass#192.168.0.90/axis-media/media.amp?camera=1' #Add your RTSP cam link
self.framerate = 25
self.fullsize_batch1=np.zeros((BATCHSIZE, ORGHEIGHT, ORGWIDTH, CHANNEL), dtype=np.uint8)
self.fullsize_batch2=np.zeros((BATCHSIZE, ORGHEIGHT, ORGWIDTH, CHANNEL), dtype=np.uint8)
self.batch1_is_processed=False
def BatchStreaming(self):
#get all cams
time.sleep(3)
self.stopbit = MultiProcess.Event()
self.camProcess = vs.StreamCapture(self.camlink,
self.stopbit,
self.cam_queue,
self.framerate)
self.camProcess.start()
count=0
try:
while True:
if not self.cam_queue.empty():
cmd, val = self.cam_queue.get()
if cmd == vs.StreamCommands.FRAME:
if val is not None:
print('streaming starts ')
if(self.batch1_is_processed == False):
self.fullsize_batch1[count]=val
else:
self.fullsize_batch2[count]=val
count=count+1
if(count>=25):
if(self.batch1_is_processed == False):#to start process for inference and post processing for batch 1
self.batch1_is_processed = True
print('batch 1 process')
p = MultiProcess(target=self.BatchProcessing, args=(1,))
else:#to start process for inference and post processing for batch 2
self.batch1_is_processed = False
print('batch 2 process')
p = MultiProcess(target=self.BatchProcessing, args=(2,))
p.start()
print('BatchProcessing start')
p.join()
print('BatchProcessing join')
count=0
cv2.imshow('Cam: ' + self.camlink, val)
cv2.waitKey(1)
except KeyboardInterrupt:
print('Caught Keyboard interrupt')
except:
e = sys.exc_info()
print('Caught Main Exception')
print(e)
self.StopStreaming()
cv2.destroyAllWindows()
def StopStreaming(self):
print('in stopCamStream')
if self.stopbit is not None:
self.stopbit.set()
while not self.cam_queue.empty():
try:
_ = self.cam_queue.get()
except:
break
self.cam_queue.close()
print("before camProcess.join()")
self.camProcess.join()
print("after camProcess.join()")
def BatchProcessing(self, b_num):
print('module name:', __name__)
if hasattr(os, 'getppid'): # only available on Unix
print('parent process:', os.getppid())
print('process id:', os.getpid())
if __name__ == "__main__":
mc = ProcessPipeline()
mc.BatchStreaming()
I used Event signalling as shown below.
That is more straightforward for my application.
When batching loop have enough images, signal to batch processing.
#event_tut.py
import random, time
from threading import Event, Thread
event = Event()
def waiter(event, nloops):
count=0
while(count<10):
print("%s. Waiting for the flag to be set." % (i+1))
event.wait() # Blocks until the flag becomes true.
print("Wait complete at:", time.ctime())
event.clear() # Resets the flag.
print('wait exit')
count=count+1
def setter(event, nloops):
for i in range(nloops):
time.sleep(random.randrange(2, 5)) # Sleeps for some time.
event.set()
threads = []
nloops = 10
threads.append(Thread(target=waiter, args=(event, nloops)))
threads[-1].start()
threads.append(Thread(target=setter, args=(event, nloops)))
threads[-1].start()
for thread in threads:
thread.join()
print("All done.")

multiprocessing Queue deadlock when spawn multi threads in one process

I created two processes, one process that spawn multi threads is response for writing data to Queue, the other is reading data from Queue. It always deadblock in high frequent, fewer not. Especially when you add sleep in run method in write module(comment in codes). Let me put my codes below:
environments: python2.7
main.py
from multiprocessing import Process,Queue
from write import write
from read import read
if __name__ == "__main__":
record_queue = Queue()
table_queue = Queue()
pw = Process(target=write,args=[record_queue, table_queue])
pr = Process(target=read,args=[record_queue, table_queue])
pw.start()
pr.start()
pw.join()
pr.join()
write.py
from concurrent.futures import ThreadPoolExecutor, as_completed
def write(record_queue, table_queue):
thread_num = 3
pool = ThreadPoolExecutor(thread_num)
futures = [pool.submit(run, record_queue, table_queue) for _ in range (thread_num)]
results = [r.result() for r in as_completed(futures)]
def run(record_queue, table_queue):
while True:
if table_queue.empty():
break
table = table_queue.get()
# adding this code below reduce deadlock opportunity.
#import time
#import random
#time.sleep(random.randint(1, 3))
process_with_table(record_queue, table_queue, table)
def process_with_table(record_queue, table_queue, table):
#for short
for item in [x for x in range(1000)]:
record_queue.put(item)
read.py
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
import Queue
def read(record_queue, table_queue):
count = 0
while True:
item = record_queue.get()
count += 1
print ("item: ", item)
if count == 4:
break
I googled it and there are same questions on SO, but i cant see the similarity compared with my code, so can anyone help my codes, thanks...
I seem to find a solution, change run method in write module to :
def run(record_queue, table_queue):
while True:
try:
if table_queue.empty():
break
table = table_queue.get(timeout=3)
process_with_table(record_queue, table_queue, table)
except multiprocessing.queues.Empty:
import time
time.sleep(0.1)
and never see deadlock or blocking on get method.

python SIGTERM not work when I use multiprocessing Pool

Correct code as expected:
from multiprocessing import Pool
import signal
import time
import os
def consumer(i):
while True:
# print os.getpid()
pass
def handler(signum, frame):
print 'Here you go'
signal.signal(signal.SIGTERM, handler)
p = Pool(5)
p.map_async(consumer, [1 for i in range(5)])
while True:
pass
p.terminate()
# p.close()
p.join()
==================================================
I have found the problem, when I use map function, the main func is blocked, and signal handler will be called only when map function is funished.
So, use "map_async" function is much more better to fix this problem.
Here is what I found:
A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
==================================================
I wrote a program like the following, and I expect to exit/(like the program print string) in the program when I type "kill pid" in the terminal, but it not work. Is there any other strategy that block the SIGTERM get in the main func.
from multiprocessing import Pool
import signal
import time
import os
def consumer(i):
while True:
# print os.getpid()
pass
def handler(signum, frame):
print 'Here you go'
signal.signal(signal.SIGTERM, handler)
p = Pool(5)
p.map(consumer, [1 for i in range(5)])
p.terminate()
# p.close()
p.join()
You need to use the map_async method as the map one blocks until results are not ready. Therefore, in your example the call to terminate is never reached.

Python multiprocessing: how to exit cleanly after an error?

I am writing some code that makes use of the multiprocessing module. However, since I am a newbie, what often happens is that some error pops up, putting a halt to the main application.
However, that applications' children still remain running, and I get a long, long list of running pythonw processes in my task manager list.
After an error occurs, what can I do to make sure all the child processes are killed as well?
There are two pieces to this puzzle.
How can I detect and kill all the child processes?
How can I make a best effort to ensure my code from part 1 is run whenever one process dies?
For part 1, you can use multiprocessing.active_children() to get a list of all the active children and kill them with Process.terminate(). Note the use of Process.terminate() comes with the usual warnings.
from multiprocessing import Process
import multiprocessing
def f(name):
print 'hello', name
while True: pass
if __name__ == '__main__':
for i in xrange(5):
p = Process(target=f, args=('bob',))
p.start()
# At user input, terminate all processes.
raw_input("Press Enter to terminate: ")
for p in multiprocessing.active_children():
p.terminate()
One solution to part 2 is to use sys.excepthook, as described in this answer. Here is a combined example.
from multiprocessing import Process
import multiprocessing
import sys
from time import sleep
def f(name):
print 'hello', name
while True: pass
def myexcepthook(exctype, value, traceback):
for p in multiprocessing.active_children():
p.terminate()
if __name__ == '__main__':
for i in xrange(5):
p = Process(target=f, args=('bob',))
p.start()
sys.excepthook = myexcepthook
# Sleep for a bit and then force an exception by doing something stupid.
sleep(1)
1 / 0