Weird behaviour with numpy and multiprocessing.process - python-2.7

Sorry for the long code, I have tried to make it as simple as possible and yet reproducible.
In short, this python script starts four processes that randomly distribute numbers into lists. Then, the result is added to a multiprocessing.Queue().
import random
import multiprocessing
import numpy
import sys
def work(subarray, queue):
result = [numpy.array([], dtype=numpy.uint64) for i in range (0, 4)]
for element in numpy.nditer(subarray):
index = random.randint(0, 3)
result[index] = numpy.append(result[index], element)
queue.put(result)
print "after the queue.put"
jobs = []
queue = multiprocessing.Queue()
subarray = numpy.array_split(numpy.arange(1, 10001, dtype=numpy.uint64), 4)
for i in range(0, 4):
process = multiprocessing.Process(target=work, args=(subarray[i], queue))
jobs.append(process)
process.start()
for j in jobs:
j.join()
print "the end"
All processes ran the print "after the queue.put" line. However, it doesn't get to the print "the end" line. Weird enough, if I change the arange from 10001 to 1001, it gets to the end. What is happening?

most of the child processes are blocking on put call.
multiprocessing queue put
block if necessary until a free slot is available.
this can be avoided by adding a call to queue.get() before join.
Also, in multiprocessing code please isolate the parent process by having:
if __name__ == '__main__':
# main code here
Compulsory usage of if name==“main” in windows while using multiprocessing

I will expand my comment into a short answer. As I also do not understand the weird behavior it is merely a workaround.
A first observation is that the code runs to the end if the queue.put line is commented out, so it must be a problem related to the queue. The results are actually added to the queue so the problem must be in the interplay between the queue and join.
The following code works as expected
import random
import multiprocessing
import numpy
import sys
import time
def work(subarray, queue):
result = [numpy.array([], dtype=numpy.uint64) for i in range (4)]
for element in numpy.nditer(subarray):
index = random.randint(0, 3)
result[index] = numpy.append(result[index], element)
queue.put(result)
print("after the queue.put")
jobs = []
queue = multiprocessing.Queue()
subarray = numpy.array_split(numpy.arange(1, 15001, dtype=numpy.uint64), 4)
for i in range(4):
process = multiprocessing.Process(target=work, args=(subarray[i], queue))
jobs.append(process)
process.start()
res = []
while len(res)<4:
res.append(queue.get())
print("the end")

This is the reason:
Joining processes that use queues
Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the cancel_join_thread() method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be joined automatically.

Related

Python - how do I display an updating value on one line and not scroll down the screen

I have a python script which receives values in real-time. My script prints the values as they are received and currently, each update is printed to the screen and the output scrolls down e.g.
Update 1
Update 2
Update 3
...
Is it possible to have each new value overwrite the previous value on the output screen so that the values don't scroll? e.g.
Update 1
When Update 2 is received the print statement would update the output screen to:
Update 2
and so on...
Thanks in advance for any pointers.
You can pass end='\r' to the print() function. '\r' represents a carriage return in Python. In Python 3 for example:
import time
for i in range(10):
print('Update %d' % i, end='\r')
time.sleep(5)
Here time.sleep(5) is just used to force a delay between prints. Otherwise, even though everything is being printed and then overwritten, it will happen so fast that you will only see the final value 'Update 9'. In your code, it sounds like there is a natural delay while some processes is running, using time.sleep() will not be necessary.
In python 2.7, the print function will not allow an end parameter, and print 'Update %d\r' will also print a newline character.
Here is the same example #elethan gave for python2.7. Note it uses the sys.stdout output stream:
import sys
import time
output_stream = sys.stdout
for i in xrange(10):
output_stream.write('Update %s\r' % i)
output_stream.flush()
time.sleep(5)
# Optionally add a newline if you want to keep the last update.
output_stream.write('\n')
Assuming you have your code which prints the updated list in write_num function,
You can have a function called show_updated_list().
Hoping the clear() every time should resolve this issue:
def show_updated_list():
self.clear()
write_num()

How can I keep track of files processed while using map?

I'm using the map function with pandas to read in files with multiprocessing like
files = glob.glob('C:\Desktop\Folder\*.xlsx')
def read_excel(filename):
return pd.read_excel(filename)
file_list = [filename for filename in files]
pool = Pool(processors = 4)
pool.map(read_excel, file_list)
But the problem is that whereas before I was using a for loop and could have a counter
count += 1
for each iteration through the loop and print the count / len(files) to get a sense of how far along the process is, I can't do that here. I realize with multiprocessing it could get a little funky but there should be some way to implement this.

How can I share an updated values in dictionary among different process?

I am newbe here and also in Python. I have a question about dictionaries and multiprocessing. I want to run this part of the code on the second core of my Raspberry Pi (on first is running GUI application). So, I created a dictionary (keys(20) + array with the length of 256 for each of this key - script below is just short example). I initialized this dictionary in a separate script and put all values inside this dictionary on zero. Script table1.py (this dictionary should be available from both cores)
diction = {}
diction['FrameID']= [0]*10
diction['Flag']= ["Z"]*10
In the second script (should run on the second core), I read the values that I get from the serial port and put/set them in this dictionary (parsing + conversion) according to the appropriate place. Since I get much information through a serial port, the information is changing all the time. Script Readmode.py
from multiprocessing import Process
import time
import serial
import table1
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
def hexTobin(hex_num):
scale = 16 ## equals to hexadecimal
num_of_bits = len(hex_num)*4
bin_num = bin(int(hex_num, scale))[2:].zfill(num_of_bits) #hex to binary
return bin_num
def readSerial():
port = "/dev/ttyAMA0"
baudrate = 115200
ser = serial.Serial(port, baudrate, bytesize=8, parity=serial.PARITY_NONE, stopbits=1, xonxoff=False, rtscts=False)
line = []
for x in xrange(1):
ser.write(":AAFF:AA\r\n:F1\r\n") # new
while True:
for c in ser.read():
line.append(c)
a=''.join(line[-2:])
if a == '\r\n':
b=''.join(line)
print("what I get:" + b)
c=b[b.rfind(":"):len(b)] #string between last ":" start delimiter and stop delimiter
reps = {':':'', '\r\n':''} #throw away start and stop delimiter
txt = replace_all(c, reps)
print("hex num: " + txt)
bina_num=hexTobin(txt) # convert hex to bin
print("bin num: " + bina_num)
ssbit = bina_num[:3] # first three bits
print("select first three bits: " + ssbit)
abit=int(ssbit,2) # binary to integer
if abit == 5:
table1.diction['FrameID'][0]=abit
if abit == 7:
table1.diction['FrameID'][5]=abit
print("int num: ",abit)
print(table1.diction)
line = []
break
ser.close()
p1=Process(target=readSerial)
p1.start()
During that time I want to read information in this dictionary and use them in another process. But When I try to read that values there are all zero.
My question is how to create a dictionary that will be available for both process and can be updated based on data get from serial port?
Thank you for your answer in advance.
In Python, when you start two different scripts, even if they import some common modules, they share nothing (except the code). If two scripts both import table1, then they both have their own instance of the table1 module and therefore their own instance of the table1.diction variable.
If you think about it, it has to be like that. Otherwise all Python scripts would share the same sys.stdout, for example. So, more or less nothing is shared between two different scripts executing at the same time.
The multiprocessing module lets you create more than one process from the same single script. So you'll need to merge your GUI and your reading function into the same script. But then you can do what you want. Your code will look something like this:
import multiprocessing
# create shared dictionary sd
p1=Process(target=readSerial, args = (sd,)
p1.start()
# create the GUI behaviour you want
But hang on a minute. This won't work either. Because when the Process is created it starts a new instance of the Python interpreter and creates all its own instances again, just like starting a new script. So even now, by default, nothing in readSerial will be shared with the GUI process.
But fortunately the multiprocessing module provides some explicit techniques to share data between the processes. There is more than one way to do that, but here is one that works:
import multiprocessing
import time
def readSerial(d):
d["test"] = []
for i in range(100):
l = d["test"]
l.append(i)
d["test"] = l
time.sleep(1)
def doGUI(d):
while True:
print(d)
time.sleep(.5)
if __name__ == '__main__':
with multiprocessing.Manager() as manager:
sd = manager.dict()
p = multiprocessing.Process(target=readSerial, args=(sd,))
p.start()
doGUI(sd)
You'll notice that the append to the list in the readSerial function is a bit odd. That is because the dictionary object we are working with here is not a normal dictionary. It's actually a dictionary proxy that conceals a pipe used to send the data between the two processes. When I append to the list inside the dictionary the proxy needs to be notified (by assigning to the dictionary) so it knows to send the updated data across the pipe. That is why we need the assignment (rather than simply directly mutating the list inside the dictionary). For more on this look at the docs.

Break loop on keypress

I have a continuous loop that modifies data in an array and pauses for one second on every loop. Which is no problem..but I also need to have to print a specific part of the array to the screen on a specific keypress is entered, without interrupting the continuous loop running in one second intervals.
Any ideas on how to get the keypress while not distrupting the loop?
You can use either multiprocessing or threading library in order to spawn a new process/thread that will run the continuos loop, and continue the main flow with reading the user input (print a specific part of the array to the screen etc).
Example:
import threading
def loop():
for i in range(3):
print "running in a loop"
sleep(3)
print "success"
if __name__ == '__main__':
t = threading.Thread(target=loop)
t.start()
user_input = raw_input("Please enter a value:")
print user_input
t.join()
You're probably looking for the select module. Here's a tutorial on waiting for I/O.
For the purpose of doing something on keypress, you could use something like:
import sys
from select import select
# Main loop
while True:
# Check if something has been input. If so, exit.
if sys.stdin in select([sys.stdin, ], [], [], 0)[0]:
# Absorb the input
inpt = sys.stdin.readline()
# Do something...

Python multiprocess get result from queue

I'm running a multiprocessing script that is supposed to launch 2.000.000 jobs of about 0.01 seconds. Each job put the result in a queue imported from Queue because the queue from Multiprocessing module couldn't handle more than 517 results in it.
My programm freeze before getting the results from the queue. Here is the core of my multiprocess function:
while argslist != []:
p = mp.Process(target=function, args=(result_queue, argslist.pop(),))
jobs.append(p)
p.start()
for p in jobs:
p.join()
print 'over'
res = [result_queue.get() for p in jobs]
print 'got it'
output: "over" but never "got it"
when I replace
result_queue.get()
by
result_queue.get_nowait()
I got the raise Empty error saying my queue is Empty...
but if I do the queue.get() just after the queue.put() in my inner function, then it works, showing me that the queue is well filed by my function..
queue.Queue is not shared between processes, so it won't work with that, you must use multiprocessing.Queue.
To avoid a deadlock you should not join your processes before getting the results from the queue. A multiprocessing.Queue is effectively limited by its underlying pipes' buffer, so if that fills up no more items can be flushed to the pipe and queue.put() will block until a consumer calls queue.get(), but if the consumer is joining a blocked process, then you have a deadlock.
You can avoid all of this by using a multiprocessing.Pool and its map() instead.
Thank you mata, I switched back to the multiprocessing.Queue() but I don't want to use a pool because I want to keep track of how many jobs did run. I finally added an if statement to regularly empty my queue.
def multiprocess(function, argslist, ncpu):
total = len(argslist)
done = 0
result_queue = mp.Queue(0)
jobs = []
res = []
while argslist != []:
if len(mp.active_children()) < ncpu:
p = mp.Process(target=function, args=(result_queue, argslist.pop(),))
jobs.append(p)
p.start()
done += 1
print "\r",float(done)/total*100,"%", #here is to keep track
# here comes my emptying step
if len(jobs) == 500:
tmp = [result_queue.get() for p in jobs]
for r in tmp:
res.append(r)
result_queue = mp.Queue(0)
jobs = []
tmp = [result_queue.get() for p in jobs]
for r in tmp:
res.append(r)
return res
Then comes to my mind this question:
Is 500 jobs the limit because of python or because of my machine or my system?
Will this threshold be buggy if my multiprocessing function is used in other conditions?