Python multiprocessing write to single gzip in parallel - compression

I am trying to copy a large, compressed file (.gz) to another compressed file (.gz) using python. I will perform intermediate processing on the data that is not present in my code sample. I would like to be able to use multiprocessing with locks to write to the new gzip in parallel from multiple processes, but I get an invalid format error on the output gz file.
I assume that this is because a lock is not enough to support writing to a gzip in parallel. Since compressed data requires "knowledge" of the data that came before it in order to make correct entries into the archive I don't think that python can't handle this by default. I'd guess that each process maintains its own awareness of the gzip output and that this state diverges after the first write.
If I open the target file in the script without using gzip then this all works. I could also write to multiple gzips and merge them, but prefer to avoid that if possible.
Here is my source code:
#python3.8
import gzip
from itertools import islice
from multiprocessing import Process, Queue, Lock
def reader(infile, data_queue, coordinator_queue, chunk_size):
print("Reader Started.")
while True:
data_chunk = list(islice(infile, chunk_size))
data_queue.put(data_chunk)
coordinator_queue.put('CHUNK_READ')
if not data_chunk:
coordinator_queue.put('READ_DONE')
#Process exit
break
def writer(outfile, data_queue, coordinator_queue, write_lock, ID):
print("Writer Started.")
while True:
queue_message = data_queue.get()
if (queue_message == 'DONE'):
outfile.flush()
coordinator_queue.put('WRITE_DONE')
#Process exit
break
else:
print("Writer",ID,"-","Write Lock:",write_lock)
write_lock.acquire()
print("Writer",ID,"-","Write Lock:",write_lock)
for line in queue_message:
print("Line write:",line)
outfile.write(line)
write_lock.release()
print("Writer",ID,"-","Write Lock:",write_lock)
def coordinator(reader_procs, writer_procs, coordinator_queue, data_queue):
print("Coordinator Started.")
active_readers=reader_procs
active_writers=writer_procs
while True:
queue_message = coordinator_queue.get()
if queue_message=='READ_DONE':
active_readers = active_readers-1
if active_readers == 0:
while not data_queue.qsize() == 0:
continue
[data_queue.put('DONE') for x in range(writer_procs)]
if queue_message=='WRITE_DONE':
active_writers = active_writers-1
if active_writers == 0:
break
def main():
reader_procs=1
writer_procs=2
chunk_size=1
queue_size=96
data_queue = Queue(queue_size)
coordinator_queue=Queue()
write_lock=Lock()
infile_path='/directory/input_records.json.gz'
infile = gzip.open(infile_path, 'rt')
outfile_path='/directory/output_records.json.gz'
outfile = gzip.open(outfile_path, 'wt')
#Works when it is uncompressed
#outfile=open(outfile_path, 'w')
readers = [Process(target=reader, args=(infile, data_queue, coordinator_queue, chunk_size)) for x in range(reader_procs)]
writers = [Process(target=writer, args=(outfile, data_queue, coordinator_queue, write_lock, x)) for x in range(writer_procs)]
coordinator_p = Process(target=coordinator, args=(reader_procs, writer_procs, coordinator_queue, data_queue))
coordinator_p.start()
for process in readers:
process.start()
for process in writers:
process.start()
for process in readers:
process.join()
for process in writers:
process.join()
coordinator_p.join()
outfile.flush()
outfile.close()
main()
Notes about the code:
The "chunk size" determines the number of lines pulled from the input file
I have been using a much smaller test file while trying to make this work
My input file is over 200gb when uncompressed
My output file will be over 200gb when uncompressed
This version of the code has been trimmed and may have some mistakes but it is based directly on what I am running.
All functional areas of the script have been retained.
I would guess that I need a library that can somehow coordinate the compressed writes between the different processes. Obviously this speaks to the use of a single process to perform the write (like the coordinator process) however that would likely introduce a bottleneck.
There are some related posts on stack but none that seem to specifically address what I am trying to do. I also see utilities like "mgzip", "pigz" and "migz" out there that can parallelize compression, but I don't believe they are applicable to this use case. mgzip didn't work in my testing (0 sized file), pigz appears to consume an entire file as input on the command line and migz is a java library, so I'm not sure how I could integrate it into python.
If it can't be done then so be it but any answer would be appreciated!
-------
Update and working code:
With help from Mark Adler I was able to create a multiprocessing script that compresses the data in parallel and has a single writer process to add it to the target gz file. With the throughput on a modern NVME drive this reduces the likelihood of becoming CPU bound by compression before becoming I/O bound.
The largest changes that needed to be made to make this code work are as follows:
gzip.compress(bytes(string, 'utf-8'),compresslevel=9) was needed to compress an individual "block" or "stream:
file = open(outfile, 'wb') was needed in order to open an unencoded binary output file that can become the target gzip.
The file.write() action had to take place from a single process as it must be executed serially.
It is worth noting that this does not write to the file in parallel, but rather handles the compression in parallel. The compression is the heavy-lifting part of this process anyway.
Updated code (tested and works as-is):
#python3.8
import gzip
from itertools import islice
from multiprocessing import Process, Queue
def reader(infile, data_queue, coordinator_queue, chunk_size):
print("Reader Started.")
while True:
data_chunk = list(islice(infile, chunk_size))
data_queue.put(data_chunk)
coordinator_queue.put('CHUNK_READ')
if not data_chunk:
coordinator_queue.put('READ_DONE')
#Process exit
break
def compressor(data_queue, compressed_queue, coordinator_queue):
print("Compressor Started.")
while True:
chunk = ''
queue_message = data_queue.get()
if (queue_message == 'DONE'):
#Notify coordinator process of task completion
coordinator_queue.put('COMPRESS_DONE')
#Process exit
break
else:
for line in queue_message:
#Assemble concatenated string from list
chunk += line
#Encode the string as binary so that it can be compressed
#Setting gzip compression level to 9 (highest)
compressed_chunk=gzip.compress(bytes(chunk,'utf-8'),compresslevel=9)
compressed_queue.put(compressed_chunk)
def writer(outfile, compressed_queue, coordinator_queue):
print("Writer Started.")
while True:
queue_message = compressed_queue.get()
if (queue_message == 'DONE'):
#Notify coordinator process of task completion
coordinator_queue.put('WRITE_DONE')
#Process exit
break
else:
outfile.write(queue_message)
def coordinator(reader_procs, writer_procs, compressor_procs, coordinator_queue, data_queue, compressed_queue):
print("Coordinator Started.")
active_readers=reader_procs
active_compressors=compressor_procs
active_writers=writer_procs
while True:
queue_message = coordinator_queue.get()
if queue_message=='READ_DONE':
active_readers = active_readers-1
if active_readers == 0:
while not data_queue.qsize() == 0:
continue
[data_queue.put('DONE') for x in range(compressor_procs)]
if queue_message=='COMPRESS_DONE':
active_compressors = active_compressors-1
if active_compressors == 0:
while not compressed_queue.qsize() == 0:
continue
[compressed_queue.put('DONE') for x in range(writer_procs)]
if queue_message=='WRITE_DONE':
active_writers = active_writers-1
if active_writers == 0:
break
def main():
reader_procs=1
compressor_procs=2
#writer_procs really needs to stay as 1 since writing must be done serially
#This could probably be written out...
writer_procs=1
chunk_size=600
queue_size=96
data_queue = Queue(queue_size)
compressed_queue=Queue(queue_size)
coordinator_queue=Queue()
infile_path='/directory/input_records.json.gz'
infile = gzip.open(infile_path, 'rt')
outfile_path='/directory/output_records.json.gz'
outfile=open(outfile_path, 'wb')
readers = [Process(target=reader, args=(infile, data_queue, coordinator_queue, chunk_size)) for x in range(reader_procs)]
compressors = [Process(target=compressor, args=(data_queue, compressed_queue, coordinator_queue)) for x in range(compressor_procs)]
writers = [Process(target=writer, args=(outfile, compressed_queue, coordinator_queue)) for x in range(writer_procs)]
coordinator_p = Process(target=coordinator, args=(reader_procs, writer_procs, compressor_procs, coordinator_queue, data_queue, compressed_queue))
coordinator_p.start()
for process in readers:
process.start()
for process in compressors:
process.start()
for process in writers:
process.start()
for process in compressors:
process.join()
for process in readers:
process.join()
for process in writers:
process.join()
coordinator_p.join()
outfile.flush()
outfile.close()
main()

It's actually quite straightforward to do by writing complete gzip streams from each thread to a single output file. Yes, you will need one thread that does all the writing, with each compression thread taking turns writing all of its gzip stream, before another compression thread gets to write any. The compression threads can all do their compression in parallel, but the writing needs to be serialized.
The reason this works is that the gzip standard, RFC 1952, says that a gzip files consists of a series of members, where each member is a gzip header, compressed data, and gzip trailer.

Related

Parallel processing in Python for image batching

I like to parallel two functions, one for image batching (streaming all 25 images for processing) and another one for processing batched images. They need to be in parallel.
So I have main function for batching images BatchStreaming(self) and processing for BatchProcessing(self, b_num). Now BatchStreaming is working well. After streaming 25 images, need to proceed for batch processing. I have two parallel processes. They are
(1)While loop in BatchStreaming need to continue for another batch of images.
(2)At the same time, current batched images need to be processed.
I am confusing whether I should use process or thread. I prefer process as I like to utilize all cores in CPU. (Python's thread run only on one CPU core)
Then I have two issues
(1)Process has to join back to main program to proceed. But I need to continue for next batch of images.
(2)In the following program, when BatchProcessing(self, b_num) is called and have exception as
Caught Main Exception
(<class 'TypeError'>, TypeError("'module' object is not callable",), <traceback object at 0x7f98635dcfc8>)
What could be issue?
The code is as follow.
import multiprocessing as MultiProcess
import time
import vid_streamv3 as vs
import cv2
import sys
import numpy as np
import os
BATCHSIZE=25
CHANNEL=3
HEIGHT=480
WIDTH=640
ORGHEIGHT=1080
ORGWIDTH=1920
class ProcessPipeline:
def __init__(self):
#Current Cam
self.camProcess = None
self.cam_queue = MultiProcess.Queue(maxsize=100)
self.stopbit = None
self.camlink = 'rtsp://root:pass#192.168.0.90/axis-media/media.amp?camera=1' #Add your RTSP cam link
self.framerate = 25
self.fullsize_batch1=np.zeros((BATCHSIZE, ORGHEIGHT, ORGWIDTH, CHANNEL), dtype=np.uint8)
self.fullsize_batch2=np.zeros((BATCHSIZE, ORGHEIGHT, ORGWIDTH, CHANNEL), dtype=np.uint8)
self.batch1_is_processed=False
def BatchStreaming(self):
#get all cams
time.sleep(3)
self.stopbit = MultiProcess.Event()
self.camProcess = vs.StreamCapture(self.camlink,
self.stopbit,
self.cam_queue,
self.framerate)
self.camProcess.start()
count=0
try:
while True:
if not self.cam_queue.empty():
cmd, val = self.cam_queue.get()
if cmd == vs.StreamCommands.FRAME:
if val is not None:
print('streaming starts ')
if(self.batch1_is_processed == False):
self.fullsize_batch1[count]=val
else:
self.fullsize_batch2[count]=val
count=count+1
if(count>=25):
if(self.batch1_is_processed == False):#to start process for inference and post processing for batch 1
self.batch1_is_processed = True
print('batch 1 process')
p = MultiProcess(target=self.BatchProcessing, args=(1,))
else:#to start process for inference and post processing for batch 2
self.batch1_is_processed = False
print('batch 2 process')
p = MultiProcess(target=self.BatchProcessing, args=(2,))
p.start()
print('BatchProcessing start')
p.join()
print('BatchProcessing join')
count=0
cv2.imshow('Cam: ' + self.camlink, val)
cv2.waitKey(1)
except KeyboardInterrupt:
print('Caught Keyboard interrupt')
except:
e = sys.exc_info()
print('Caught Main Exception')
print(e)
self.StopStreaming()
cv2.destroyAllWindows()
def StopStreaming(self):
print('in stopCamStream')
if self.stopbit is not None:
self.stopbit.set()
while not self.cam_queue.empty():
try:
_ = self.cam_queue.get()
except:
break
self.cam_queue.close()
print("before camProcess.join()")
self.camProcess.join()
print("after camProcess.join()")
def BatchProcessing(self, b_num):
print('module name:', __name__)
if hasattr(os, 'getppid'): # only available on Unix
print('parent process:', os.getppid())
print('process id:', os.getpid())
if __name__ == "__main__":
mc = ProcessPipeline()
mc.BatchStreaming()
I used Event signalling as shown below.
That is more straightforward for my application.
When batching loop have enough images, signal to batch processing.
#event_tut.py
import random, time
from threading import Event, Thread
event = Event()
def waiter(event, nloops):
count=0
while(count<10):
print("%s. Waiting for the flag to be set." % (i+1))
event.wait() # Blocks until the flag becomes true.
print("Wait complete at:", time.ctime())
event.clear() # Resets the flag.
print('wait exit')
count=count+1
def setter(event, nloops):
for i in range(nloops):
time.sleep(random.randrange(2, 5)) # Sleeps for some time.
event.set()
threads = []
nloops = 10
threads.append(Thread(target=waiter, args=(event, nloops)))
threads[-1].start()
threads.append(Thread(target=setter, args=(event, nloops)))
threads[-1].start()
for thread in threads:
thread.join()
print("All done.")

Audio Timeout error in Speech to text API of Google Cloud

I aim to make my jarvis, which listens all the time and activates when I say hello. I learned that Google cloud Speech to Text API doesn't listen for more than 60 seconds, but then I found this not-so-famous link, where this listens for infinite duration. The author of github script says that, he has played a trick that script refreshes after 60 seconds, so that program doesn't crash.
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/cloud-client/transcribe_streaming_indefinite.py
Following is the modified version, since I wanted it to answer of my questions, followed by "hello", and not answer me all the time. Now if I ask my Jarvis, a question, which while answering takes more than 60 seconds and it doesn't get the time to refresh, the program crashes down :(
#!/usr/bin/env python
# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Google Cloud Speech API sample application using the streaming API.
NOTE: This module requires the additional dependency `pyaudio`. To install
using pip:
pip install pyaudio
Example usage:
python transcribe_streaming_indefinite.py
"""
# [START speech_transcribe_infinite_streaming]
from __future__ import division
import time
import re
import sys
import os
from google.cloud import speech
from pygame.mixer import *
from googletrans import Translator
# running=True
translator = Translator()
init()
import pyaudio
from six.moves import queue
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "C:\\Users\\mnauf\\Desktop\\rehandevice\\key.json"
from commands2 import commander
cmd=commander()
# Audio recording parameters
STREAMING_LIMIT = 55000
SAMPLE_RATE = 16000
CHUNK_SIZE = int(SAMPLE_RATE / 10) # 100ms
def get_current_time():
return int(round(time.time() * 1000))
def duration_to_secs(duration):
return duration.seconds + (duration.nanos / float(1e9))
class ResumableMicrophoneStream:
"""Opens a recording stream as a generator yielding the audio chunks."""
def __init__(self, rate, chunk_size):
self._rate = rate
self._chunk_size = chunk_size
self._num_channels = 1
self._max_replay_secs = 5
# Create a thread-safe buffer of audio data
self._buff = queue.Queue()
self.closed = True
self.start_time = get_current_time()
# 2 bytes in 16 bit samples
self._bytes_per_sample = 2 * self._num_channels
self._bytes_per_second = self._rate * self._bytes_per_sample
self._bytes_per_chunk = (self._chunk_size * self._bytes_per_sample)
self._chunks_per_second = (
self._bytes_per_second // self._bytes_per_chunk)
def __enter__(self):
self.closed = False
self._audio_interface = pyaudio.PyAudio()
self._audio_stream = self._audio_interface.open(
format=pyaudio.paInt16,
channels=self._num_channels,
rate=self._rate,
input=True,
frames_per_buffer=self._chunk_size,
# Run the audio stream asynchronously to fill the buffer object.
# This is necessary so that the input device's buffer doesn't
# overflow while the calling thread makes network requests, etc.
stream_callback=self._fill_buffer,
)
return self
def __exit__(self, type, value, traceback):
self._audio_stream.stop_stream()
self._audio_stream.close()
self.closed = True
# Signal the generator to terminate so that the client's
# streaming_recognize method will not block the process termination.
self._buff.put(None)
self._audio_interface.terminate()
def _fill_buffer(self, in_data, *args, **kwargs):
"""Continuously collect data from the audio stream, into the buffer."""
self._buff.put(in_data)
return None, pyaudio.paContinue
def generator(self):
while not self.closed:
if get_current_time() - self.start_time > STREAMING_LIMIT:
self.start_time = get_current_time()
break
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = self._buff.get()
if chunk is None:
return
data = [chunk]
# Now consume whatever other data's still buffered.
while True:
try:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
except queue.Empty:
break
yield b''.join(data)
def search(responses, stream, code):
responses = (r for r in responses if (
r.results and r.results[0].alternatives))
num_chars_printed = 0
for response in responses:
if not response.results:
continue
# The `results` list is consecutive. For streaming, we only care about
# the first result being considered, since once it's `is_final`, it
# moves on to considering the next utterance.
result = response.results[0]
if not result.alternatives:
continue
# Display the transcription of the top alternative.
top_alternative = result.alternatives[0]
transcript = top_alternative.transcript
# music.load("/home/pi/Desktop/rehandevice/end.mp3")
# music.play()
# Display interim results, but with a carriage return at the end of the
# line, so subsequent lines will overwrite them.
# If the previous result was longer than this one, we need to print
# some extra spaces to overwrite the previous result
overwrite_chars = ' ' * (num_chars_printed - len(transcript))
if not result.is_final:
sys.stdout.write(transcript + overwrite_chars + '\r')
sys.stdout.flush()
num_chars_printed = len(transcript)
else:
#print(transcript + overwrite_chars)
# Exit recognition if any of the transcribed phrases could be
# one of our keywords.
if code=='ur-PK':
transcript=translator.translate(transcript).text
print("Your command: ", transcript + overwrite_chars)
if "hindi assistant" in (transcript+overwrite_chars).lower():
cmd.respond("Alright. Talk to me in urdu",code=code)
main('ur-PK')
elif "english assistant" in (transcript+overwrite_chars).lower():
cmd.respond("Alright. Talk to me in English",code=code)
main('en-US')
cmd.discover(text=transcript + overwrite_chars,code=code)
for i in range(10):
print("Hello world")
break
num_chars_printed = 0
def listen_print_loop(responses, stream, code):
"""Iterates through server responses and prints them.
The responses passed is a generator that will block until a response
is provided by the server.
Each response may contain multiple results, and each result may contain
multiple alternatives; for details, see https://cloud.google.com/speech-to-text/docs/reference/rpc/google.cloud.speech.v1#streamingrecognizeresponse. Here we
print only the transcription for the top alternative of the top result.
In this case, responses are provided for interim results as well. If the
response is an interim one, print a line feed at the end of it, to allow
the next result to overwrite it, until the response is a final one. For the
final one, print a newline to preserve the finalized transcription.
"""
responses = (r for r in responses if (
r.results and r.results[0].alternatives))
music.load(r"C:\\Users\\mnauf\\Desktop\\rehandevice\\coins.mp3")
num_chars_printed = 0
for response in responses:
if not response.results:
continue
# The `results` list is consecutive. For streaming, we only care about
# the first result being considered, since once it's `is_final`, it
# moves on to considering the next utterance.
result = response.results[0]
if not result.alternatives:
continue
# Display the transcription of the top alternative.
top_alternative = result.alternatives[0]
transcript = top_alternative.transcript
# Display interim results, but with a carriage return at the end of the
# line, so subsequent lines will overwrite them.
#
# If the previous result was longer than this one, we need to print
# some extra spaces to overwrite the previous result
overwrite_chars = ' ' * (num_chars_printed - len(transcript))
if not result.is_final:
sys.stdout.write(transcript + overwrite_chars + '\r')
sys.stdout.flush()
num_chars_printed = len(transcript)
else:
print("Listen print loop", transcript + overwrite_chars)
# Exit recognition if any of the transcribed phrases could be
# one of our keywords.
if re.search(r'\b(hello)\b', transcript.lower(), re.I):
#print("Give me order")
music.play()
search(responses, stream,code)
break
elif re.search(r'\b(ہیلو)\b', transcript, re.I):
music.play()
search(responses, stream,code)
break
num_chars_printed = 0
def main(code):
cmd.respond("I am Rayhaan dot A Eye. How can I help you?",code=code)
client = speech.SpeechClient()
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=SAMPLE_RATE,
language_code='en-US',
max_alternatives=1,
enable_word_time_offsets=True)
streaming_config = speech.types.StreamingRecognitionConfig(
config=config,
interim_results=True)
mic_manager = ResumableMicrophoneStream(SAMPLE_RATE, CHUNK_SIZE)
print('Say "Quit" or "Exit" to terminate the program.')
with mic_manager as stream:
while not stream.closed:
audio_generator = stream.generator()
requests = (speech.types.StreamingRecognizeRequest(
audio_content=content)
for content in audio_generator)
responses = client.streaming_recognize(streaming_config,
requests)
# Now, put the transcription responses to use.
try:
listen_print_loop(responses, stream, code)
except:
listen
if __name__ == '__main__':
main('en-US')
# [END speech_transcribe_infinite_streaming]
You can call your functions after recognition in different thread. Example:
new_thread = Thread(target=music.play)
new_thread.daemon = True # Not always needed, read more about daemon property
new_thread.start()
Or if you want just to prevent exception - you can always use try/except. Example:
with mic_manager as stream:
while not stream.closed:
try:
audio_generator = stream.generator()
requests = (speech.types.StreamingRecognizeRequest(
audio_content=content)
for content in audio_generator)
responses = client.streaming_recognize(streaming_config,
requests)
# Now, put the transcription responses to use.
listen_print_loop(responses, stream, code)
except BaseException as e:
print("Exception occurred - {}".format(str(e)))

multiprocessing Queue deadlock when spawn multi threads in one process

I created two processes, one process that spawn multi threads is response for writing data to Queue, the other is reading data from Queue. It always deadblock in high frequent, fewer not. Especially when you add sleep in run method in write module(comment in codes). Let me put my codes below:
environments: python2.7
main.py
from multiprocessing import Process,Queue
from write import write
from read import read
if __name__ == "__main__":
record_queue = Queue()
table_queue = Queue()
pw = Process(target=write,args=[record_queue, table_queue])
pr = Process(target=read,args=[record_queue, table_queue])
pw.start()
pr.start()
pw.join()
pr.join()
write.py
from concurrent.futures import ThreadPoolExecutor, as_completed
def write(record_queue, table_queue):
thread_num = 3
pool = ThreadPoolExecutor(thread_num)
futures = [pool.submit(run, record_queue, table_queue) for _ in range (thread_num)]
results = [r.result() for r in as_completed(futures)]
def run(record_queue, table_queue):
while True:
if table_queue.empty():
break
table = table_queue.get()
# adding this code below reduce deadlock opportunity.
#import time
#import random
#time.sleep(random.randint(1, 3))
process_with_table(record_queue, table_queue, table)
def process_with_table(record_queue, table_queue, table):
#for short
for item in [x for x in range(1000)]:
record_queue.put(item)
read.py
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading
import Queue
def read(record_queue, table_queue):
count = 0
while True:
item = record_queue.get()
count += 1
print ("item: ", item)
if count == 4:
break
I googled it and there are same questions on SO, but i cant see the similarity compared with my code, so can anyone help my codes, thanks...
I seem to find a solution, change run method in write module to :
def run(record_queue, table_queue):
while True:
try:
if table_queue.empty():
break
table = table_queue.get(timeout=3)
process_with_table(record_queue, table_queue, table)
except multiprocessing.queues.Empty:
import time
time.sleep(0.1)
and never see deadlock or blocking on get method.

OpenCV video writer writes blank files due to memory leak?

I'm trying to save video files on a raspberry pi with python2.7 and opencv. The code shown below consistently saves several video files (size 16-18 Mb) to a usb but after the first few files the file sizes drop to 6 kb and appear to be empty since they won't open.
I opened task manager to monitor the memory usage by python during the saving and noticed that the RSS memory continually increases until roughly 200 MB which is when the video files start showing up blank.
Is that a sure indicator of a memory leak, or should I run other tests?
Is there something wrong in the below code that isn't releasing variables properly?
import cv2
import numpy as np
import datetime
dispWidth = 640
dispHeight = 480
FPS = 6
SetupNewVideoFile = True # state variable
VidCaptureDurationMinutes = 3
filepath = '/media/pi/9CEE-5383/Videos/'
i = 1 # counter for the video file names
fourcc = cv2.cv.CV_FOURCC('X','V','I','D')
while True:
# timer section that ends running file saves and triggers a new file save
Now = datetime.datetime.now() # refresh current time
delta = Now - VidCaptureStartTime
print('delta: ',delta.seconds,delta.days)
if ((delta.seconds/60) >= VidCaptureDurationMinutes) and delta.days >= 0:
print('delta: ',delta.seconds,delta.days)
SetupNewVideoFile = True
Vidoutput.release()
cap.release()
# setting up new file saves
if SetupNewVideoFile:
SetupNewVideoFile = False
title = "Video_"+str(i)+".avi"
i += 1
fullpath = filepath + title
print(fullpath)
Vidoutput = cv2.VideoWriter(fullpath, fourcc, FPS,(dispWidth,dispHeight))
VidCaptureStartTime = datetime.datetime.now() # updating video start time
cap = cv2.VideoCapture(-1) # start video capture
ret, frame = cap.read()
if ret: # display and save if a frame was successfully read
cv2.imshow('webcam',frame)
Vidoutput.write(frame) # save the frames
key = cv2.waitKey(1) & 0xFF
if key == ord('q'): # quits program
break
# clean up
cap.release()
Vidoutput.release()
cv2.destroyAllWindows()
cv2.waitKey(1) # these seem to be needed to flush the cv actions
cv2.waitKey(1)
cv2.waitKey(1)
cv2.waitKey(1)
After some more trouble shooting and help from the pympler module functions and tutorials (https://pythonhosted.org/Pympler/muppy.html) my problems still looked like a memory leak but I was unable to solve the specific error.
This other S.O. post (Releasing memory in Python) mentioned improvements in memory management made in version 3.3 of Python:
"In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory."
So I switched over to Python 3.3 and now the code saves valid video files well past the error out time I saw previously.
This isn't an answer as to why the blank files were occurring, but at least it's a solution.

How this code for parallel task works in Python?

I've been using a script (above) to run some task in parallel in an Ubuntu server with 16 processors, it actually works but I have a few questions about it:
What is the code actually doing?
As more workers I set up the script run faster, but what is the limit of workers?, I've run 100.
How could improve it?
#!/usr/bin/env python
from multiprocessing import Process, Queue
from executable import run_model
from database import DB
import numpy as np
def worker(work_queue, db_conection):
try:
for phone in iter(work_queue.get, 'STOP'):
registers_per_number = retrieve_CDRs(phone, db_conection)
run_model(np.array(registers_per_number), db_conection)
#print("The phone %s was already run" % (phone))
except Exception:
pass
return True
def retrieve_CDRs(phone, db_conection):
return db_conection.retrieve_data_by_person(phone)
def main():
phone_numbers = np.genfromtxt("../listado.csv", dtype="int")[:2000]
workers = 16
work_queue = Queue()
processes = []
#print("Process started with %s" % (workers))
for phone in phone_numbers:
work_queue.put(phone)
#print("Phone %s put at the queue" % (phone))
#print("The queue %s" % (work_queue))
for w in xrange(workers):
#print("The worker %s" % (w))
# new conection to data base
db_conection = DB()
p = Process(target=worker, args=(work_queue, db_conection))
p.start()
#print("Process %s started" % (p))
processes.append(p)
work_queue.put('STOP')
for p in processes:
p.join()
if __name__ == '__main__':
main()
Cheers!
At first, start from the main function:
It's creating an numpy array of 2000 integers type phone numbers from a CSV file.
Then creating some variables and lists.
Next, you are creating a queue with all the phone numbers that you extracted from the CSV file
Next, for the 16 workers, you are creating a DB connection for each, setting up the processing arguments and started the process for all the worker processors.
Hope that helps you to understand the code. Actually, it's kind of multi-threading you are trying and it's behaving like parallel processing. So, the more number you use, it becomes more faster. You should be able to use 2000 processors as my common sense says that. After that it's not meaningful as master-slave philosophy. Also, parallel processing suggests you to minimize the number of idle processors/workers. If you have more than 2000 workers, then you will have some idle workers which will reduce your performance. Finally, improving parallel processing needs to improve this kind of ideology.
Hope that helps. Cheers!