Image of an abstract model
I have implemented this model using python-socketIO, however I am unable to do something similar in twisted. I have a feed of data coming in, I take this read this data in blocks of 8192 bytes. This is binary data, so it needs to be processed before sending it to clients. However I can't stop the input of data for processing. In python-socketIO I used to put the function to work on this data as a background task, while I continue to fetch more data. Any ideas how I can do something similar in using twisted library of python.
By "background task" I suppose you mean a thread. You can work with threads using Twisted. The most approach with the fewest concepts involved is deferToThread:
from twisted.internet.protocol import Protocol
from twisted.internet.threads import deferToThread
class YourProtocol(Protocol):
def dataReceived(self, data):
d = deferToThread(your_process_data, data)
d.addCallback(your_result_handler)
d.addErrback(your_error_handler)
Related
So I'm trying to display a preview of human detection done on raspberry pi on a webpage.
I already saw, and tried this proposed solution. My issue with it is that the processing is done only when the page is viewed, for obvious reasons. I want the processing to happen independent on whether the preview is active, and when the page is viewed for it to be simply attached "on top".
Having a separate thread for processing looks like a probable solution, but due to flask's event driven approach I'm struggling to figure out how to safely pass the frames between threads (pre processing takes reasonable time, and if i simply use locks to guard, sometimes times it raises the exceptions), and generally am not sure if that the best way to solve the problem.
Is multithreading the way to go? Or should I maybe choose some other library other then flask for that purpose?
From the example you posted, you can use the VideoCamera class but split get_frame into two functions. One that retrieves the frame and processes it (update_frame), then another that returns the latest frame in the encoding you need for flask (get_frame). Simply run update_frame in a separate thread and that should work.
Its probably best practice to store the new frame in a local variable first, then use a lock to read/write to the instance variable you are storing the latest frame in. But I'll spare the example code that implementation.
class VideoCamera(object):
def __init__(self):
self.video = cv2.VideoCapture(0)
self._last_frame = None
def __del__(self):
self.video.release()
def update_frame(self):
ret, frame = self.video.read()
# DO WHAT YOU WANT WITH TENSORFLOW / KERAS AND OPENCV
# Perform mutual exclusion here
self._last_frame = frame
def get_frame():
# Perform mutual exclusion here
frame = self._last_frame
ret, jpeg = cv2.imencode('.jpg', frame)
return jpeg.tobytes()
I have Rosbag file which contains messages on various topics, each topic has its own frequency. This data has been captured from a hardware device streaming data, and data from all topics would "reach" at the same time to be used for different algorithms.
I wish to simulate this using the rosbag file(think of it as every topic has associated an array of data) and it is imperative that this data streaming process start at the same time so that the data can be in sync.
I do this via launching different publishers on different threads (I am open to other approaches as well, this was the only one I could think of.), but the threads do not start at the same time, by the time thread 3 starts, thread 1 would be considerably ahead.
How may I achieve this?
Edit - I understand that launching at the exact same time is not possible, but maybe I can get away with a launch extremely close to each other as well. Is there any way to ensure this?
Edit2 - Since the main aim is to get the data stream in Sync, I was wondering about the warmup effect of the thread(suppose a thread1 starts from 3.3GHz and reaches to 4.2GHz by the time thread2 starts at 3.2). Would this have a significant effect (I can always warm them up before starting the publishing process, but I am curious whether it would have a pronounced effect)
TIA
As others have stated in the comments you cannot guarantee threads launch at exactly the same time. To address your overall goal: you're going about solving this problem the wrong way, from a ROS perspective. Instead of manually publishing data and trying to get it in sync, you should be using the rosbag api. This way you can actually guarantee messages have the same timestamp. Note that this doesn't guarantee they will be sent out at the exact same time, because they won't. You can put a message into a bag file directly like this
import rosbag
from std_msgs.msg import Int32, String
bag = rosbag.Bag('test.bag', 'w')
try:
s = String()
s.data = 'foo'
i = Int32()
i.data = 42
bag.write('chatter', s)
bag.write('numbers', i)
finally:
bag.close()
For more complex types that include a Header field simply edit the header.stamp portion to keep timestamps consistent
I have a for loop in django. It will loop through a list and get the corresponding data from database and then do some calculation based on the database value and then append it another list
def getArrayList(request):
list_loop = [...set of values to loop through]
store_array = [...store values here from for loop]
for a in list_loop:
val_db = SomeModel.objects.filter(somefield=a).first()
result = perform calculation on val_db
store_array.append(result)
The list if 10,000 entries. If the user want this request he is ready to wait and will be informed that it will take time
I have tried joblib with backed=threading its not saving much time than normal loop
But when i try with backend=multiprocessing. it says "Apps aren't loaded yet"
I read multiprocessing is not possible in module based files.
So i am looking at celery now. I am not sure how can this be done in celery.
Can any one guide how can we faster the for loop calculation using mutliprocessing techniques available.
You're very likely looking for the wrong solution. But then again - this is pseudo code so we can't be sure.
In either case, your pseudo code is a self-fulfilling prophecy, since you run queries in a for loop. That means network latency, result set fetching, tying up database resources etc etc. This is never a good pattern, at best it's a last resort.
The simple solution is to get all values in one query:
list_values = [ ... ]
results = []
db_values = SomeModel.objects.filter(field__in=list_values)
for value in db_values:
results.append(calc(value))
If for some reason you need to loop, then to do this in celery, you would mark the function as a task (plenty of examples to find). It won't speed up anything. But you won't speed up anything - it will we be run in the background and so you render a "please wait" message and somehow you need to notify the user again that the job is done.
I'm saying somehow, because there isn't a really good integration package that I'm aware of that ties in all the components. There's django-notifications-hq, but if this is your only background task, it's a lot of extra baggage just for that - so you may want to change the notification part to "we will send you an email when the job is done", cause that's easy to achieve inside your function.
And thirdly, if this is simply creating a report, that doesn't need things like automatic retries on failure, then you can simply opt to use Django Channels and a browser-native websocket to start and report on the job (which also allows you to send email).
You could try concurrent.futures.ProcessPoolExecutor, which is a high level api for processing cpu bound tasks
def perform_calculation(item):
pass
# specify number of workers(default: number of processors on your machine)
with concurrent.futures.ProcessPoolExecutor(max_workers=6) as executor:
res = executor.map(perform_calculation, tasks)
EDIT
In case of IO bound operation, you could make use of ThreadPoolExecutor to open a few connections in parallel, you can wrap the pool in a contextmanager which handles the cleanup work for you(close idle connections). Here is one example but handles the connection closing manually.
I have a piece of software in QT framework (c++) that's suppose to dispatch processed (local) data to other servers and receive the same (foreign) data processed on other servers and compare it.
Problem occurs when a large amount of local data is processed foreign data is buffered and doesn't go into comparison process until all local data is sent. I need the data to be compared in certain time frame, so this causes a timeout.
An idea was to to use one thread to dispatch local data and another thread to receive and compare foreign data. QTcpServer will probably need a mutex to protect it from simultaneous reading and writing.
Is this possible to do with one connection or would it be better to have one connection for dispatching and one for receiving in QT environment?
I checked the Fortune server example
http://doc.qt.io/qt-5/qtnetwork-threadedfortuneserver-example.html
but I need to know if it's possible and logical to use different threads for sending and receiving on the same connection.
PS. I'm new to multi-threading so I apologise if I misunderstood some concepts.
Without seeing any code, it's difficult to definitively answer this question. However, this may set you on the right track...
I wouldn't expect you'd need different threads for sending / receiving data; QTcpSocket is asynchronous.
It sounds like the architecture you're using to process the data may need revising.
foreign data is buffered and doesn't go into comparison process until all local data is sent
That sounds like more of an issue and the area where multi-threading would be beneficial. So, use multi-threading for processing the data, rather than controlling the communication between servers.
As you state you're new to multi-threading, I suggest starting by reading this article and using its examples as a template.
This is probably a question about python callbacks as much as using pika. I'm trying to develop some code that subscribes to a queue in RabbitMQ, processes the payload of any delivered message and then write that payload to a series of (disk) files. So using the simple "Hello World" example at http://www.rabbitmq.com/tutorials/tutorial-one-python.html, I've added in logic to the callback function (that is co-incidentally called "callback") to write any received message payloads to a file.
Here's the main problem: I want to write some additional code that, if a certain time period has elapsed, for example 300sec (5 mins), then the process should close the file and create a new one and write any subsequent new messages received to that. And so on ...
BUT - the issue as I see it is that the callback function ONLY gets called when a message arrives in the queue. I think I need some process outside of that callback function that measures elapsed time ....
The rationale is that I want to create a set of disk files (all have unique names based on timestamp) that contain received messages in the MQ queue. If messages are slow in coming, then I close the current open file (so it can be processed further downstream) and open up another.
I also notice that after issuing the start consuming call (channel.start_consuming) then no code under that is reached - why ?
I've played around with python's multiprocessing module but no luck so far.
Here's some skeleton code with pseudo-code comments :-
#!/usr/bin/env python
import pika
connection = pika.BlockingConnection(pika.ConnectionParameters(
host='localhost'))
channel = connection.channel()
channel.queue_declare(queue='hello')
print ' [*] Waiting for messages. To exit press CTRL+C'
def callback(ch, method, properties, body):
print " [x] Received %r" % (body,)
# want to put code here to write message payloads to a file (unique name)
# if n secs have elapsed then close the file and create a new file
channel.basic_consume(callback,queue='hello',no_ack=True)
channel.start_consuming()
Thanks !
It might be worth taking a look at an alternative implementation to Pika. As Pika is blocking by nature, it makes it difficult create something like this. You would essentially need another thread to watch the IO, to see if anything has been written within the last five minutes, else close it.
You could also keep a timestamp, and once you get a new callback if enough time has passed, you can close the file, and create a new file. This would however keep the file open for longer durations, but prevent the data from exceeding five minutes worth.
However, I would recommend that you take a look at Puka instead. It is a non-blocking alternative to Pika that would allow you to easier implement a solution to your problem.