display live camera feed on a webpage with python - flask

So I'm trying to display a preview of human detection done on raspberry pi on a webpage.
I already saw, and tried this proposed solution. My issue with it is that the processing is done only when the page is viewed, for obvious reasons. I want the processing to happen independent on whether the preview is active, and when the page is viewed for it to be simply attached "on top".
Having a separate thread for processing looks like a probable solution, but due to flask's event driven approach I'm struggling to figure out how to safely pass the frames between threads (pre processing takes reasonable time, and if i simply use locks to guard, sometimes times it raises the exceptions), and generally am not sure if that the best way to solve the problem.
Is multithreading the way to go? Or should I maybe choose some other library other then flask for that purpose?

From the example you posted, you can use the VideoCamera class but split get_frame into two functions. One that retrieves the frame and processes it (update_frame), then another that returns the latest frame in the encoding you need for flask (get_frame). Simply run update_frame in a separate thread and that should work.
Its probably best practice to store the new frame in a local variable first, then use a lock to read/write to the instance variable you are storing the latest frame in. But I'll spare the example code that implementation.
class VideoCamera(object):
def __init__(self):
self.video = cv2.VideoCapture(0)
self._last_frame = None
def __del__(self):
self.video.release()
def update_frame(self):
ret, frame = self.video.read()
# DO WHAT YOU WANT WITH TENSORFLOW / KERAS AND OPENCV
# Perform mutual exclusion here
self._last_frame = frame
def get_frame():
# Perform mutual exclusion here
frame = self._last_frame
ret, jpeg = cv2.imencode('.jpg', frame)
return jpeg.tobytes()

Related

Launching all threads at exactly the same time in C++

I have Rosbag file which contains messages on various topics, each topic has its own frequency. This data has been captured from a hardware device streaming data, and data from all topics would "reach" at the same time to be used for different algorithms.
I wish to simulate this using the rosbag file(think of it as every topic has associated an array of data) and it is imperative that this data streaming process start at the same time so that the data can be in sync.
I do this via launching different publishers on different threads (I am open to other approaches as well, this was the only one I could think of.), but the threads do not start at the same time, by the time thread 3 starts, thread 1 would be considerably ahead.
How may I achieve this?
Edit - I understand that launching at the exact same time is not possible, but maybe I can get away with a launch extremely close to each other as well. Is there any way to ensure this?
Edit2 - Since the main aim is to get the data stream in Sync, I was wondering about the warmup effect of the thread(suppose a thread1 starts from 3.3GHz and reaches to 4.2GHz by the time thread2 starts at 3.2). Would this have a significant effect (I can always warm them up before starting the publishing process, but I am curious whether it would have a pronounced effect)
TIA
As others have stated in the comments you cannot guarantee threads launch at exactly the same time. To address your overall goal: you're going about solving this problem the wrong way, from a ROS perspective. Instead of manually publishing data and trying to get it in sync, you should be using the rosbag api. This way you can actually guarantee messages have the same timestamp. Note that this doesn't guarantee they will be sent out at the exact same time, because they won't. You can put a message into a bag file directly like this
import rosbag
from std_msgs.msg import Int32, String
bag = rosbag.Bag('test.bag', 'w')
try:
s = String()
s.data = 'foo'
i = Int32()
i.data = 42
bag.write('chatter', s)
bag.write('numbers', i)
finally:
bag.close()
For more complex types that include a Header field simply edit the header.stamp portion to keep timestamps consistent

django, multi-databases (writer, read-reploicas) and a sync issue

So... in response to an API call I do:
i = CertainObject(paramA=1, paramB=2)
i.save()
now my writer database has a new record.
Processing can take a bit and I do not wish to hold off my response to the API caller, so the next line I am transferring the object ID to an async job using Celery:
run_async_job.delay(i.id)
right away, or a few secs away depending on the queue run_async_job tried to load up the record from the database with that ID provided. It's a gamble. Sometimes it works, sometimes doesn't depending whether the read replicas updated or not.
Is there pattern to guarantee success and not having to "sleep" for a few seconds before reading or hope for good luck?
Thanks.
The simplest way seems to be using the retries as mentioned by Greg and Elrond in their answers. If you're using shared_task or #app.task decorators, you can use the following code snippet.
#shared_task(bind=True)
def your_task(self, certain_object_id):
try:
certain_obj = CertainObject.objects.get(id=certain_object_id)
# Do your stuff
except CertainObject.DoesNotExist as e:
self.retry(exc=e, countdown=2 ** self.request.retries, max_retries=20)
I used an exponential countdown in between every retry. You can modify it according to your needs.
You can find the documentation for custom retry delay here.
There is also another document explaining the exponential backoff in this link
When you call retry it’ll send a new message, using the same task-id, and it’ll take care to make sure the message is delivered to the same queue as the originating task. You can read more about this in the documentation here
As writing and then loading it immediately is a high priority, then why not store it in memory based DB like Memcache or Redis. So that after sometime, you can write it in the Database using a periodic job in celery which will run lets say every minute or so. When it is done writing to DB, it will delete the keys from Redis/Memcache.
You can keep the data in memory based DB for certain time, lets say 1 hour when the data is needed most. Also you can create a service method, which will check if the data is in memory or not.
Django Redis is a great package to connect to redis(if you are using it as broker in Celery).
I am providing some example based on Django cache:
# service method
from django.core.cache import cache
def get_object(obj_id, model_cls):
obj_dict = cache.get(obj_id, None) # checks if obj id is in cache, O(1) complexity
if obj_dict:
return model_cls(**obj_dict)
else:
return model_cls.objects.get(id=obj_id)
# celery job
#app.task
def store_objects():
logger.info("-"*25)
# you can use .bulk_create() to reduce DB hits and faster DB entries
for obj_id in cache.keys("foo_*"):
CertainObject.objects.create(**cache.get(obj_id))
cache.delete(obj_id)
logger.info("-"*25)
The simplest solution would be to catch any DoesNotExist errors thrown at the start of the task, then schedule a retry. This can be done by converting run_async_job into a Bound Task:
#app.task(bind=True)
def run_async_job(self, object_id):
try:
instance = CertainObject.objects.get(id=object_id)
except CertainObject.DoesNotExist:
return self.retry(object_id)
This article goes pretty deep into how you can handle read-after-write issues with replicated databases: https://medium.com/box-tech-blog/how-we-learned-to-stop-worrying-and-read-from-replicas-58cc43973638.
Like the author, I know of no foolproof catch-all way to handle read-after-write inconsistency.
The main strategy I've used before is to have some kind of expect_and_get(pk, max_attempts=10, delay_seconds=5) method that attempts to fetch the record, and attempts it max_attempts times, delaying delay_seconds seconds in between attempts. The idea is that it "expects" the record to exist, and so it treats a certain number of failures as just transient DB issues. It's a little more reliable than just sleeping for some time since it will pick up records quicker and hopefully delay the job execution much less often.
Another strategy would be to delay returning from a special save_to_read method until read replicas have the value, either by synchronously pushing the new value to the read replicas somehow or just polling them all until they return the record. This way seems a little hackier IMO.
For a lot of your reads, you probably don't have to worry about read-after-write consistency:
If we’re rendering the name of the enterprise a user is part of, it’s really not that big a deal if in the incredibly rare occasion that an admin changes it, it takes a minute to have the change propagate to the enterprise’s users.

Django: for loop through parallel process and store values and return after it finishes

I have a for loop in django. It will loop through a list and get the corresponding data from database and then do some calculation based on the database value and then append it another list
def getArrayList(request):
list_loop = [...set of values to loop through]
store_array = [...store values here from for loop]
for a in list_loop:
val_db = SomeModel.objects.filter(somefield=a).first()
result = perform calculation on val_db
store_array.append(result)
The list if 10,000 entries. If the user want this request he is ready to wait and will be informed that it will take time
I have tried joblib with backed=threading its not saving much time than normal loop
But when i try with backend=multiprocessing. it says "Apps aren't loaded yet"
I read multiprocessing is not possible in module based files.
So i am looking at celery now. I am not sure how can this be done in celery.
Can any one guide how can we faster the for loop calculation using mutliprocessing techniques available.
You're very likely looking for the wrong solution. But then again - this is pseudo code so we can't be sure.
In either case, your pseudo code is a self-fulfilling prophecy, since you run queries in a for loop. That means network latency, result set fetching, tying up database resources etc etc. This is never a good pattern, at best it's a last resort.
The simple solution is to get all values in one query:
list_values = [ ... ]
results = []
db_values = SomeModel.objects.filter(field__in=list_values)
for value in db_values:
results.append(calc(value))
If for some reason you need to loop, then to do this in celery, you would mark the function as a task (plenty of examples to find). It won't speed up anything. But you won't speed up anything - it will we be run in the background and so you render a "please wait" message and somehow you need to notify the user again that the job is done.
I'm saying somehow, because there isn't a really good integration package that I'm aware of that ties in all the components. There's django-notifications-hq, but if this is your only background task, it's a lot of extra baggage just for that - so you may want to change the notification part to "we will send you an email when the job is done", cause that's easy to achieve inside your function.
And thirdly, if this is simply creating a report, that doesn't need things like automatic retries on failure, then you can simply opt to use Django Channels and a browser-native websocket to start and report on the job (which also allows you to send email).
You could try concurrent.futures.ProcessPoolExecutor, which is a high level api for processing cpu bound tasks
def perform_calculation(item):
pass
# specify number of workers(default: number of processors on your machine)
with concurrent.futures.ProcessPoolExecutor(max_workers=6) as executor:
res = executor.map(perform_calculation, tasks)
EDIT
In case of IO bound operation, you could make use of ThreadPoolExecutor to open a few connections in parallel, you can wrap the pool in a contextmanager which handles the cleanup work for you(close idle connections). Here is one example but handles the connection closing manually.

How to run multiple threads on buffered data in twisted?

Image of an abstract model
I have implemented this model using python-socketIO, however I am unable to do something similar in twisted. I have a feed of data coming in, I take this read this data in blocks of 8192 bytes. This is binary data, so it needs to be processed before sending it to clients. However I can't stop the input of data for processing. In python-socketIO I used to put the function to work on this data as a background task, while I continue to fetch more data. Any ideas how I can do something similar in using twisted library of python.
By "background task" I suppose you mean a thread. You can work with threads using Twisted. The most approach with the fewest concepts involved is deferToThread:
from twisted.internet.protocol import Protocol
from twisted.internet.threads import deferToThread
class YourProtocol(Protocol):
def dataReceived(self, data):
d = deferToThread(your_process_data, data)
d.addCallback(your_result_handler)
d.addErrback(your_error_handler)

Timer object inconsistently accessible

I am starting a Python Timer in a Django view and I am using another Django view to cancel it. However, I find that I cannot access the Timer object consistently when I am trying to cancel it.
The code in my "views.py" looks like this:
import threading
myTimer = None
def f():
pass
def startTimer(request):
global myTimer
myTimer = threading.Timer(10000, f)
myTimer.start()
pass
def stopTimer(request):
if myTimer != None:
myTimer.cancel()
else:
print("No timer found.")
pass
When I try to cancel the timer, many times, I get the "No timer found." message. After some tries, seemingly in a random fashion, the Timer object is found and the cancellation succeeds. This phenomenon happens only when I run the code on the server. When the code runs on my local machine, this problem never happens.
You must never use global objects like this in a server environment. Your server almost certainly has multiple processes, each of which have their own local namespaces, so the timer won't be shared between them.
A second reason is that you will likely have multiple users for your site; all of them will have access to the same global variables in each process.
I'm not really sure what you're doing here, but one way of doing a per-user timer would be to use the session to store the current time when the user hits start, and then calculate the difference from that time when they click end.