Django share unpicklable objects between requests - django

I am currently working on a Django project that will hopefully make some transformations to videofiles thru the web. To make the transformation to the videos, I am using opencv's python API and I am also uing Dajax to perform ajax requests.
In the ajax requests file i have the following code:
#dajaxice_register
def transform_and_show(request, filename, folder, frame_count, img_number):
detector = Detector(filename) //Object which uses opencv API
dajax = Dajax()
generated_file = detector.detect_people_by_frame(folder, str(img_number))
dajax.assign('#video', 'src', '/media/generated'+folder+generated_file)
return dajax.json()
The idea is to tranform videos frame by frame and to display each transformed frame in the browser in an img tag giving the sensation to the user that he/she is watching the transformed video, so this method is called in a javascript loop.
The problem is that in this approach, the object "detector" is reinitialized in every iteration so it only generates the image corresponding to the first frame of the video. My idea was to workaround this issue by making "detector" persistent between requests so that the pointer to the next frame of the video wouldn't be set to 0 on every request.
The problem is that Dectector object is not picklable, meaning that it cannot be cached or saved to a session object.
Is there anything I can do to make it persistent between requests?
NOTE: I have considered using HTTP push approaches like APE or Orbit but since this is just an investigation project there is no real concern about performance.

Have you tried a module level variable to store the object?
Make "detector" a global at the file level.
detector = None
def transform():
global detector
if detector is None:
detector = Detector(filename)
file = detector.detect(....)

Related

returned array referencing the same memory spot but returning different values

I'm setting up a basic dynamic web page to display EC2 instance data and I need to be checking and passing arrays with the data inside to display with D3. Im using multiprocess to run the collection in the background.
Running python3.7 and the newest version of Flask.
app.py Code
#app.route('/experiment')
def experiment():
type = request.args.get('type')
resource = request.args.get('resource')
action = request.args.get('action')
if 'test' not in session:
thread = multiprocessing.Process(target=exp.transmitTest)
session['test'] = 'started'
thread.start()
print(f"Looking for Data at {hex(id(exp.getData()))} found {exp.getData()}")
return render_template('experiment.html', data=exp.getData(), type=request.args.get('type'), resource=request.args.get('resource'), action=request.args.get('action'))
Backend Code
def transmitTest(self):
for i in range(5):
self.data.append(random.randint(0,100))
time.sleep(4)
print(f"Data: {self.data} at {hex(id(self.data))}")
def getData(self):
return self.data
My JS scheduler runs '/experiment' every 5 seconds. The print statements show that the array im writing to and getting from the getter are at the same memory space, but one is empty and the other has the data. Can anyone help me understand this?
so I figured it out. When calling object methods in processes in flask python creates copies of the objects and then differentiates between the two copies even if they do take up the same memory space. I needed to add a backend queue through redisqueue (https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-xxii-background-jobs) so that I could make asynchronous calls to my backend without disrupting flask's routing.

How to refresh training in chatterbot bot in python?

I made a simple chatbot using chatterbot library and python. The way I trained it is, I made it read a few text files containing chat examples, and it learns how to reply to messages based on those training examples. The problem I am facing is - Even if I erase the contents of the training text files, and run the application, the chatbot still behaves in the same manner as before, i.e. it's memory doesn't get refreshed. I tried starting a new file and copy pasted the same code and changed the name of the program, but it still doesn't help. How do I solve this problem? Here is the code for reference:
from chatterbot.trainers import ListTrainer
from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer
import os
bot = ChatBot('trialBot')
bot.set_trainer(ListTrainer)
#directory containing training text files
mainDir = 'C:\\Users\\xyz\\Desktop\\trainfiles\\'
for _file in os .listdir(mainDir):
chats = open(mainDir + _file, 'r').readlines()
bot.train(chats)
while True:
request = raw_input('You: ')
response = bot.get_response(request)
print('Bot: ' + str(response))
It sounds like you might want to use an in-memory database so that the content is only persisted as long as the chat bot is running.
bot = ChatBot(
'trialBot',
database_uri=None
)
Setting database_uri to None will cause the chat bot to use a Sqlite database that is stored in-memory so store the knowledge that it is trained with. As a result, you will have a fresh database to work with each time you run your program.

Right way to delay file download in Django

I have a class-based view which triggers the composition and downloading of a report for a user.
Normally in def get of the class I just compile the report, add response['Content-Disposition'] = 'attachment; filename="somefilename.pdf"' and return response to a user.
The problem is that some reports are large and while they are compiling the request timeout happens.
I know that the right way of dealing with this is to delegate it to a background process (like Celery). But the problem is that it means that instead of creating a temporary file which ceases to exist the moment the user downloads a report, I have to store these reports somewhere, and write a cronjob which will regularly clean the reports directory.
Is there any more elegant way in Django to deal with this issue?
One solution less fancy than using celery is to use is Django's StreamingHttpResponse:
(https://docs.djangoproject.com/en/2.0/ref/request-response/#django.http.StreamingHttpResponse
With this, you use a generator function, which is a python function that uses yield to return its results as an iterator. This allows you to return the data as you generate it, rather than all at once at after you're finished. You can yield after each line or section of the report.. thus keeping a flow of data back to the browser.
But.. this only works if you are building up the finished file bit by bit.. for example, a CSV file. If you're returning something that you need to format all at once, for example if you're using something like wkhtmltopdf to generate a pdf file after you're done, then it's not as easy.
But there's still a solution:
What you can do in that case is, use StreamingHttpReponse along with a generator function to generate your report into a temporary file, instead of back to the browser. But as you are doing this, yield HTML snippets back to the browser which lets the user know the progress, eg:
def get(self, request, **kwargs):
# first you need a tempfile name.. do that however you like
tempfile = "kfjkdsjfksjfks"
# then you need to create a view which will open that file and serve it
# but I won't show that here.
# For security reasons it has to serve only out of one directory
# that is dedicated to this.
fetchurl = reverse('reportgetter_url') + '?file=' + tempfile
def reportgen():
yield 'Starting report generation..<br>'
# do some stuff to generate your report into the tempfile
yield 'Doing this..<br>'
# do this
yield 'Doing that..<br>'
# do that
yield 'Finished.<br>'
# when the browser receives this script, it'll go to fetchurl where
# you will send them the finished report.
yield '<script>document.location="%s";</script>' % fetchurl
return http.StreamingHttpResponse(reportgen())
That's not a complete example obviously, but should give you the idea.
When your user fetches this view, they will see the progress of the report as it comes along. At the end, you're sending the javacript which redirect the browser to the other view you will have to write which returns the response containing the finished file. When the browser gets this javacript, if the view returning the tempfile is setting the response Content-Disposition as an attachment before returning it, eg:
response['Content-Disposition'] = 'attachment; filename="%s"' % filename
..then the browser will stay on the current page showing your progress.. and simply pop up a file save dialog for the user.
For cleanup, you'll need a cron job regardless.. because if people don't wait around, they'll never pick up the report. Sometimes things don't work out... So you could just clean up files older than let's say 1 hour. For a lot of systems this is acceptable.
But if you want to clean up right away, what you can do, if you are on unix/linux, is to use an old unix filesystem trick: Files which are deleted while they are open are not really gone until they are closed. So, open your tempfile.. then delete it. Then return your response. As soon as the response has finished sending, the space used by the file will be freed.
PS: I should add.. that if you take this second approach, you could use one view to do both jobs.. just:
if `file` in request.GET:
# file= was in the url.. they are trying to get an already generated report
with open(thepathname) as f:
os.unlink(f)
# file has been 'deleted' but f is still a valid open file
response = HttpResponse( etc etc etc)
response['Content-Disposition'] = 'attachment; filename="thereport"'
response.write(f)
return response
else:
# generate the report
# as above
This is not really a Django question but a general architecture question.
You can always increase your server time outs but it would still, IMO, give you a bad user experience if the user has to sit watching the browser just spin.
Doing this on a background task is the only way to do it right. I don’t know how large the reports are, but using email can be a good solution. The background task simply generates the report, sends it via email and deletes it.
If the files are too large to send via email, then you will have to store them. Maybe send an email with a link and a message indicating the link will not work after X days/hours. Once you have a background worker, creating a daily or hourly clean up task would be very easy.
Hope it helps

PyMongo find query returns empty/partial cursor when running in a Django+uWsgi project

We developed a REST API using Django & mongoDB (PyMongo driver). The problem is that, on some requests to the API endpoints, PyMongo cursor returns a partial response which contains less documents than it should (but it’s a completely valid JSON document).
Let me explain it with an example of one of our views:
def get_data(key):
return collection.find({'key': key}, limit=24)
def my_view(request):
key = request.POST.get('key')
query = get_data(key)
res = [app for app in query]
return JsonResponse({'list': res})
We're sure that there is more than 8000 documents matching the query, but in
some calls we get less than 24 results (even zero). The first problem we've
investigated was that we had more than one MongoClient definition in our code. By resolving this, the number of occurrences of the problem decreased, but we still had it in a lot of calls.
After all of these investigations, we've designed a test in which we made 16 asynchronous requests at the same time to the server. With this approach, we could reproduce the problem. On each of these 16 requests, 6-8 of them had partial results. After running this test we reduced uWsgi’s number of processes to 6 and restarted the server. All results were good but after applying another heavy load on the server, the problem began again. At this point, we restarted uwsgi service and again everything was OK. With this last experiment we have a clue now that when the uwsgi service starts running, everything is working correctly but after a period of time and heavy load, the server begins to return partial or empty results again.
The latest investigation we had was to run the API using python manage.py with DEBUG=False, and we had the problem again after a period of time in this situation.
We can't figure out what the problem is and how to solve it. One reason that we can think of is that Django closes pymongo’s connections before completion. Because the returned result is a valid JSON.
Our stack is:
nginx (with no cache enabled)
uWsgi
MemCached (disabled during debugging procedure)
Django (v1.8 on python 3)
PyMongo (v3.0.3)
Your help is really appreciated.
Update:
Mongo version:
db version v3.0.7
git version: 6ce7cbe8c6b899552dadd907604559806aa2e9bd
We are running single mongod instance. No sharding/replicating.
We are creating connection using this snippet:
con = MongoClient('localhost', 27017)
Update 2
Subject thread in Pymongo issue tracker.
Pymongo cursors are not thread safe elements. So using them like what I did in a multi-threaded environment will cause what I've described in question. On the other hand Python's list operations are mostly thread safe, and changing snippet like this will solve the problem:
def get_data(key):
return list(collection.find({'key': key}, limit=24))
def my_view(request):
key = request.POST.get('key')
query = get_data(key)
res = [app for app in query]
return JsonResponse({'list': res})
My very speculative guess is that you are reusing a cursor somewhere in your code. Make sure you are initializing your collection within the view stack itself, and not outside of it.
For example, as written, if you are doing something like:
import ...
import con
collection = con.documents
# blah blah code
def my_view(request):
key = request.POST.get('key')
query = collection.find({'key': key}, limit=24)
res = [app for app in query]
return JsonResponse({'list': res})
You could end us reusing a cursor. Better to do something like
import ...
import con
# blah blah code
def my_view(request):
collection = con.documents
key = request.POST.get('key')
query = collection.find({'key': key}, limit=24)
res = [app for app in query]
return JsonResponse({'list': res})
EDIT at asker's request for clarification:
The reason you need to define the collection within the view stack and not when the file loads is that the collection variable has a cursor, which is basically how the database and your application talk to each other. Cursors do things like keep track of where you are in a long list of data, in addition to a bunch of other stuff, but thats the important part.
When you create the collection cursor outside the view method, it will re-use the cursor on each request if it exists. So, if you make one request, and then another really, really fast right after that (like what happened when you applied high load), the cursor might only be half way through talking to the database, and so some of your data goes to the first request, and some to the second. The reason you would get NO data in a request would be if a cursor finished fetching data but hadn't been closed yet, so the next request tried to fetch data from the cursor, and there was none left to fetch in the query.
By moving the collection definition (and by association, the cursor definition) into the view stack, you will ALWAYS get a new cursor when you process a new request. You wont get any cross talking between your cursors and different requests, as each request cycle will have its own.

Playing a mp3 file using an Embed tag where the source is ashx file and byte array from database

I currently store text to speech mp3 files as varbinary(max) in the database. what I want to do is play those audio files using the embed tag where the source is ashx file that will recieve the id of the database record and write the byte array.
My ashx file has the following code
byte[] byteArray = ttsMessage.MessageContents;
context.Response.Buffer = true;
context.Response.Clear();
context.Response.ClearContent();
context.Response.ClearHeaders();
context.Response.ContentType = "audio/mpeg";
context.Response.OutputStream.Write(byteArray, 0, byteArray.Length);
context.Response.End();
The call from the aspx page is as follows
Panel5.Controls.Add(new LiteralControl(String.Format("<embed src='/TestArea/PreviewWav.ashx?source={0}' type='audio/mpeg' height='60px' width='144px'/>", ttsMessage.Id.ToString())));
I have gotten this to work with the following
Panel5.Controls.Add(new LiteralControl(String.Format("<audio controls='controls' autoplay='autoplay'><source src='/TestArea/PreviewWav.ashx?source={0}' type='audio/x-wav' /></audio>", ttsMessage.Id.ToString())));
Using the audio tag but cannot seem to get it to work with the embed tag.
I am using IE9/VS2010
Any ideas?
I think the wrong thing with embed tag is that...
Embed tag call a plugin like winmediaplayer ocx than handler firstly called from web page than media plugin get the ashx url than it started to call handler.
But web page's request and mediaplayer plugin's requests are diffent so if you check users Authentication or some other header information it fails.
You can easily see that on fiddler utility. On fiddler top-right side shows the request info. there is a user-agent part. Look it carefully.
How many requests happen from your handler,notice them. What are the differs. for each reqs.
If you have this issue,
You may use a ticket system or redirect a safety area for download without header or other request checks. Sadly web page requst cannot complately transfer media player and others.
hope helps