How can one disable the buildin print function depending on the server enviroment? The code below seems to be working
but I'm looking for a cleaner way to do it. I want to use this in a django app.
It would be nice if print kept working on localhost.
import sys
class MyFileWrapper(object):
def write(self, *args):
pass
def flush(self):
pass
if __name__=='__main__':
print('will be printed')
sys.stdout = MyFileWrapper()
print("won't be printed ")
If you just want to stop all writing to stdout, you can do sys.stdout = None (or, if you want to be slightly more pedantic, sys.stdout = open(os.devnull)). As far as changing behavior depending on your environment, you may be able to distinguish between them based on the results of (say) socket.gethostname(). Alternatively, you can set an environment variable on either the server or your local box (but not both) and then test os.environ for the variable's presence or value.
You may be better off using the built-in logging module instead of print() calls. That will allow much more fine-grained control over which things get logged and where the logs go.
Related
I am using flask. I have a few routes defined that are expensive as they need to access a database and do lengthy computations. The database connectivity is relying on the Flask-Mongoengine extension which relies on PyMongo which is not threadsafe.
Hence my thoughts are as follows:
#blueprint.route("/refresh/data", methods=['GET'])
def refresh_data():
cache.clear()
with Pool(4) as p:
print(p.map(func=f, iterable=["recently", "mtd", "ytd", "sector"]))
Get a small pool and call the function f. The function f is based on
def f(name):
print(current_app.extensions)
print(current_app.config)
current_app.extensions["mongoengine"] = MongoEngine(app=current_app)
print(current_app.extensions)
get(address="reports/{path}/json".format(path=name))
get(address="reports/{path}/html".format(path=name))
return name
The problem here is that one cannot init_app the MongoEngine again. In fact, extensions can be initialize only once but what happens it the extensions is needed on multiple threads and is not threadsafe?
We need some parameters handed over using a JWT token that are evaluated at a custom SecurityManager within Apache Superset. I read some information about g being able to handle information like this (in newer versions also across requests).
Ninja states it can read Flask's information from 'g' out of the box (docu Ninja Standard Context). This is a sample code snippet:
#expose('/login/', methods=['GET', 'POST'])
def login(self):
g.my_value = 'bar'
...
Still I do have issues when trying to use the information within SQLLab using e.g.
--{% set x = 'foo' %}
SELECT
{{x}},
{{g.my_value}}
FROM table_xyz
Will result in
g not defined
x is evaluated correctly to 'foo' if the g field is removed. Any hint if this is possible?
Accessing the flask.g context is not that straight forward as execution happens on the celery worker. The celery worker has no access to the Flask context (also note depending on configuration it might even run in async mode).
Internal JINJA variables like current_user needed some additional work in Supersets codebase to be made available, but there's no general mechanism to open up user defined JINJA variables out of the box at the time of writing.
I have a class-based view which triggers the composition and downloading of a report for a user.
Normally in def get of the class I just compile the report, add response['Content-Disposition'] = 'attachment; filename="somefilename.pdf"' and return response to a user.
The problem is that some reports are large and while they are compiling the request timeout happens.
I know that the right way of dealing with this is to delegate it to a background process (like Celery). But the problem is that it means that instead of creating a temporary file which ceases to exist the moment the user downloads a report, I have to store these reports somewhere, and write a cronjob which will regularly clean the reports directory.
Is there any more elegant way in Django to deal with this issue?
One solution less fancy than using celery is to use is Django's StreamingHttpResponse:
(https://docs.djangoproject.com/en/2.0/ref/request-response/#django.http.StreamingHttpResponse
With this, you use a generator function, which is a python function that uses yield to return its results as an iterator. This allows you to return the data as you generate it, rather than all at once at after you're finished. You can yield after each line or section of the report.. thus keeping a flow of data back to the browser.
But.. this only works if you are building up the finished file bit by bit.. for example, a CSV file. If you're returning something that you need to format all at once, for example if you're using something like wkhtmltopdf to generate a pdf file after you're done, then it's not as easy.
But there's still a solution:
What you can do in that case is, use StreamingHttpReponse along with a generator function to generate your report into a temporary file, instead of back to the browser. But as you are doing this, yield HTML snippets back to the browser which lets the user know the progress, eg:
def get(self, request, **kwargs):
# first you need a tempfile name.. do that however you like
tempfile = "kfjkdsjfksjfks"
# then you need to create a view which will open that file and serve it
# but I won't show that here.
# For security reasons it has to serve only out of one directory
# that is dedicated to this.
fetchurl = reverse('reportgetter_url') + '?file=' + tempfile
def reportgen():
yield 'Starting report generation..<br>'
# do some stuff to generate your report into the tempfile
yield 'Doing this..<br>'
# do this
yield 'Doing that..<br>'
# do that
yield 'Finished.<br>'
# when the browser receives this script, it'll go to fetchurl where
# you will send them the finished report.
yield '<script>document.location="%s";</script>' % fetchurl
return http.StreamingHttpResponse(reportgen())
That's not a complete example obviously, but should give you the idea.
When your user fetches this view, they will see the progress of the report as it comes along. At the end, you're sending the javacript which redirect the browser to the other view you will have to write which returns the response containing the finished file. When the browser gets this javacript, if the view returning the tempfile is setting the response Content-Disposition as an attachment before returning it, eg:
response['Content-Disposition'] = 'attachment; filename="%s"' % filename
..then the browser will stay on the current page showing your progress.. and simply pop up a file save dialog for the user.
For cleanup, you'll need a cron job regardless.. because if people don't wait around, they'll never pick up the report. Sometimes things don't work out... So you could just clean up files older than let's say 1 hour. For a lot of systems this is acceptable.
But if you want to clean up right away, what you can do, if you are on unix/linux, is to use an old unix filesystem trick: Files which are deleted while they are open are not really gone until they are closed. So, open your tempfile.. then delete it. Then return your response. As soon as the response has finished sending, the space used by the file will be freed.
PS: I should add.. that if you take this second approach, you could use one view to do both jobs.. just:
if `file` in request.GET:
# file= was in the url.. they are trying to get an already generated report
with open(thepathname) as f:
os.unlink(f)
# file has been 'deleted' but f is still a valid open file
response = HttpResponse( etc etc etc)
response['Content-Disposition'] = 'attachment; filename="thereport"'
response.write(f)
return response
else:
# generate the report
# as above
This is not really a Django question but a general architecture question.
You can always increase your server time outs but it would still, IMO, give you a bad user experience if the user has to sit watching the browser just spin.
Doing this on a background task is the only way to do it right. I don’t know how large the reports are, but using email can be a good solution. The background task simply generates the report, sends it via email and deletes it.
If the files are too large to send via email, then you will have to store them. Maybe send an email with a link and a message indicating the link will not work after X days/hours. Once you have a background worker, creating a daily or hourly clean up task would be very easy.
Hope it helps
There are a few topics on this, but none with a satisfactory answer.
I have a python application running in an IPython qt console
http://ipython.org/ipython-doc/dev/interactive/qtconsole.html
When I encounter an error, I'd like to be able to interact with the code at that point.
try:
raise Exception()
except Exception as e:
try: # use exception trick to pick up the current frame
raise None
except:
frame = sys.exc_info()[2].tb_frame.f_back
namespace = frame.f_globals.copy()
namespace.update(frame.f_locals)
import IPython
IPython.embed_kernel(local_ns=namespace)
I would think this would work, but I get an error:
RuntimeError: threads can only be started once
I just use this:
from IPython import embed; embed()
works better than anything else for me :)
Update:
In celebration of this answer receiving 50 upvotes, here are the updates I've made to this snippet in the intervening six years since it was posted.
First, I now like to import and execute in a single statement, as I use black for all my python code these days and it reformats the original snippet in a way that doesn't make sense in this specific and unusual context. So:
__import__("IPython").embed()
Given than I often use this inside a loop or a thread, it can be helpful to include a snippet that allows terminating the parent process (partly for convenience, and partly to remind myself of the best way to do it). os._exit is the best choice here, so my snippet includes this (same logic w/r/t using a single statement):
q = __import__("functools").partial(__import__("os")._exit, 0)
Then I can simply use q() if/when I want to exit the master process.
My full snippet (with # FIXME in case I would ever be likely to forget to remove it!) looks like this:
q = __import__("functools").partial(__import__("os")._exit, 0) # FIXME
__import__("IPython").embed() # FIXME
You can follow the following recipe to embed an IPython session into your program:
try:
get_ipython
except NameError:
banner=exit_msg=''
else:
banner = '*** Nested interpreter ***'
exit_msg = '*** Back in main IPython ***'
# First import the embed function
from IPython.frontend.terminal.embed import InteractiveShellEmbed
# Now create the IPython shell instance. Put ipshell() anywhere in your code
# where you want it to open.
ipshell = InteractiveShellEmbed(banner1=banner, exit_msg=exit_msg)
Then use ipshell() whenever you want to be dropped into an IPython shell. This will allow you to embed (and even nest) IPython interpreters in your code and inspect objects or the state of the program.
Introduction
I have the following code which checks to see if a similar model exists in the database, and if it does not it creates the new model:
class BookProfile():
# ...
def save(self, *args, **kwargs):
uniqueConstraint = {'book_instance': self.book_instance, 'collection': self.collection}
# Test for other objects with identical values
profiles = BookProfile.objects.filter(Q(**uniqueConstraint) & ~Q(pk=self.pk))
# If none are found create the object, else fail.
if len(profiles) == 0:
super(BookProfile, self).save(*args, **kwargs)
else:
raise ValidationError('A Book Profile for that book instance in that collection already exists')
I first build my constraints, then search for a model with those values which I am enforcing must be unique Q(**uniqueConstraint). In addition I ensure that if the save method is updating and not inserting, that we do not find this object when looking for other similar objects ~Q(pk=self.pk).
I should mention that I ham implementing soft delete (with a modified objects manager which only shows non-deleted objects) which is why I must check for myself rather then relying on unique_together errors.
Problem
Right thats the introduction out of the way. My problem is that when multiple identical objects are saved in quick (or as near as simultaneous) succession, sometimes both get added even though the first being added should prevent the second.
I have tested the code in the shell and it succeeds every time I run it. Thus my assumption is if say we have two objects being added Object A and Object B. Object A runs its check upon save() being called. Then the process saving Object B gets some time on the processor. Object B runs that same test, but Object A has not yet been added so Object B is added to the database. Then Object A regains control of the processor, and has allready run its test, even though identical Object B is in the database, it adds it regardless.
My Thoughts
The reason I fear multiprogramming could be involved is that each Object A and Object is being added through an API save view, so a request to the view is made for each save, thus not a single request with multiple sequential saves on objects.
It might be the case that Apache is creating a process for each request, and thus causing the problems I think I am seeing. As you would expect, the problem only occurs sometimes, which is characteristic of multiprogramming or multiprocessing errors.
If this is the case, is there a way to make the test and set parts of the save() method a critical section, so that a process switch cannot happen between the test and the set?
Based on what you've described, it seems reasonable to assume that multiple Apache processes are a source of problems. Are you able to replicate if you limit Apache to a single worker process?
Maybe the suggestions in this thread will help: How to lock a critical section in Django?
An alternative approach could be utilizing a queue. You'd just stick your objects to be saved into the queue and have another process doing the actual save. That way you could guarantee that objects were processed sequentially. This wouldn't work well if your application depends on having the object saved by the time the response is returned unless you also had the request processes wait on the result (watching a finished queue for example).
Updated
You may find this info useful. Mr. Dumpleton does a much better job of laying out the considerations than I could attempt to summarize here:
http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines especially the Defining Process Groups section.
http://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide Delegation to Daemon Process section
http://code.google.com/p/modwsgi/wiki/IntegrationWithDjango
Find the section of text toward the bottom of the page that begins with:
Now, traditional wisdom in respect of
Django has been that it should
perferably only be used on single
threaded servers. This would mean for
Apache using the single threaded
'prefork' MPM on UNIX systems and
avoiding the multithreaded 'worker'
MPM.
and read until the end of the page.
I have found a solution that I think might work:
import threading
def save(self, *args, **kwargs):
lock = threading.Lock()
lock.acquire()
try:
# Test and Set Code
finally:
lock.release()
It doesn't seam to break the save method like that decorator and thus far I have not seen the error again.
Unless anyone can say that this is not a correct solution, I think this works.
Update
The accepted answer was the inspiration for this change.
I seams I was under the impressions that locks were some sort of special voodoo that were exempt by normal logic. Here the lock = threading.Lock() is run each time, thus instantiating a new unlocked lock which may always be acquired.
I needed a single central lock for the purpose, but were could that go unless I had a thread running all the time holding the lock? The answer seamed to be to use file locks explained in this answer to the StackOverflow question mentioned in the accepted answer.
The following is that solution modified to suit my situation:
The Code
Th following is my modified DjangoLock. I wished to keep locks relative to the Django root, to do this I put a custom variable into the settings.py file.
# locks.py
import os
import fcntl
from django.conf import settings
class DjangoLock:
def __init__(self, filename):
self.filename = os.path.join(settings.LOCK_DIR, filename)
self.handle = open(self.filename, 'w')
def acquire(self):
fcntl.flock(self.handle, fcntl.LOCK_EX)
def release(self):
fcntl.flock(self.handle, fcntl.LOCK_UN)
def __del__(self):
self.handle.close()
And now the additional LOCK_DIR settings variable:
# settings.py
import os
PATH = os.path.abspath(os.path.dirname(__file__))
# ...
LOCK_DIR = os.path.join(PATH, 'locks')
That will now put locks in a folder named locks relative to the root of the Django project. Just make sure you give apache write access, in my case:
sudo chown www-data locks
And finally the usage is much the same as before:
import locks
def save(self, *args, **kwargs):
lock = locks.DjangoLock('ClassName')
lock.acquire()
try:
# Test and Set Code
finally:
lock.release()
This is now the implementation I am using and it seams to be working really well. Thanks to all who have contributed to the process of arriving at this end.
You need to use synchronization on the save method. I haven't tried this yet, but here's a decorator that can be used to do so.