Accessing the properties of the mongoengine instance in a Flask app - flask

I've registered the flask-mongoengine extension in my Flask app and initialized it.
I now want to access its conn property as I want to do pure MongoDB queries too.
>> app.extensions
{'csrf': <flask_wtf.csrf.CSRFProtect object at 0x00000213B72AB940>,
'mail': <flask_mail._Mail object at 0x00000213B72ABCF8>,
'mongoengine': {
<flask_mongoengine.MongoEngine object at 0x00000213B72ABDD8>: {
'app': <Flask 'app'>,
'conn': MongoClient(host=['xxx'], document_class=dict, tz_aware=False, connect=True, ssl=True, replicaset='Cluster0-shard-0', authsource='admin', retrywrites=True, read_preference=Primary())
}
},
'rq2': <flask_rq2.app.RQ object at 0x00000213B5DE8940>,
'security': <flask_security.core._SecurityState object at 0x00000213B734EE10>
}
Is there a way to access the conn property other than the very convoluted (and error-prone):
>> list(app.extensions.get('mongoengine').values())[0].get('conn')
MongoClient(host=['xxx'], document_class=dict, tz_aware=False, connect=True, ssl=True, replicaset='Cluster0-shard-0', authsource='admin', retrywrites=True, read_preference=Primary())
Does flask-mongoengine have a method to access its properties?

The MongoEngine instance has a connection attribute, use that.
You don't need to use app.extensions; that's more of an internal data structure for extensions to keep track of their own state when they need to access this from the current app context.
In your own code, just keep a reference to the MongoEngine instance you created. The documentation uses:
db = MongoEngine(app)
or
db = MongoEngine()
# in an app factory, attach to an app with db.init_app(app)
and so you can use:
db.connection
Next, there is also a current_mongoengine_instance() utility function that essentially gives you the same object as what your code already achieved. Use it like this:
from flask_mongoengine import current_mongoengine_instance
current_mongoengine_instance().connection
As a side note: the way this extension uses app.extensions is ... over-engineered and redundant. The rationale in the source code is:
# Store objects in application instance so that multiple apps do not
# end up accessing the same objects.
but multiple apps already have separate app.extensions dictionaries. While this method does let you use multiple mongoengine connections, you still then can't use this data structure to distinguish between different connections with only the current app context. The implementation for current_mongoengine_instance() only further illustrates that the extension doesn't have a proper strategy for handling multiple connections. You just get the 'first' one, whatever that may mean in the context of unordered dictionaries. The Flask SQLAlchemy extension instead uses a single extension instance to manage multiple connections via a system called binds.

You can also reach the underlying pymongo.Collection from any of your Document class by
class MyDocument(Document):
name = StringField()
MyDocument(name='John').save()
coll = MyDocument._get_collection()
print(coll.find_one({'name': 'John'}))

Related

Reinitialize flask extensions

I am using flask. I have a few routes defined that are expensive as they need to access a database and do lengthy computations. The database connectivity is relying on the Flask-Mongoengine extension which relies on PyMongo which is not threadsafe.
Hence my thoughts are as follows:
#blueprint.route("/refresh/data", methods=['GET'])
def refresh_data():
cache.clear()
with Pool(4) as p:
print(p.map(func=f, iterable=["recently", "mtd", "ytd", "sector"]))
Get a small pool and call the function f. The function f is based on
def f(name):
print(current_app.extensions)
print(current_app.config)
current_app.extensions["mongoengine"] = MongoEngine(app=current_app)
print(current_app.extensions)
get(address="reports/{path}/json".format(path=name))
get(address="reports/{path}/html".format(path=name))
return name
The problem here is that one cannot init_app the MongoEngine again. In fact, extensions can be initialize only once but what happens it the extensions is needed on multiple threads and is not threadsafe?

Store objects on the Flask object

I need to store an object on my flask.Flask instance. The naive approach would be just assigning an attribute like this.
app = flask.Flask(__name__)
app.my_object = MyObject()
I'm planning on referencing it later in an application context like this:
flask.current_app.my_object
I doubt this method is thread save though. Is there a correct method to do this that is encouraged by Flask? If not, how would you safely implement the approach above?
I ended up with using the config object.
app.config['MY_OBJECT'] = MyObject()
You can reference it like this in a request.
current_app.config['MY_OBJECT']

Lifetime and scope of class and instance variables in django

While there are quite a few questions and answers on here already about the life of different variables within python I am looking for how they translate into the django environment in terms of application scope and endpoint scopes. Here is a simple version of what I am making and I want to ensure that it will behave the way I am expecting it to
my_cache/models/GlobalCache.py:
# This class should be global to the entire application and only
# load when the server is started.
class GlobalCacheobject):
_cache = {}
#classmethod
def fetch(cls):
return cls._cache
#classmethod
def flush(cls):
cls._cache = {}
#classmethod
def load_cache(cls, files_to_load_data_from):
for file in files_to_load_from:
cls._cache[file] = <load file and process its data into an entry>
my_cache/models/InstanceCache.py:
from .GlobalCache import GlobalCache
# This class will contain a reference to the global cache and use it to look
# up entries.
class InstanceCache(object):
def __init__(self, name=None):
self._name = name
self._cache = GlobalCache.fetch()
def fetch_file_data(self, file_name):
cache_entry = self._cache.get(file_name, None)
if cache_entry is None:
raise EntryNotFoundException()
return ReadOnlyInterfaceObject(cache_entry)
The intent is to have GlobalCache have a cls._cache value that will persist as long as the server is running. Calling GlobalCache.flush() will drop its global reference to the data it was tracking and calling GlobalCache.load(files_to_load_from) will populate a new instance of its data from.
The InstanceCache object is then intended to hold a reference to the current version of the data and return read-only objects for the different data sets identified by their original file name.
From my testing this seems to work, though I do not really have the InstanceCache object per se. I can load the global cache, retrieve read only objects to it and then flush the global, load it with new data. The original read only objects still return the values they were originally loaded with, new requests will use the new data values.
What I want to confirm is that GlobalCache will exist as long as the server is running and only alter its data with direct calls to flush() and load_cache(). And that when I hit an endpoint and create an InstanceCache it will keep a reference to the original data only as long as it exists. When the execution on the end point is done I would expect it to go out of scope removing the reference to the global cache and if that was the last one, it goes away and only the new/current data is kept. If it matters I am running Python 2.7.6 and django 1.5.12. Solutions that require an upgrade may be useful as well but it is not an immediate option for me.
The answer here is a maybe, and it also depends a lot on which app server you are using to run django (if you are running multi-process).
So, generally speaking, yes, the GlobalCache will retain its cached contents for the lifetime of the process it is in after it has been initialized.
But InstanceCache, on the other hand, is only guaranteed to be garbage collected at some time after there are no more references to it. Garbage collection is a deep field and there are often teams of people that work on the algorithms so going into exact scenarios is probably outside the scope of an answer on SO. A popular implementation of python is pypy, and you can read more about the garbage collection used in pypy here.
That said, please remember that most app servers are multi-process. Both uwsgi and gunicorn spin up child processes to serve requests. So even though GlobalCache is a singleton in its process, there may be several processes, each with its own GlobalCache. And, this GlobalCache will ultimately be garbage collected/cleaned up when the process exits. Both uwsgi and gunicorn will usually kill child processes after the child services some number of HTTP requests.

Should a database connection be opened only once in a django app or once for every user within views.py?

I'm working on my first Django project.
I need to connect to a pre-existing key value store (in this case it is Kyoto Tycoon) for a one off task. i.e. I am not talking about the main database used by django.
Currently, I have something that works, but I don't know if what I'm doing is sensible/optimal.
views.py
from django.http import HttpResponse
from pykt import KyotoTycoon
def get_from_kv(user_input):
kt=KyotoTycoon()
kt.open('127.0.0.1',1978)
# some code to define the required key
# my_key = ...
my_value = kt.get(my_key)
kt.close()
return HttpResponse(my_value)
i.e. it opens a new connection to the database every time a user makes a query, then closes the connection again after it has finished.
Or, would something like this be better?
views.py
from django.http import HttpResponse
from pykt import KyotoTycoon
kt=KyotoTycoon()
kt.open('127.0.0.1',1978)
def get_from_kv(user_input):
# some code to define the required key
# my_key = ...
my_value = kt.get(my_key)
return HttpResponse(my_value)
In the second approach, will Django only open the connection once when the app is first started? i.e. will all users share the same connection?
Which approach is best?
Opening a connection when it is required is likely to be the better solution. Otherwise, there is the potential that the connection is no longer open. Thus, you would need to test that the connection is still open, and restart it if it isn't before continuing anyway.
This means you could run the queries within a context manager block, which would auto-close the connection for you, even if an unhanded exception occurs.
Alternatively, you could have a pool of connections, and just grab one that is not currently in use (I don't know if this would be an issue in this case).
It all depends just how expensive creating connections is, and if it makes sense to be able to re-use them.

How should I do post persist/update actions in doctrine 2.1, that involves re-saving to the db?

Using doctrine 2.1 (and zend framework 1.11, not that it matters for this matter), how can I do post persist and post update actions, that involves re-saving to the db?
For example, creating a unique token based on the just generated primary key' id, or generating a thumbnail for an uploaded image (which actually doesn't require re-saving to the db, but still) ?
EDIT - let's explain, shall we ?
The above is actually a question regarding two scenarios. Both scenarios relate to the following state:
Let's say I have a User entity. When the object is flushed after it has been marked to be persisted, it'll have the normal auto-generated id of mysql - meaning running numbers normally beginning at 1, 2, 3, etc..
Each user can upload an image - which he will be able to use in the application - which will have a record in the db as well. So I have another entity called Image. Each Image entity also has an auto-generated id - same methodology as the user id.
Now - here is the scenarios:
When a user uploads an image, I want to generate a thumbnail for that image right after it is saved to the db. This should happen for every new or updated image.
Since we're trying to stay smart, I don't want the code to generate the thumbnail to be written like this:
$image = new Image();
...
$entityManager->persist($image);
$entityManager->flush();
callToFunctionThatGeneratesThumbnailOnImage($image);
but rather I want it to occur automatically on the persisting of the object (well, flush of the persisted object), like the prePersist or preUpdate methods.
Since the user uploaded an image, he get's a link to it. It will probably look something like: http://www.mysite.com/showImage?id=[IMAGEID].
This allows anyone to just change the imageid in this link, and see other user's images.
So in order to prevent such a thing, I want to generate a unique token for every image. Since it doesn't really need to be sophisticated, I thought about using the md5 value of the image id, with some salt.
But for that, I need to have the id of that image - which I'll only have after flushing the persisted object - then generate the md5, and then saving it again to the db.
Understand that the links for the images are supposed to be publicly accessible so I can't just allow an authenticated user to view them by some kind of permission rules.
You probably know already about Doctrine events. What you could do:
Use the postPersist event handler. That one occurs after the DB insert, so the auto generated ids are available.
The EventManager class can help you with this:
class MyEventListener
{
public function postPersist(LifecycleEventArgs $eventArgs)
{
// in a listener you have the entity instance and the
// EntityManager available via the event arguments
$entity = $eventArgs->getEntity();
$em = $eventArgs->getEntityManager();
if ($entity instanceof User) {
// do some stuff
}
}
}
$eventManager = $em->getEventManager():
$eventManager->addEventListener(Events::postPersist, new MyEventListener());
Be sure to check e. g. if the User already has an Image, otherwise if you call flush in the event listener, you might be caught in an endless loop.
Of course you could also make your User class aware of that image creation operation with an inline postPersist eventHandler and add #HasLifecycleCallbacks in your mapping and then always flush at the end of the request e. g. in a shutdown function, but in my opinion this kind of stuff belongs in a separate listener. YMMV.
If you need the entity id before flushing, just after creating the object, another approach is to generate the ids for the entities within your application, e. g. using uuids.
Now you can do something like:
class Entity {
public function __construct()
{
$this->id = uuid_create();
}
}
Now you have an id already set when you just do:
$e = new Entity();
And you only need to call EntityManager::flush at the end of the request
In the end, I listened to #Arms who commented on the question.
I started using a service layer for doing such things.
So now, I have a method in the service layer which creates the Image entity. After it calls the persist and flush, it calls the method that generates the thumbnail.
The Service Layer pattern is a good solution for such things.