How to persist a query between flask page requests? - flask

I need to run the same exact query on multiple page requests.
The first page renders the items, the second page exports the items to excel.
Storing the query directly on session fails because BaseQuery is not JSON serializable:
session['previous_query'] = SomeModel.query
The next option is to store the query as a string:
session['previous_query'] = str(SomeModel.query)
This works but I would now need to run session.execute:
db.session.execute(session['previous_query'])
And that does not give me ORM objects but plain dicts without relationships.
Finally I can store only the ids, but that would require me to run the query on both ends multiple times and would not preserve the ordering I need.
Any suggestions?

You can serialize the query with the SQLAlchemy Serializer extention:
from sqlalchemy.ext import serializer
session['previous_query'] = serializer.dumps(SomeModel.query, -1)
then reconstitute the query with:
query = serializer.loads(session['previous_query'], db.metadata, db.session)
objects = query.all()
where db is your Flask-SQLAlchemy integration object.
Under the hood this uses the pickle module but pickling has been customized to be more compact and to omit the session and engine references; these are loaded again when loading the serialized data with serializer.loads().
For this to work on Python 2 you do need to set the protocol version (second argument to serializer.dumps()) as the seralization won't work with the default protocol version 0. Pick version 1 or 2 instead, or use -1 to pick the highest version supported by your Python installation.
Because this uses pickle, do be careful with loading the pickle from untrusted sources; a Flask session is tamper proof because it is cryptographically signed, but if an attacker ever was able to obtain your server-side secret then the attacker can take over your process by sending you a carefully crafted pickle for serializer.loads() to load.

Related

How to handle network errors when saving to Django Models

I have a Django .save() execution that loops at n times.
My concern is how to guard against network errors during saving, as some entries could be saved while others won't and there could be no telling.
What is the best way to make sure that the execution is completed?
Here's a sample of my code
# SAVE DEBIT ENTRIES
for i in range(len(debit_journals)):
# UPDATE JOURNAL RECORD
debit_journals[i].approval_no = journal_transaction_id
debit_journals[i].approval_status = 'approved'
debit_journals[i].save()
Either use bulk_create / bulk_update to execute a single DB query, or use transaction.atomic as decorator for your function so that any error on save will rollback your database before your function was run.
Try something like below (I suppose your model name is DebitJournal and debit_journals is a list).
for debit_journal in debit_journals:
debit_journal.approval_no = journal_transaction_id
debit_journal.approval_status = 'approved'
DebitJournal.objects.bulk_update(debit_journals, ["approval_no", "approval_status"])
If debit_journals is a QuerySet you can also try
debit_journals.update(approval_no=journal_transaction_id, approval_status='approved').
It depends of what you call a network error, if it's between the user and the django application or between the django application and the database. If it's only between the user and the app, note that if the request has been sent correctly even if the user lose the connection afterward the objects will be created. So a user might not have the request response, but objects will still be created.
If it's between the database and the django application some objects might still be created before the error.
Usually if you want a "All or Nothing" behaviour you should use manual transaction as described there: https://docs.djangoproject.com/en/4.1/topics/db/transactions/
Note that if the creation is really long you might hit the request timeout. If the creation takes more than a few seconds you should consider making it a background task. The request is only there to create the task.
See Python task queue alternatives and frameworks for 3rd party solutions.

Does Django have a method of storage like HTML5's localStorage or sessionStorage?

Does Django have a method of storage like HTML5's localStorage or sessionStorage?
I want to use the Django/Django-Rest-Framework as the backend of my project.
but whether the Django has a convenient storage method to server my project? if in the HTML5 there are localStorage and sessionStorage, which is very useful.
EDIT
I want to use a simple method to store my temporary data, such as, if there is a requirement to share the data.
such as I have 3 providers (a_provider, b_provider, c_provider), they can process a origin_data.
in a function,
def process_data():
a_provider(get_data()) # process a
b_provider(get_data()) # process b
c_provider(get_data()) # process c
the get_data() can get the shared data.
rather than every process to return the processed data as param to pass into other provider.
There are some 'Offline Solutions' you can check out here
However, If you are trying to completely run in local-storage Django probably isn't your choice. Some new development on this particular topic is being explored by the awesome team at BeeWare.
Hope this helps.

Multi-tenant Django applications using Mongoengine

I want to build a multi tenant architecture for a SAAS system. We are using Django as our backend and mongoengine as our main database and gunicorn as our web-server.
Our clients are a few big companies, so the number of databases pre-allocating space shouldn't be a problem.
The first approach we took was to write a middleware to determine the source of the request to properly connect to a mongoengine database. Here is the code:
class MongoConnectionMiddleware(object):
def process_request(self, request):
if request.user.is_authenticated():
mongo_connect(request.user.profile.establishment)
And the mongo_connect method:
def mongo_connect(establishment):
db_name = 'db_client_%d' % establishment.id
connect(db_name)
This will register the "default" alias as the db_name for every mongoengine request.
But it seems that when many concurrent users from different companies are making requests, each one sets the default db_name to it's own name.
As an example:
Company A makes a request and connects to database A. While A is making it's work company B connects to database B. This makes A also connect to B's database in the process, so A fails to find some ids.
¿Is there a way to isolate the connection to the mongo database per request to avoid this problem?
Unfortunately MongoEngine seems to be designed around a very basic use case of a single primary connection and multiple auxiliary connections.
http://docs.mongoengine.org/en/latest/guide/connecting.html#connecting-to-mongodb
To get around the default connection logic, I define the first connection I come across as the default, I also add it as a named connection. I then add any subsequent connection as named connections only.
https://github.com/MongoEngine/mongoengine/issues/607#issuecomment-38651532
You can use the with_db decorator to switch from one connection to another, but it's a contextmanager call, which means as soon as you leave the with statement, it will revert. It also still requires a default connection.
http://docs.mongoengine.org/en/latest/guide/connecting.html#switch-database-context-manager
You might be able to put it inside a function and then yield inside the with to prevent it reverting immediately, I'm not sure if this is valid.
You could use a wrapper of some kind, either a function, class or a custom QuerySet, that checks the current django/flask session and switches the db to the appropriate connection.
I'm not sure if a QuerySet can do this, but it would probably be the nicest way if it can.
http://docs.mongoengine.org/en/latest/guide/querying.html#custom-querysets
I included some code in this issue here where I change the database connection for my models.
https://github.com/MongoEngine/mongoengine/issues/605
def switch(model, db):
model._meta['db_alias'] = db
# must set _collection to none so it is re-evaluated
model._collection = None
return model
MyDocument = switch(MyDocument, 'db-alias')
You'll also want to take a look at the code that mongoengine uses to switch dbs.
Beware that mongo engine likes to cache things, so changing a few variables here and there doesn't always cause an effect. It's full of surprises like this.
Edit:
I should also add, that the 'connect' call won't pick up value changes. So calling connect with new parameters wont take effect unless its a new alias. Even the disconnect function (which isn't exposed publically) doesn't let you do this as the models will cache the connection. I mention this in some of the issues linked above and also here: https://github.com/MongoEngine/mongoengine/issues/566

Updating a hit counter when an image is accessed in Django

I am working on doing some simple analytics on a Django webstite (v1.4.1). Seeing as this data will be gathered on pretty much every server request, I figured the right way to do this would be with a piece of custom middleware.
One important metric for the site is how often given images are accessed. Since each image is its own object, I thought about using django-hitcount, but figured that was unnecessary for what I was trying to do. If it proves easier, I may use it though.
The current conundrum I face is that I don't want to query the database and look for a given object for every HttpRequest that occurs. Instead, I would like to wait until a successful response (indicated by an HttpResponse.status of 200 or whatever), and then query the server and update a hit field for the corresponding image. The reason the only way to access the path of the image is in process_request, while the only way to access the status code is in process_response.
So, what do I do? Is it as simple as creating a class variable that can hold the path and then lookup the file once the response code of 200 is returned, or should I just use django-hitcount?
Thanks for your help
Set up a cron task to parse your Apache/Nginx/whatever access logs on a regular basis, perhaps with something like pylogsparser.
You could use memcache to store the counters and then periodically persist them to the database. There are risks that memcache will evict the value before it's been persisted but this could be acceptable to you.
This article provides more information and highlights a risk arising when using hosted memcache with keys distributed over multiple servers. http://bjk5.com/post/36567537399/dangers-of-using-memcache-counters-for-a-b-tests

Looking for a simple and minimalistic way to store small data packets in the cloud

I'm looking for a very simple and free cloud store for small packets of data.
Basically, I want to write a Greasemonkey script that a user can run on multiple machines with a shared data set. The data is primarily just a single number, eight byte per user should be enough.
It all boils down to the following requirements:
simple to develop for (it's a fun project for a few hours, I don't want to invest twice as much in the sync)
store eight bytes per user (or maybe a bit more, but it's really tiny)
ideally, users don't have to sign up (they just get a random key they can enter on all their machines)
I don't need to sign up (it's all Greasemonkey, so there's no way to hide a secret, like a developer key)
there is no private data in the values, so another user getting access to that information by guessing the random key is no big deal
the information is easily recreated (sharing it in the cloud is just for convenience), so another user taking over the 'key' is easily fixed as well
First ideas:
Store on Google Docs with a form as the frontend. Of course, that's kinda ugly and every user needs to set it up again.
I could set up a Google App Engine instance that allows storing a number to a key and retrieving the number by key. It wouldn't be hard, but it still sounds overkill for what I need.
I could create a Firefox add-on instead of a Greasemonkey script and use Mozilla Weave/Sync—which unfortunately doesn't support storing HTML5 local storage yet, so GM isn't enough. Of course I'd have to implement the same for Opera and Chrome then (assuming there are similar services for them), instead of just reusing the user script.
Anybody got a clever idea or a service I'm not aware of?
Update for those who are curious: I ended up going the GAE route (about half a page of Python code). I only discovered OpenKeyval afterwards (see below). The advantage is that it's pretty easy for users to connect on all their machines (just a Google account login, no other key to transfer from machine A to machine B), the disadvantage is that everybody needs a Google account.
OpenKeyval is pretty much what I was looking for.
OpenKeyval was what I was looking for but has apparently been shut down.
I think GAE will be nice choice. With your requirements for storage size you will never pass free 500 mb of GAE's store. And it will be easy to port your script across browsers because of REST nature of your service;)
I was asked to share my GAE key/value store solution, so here it comes. Note that this code hasn't run for years, so it might be wrong and/or very outdated GAE code:
app.yaml
application: myapp
version: 1
runtime: python
api_version: 1
handlers:
- url: /
script: keyvaluestore.py
keyvaluestore.py
from google.appengine.api import users
from google.appengine.ext import db
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
class KeyValue(db.Model):
v = db.StringProperty(required=True)
class KeyValueStore(webapp.RequestHandler):
def _do_auth(self):
user = users.get_current_user()
if user:
return user
else:
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('login_needed|'+users.create_login_url(self.request.get('uri')))
def get(self):
user = self._do_auth()
callback = self.request.get('jsonp_callback')
if user:
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write(self._read_value(user.user_id()))
def post(self):
user = self._do_auth()
if user:
self._store_value(user.user_id(), self.request.body)
def _read_value(self, key):
result = db.get(db.Key.from_path("KeyValue", key))
return result.v if result else 'none'
def _store_value(self, k, v):
kv = KeyValue(key_name = k, v = v)
kv.put()
application = webapp.WSGIApplication([('/', KeyValueStore)],
debug=True)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
The closest thing I've seen is Amazon's Simple Queue Service.
http://aws.amazon.com/sqs/
I've not used it myself so I'm not sure how the developer key aspect works, but they give you 100,000 free queries a month.