Safe way to persist multiple clients data - django

I am currently developing a computer based test web app with Django, and I am trying to figure out the best way to persist the choices users make during the course of the test.
I want to persist the choices because they might leave the page due to a crash or something else and I want them to be able to resume from exactly where they stopped.
To implement this I chose saving to Django sessions with db backend which in turns save to the database and this will resolve to a very bad design because I don't want about 2000 users hitting my db every few seconds.
So my question is are there any other ways I can go about implementing this feature that I don't know of. Thanks

If your application is being run on a browser, to be specific, if you are designing a progressive web application, you can make use of browser storage systems, such as localstorage, indexed db, cookies, etc ..
This way, you wouldn't need to send user's updated state back and forth to your backend and you can do the state update based on a specific condition or each n seconds/minutes.

Related

What exactly is caching and how do I add it to an app I have on heroku?

I have a data science type application where I am getting public information from FPDS and SAM gov't website. The site is currently on Heroku.
I would like cache views so if a person is researching more than one company they can quickly go back to earlier pages without having to fetch the results from the database every time.
Based on my limited knowledge that is what cashing does?
Second, I am looking at flash-caching and it doesn't appear to be that difficult to implement to the route's I would like to cache.
Now the question is on Heroku, you wouldn't use simplecashe would you? Would you use a different cache strategy? From the docs, the CASHE_TYPE can be simple, redis, memcached and several more. On Heroku would I need to store the cache on something like Redis or can I store it in memory? Ideally, to get everything up and running I would like the cache to be in memory.
Late answer to your question. Caching can be a number of techniques on client and server side to achieve a goal of reduced traffic, network transport, or speed.
I'll focus on one aspect from what you are asking: a redis integration with flask to achieve faster response from a flask app environment. Redis is 'blindingly' fast, imo, as an in-memory database. When I have many users asking for the same view (typically a report-style display), I can interrupt the view route to get the response from a named redis database, so that my flask server is not bound up in eternally regenerating the same contents, which in turn saves a good few cycles of the main back-end database. Of course, if the contents of that view/report change, I have to separately take care of that. Most importantly, Redis includes an expiry value for each entry, so one way of handling stale contents is to delete the redis contents ahead of the expiry time.
Let me know if you want sample code to demonstrate this.

Should I implement revisioning using database triggers or using django-reversion?

We're looking into implementing audit logs in our application and we're not sure how to do it correctly.
I know that django-reversion works and works well but there's a cost of using it.
The web server will have to make two roundtrips to the database when saving a record even if the save is in the same transaction because at least in postgres the changes are written to the database and comitting the transaction makes the changes visible.
So this will block the web server until the revision is saved to the database if we're not using async I/O which is currently the case. Even if we would use async I/O generating the revision's data takes CPU time which again blocks the web server from handling other requests.
We can use database triggers instead but our DBA claims that offloading this sort of work to the database will use resources that are meant for handling more transactions.
Is using database triggers for this sort of work a bad idea?
We can scale both the web servers using a load balancer and the database using read/write replicas.
Are there any tradeoffs we're missing here?
What would help us decide?
You need to think about the pattern of db usage in your website.
Which may be unique to you, however most web apps read much more often than they write to the db. In fact it's fairly common to see optimisations done, to help scaling a web app, which trade off more complicated 'save' operations to get faster reads. An example would be denormalisation where some data from related records is copied to the parent record on each save so as to avoid repeatedly doing complicated aggregate/join queries.
This is just an example, but unless you know your specific situation is different I'd say don't worry about doing a bit of extra work on save.
One caveat would be to consider excluding some models from the revisioning system. For example if you are using Django db-backed sessions, the session records are saved on every request. You'd want to avoid doing unnecessary work there.
As for doing it via triggers vs Django app... I think the main considerations here are not to do with performance:
Django app solution is more 'obvious' and 'maintainable'... the app will be in your pip requirements file and Django INSTALLED_APPS, it's obvious to other developers that it's there and working and doesn't need someone to remember to run the custom SQL on the db server when you move to a new server
With a db trigger solution you can be certain it will run whenever a record is changed by any means... whereas with Django app, anyone changing records via a psql console will bypass it. Even in the Django ORM, certain bulk operations bypass the model save method/save signals. Sometimes this is desirable however.
Another thing I'd point out is that your production webserver will be multiprocess/multithreaded... so although, yes, a lengthy db write will block the webserver it will only block the current process. Your webserver will have other processes which are able to server other requests concurrently. So it won't block the whole webserver.
So again, unless you have a pattern of usage where you anticipate a high frequency of concurrent writes to the db, I'd say probably don't worry about it.

Django website serving data to mobile app

I am currently trying to figure out he best practice in order to design my web services between a django administrated database (+ images) and a mobile app. My main concern is how to separate a bulk update (send every data in the database and all the files on the server) and a lighter, smaller update with only the new and / or modified objects (images or data.)
I have had access to a working code-base using a cronjob and states for each data field (new, modified, up to date) to generate either a reference data file or an update file. I find it to be very redundant and somewhat unelegant, in contradiction with the DRY spirit of Django (there are tons of lines of code, making it nearly unmaintainable.))
I find it very surprising that this aspect is almost un-documented, since web traffic is a crucial matter in mobile developpment.. Fetching everytime all the data served quickly becomes unsustainable as the database grows..
I would be very grateful for any lead or advice you could give me :-) Thx in advance !
Just have a last_modified DateTimeField in your table, and in your user's profile a last_synchronized DateTimeField. When the mobile app wants to synchronize, send the data which was modified after the last synchronization run, and update the last_synchronized field in the user's profile.

Designing backend software for multiplayer cross platform app

I am currently in the initial design phase of my first app.
In my app there will be individual sessions containing 1-5 users.
I need to be able to keep track of each users gps location and be able to push and pull them to each of the users. Each user will have the most recently reported location of every other user in the session.
There will be other calculations done on the data set but that will be client side, the server should only need to handle pushing and pulling of user locations (and the usernames).
I'm predicting due to the nature of the app 90% of sessions should not last more than 2 hours, with the possibility of the server ending sessions that are older then 24-48 hours (once real world testing of the app begins I would have a better idea of how long sessions should last).
I was thinking of using django to build an API, and to store all the data in the program itself and not to use a database as this should be faster and I don't think it is necessary to store the data since it has such a short lifetime.
Is this a good starting point? Is there anything I should be thinking about or considering? I'm completely new to designing backend software.
While performance might not even be an issue in the beginning, there are some things you can do once you hit a certain load:
Keep all your session data in one model, even if you're denormalizing (putting redundant information into your database) your database a bit. That way you only have to do one read to the database and no expensive JOINs
Use the Django caching framework (https://docs.djangoproject.com/en/dev/topics/cache/) to cache views, so multiple reads of the same data don't have to hit the database
Before you start optimizing, profile your code to see where your performance bottlenecks really are. Sometimes you'll be surprised which operations are expensive, and which aren't.

Tracing requests of users by logging their actions to DB in django

I want to trace user's actions in my web site by logging their requests to database as plain text in Django.
I consider to write a custom decorator and place it to every view that I want to trace.
However, I have some troubles in my design.
First of all, is such logging mecahinsm reasonable or because of my log table will be enlarging rapidly it causes some preformance problems ?
Secondly, how should be my log table's design ?
I want to keep keywords if the user call search view or keep the item's id if the user call details of item view.
Besides, IP addresses of user's should be kept but how can I seperate users if they connect via single IP address as in many companies.
I am glad to explain in detail if you think my question is unclear.
Thanks
I wouldn't do that. If this is a production service then you've got a proper web server running in front of it, right? Apache, or nginx or something. That can do logging, and can do it well, and can write to a form that won't bloat your database, and there's a wealth of analytical tools for log analysis.
You are going to have to duplicate a lot of that functionality in your decorator, such as when you want to switch it on or off, or change the log level. The only thing you'll get by doing it all in django is the possibility of ultra-fine control, such as only logging views of blog posts with id numbers greater than X or something. But generally you'd not want that level of detail, and you'd log everything and do any stripping at the analysis phase. You've not given any reason currently why you need to do it from Django.
If you really want it in a RDBMS, reading an apache log file into Postgres or MySQL or one of those expensive ones is fairly trivial.
One thing you should keep in mind is that SQL databases don't offer you a very good writing performance (in comparison with reading), so if you are experiencing heavy loads you should probably look for a better in-memory solution (eg. some key-value-store like redis).
But keep in mind, that, especially if you would use a non-sql solution you should be aware what you want to do with the collected data (just display something like a 'log' or do some more in-deep searching/querying on the data).
If you want to identify different users from the same IP address you should probably look for a cookie-based solution (if you are using django's session framework the session's are per default identified through a cookie - so you could just simply use sessions). Another solution could be doing the logging 'asynchronously' via javascript after the page has loaded in the browser (which could give you more possibilities in identifying the user and avoid additional load when generating the page).