Temporarily storing statistics on slave in Django - django

My slave servers collect statistics and performance metrics about visits, but eventually they would have to be sent to the master DB.
I don't want to have a permanent database connection open with the master DB server, so they would have to be temporarily stored locally and shipped over in chunks at specific intervals.
Any suggestions for tools to do this with Django? I've come up with the idea of storing the records in a local SQLite DB and sending them to the main DB server every hour for example. But maybe there are better ways than SQLite out there. Also, still not sure, for pushing the data back into the master DB server at regular intervals, would you use a direct DB connection from within Django, or design a simple API to send it over HTTPS?

I'll end up using redis and putting them in a list, prickling the objects. Main advantage: no SQLite migrations to maintain. It's really a pain though that Redis doesn't support atomic gets for more than 1 item, like LPOP but for more than 1, to retrieve them in batches...

Related

Multi database - each database writes to self and read from self

tried to google as much as I could but couldn't find what I was looking for..
So idea is: There is one Master Database from which one you read users authentication process. And there is other many databases which ones keep information(her users, files and other) just to itself(it writes to itself and reads) but Master can reach them all. And if master makes changes to lets say to database structure - all fields should change but information on those database should stay (but master can change any of those database information).
It's like Multi-master but I do not want that other masters could reach other databases, but only write to itself.
Any tips?
Question is not clear, but if you want use multiple databases with django. Look at THIShttps://docs.djangoproject.com/en/2.1/topics/db/multi-db/

Redis -- how does it improve performance?

i'm relatively new to the world of web-development and have only recently learned memory hierarchies in computer systems. I recently came across Redis and am itching to try it out in a small web-app. But before I do, I was wondering how is Redis going to improve performance? From what i've read so far, it seems that Redis is an "in-memory" data store, so does that mean that whenever a user requests a data from the server, instead of fetching from the database (given that the Redis data store is already populated with the needed data) the request can be fulfilled by accessing the data directly from the server's memory? To be specific, say if i have a web-app which back-end server is hosted on AWS, and the database is stored on MLAB, then whenever a user requests a data, instead of querying to the server which redirects the request to MLAB, it can now directly fetch the data from the server without going to MLAB ? Also, by in-memory, does that mean that the data is stored in the RAM on my AWS server?
Finally, how is this different from a cache?
Thank you so much!!
Well, Redis is used as a cache, the difference with most of the traditional cache is that you have other nice structures like hashes, sets, lists, TTL on keys, hyperlologs and so on, not only pair key:value.
You are right what you define about Redis, is but take into account that if you want to move your data from MLAB database to Redis you have to design some process to keep Redis update in each update that happens in your database. So every query from your application will use Redis to get data but apart from that you will need a process to keep update Redis with changes on your database, so if you use your application to update the database (and there are no other external parts which update your DB), every time you get an update from your web-app you have to update the DB and also Redis or having a command/script which detect every time an updated happened in the DB and update Redis properly.
AWS also provides Redis services, like ElasticCache https://aws.amazon.com/elasticache/?nc1=h_ls so basically the AWS ECS instance where you have your application doesn't use the RAM but this ElasticCache service which can live on another physical machine.
Finally, Redis store on memory the data though, it uses a dump file to save partial data in case of crashes and it also offers a persistence mode

Should I implement revisioning using database triggers or using django-reversion?

We're looking into implementing audit logs in our application and we're not sure how to do it correctly.
I know that django-reversion works and works well but there's a cost of using it.
The web server will have to make two roundtrips to the database when saving a record even if the save is in the same transaction because at least in postgres the changes are written to the database and comitting the transaction makes the changes visible.
So this will block the web server until the revision is saved to the database if we're not using async I/O which is currently the case. Even if we would use async I/O generating the revision's data takes CPU time which again blocks the web server from handling other requests.
We can use database triggers instead but our DBA claims that offloading this sort of work to the database will use resources that are meant for handling more transactions.
Is using database triggers for this sort of work a bad idea?
We can scale both the web servers using a load balancer and the database using read/write replicas.
Are there any tradeoffs we're missing here?
What would help us decide?
You need to think about the pattern of db usage in your website.
Which may be unique to you, however most web apps read much more often than they write to the db. In fact it's fairly common to see optimisations done, to help scaling a web app, which trade off more complicated 'save' operations to get faster reads. An example would be denormalisation where some data from related records is copied to the parent record on each save so as to avoid repeatedly doing complicated aggregate/join queries.
This is just an example, but unless you know your specific situation is different I'd say don't worry about doing a bit of extra work on save.
One caveat would be to consider excluding some models from the revisioning system. For example if you are using Django db-backed sessions, the session records are saved on every request. You'd want to avoid doing unnecessary work there.
As for doing it via triggers vs Django app... I think the main considerations here are not to do with performance:
Django app solution is more 'obvious' and 'maintainable'... the app will be in your pip requirements file and Django INSTALLED_APPS, it's obvious to other developers that it's there and working and doesn't need someone to remember to run the custom SQL on the db server when you move to a new server
With a db trigger solution you can be certain it will run whenever a record is changed by any means... whereas with Django app, anyone changing records via a psql console will bypass it. Even in the Django ORM, certain bulk operations bypass the model save method/save signals. Sometimes this is desirable however.
Another thing I'd point out is that your production webserver will be multiprocess/multithreaded... so although, yes, a lengthy db write will block the webserver it will only block the current process. Your webserver will have other processes which are able to server other requests concurrently. So it won't block the whole webserver.
So again, unless you have a pattern of usage where you anticipate a high frequency of concurrent writes to the db, I'd say probably don't worry about it.

Using Redis as intermediary cache for REST API

We have an iOS app that talks to a django server via REST API. Most of the data consists of rather large Item objects that involve a few related models that render into single flat dictionary, and this data changes rarely.
We've found, that querying this is not a problem for Postgres, but generating JSON responses takes a noticeable amount of time. On the other hand, item collections vary per-user.
I thought about a rendering system, where we just build a dictionary for Item object and save it into redis as JSON string, this way we can serve API directly from redis (e.g. HMGET(id of items in user library), which is fast, and makes it relatively easy to regenerate "rendered instances", basically just a couple of post_save signals.
I wonder how good this design is, are there any major flaws in it? Maybe there's a better way for the task?
Sure, we do the same at our firm, using Redis to store not JSON but large XML strings which are generated from backend databases for RESTful requests, and it saves lots of network hops and overhead.
A few things to keep in mind if this is the first time you're using Redis...
Dedicated Redis Server
Redis is single-threaded and should be deployed on a dedicated server with sufficient CPU power. Don't make the mistake of deploying it on your app or database server.
High Availability
Set up Redis with Master/Slave replication for high availability. I know there's been lots of progress with Redis cluster, so you may want to check on that too for HA.
Cache Hit/Miss
When checking Redis for a cache "hit", if the connection is dead or any exception occurs, don't fail the request, just default to the database; caching should always be 'best effort' since the database can always be used as a last resort.

Sync Framework Considerations for Smart Client app

Microsoft Sync Framework with SQL 2005? Is it possible? It seems to hint that the OOTB providers use SQL2008 functionality.
I'm looking for some quick wins in relation to a sync project.
The client app will be offline for a number of days.
There will be a central server that MUST be SQL Server 2005.
I can use .net 3.5.
Basically the client app could go offline for a week. When it comes back online it needs to sync its data. But the good thing is that the data only needs to push to the server. The stuff that syncs back to the client will just be lookup data which the client never changes. So this means I don't care about sync collisions.
To simplify the scenario for you, this smart client goes offline and the user surveys data about some observations. They enter the data into the system. When the laptop is reconnected to the network, it syncs back all that data to the server. There will be other clients doing the same thing too, but no one ever touches each other's data. Then there are some reports on the server for viewing the data that has been pushed to the server. This also needs to use ClickOnce.
My biggest concern is that there is an interim release while a client is offline. This release might require a new field in the database, and a new field to fill in on the survey.
Obviously that new field will be nullable because we can't update old data, that's fine to set as an assumption. But when the client connects up and its local data schema and the server schema don't match, will sync framework be able to handle this? After the data is pushed to the server it is discarded locally.
Hope my problem makes sense.
I've been using the Microsoft Sync framework for an application that has to deal with collisions, and it's miserable. The *.sdf file is locked for certain schema changes (like dropping of a column, etc.).
Your scenario sounds like the Sync Framwork would work for you, out of the box... just don't enforce any referential integrity, since the Sync Framework will cause insert issues in these instances.
As far as updating schema, if you follow the Sync Framework forums, you are instructed to create a temporary table the way the table should look at the end, copy your data over, and then drop the old table. Once done, then go ahead and rename the new table to what it should be, and re-hookup the stuff in SQL Server CE that allows you to handle the sync classes.
IMHO, I'm going to be working on removing the Sync functionality from my application, because it is more of a hinderance than an aid.