Multi database - each database writes to self and read from self - django

tried to google as much as I could but couldn't find what I was looking for..
So idea is: There is one Master Database from which one you read users authentication process. And there is other many databases which ones keep information(her users, files and other) just to itself(it writes to itself and reads) but Master can reach them all. And if master makes changes to lets say to database structure - all fields should change but information on those database should stay (but master can change any of those database information).
It's like Multi-master but I do not want that other masters could reach other databases, but only write to itself.
Any tips?

Question is not clear, but if you want use multiple databases with django. Look at THIShttps://docs.djangoproject.com/en/2.1/topics/db/multi-db/

Related

django multitenant architecture options: what influence on database performance?

I m designing a website where security of data is an issue.
I ve read this book : https://books.agiliq.com/projects/django-multi-tenant/en/latest/index.html
I'm still thinking about the right database structure for users.
And im hesitating between shared database with isolated schemas, isolated databases in a shared app, or completely isolated tenants using docker.
As security of data is an issue, i would like to avoid to put all the users in the same table in the database or in different schemas in the same database. However i dont understand well if i should put each user in a separate database (create a database per user, sqlite for example but i dont know if it would communicate well with postgres). What is the best practice for this in terms of security?
Im wondering how these options affect database speed compared to a shared database with a shared schema, which was the basic configuration of the course i attended on django.
I dont have good knowledge on databases so your help on the performance issue would be very appreciated!
Also, if I want to do some stats and use tenants data, how difficult is it to query completely isolated tenants using docker or isolated databases, in particular if each user is a separate docker or database?

Use of redis cluster vs standalone redis

I have a question about when it makes sense to use a Redis cluster versus standalone Redis.
Suppose one has a real-time gaming application that will allow multiple instances of the game and wish to implement
real time leaderboard for each instance. (Games are created by communities of users).
Suppose at any time we have say 100 simultaneous matches running.
Based on the use cases outlined here :
https://d0.awsstatic.com/whitepapers/performance-at-scale-with-amazon-elasticache.pdf
https://redislabs.com/solutions/use-cases/leaderboards/
https://aws.amazon.com/blogs/database/building-a-real-time-gaming-leaderboard-with-amazon-elasticache-for-redis/
We can implement each leaderboard using a Sorted Set dataset in memory.
Now I would like to implement some sort of persistence where leaderboard state is saved at the end of each
game as a snapshot. Thus each of these independent Sorted Sets are saved as a snapshot file.
I have a question about design choices:
Would a redis cluster make sense for this scenario ? Or would it make more sense to have standalone redis instances and create a new database for each game ?
As far as I know there is only a single database 0 for a single redis cluster.(https://redis.io/topics/cluster-spec)
In that case, how would one be able to snapshot datasets for each leaderboard at different times work ?
https://redis.io/topics/cluster-spec
From what I can see using a Redis cluster only makes sense for large-scale monolithic applications and may not be the best approach for the scenario described above. Is that the case ?
Or if one goes with AWS Elasticache for Redis Cluster mode can I configure snapshotting for individual datasets ?
You are correct, clustering is a way of scaling out to handle really high request loads and store tons of data.
It really doesn't quite sound like you need to bother with a cluster.
I'd quite be very surprised if a standalone Redis setup would be your bottleneck before having several tens of thousands of simultaneous players.
If you are unsure, you can probably mock some simulated load and see what it can handle. My guess is that you are better off focusing on other complexities of your game until you start reaching quite serious usage. Which is a good problem to have. :)
You might however want to consider having one or two replica instances, which is a different thing.
Secondly, regardless of cluster or not, why do you want to use snap-shots (SAVE or BGSAVE) to persist your scoreboard?
If you want to have individual snapshots per game, and its only a few keys per game, why don't you just have your application read and persist those keys when needed to a traditional db? You can for example use MULTI, DUMP and RESTORE to achieve something that is very similar to snapshotting, but on the specific keys you want.
It doesn't sound like multiple databases is warranted for this.
Multiple databases on clustered Redis is only supported in the Enterprise version, so not on ElastiCache. But the above mentioned approach should work just fine.

Databases on multiple servers for write and in a single server for read

I have a database with single table busy_table1 in a server say server_1.
Another database with single table busy_table2 in a server say server_2.
Those databases are just for writing and would like to use aws aurora.
Is it possible to use read servers with two databases in the same server?
My goal is for write I would like to use databases in different servers, but for reading all those databases should be in same server. And I should be able to scale the read server and should be able to use aurora.
Is it possible?
What's you are looking for is Aurora Multi Master. You may want to try out the preview.
https://aws.amazon.com/about-aws/whats-new/2017/11/sign-up-for-the-preview-of-amazon-aurora-multi-master/
You could try out your databases against a single master as well, as long as you choose one of the biggest instance types. For example, an 8XL or 16XL is large enough that you may actually not need to share your writes into two servers. Would highly encourage profiling single master as well.

Use redis for log caching: Is it possible to create an eviction policy that evicts to PostgreSQL?

I have a newly written system (written in C++) where I expect lots of logging to be done, at least in the beginning until the system proves reliable. I'm planning to store the database messages to a PostgreSQL server, but for efficiency, I'd like to cache in Redis first, where I write to Redis, and then if the messages exceed some size, I dump them to the persistent database, where they can be navigated later.
I read about LRU caching of redis and it seems suitable, however it seems that LRU caching is more oriented towards reading data and not writing. In other words, the scenario described there sounds like if I want to read something from a persistent database, and in order to avoid reading (accessing) that persistent database many times, I take that value and write it in redis cache and use it repeatedly. But I'd like to do the opposite. My logging system will write log messages to redis, and then I'd like them to be "evicted" to PostgreSQL in a predefined schema.
Is there a way for me to write a Redis plugin that will make this possible? I don't seem to find any literature or examples on this.
PS: Please feel free to suggest a better mechanism for log-caching.
If you want write efficiency, I suggest leveldb or rocksdb, both are Log-Structured-Merge-Database (LSM) design, which has excellent write performance and also good read performance.
Google Leveldb
Facebook Rocksdb
If you want use redis and postgres, I think you can use redis as a job-queue。
Write your log-message to the queue, and setup some worker to retrieve the log-message from the queue and write to postgres.
In this case, you may consider:
Celery
Or implement a job-queue yourself with redis List: using LPUSH to store log-message and LPOP to retrieve log-message and then write them to postgres.

Should I implement revisioning using database triggers or using django-reversion?

We're looking into implementing audit logs in our application and we're not sure how to do it correctly.
I know that django-reversion works and works well but there's a cost of using it.
The web server will have to make two roundtrips to the database when saving a record even if the save is in the same transaction because at least in postgres the changes are written to the database and comitting the transaction makes the changes visible.
So this will block the web server until the revision is saved to the database if we're not using async I/O which is currently the case. Even if we would use async I/O generating the revision's data takes CPU time which again blocks the web server from handling other requests.
We can use database triggers instead but our DBA claims that offloading this sort of work to the database will use resources that are meant for handling more transactions.
Is using database triggers for this sort of work a bad idea?
We can scale both the web servers using a load balancer and the database using read/write replicas.
Are there any tradeoffs we're missing here?
What would help us decide?
You need to think about the pattern of db usage in your website.
Which may be unique to you, however most web apps read much more often than they write to the db. In fact it's fairly common to see optimisations done, to help scaling a web app, which trade off more complicated 'save' operations to get faster reads. An example would be denormalisation where some data from related records is copied to the parent record on each save so as to avoid repeatedly doing complicated aggregate/join queries.
This is just an example, but unless you know your specific situation is different I'd say don't worry about doing a bit of extra work on save.
One caveat would be to consider excluding some models from the revisioning system. For example if you are using Django db-backed sessions, the session records are saved on every request. You'd want to avoid doing unnecessary work there.
As for doing it via triggers vs Django app... I think the main considerations here are not to do with performance:
Django app solution is more 'obvious' and 'maintainable'... the app will be in your pip requirements file and Django INSTALLED_APPS, it's obvious to other developers that it's there and working and doesn't need someone to remember to run the custom SQL on the db server when you move to a new server
With a db trigger solution you can be certain it will run whenever a record is changed by any means... whereas with Django app, anyone changing records via a psql console will bypass it. Even in the Django ORM, certain bulk operations bypass the model save method/save signals. Sometimes this is desirable however.
Another thing I'd point out is that your production webserver will be multiprocess/multithreaded... so although, yes, a lengthy db write will block the webserver it will only block the current process. Your webserver will have other processes which are able to server other requests concurrently. So it won't block the whole webserver.
So again, unless you have a pattern of usage where you anticipate a high frequency of concurrent writes to the db, I'd say probably don't worry about it.