using database routers to shard a table - django

I am trying to use django's database routers to shard my database, but I am not able to find a solution for that.
I'd like to define two databases, create the same table in both and then save the even rows in one db, the odd ones in the other one. The examples in the documentation show how to write to a master db and to read from the readonly slaves, which is not what I want, because I don't want to store the whole dataset in both dbs.
Do know any webpage explaining what I am trying to do?
Thank you
PS: I am using Postgresql and I know there are tools to achieve the same goal at DB level. My goal is to study if it can also be done in django and to explore if there are some advantages by doing this.

Related

Archive data after every year

I have lots of models in my project like Advertisements, UserDetails etc. Currently I have to delete the entire database every year so as to not create any conflicts between this year data and previous year data.
I want to implement a feature that can allow me to switch between different years. What can be the best way to implement this?
I think you could switch schemas in PostgreSQL. It's not completely straightforward. There are several ways to do that you can look into. The way I did it was to use a default search path for the Django database user account (e.g. user2018, user2019, etc) that only included the schema I wanted to use. I can't check the exact settings right now because my office network is down. You can also do it in settings.py or in each individual model using db_table according to what I've read, although both those solutions seem more convoluted that using the search path.
You would have to shutdown, change the database username in settings.py (or change the search path in PostgreSQL, change the schema over to a new one, and then run migrate to create the tables again. If you have reference data in any of the tables then schema-to-schema copies are easy to do.
Try searching for change django database schema postgresql to see what options there are for specifying the schema.

Need guidance with creating Django based dashboard

I'm a beginner at Django, and as a practice project I would like to create a webpage with a dashboard to track investments in a particular p2p platform. They do not have a nice dashboard (but provide excel file with all data). As I see it, main steps that I need to do in this project are as follow:
Create login so that users would have account where they upload their excel files.
Make it possible to import excel file to a database
Manipulate/calculate data for it to be later used in dashboard
Create dashboard.
Host webpage.
After some struggle I have implemented point no. 2, and will deal with 1 and 5 later. But number 3 is my biggest issue now.
I'm completely unsure what I need to do, and google did not help. I need to calculate data before I can make dashboard from it. Union two of the tables, and then join them together with a third table, creating some additional needed calculated fields. Do I create a view in the database and somehow fetch this data to Django? Or do I need to create some rules so that new table would be created at the time of the import? I think having table instead of a view would have better performance. Or maybe I'm doing it completely wrong, and should take completely different approach for this kind of task? Also, is SQLite a good database for a task (I'm using it, because it was a default in Django)?
I assume for vizualization part I will need to do it with some JavaScript library, such as D3? Which then would use data from step 3.
For part 3 there is 2 way, either do these stuff and save the result in your database or you can do it when you need it using django model features like annotation, aggregation and etc.
Option 1 requires to add a table for you calculation which is Models in django.
Option 2 requires to create a doing the annotations in a view or model managers and then using them in views.
Django docs: Aggregation
Which is the best is depended on how big your data is, how complicated the calculation is and how often you need them.
And for database; SQLite is just a database for development use not the production and surly not with a lot of data and a lot of calculations. The recommended database for django is postgresql which is pretty good at handling millions and even billions of data and doing heavy calculation.
And for vizualization you should handle it on the template side which is basically HTML, CSS and JS.

One table per month in django

I'm looking for a way to divide tables with logs of user actions in Django. The site features e-books, with reader and I need to track views in catalog, what users read and what they buy. I'm afraid that these tables might be too big to be useful.
For example Piwik analytics solves this with database tables where each one table corresponds to one month. I would like to do the same with Django. But as I can see there is no way to do it. Is there some similar way to Django ORM to accomplish this?

A simple file for saving a class of vectors or a SQL database

I have a database that is made of sorted data from the user activities. If I wanted to keep a record of each users that which record belong to which user (like a class of vectors of numbers for each users), what is the best database type that I can use here? The speed is important and the database is very large (9 Gig ~ 700 million record).The number of users is around 2 million, so I don't think that a relational connection in SQL would be a good suggestion. (Coding are in C++).
I am going to provide an answer now based on our conversation in the comments as I have too much to write in a comment.
First of all, I would use a full RDBMS for this rather than SQLite. The Lite part of the name should serve as an indicator that it isn't trying to be a full strength database. I am just saying this because if SQLite does not perform well enough on your large database, I don't want you to blame it on RDBMS technology, but on the weak database that you are using. Choose PostgreSQL or MySQL as they have better optimizers (you don't have to code it).
Second your database should provide the features to join the tables together. It would look something like:
Select *
From users
Join activity on users.id = activity.user_id
Where users.id = ###
That combined with the appropriate indexes should give you what you need.
As far as indexes, your primary keys should produce the appropriate indexes for this join. You can also create a foreign key definition so that the database knows the relationship between the tables, and can enforce it. Some databases do not support foreign key constraints, but that is not critical.
A relational SQL database can handle this just well.
Use PostGreSQL
You can use ODBC from C, that way you can change the database should the need arise.
If your data is not really relational, you can also use redis.
http://code.google.com/p/credis/
Since its a sorted set of data, you can event go for a NoSQL or Bigtable database. HBase, Hadoop, etc are provided OpenSouce resources for you.

Coldfusion: Move data from one datasource to another

I need to move a series of tables from one datasource to another. Our hosting company doesn't give shared passwords amongst the databases so I can't write a SQL script to handle it.
The best option is just writing a little coldfusion scripty that takes care of it.
Ordinarily I would do something like:
SELECT * INTO database.table FROM database.table
The only problem with this is that cfquery's don't allow you to use two datasources in the same query.
I don't think I could use a QoQ's either because you can't tell it to use the second datasource, but to have a dbType of 'Query'.
Can anyone think of any intelligent ways of getting this done? Or is the only option to just loop over each line in the first query adding them individually to the second?
My problem with that is that it will take much longer. We have a lot of tables to move.
Ok, so you don't have a shared password between the databases, but you do seem to have the passwords for each individual database (since you have datasources set up). So, can you create a linked server definition from database 1 to database 2? User credentials can be saved against the linked server, so they don't have to be the same as the source DB. Once that's set up, you can definitely move data between the two DBs.
We use this all the time to sync data from our live database into our test environment. I can provide more specific SQL if this would work for you.
You CAN access two databases, but not two datasources in the same query.
I wrote something a few years ago called "DataSynch" for just this sort of thing.
http://www.bryantwebconsulting.com/blog/index.cfm/2006/9/20/database_synchronization
Everything you need for this to work is included in my free "com.sebtools" package:
http://sebtools.riaforge.org/
I haven't actually used this in a few years, but I can't think of any reason why it wouldn't still work.
Henry - why do any of this? Why not just use SQL manager to move over the selected tables usign the "import data" function? (right click on your dB and choose "import" - then use the native client and permissions for the "other" database to specify the tables. Your SQL manager will need to have access to both DBs, but the db servers themselves do not need access to each other. Your manager studio will serve as a conduit.