Suggestions for splitting up a Django model of 1000+ fields - django

I am looking for suggestions on how to deal with a database table that has 1000 or more columns and I am tying to translate it into one or many Django models. The data also needs to be filtered through API calls within the URL. Every field needs to be able to filter the rest of the data.
I have come up with a few solutions and would like input or resources related to them:
Just have a model with 1000+ fields - This seems like it would be a nightmare to maintain and would require a lot of brute force coding but would likely work fine if data was selectively returned.
Use a JSON field to store all less frequently accessed data - The issue here would be difficulty in filtering the data using Django Filters.
Split data into related models connected by One-to-One relationships, as I understand this cuts down on join operations. - This would seem to require more coding than the first option but would be more maintainable.
Does anyone have any information or resources on dealing with database tables of this size?

You should absolutely split the model into multiple linked models.
Because of how Django models data in the database you should generally have a 1:1 relationship between models and tables, and your tables should be normalized to at least the third normal form.

Related

Handle large amounts of time series data in Django while preserving Django's ORM

We are using Django with its ORM in connection with an underlying PostgreSQL database and want to extend the data model and technology stack to store massive amounts of time series data (~5 million entries per day onwards).
The closest questions I found were this and this which propose to combine Django with databases such as TimescaleDB or InfluxDB. But his creates parallel structures to Django's builtin ORM and thus does not seem to be straightforward.
How can we handle large amounts of time series data while preserving or staying really close to Django's ORM?
Any hints on proven technology stacks and implementation patterns are welcome!
Your best option is to keep your relational data in Postgres and your time series data in a separate database, and combining them when needed in your code.
With InfluxDB you can do this join with a Flux script by passing it the SQL that Django's ORM would execute, along with your database connection info. This will return your data in InfluxDB's format though, not Django models.
why not using in parallel to your existing postgres a timescaledb for the time series data, and use this django integration for the latter one: https://pypi.org/project/django-timescaledb/.
Using multiple databases in django is possible, also I not did it by myself so far. Have a look here to do it in a convenient way (reroute certain Models to another db instead of default postgres one)
Using Multiple Databases with django

preferred way of designing model in Django

I have a very simple model in Django with 3 fields. The issue is that I have dozens of billions of data that needs to be stored in the table associated with that model in Postgresql database. I stored around 100 millions of rows, and then my server first got sluggish and then gave me "Nginx 505 bad gateway" when clicking on that table in Django admin or requesting data using API.
I was wondering what is the best way to handle this simple scenario of having massive data in Django. What's the best model. Should I split my model in order to split my table?
Thanks,

Need guidance with creating Django based dashboard

I'm a beginner at Django, and as a practice project I would like to create a webpage with a dashboard to track investments in a particular p2p platform. They do not have a nice dashboard (but provide excel file with all data). As I see it, main steps that I need to do in this project are as follow:
Create login so that users would have account where they upload their excel files.
Make it possible to import excel file to a database
Manipulate/calculate data for it to be later used in dashboard
Create dashboard.
Host webpage.
After some struggle I have implemented point no. 2, and will deal with 1 and 5 later. But number 3 is my biggest issue now.
I'm completely unsure what I need to do, and google did not help. I need to calculate data before I can make dashboard from it. Union two of the tables, and then join them together with a third table, creating some additional needed calculated fields. Do I create a view in the database and somehow fetch this data to Django? Or do I need to create some rules so that new table would be created at the time of the import? I think having table instead of a view would have better performance. Or maybe I'm doing it completely wrong, and should take completely different approach for this kind of task? Also, is SQLite a good database for a task (I'm using it, because it was a default in Django)?
I assume for vizualization part I will need to do it with some JavaScript library, such as D3? Which then would use data from step 3.
For part 3 there is 2 way, either do these stuff and save the result in your database or you can do it when you need it using django model features like annotation, aggregation and etc.
Option 1 requires to add a table for you calculation which is Models in django.
Option 2 requires to create a doing the annotations in a view or model managers and then using them in views.
Django docs: Aggregation
Which is the best is depended on how big your data is, how complicated the calculation is and how often you need them.
And for database; SQLite is just a database for development use not the production and surly not with a lot of data and a lot of calculations. The recommended database for django is postgresql which is pretty good at handling millions and even billions of data and doing heavy calculation.
And for vizualization you should handle it on the template side which is basically HTML, CSS and JS.

Django, each user having their own table of a model

A little background. I've been developing the core code of an application in python, and now I want to implement it as a website for the user, so I've been learning Django and have come across a problem and not sure where to go with it. I also have little experience dealing with databases
Each user would be able to populate their own list, each with the same attributes. What seems to be the solution is to create a single model defining the attributes etc..., and then the user save records to this, and at the same time very frequently changing the values of the attributes of the records they have added (maybe every 5~10 seconds or so), using filters to filter down to their user ID. Each user would add on average 4000 records to this model, so say just for 1000 users, this table would have 4 million rows, 10,000 users we get 40million rows. To me this seems it would impact the speed of content delivery a lot?
To me a faster solution would be to define the model, and then for each user to have their own instance of this table of 4000ish records. From what I'm learning this would use more memory and disk-space, but I'd rather get a faster user experience as my primary end point.
Is it just my thinking because I don't have experience with databases? Or are my concerns warranted and I should find a solution as to how to be able to do the latter?
This post asked the same question I believe, but no solution on how to achieve it. How to create one Model (table) for each user on django?

Warehousing records from a flat item table: Django Signals or PostgreSQL Triggers?

I have a Django website with a PostgreSQL database. There is a Django app and model for a 'flat' item table with many records being inserted regularly, up to millions of inserts per month. I would like to use these records to automatically populate a star schema of fact and dimension tables (initially also modeled in the Django models.py), in order to efficiently do complex queries on the records, and present data from them on the Django site.
Two main options keep coming up:
1) PostgreSQL Triggers: Configure the database directly to insert the appropriate rows into fact and dimensional tables, based on creation or update of a record, possibly using Python/PL-pgsql and row-level after triggers. Pros: Works with inputs outside Django; might be expected to be more efficient. Cons: Splits business logic to another location; triggering inserts may not be expected by other input sources.
2) Django Signals: Use the Signals feature to do the inserts upon creation or update of a record, with the built-in signal django.db.models.signals.post_save. Pros: easier to build and maintain. Cons: Have to repeat some code or stay inside the Django site/app environment to support new input sources.
Am I correct in thinking that Django's built-in signals are the way to go for maintaining the fact table and the dimension tables? Or is there some other, significant option that is being missed?
I ended up using Django Signals. With a flat table "item_record" containing fields "item" and "description", the code in models.py looks like this:
from django.db.models.signals import post_save
def create_item_record_history(instance, created, **kwargs):
if created:
ItemRecordHistory.objects.create(
title=instance.title,
description=instance.description,
created_at=instance.created_at,
)
post_save.connect(create_item_record_history, sender=ItemRecord)
It is running well for my purposes. Although it's just creating an annotated flat table (new field "created_at"), the same method could be used to build out a star schema.