I am using celery for my async tasks, however, I am using Redis as backend broker. My current plan is to filter and manipulate data inside the backend and store it into django-db for viewing etc.
Is this the recommended way? Or I should use Django DB as backend results db and store all the raw data then filter and manipulate it into different tables?
Related
We are using Django with its ORM in connection with an underlying PostgreSQL database and want to extend the data model and technology stack to store massive amounts of time series data (~5 million entries per day onwards).
The closest questions I found were this and this which propose to combine Django with databases such as TimescaleDB or InfluxDB. But his creates parallel structures to Django's builtin ORM and thus does not seem to be straightforward.
How can we handle large amounts of time series data while preserving or staying really close to Django's ORM?
Any hints on proven technology stacks and implementation patterns are welcome!
Your best option is to keep your relational data in Postgres and your time series data in a separate database, and combining them when needed in your code.
With InfluxDB you can do this join with a Flux script by passing it the SQL that Django's ORM would execute, along with your database connection info. This will return your data in InfluxDB's format though, not Django models.
why not using in parallel to your existing postgres a timescaledb for the time series data, and use this django integration for the latter one: https://pypi.org/project/django-timescaledb/.
Using multiple databases in django is possible, also I not did it by myself so far. Have a look here to do it in a convenient way (reroute certain Models to another db instead of default postgres one)
Using Multiple Databases with django
I am connecting to a legacy mysql database in cloud using my django project.
i need to fetch data and insert data if required in DB table.
do i need to write model for the tables which are already present in the db?
if so there are 90 plus tables. so what happens if i create model for single table?
how do i talk to database other than creating models and migrating? is there any better way in django? or model is the better way?
when i create model what happens in the backend? does it create those tables again in the same database?
There are several ways to connect to a legacy database; the two I use are either by creating a model for the data you need from the legacy database, or using raw SQL.
For example, if I'm going to be connecting to the legacy database for the foreseeable future, and performing both reads and writes, I'll create a model containing only the fields from the foreign table I need as a secondary database. That method is well documented and a bit more time consuming.
However, if I'm only reading data from a legacy database which will be retired, I'll create a read-only user on the legacy database, and use raw SQL to grab what I need like so:
from django.db import connections
cursor = connections["my_secondary_db"].cursor()
cursor.execute("SELECT * FROM my_table")
for row in cursor.fetchall():
insert_data_into_my_new_system_model(row)
I'm doing this right now with a legacy SQL Server database from our old website as we migrate our user and product data to Django/PostgreSQL. This has served me well over the years and saved a lot of time. I've used this to create sync routines all within a single app as Django management commands, and then when the legacy database is done being migrated to Django, I've completely deleted the single app containing all of the sync routines, for a clean break. Good luck!
For the app I'm building I need to be able to create a new data model in models.py as fast as possible automatically.
I created a way to do this by making a seperate python program that opens models.py, edits it, closes it, and does server migrations automatically but there must be a better way.
edit: my method works on my local server but not on pythonanywhere
In the Django documentation, I found SchemaEditor, which is exactly what you want. Using the SchemaEditor, you can create Models, delete Models, add fields, delete fields etc..
Here's an excerpt:
Django’s migration system is split into two parts; the logic for
calculating and storing what operations should be run
(django.db.migrations), and the database abstraction layer that turns
things like “create a model” or “delete a field” into SQL - which is
the job of the SchemaEditor.
Don't rewrite your models.py file automatically, that is not how it's meant to work. When you need more flexibility in the way you store data, you should do the following:
think hard about what kind of data you want to store and make your data model more abstract to fit more cases, if needed.
Use JSON fields to store arbitrary JSON data with your model (e.g. for the Postgres database)
if it's not a fit, don't use Django's ORM and use a different store (e.g. Redis for key-value or MongoDB for JSON documents)
I will add a procedure to a django app where I need to store data but only for a few hours, also I don't wat to add another table to my db schema (which is kind of big), I'm thinking to use redis for the task, in the end what I want to achieve is to have a Transfer model, and I want this model always be using another database for its CRUD operations.
Example:
Transfer.objects.all() # Always be the same as Transfer.objects.using('redis').all()
OtherModel.objects.all() # Always use default DB
# Same for save
transfer_instance.save() # Always translate to transfer_instance.save(using='redis')
other_instance.save() # Work as usuall using default DB
How can I achieve this? I don't mind using obscure trickery as long as it works.
Thanks!
You will need to use a Database Router to achieve what you need.
Here is the official documentation for Using Database Routers
I have a pandas dataframe with a loose wrapper class around it that provides metadata for my django/DRF application. The application is basically a user friendly (non programmer) way to do some data analysis and validation. Between requests I want to be able to save the state of the dataframe so I can have a series of interactions with the data but it does not need to be saved in a database ( It only needs to survive as long as the browser session ). From this it was logical to check out django's session framework, but from what I've heard session data should be lightweight and the dataframe object does not json serialize.
Because I dont have a ton of users, and I want the app to feel like a desktop site, I was thinking of using the django cache as a way to keep the dataframe object in memory. So putting the data in the cache would go something like this
>>> from django.core.cache import caches
>>> cache1 = caches['default']
>>> cache1.set(request.session._get_session_key, dataframe_object)
and then the same except using get in the following requests to access.
Is this a good way to do handle this workflow or is there another system I should use to keep rather large data(5mb to 100mb) in memory?
If you are running your application on a modern server then 100mb is not a huge amount of memory. However if you have more than a couple dozen simultaneous users, each requiring 100mb of cache then this could add up to more memory than your server can handle. Your cache and server should be configured appropriately and you may want to limit the total number of cached dataframes in your python code.
Since it does appear that Django needs to serialize session data your choice is to either use sessions with PickleSerializer or to use the cache. According to documentation, PickleSerializer is not recommended for security reasons so your choice to use the cache is a good one.
The default cache backend in Django does not share entries across processes so you would get better memory and time efficiency by installing memcached and enabling the memcached.MemcachedCache backend.