Determining popularity of an item within a given date/time range - django

I'm working on a django website that needs to track popularity of items within a given date/time range. I'll need the ability to have a most viewed today, this week, all time, etc...
There is a "django-popularity" app on github that looks promising but only works with mysql (I'm using postgresql).
My initial thoughts are to create a generic ViewCounter model that logs views for all the objects that are tracked and then run a cron that crunches those numbers into the relevant time-based statistics for each item.
Looking forward to hearing your ideas.

Did you try django-popularity with postgres? The github page just says that the developer has not tested it with anything other than MySQL.

The app has only been tested with MySQL but it should fully work for Postgres with few adjustments. If you do manage to get it to work: please inforrm me. (I'm the developer.) I would love to be able to tell people this product is useable for Postgres as well.
Moreover; all the funcitonality relying on raw SQL checks whether there is actually a MySQL database in use. If not, it should throw an assertion error.
Also, the generic viewcounter is already in my package (it's called ViewTracker, but hell). The cron job seems too much of a hassle to me if we could do either SQL or Django caching as well.

Related

Database content doesn show on production

I am struggling to understand what is missing on my application, sorry it seems a little silly question, I am sure is something quite simple that I am not actually looking to.
I have created an API, using REST-FRAMEWORK on my machine and upload it to production, but the content of my database didn't come through.
If you see in the picture the product list appers as empty
But on my machine it actually has some information
It would be redundant to have the same database to both - local and production environment. By default, that is why they have separate database files/services and have to be filled independently.

Django + PostgreSQL with bi-directional replication

Firstly please let me introduce my use-case: I am working on Django application (GraphQL API using Graphene), which runs in the cloud but also have its local instances in local customer's networks.
For example One application in the cloud and 3 instances (local Django app instance with a PostgreSQL server with enabled BDR) on local networks. If there is a network connection we are using bi-directional replication to have fresh data because if there is no connectivity we use local instances. Here is the simplified infrastructure diagram for an illustration.
So, if I want to use the BDR I can't do DELETE and UPDATE operations in ORM. I have to generate UUIDs for my entities and every change is just a new record with updated data for the same UUID. Latest record for selected UUID is my valid record. Removal is just a another flag. Till now, everything seems to be fine, problem starts when I want to use for example many-to-many relationship. Relationship relies on the database primary keys and I have to handle removal somehow. Can you please find the best way how to solve this issue? I have few ideas but I do not want to made a bad decision:
I can try to override ManyToManyField to work with my UUIDs and special removal flag. It's looks like nice idea because everything should work as before (Graphene will find the relations etc.). But I am afraid of "invisible" consequences.
Create my own models to simulate ManyToMany relationship. It's much more work but it should work just fine.
Did you have to solve similar issue before? Is there some kind of good practice or it's just building a highway to hell (AC/DC is pretty cool)?
Or if you think there is a better way how to build the service architecture, I would love to hear your ideas.
Thanks in advance.

Django Moving lookup table to Redis

I have a django app with redis which is currently used as the broker for Celery, and nothing beyond that.
I would like to utilize it further for lookup caching.
Let's say I had a widely used table in my database that I keep hitting for lookups. For the same of example, let's say it's a mapping of U.S. zip codes to city/state names, or any lookup that may actually change over time that's important to my application.
My questions are:
Once the server starts (in my case, Gunicorn), how do I one-time load the data from the database table to Redis. I mean- where and how do I make this one time call? Is there a place in the django framework for such "onload" calls? or do I simply trigger it lazy-style, upon the first request which will be served from the database, but trigger a Redis load of the entire table?
What about updates? If the database table is updated somehow, (e.g. row deleted, row updated, row added) how do I catch that in order to update the Redis representation of it?
Is there a best-practice or library already geared toward exactly that?
how do I one-time load
For the one time load you can find answer here (from those answers only urls.py worked for me). But I prefer another scenario. I would create manage command and I would add this script to the command you start your Gunicorn. For example if you're using systemd you could add this to service service config. You can also combine those, like add command and call it from urls.py
What about updates
It really depends on your database. For example if you use postgresql, you can create trigger for update/insert/delete and external table as redis. Also django has signal mechanism so you can implement that in django as well. You can also write your custom wrapper. In this wrapper you implement you operations + syncing with redis. And you would call wrapper instead of. But I prefer the first scenario.
Is there a best-practice or library already geared toward exactly
that?
Sorry I can't help you with this one.

Django: "show databases" functionality?

I have a weird legacy database use-case: I have multiple databases, with (1) exactly the same schema, but (2) very different datasets. Databases, entire databases, with this schema, are being added to the total dataset every week.
Is there a way to (1) introspect the server to find out what databases are available, and if so, is there a way to (2) route to the correct database by URL, rather than by the current per-model solution (since my models don't change, only the associated underlying tables)?
Can this introspection be made dynamic, so every time someone hits the home page I can show them the list of available databases?
A generic solution is preferable, of course, but a MySQL-only solution is currently acceptable.
(The use case in the European Molecular Biology Lab's genome library, which is published every few months as a suite of MySQL database dumps, one database per species, with a core schema of about twenty tables which map nicely to six or so apps. The schema is stable and hasn't changed in years.)
Yes, you are able to run any raw SQL, and show databases is not exception. But it will be hard to change list of available databases and to switch between them. I'm afraid this will require modification or monkey patching of django's internals.
Update: Wait! I've looked into the code behind the django.db.connections and found that if you just extend settings.DATABASES in runtime, then you'll be able to use SomeModel.objects.using('some-new-database').all() in the code. Have not tested, but belive this should work!

Why does django community encourage the use of Postgres over Mysql?

Django community seems to encourage the use of Postgres, I understand that in a big project you probably don't want to use SQLite or such, but i don't know why they don't like Mysql that much.
just a quick example - Djangobook page 9:
We’re quite fond of PostgreSQL ourselves, for reasons outside the scope of this book, so we mention it first. However, all those engines will work equally well with Django.
Postgres name is always associated with Django - just like Mysql is always associated with php
Best answer might be given to this quiestion is from those who writes the Framework...
But from what you can get from documentation is:
By default, Django starts a transaction when a database connection is first used and commits the result at the end of the request/response handling. The PostgreSQL backends normally operate the same as any other Django backend in this respect.
Django have a strong Transection control, and using a powerful free DBMS that have strong transection control is a plus...
Also, previous versions of Mysql (before mysql 5.0), MySql have some integrity problems. Also MySql's fast storage engine MyISAM do not support foreign keys and transections. So using MyISAM have important minuses that its pluses...
And a wiki for why Postgres is better than Mysql. Its quite old but it is good.
I have never used PostgreSQL myself, so I can't really say much about the advantages it is supposed to have over MySQL.
But from what I've gathered, transaction handling is better supported by PostgreSQL out of the box. If you use MySQL, changes are that you will use MyISAM as storage engine, which doesn't support transactions.
https://docs.djangoproject.com/en/dev/topics/db/transactions/#transactions-in-mysql
Maybe the Django devs just got sick and tired of having to deal with bug reports where transactions didn't work, but the problem was due to MyISAM and not Django.
The South developers (the most used database schema migration framework for Django) apparently aren't to fond of MySQL either, which this message suggests that I've seen quite often with MySQL:
! Since you have a database that does not support running
! schema-altering statements in transactions, we have had to
! leave it in an interim state between migrations.
[...]
! The South developers regret this has happened, and would
! like to gently persuade you to consider a slightly
! easier-to-deal-with DBMS.
I've always used Postgres whenever possible, partly because of maturity and because of the PostGIS extensions that add spatial data capabilities to the database. Even if I don't think I'm going to want spatial data in my application at the beginning its much easier to add it on if your DB supports it, rather than have to tear out MySQL at a late stage and replace it with PostGIS.
I think there is a spatial extension to MySQL now, so you might be able to do spatial operations in that now. But Postgres just does it and has been doing it for years.
Or I could spend $$$$$ for Oracle Spatial, I suppose...