Synchronized Model instances in Django - django

I'm building a model for a Django project (my first Django project) and noticed
that instances of a Django model are not synchronized.
a_note = Notes.objects.create(message="Hello") # pk=1
same_note = Notes.objects.get(pk=1)
same_note.message = "Good day"
same_note.save()
a_note.message # Still is "Hello"
a_note is same_note # False
Is there a built-in way to make model instances with the same primary key to be
the same object? If yes, (how) does this maintain a globally consistent state of all
model objects, even in the case of bulk updates or changing foreign keys
and thus making items enter/exit related sets?
I can imagine some sort of registry in the model class, which could at least handle simple cases (i.e. it would fail in cases of bulk updates or a change in foreign keys). However, the static registry makes testing more difficult.
I intend to build the (domain) model with high-level functions to do complex
operations which go beyond the simple CRUD
actions of Django's Model class. (Some classes of my model have an instance
of a Django Model subclass, as opposed to being an instance of subclass. This
is by design to prevent direct access to the database which might break consistencies and to separate the business logic from the purely data access related Django Model.) A complex operation might touch and modify several components. As a developer
using the model API, it's impossible to know which components are out of date after
calling a complex operation. Automatically synchronized instances would mitigate this issue. Are there other ways to overcome this?

TL;DR "Is there a built-in way to make model instances with the same primary key to be the same object?" No.
A python object in memory isn't the same thing as a row in your database. So when you create a_note and then fetch same_note from the db, those are two different objects in memory, even though they are the same representation of the underlying row in your database. When you fetch same_note, in fact, you instantiate a new Notes object and initialise it with the values fetched from the database.
Then you change and save same_note, but the a_note object in memory isn't changed. If you did a_note.refresh_from_db() you would see that a_note.message was changed.
Now a_note is same_note will always be False because the location in memory of these two objects will always be different. Two variables are the same (is is True) if they point to the same object in memory.
But a_note == same_note will return True at any time, since Django defines two model instances to be equal if their pk is the same.
Note that if the complexity you're talking about is that in the case of multiple requests one request might change underlying values that are being used by another request, then use F to avoid race conditions.
Within one request, since everything is sequential and single threaded, there's not risk of variables going out of sync: You know the order in which things are done and therefore can always call refresh_from_db() when you know a previous method call might have changed the database value.
Note also: Having two variables holding the same row means you'll have performed two queries to your db, which is the one thing you want to avoid at all cost. So you should think why you have this situation in the first place.

Related

What is the most Django-appropriate way to combine multiple database columns into one model field?

I have several times come across a want to have a Django model field that comprises multiple database columns, and am wondering what the most Django way to do it would be.
Three use cases come specifically to mind.
I want to provide a field that wraps another field, keeping record of whether the wrapped field has been set or not. A use case for this particular field would be for dynamic configuration. A new configuration value is introduced, and a view marks itself as dependent upon a configuration value, redirecting if the value isn't set. Storing whether it's been set yet or not allows for easy indefinite caching of the state. This also lets the configuration value itself be not-nullable, and the application can ignore any value it might have when unset.
I want to provide a money field that combines a decimal (or integer) value, and a currency.
I want to provide a file field with a link to some manner of access rule to determine whether the request should include it/a request for it should succeed.
For each of the use cases, there exists a workaround, that in each case seems less elegant.
Define the configuration fields as nullable. This is undesirable for a few reasons: it removes the validity of NULL as a value for the configuration itself, so tristates and other use valid cases for NULL have to become a pair of fields or a different data type, or an edge case; null=True on the fields allows them to be set back to None in modelforms and the admin without writing a custom FormField for them every time; and every nullable column in a database is arguably bad design.
Define the field as a subclass of DecimalField with an argument accepting a string, and use that to contribute another field to the model. (This is what django-money does). Again, this is undesirable: fields are appearing "as if by magic" on the model; and configuring the currency field becomes not obvious.
Define the combined file+rule field instead as an entire model, and one-to-one to it from the model where you want to have the field. This is a solution to all use cases, but again comes with downsides: there's an extra JOIN required for every instance of the field - one can imagine a User with profile_picture, cv, passport, private_key etc.; there's an implicit requirement to .select_related(*fields) on every query that would ever want to access the fields; and the layout of the related model is going to have cold data interleaved with hot data all over the place given that it's reused everywhere.
In addition to solution 3., there's also the option to define a mixin factory that produces the multiple fields with matching names and whatever desired properties and methods. Again this isn't perfect because the user ends up with fields being defined in the model body, but also above that in the inheritance list.
I think the main reason this keeps sending me in circles is because custom Django model fields are always defined in terms of a single base field, because it's done by inheritance.
What is the accepted way to achieve this end?

Should I use JSONField over ForeignKey to store data?

I'm facing a dilemma, I'm creating a new product and I would not like to mess up the way I organise the informations in my database.
I have these two choices for my models, the first one would be to use foreign keys to link my them together.
Class Page(models.Model):
data = JsonField()
Class Image(models.Model):
page = models.ForeignKey(Page)
data = JsonField()
Class Video(models.Model):
page = models.ForeignKey(Page)
data = JsonField()
etc...
The second is to keep everything in Page's JSONField:
Class Page(models.Model):
data = JsonField() # videos and pictures, etc... are stored here
Is one better than the other and why? This would be a huge help on the way I would organize my databases in the futur.
I thought maybe the second option could be slower since everytime something changes all the json would be overridden, but does it make a huge difference or is what I am saying false?
A JSONField obfuscates the underlying data, making it difficult to write readable code and fully use Django's built-in ORM, validations and other niceties (ModelForms for example). While it gives flexibility to save anything you want to the db (e.g. no need to migrate the db when adding new fields), it takes away the clarity of explicit fields and makes it easy to introduce errors later on.
For example, if you start saving a new key in your data and then try to access that key in your code, older objects won't have it and you might find your app crashing depending on which object you're accessing. That can't happen if you use a separate field.
I would always try to avoid it unless there's no other way.
Typically I use a JSONField in two cases:
To save a response from 3rd party APIs (e.g. as an audit trail)
To save references to archived objects (e.g. when the live products in my db change but I still have orders referencing the product).
If you use PostgreSQL, as a relational database, it's optimised to be super-performant on JOINs so using ForeignKeys is actually a good thing. Use select_related and prefetch_related in your code to optimise the number of queries made, but the queries themselves will scale well even for millions of entries.

Where to clear unused fields: in model or in form?

Where should I clear unused data fields (for example set organization_name to empty string if the Contract model is not related to an organization but is a personal contract)?
Should I do it in model or in form/modelform?
I want to clear unused data fields, among other to ease comparison of equality of two model instances (so that erased field would be always compare equal).
Which method(s) should I override to do clearing unused data in it? Should I override Model.save() method?
That depends on your expected behavior and it is for you to choose.
For example, if you want to leave ability to create instance that will have organization_name set to non-empty string despite of being personal contract manually from console or some other interface, putting that logic into Model.save() method will prevent that.
But if you want to avoid that, Model.save() is best place for it. But don't depend on that 100%, there is always possibility that it will make it's way to database, unless you will check it on database level.

Django default caching of foreign key models

For the life of me I can't find the answer to this question online, although it's a basic one.
I have two models, one referencing the other:
class A(models.Model):
name = models.CharField(...)
...
class B(models.Model):
a = models.ForeignKey(A)
Now, I'm keeping an instance of B in memory, and ever so often I access b.a.name . Does accessing b.a.name cause a database query every time, so that changes in a.name (changes done by another process) are seen in my process? Or do I have to query a explicitly each time?
I'm surprised you haven't been able to find any information on this. It's fairly well documented that forward relationships are cached on first usage - what happens is that a cache attribute is created, and subsequent lookups will check this first.
So yes, this means that changes to that object in another process will not be seen, and you will need to re-query each time. (Note also that depends on your database's transaction isolation setting, you might not even see the new value on re-querying - you may need to commit the current transaction first.)

How to modify a queryset and save it as new objects?

I need to query for a set of objects for a particular Model, change a single attribute/column ("account"), and then save the entire queryset's objects as new objects/rows. In other words, I want to duplicate the objects, with a single attribute ("account") changed on the duplicates. I'm basically creating a new account and then going through each model and copying a previous account's objects to the new account, so I'll be doing this repeatedly, with different models, probably using django shell. How should I approach this? Can it be done at the queryset level or do I need to loop through all the objects?
i.e.,
MyModel.objects.filter(account="acct_1")
# Now I need to set account = "acct_2" for the entire queryset,
# and save as new rows in the database
From the docs:
If the object’s primary key attribute is not set, or if it’s set but a
record doesn’t exist, Django executes an INSERT.
So if you set the id or pk to None it should work, but I've seen conflicting responses to this solution on SO: Duplicating model instances and their related objects in Django / Algorithm for recusrively duplicating an object
This solution should work (thanks #JoshSmeaton for the fix):
models = MyModel.objects.filter(account="acct_1")
for model in models:
model.id = None
model.account = "acct_2"
model.save()
I think in my case, I have a OneToOneField on the model that I'm testing on, so it makes sense that my test wouldn't work with this basic solution. But, I believe it should work, so long as you take care of OneToOneField's.