Django - Merge jsonfield's new data with the old data - django

So let's assume that I have a model with a jsonfield while using Postgres as a database
class Baum(models.Model):
myjson = models.JSONField(...)
Now I'd like to know what would be the best way to edit the model fields saving behaviour/interaction with the database
myjson stores nested api responses, so dicts&lists
When new data comes into myjson, dont delete/overwrite the old by calling save()
-> Instead keep the old data and just add the new data, (if a func which proves that the data is new returns True)
I need the data together, in one myjson field where from time to time pairs are coming on top of it. I am thankful for tips!

Related

Bulk delete Django by ids

I writing a project using Django REST Framework, Django and Postgres as a database. I want to bulk delete in one query. So, it is possible to do without writing a query using pure SQL?
There is an example, but the count of executet query equal length of a list of ids (for example, if in delete_ids 2 ids, Django will execute 2 queries):
delete_ids = [...]
MyModel.objects.filter(id__in=delete_ids).delete()
Not possible using the filter and delete together using raw sql query.
https://docs.djangoproject.com/en/2.1/topics/db/sql/
MyModel.objects.raw('DELETE FROM my_model WHERE id IN (%s)', [','.join([list_of_ids])])
For fast deletes won't advice but you can also use
sql.DeleteQuery(MyModel).delete_qs(qs, using=qs.db)
jackotyne's answer is incorrect as a DELETE statement cannot be run with django raw. The idea behind django raw is that it returns a queryset, but DELETE won't do that.
Please read the reply to this answer.
You will need a database cursor as stated in the django documentation.
with connection.cursor() as cursor:
cursor.execute(
'DELETE FROM "appname_modelname" WHERE id IN (%s)' % ', '.join(delete_ids)
)
Of course it is better to filter with django and get a queryset and do a bulk delete with queryset.delete(), but that is not always possible depending on the data's logic.

How to save object in cfwheels and save data to an associated table with comma separated values out of an input field?

How do I save my object in cfwheels and add data out of a form input field in the form of a comma separated list and save this data in a related table?
My form passes the data to the controler in the params struct.
There is the field "tags" wich holds the values like "Apple,Pear,Banana".
How can I save this data to a second related table? Is it possible without a second query and without a transaction?
This is the simplified controller:
public void function create() {
news = model("News").new(params.news);
news.save()
}
The normal object data should go to the table
news
The related data to
tags
I created the associations in both models.
based on the info you have provided it's hard to give more feedback than this:
public void function create() {
news = model("News").new(params.news); //<- this is enough
// news.save() //<- not needed
}
You may need to include the post/get data, the models, as well as a description on what is the result and what is the expected result.

Django Unique Bulk Inserts

I need to be able to quickly bulk insert large amounts of records quickly, while still ensuring uniqueness in the database. The new records to be inserted have already been parsed, and are unique. I'm hoping there is a way to enforce uniqueness at the database level, and not in the code itself.
I'm using MySQL as the database backend. If django supports this functionality in any other database, I am flexible in changing the backend, as this is a requirement.
Bulk inserts in Django don't use the save method, so how can I insert several hundred to several thousand records at a time, while still respecting unique fields and unique together fields?
My model structures, simplified, look something like this:
class Example(models.Model):
Meta:
unique_together = (('name', 'number'),)
name = models.CharField(max_length = 50)
number = models.CharField(max_length = 10)
...
fk = models.ForeignKey(OtherModel)
Edit:
The records that aren't already in the database should be inserted, and the records that already existed should be ignored.
As miki725's mentioned you don't have a problem with your current code.
I'm assuming you are using the bulk_create method. It is true that the save() method is not called when using bulk_create, but the uniqueness of fields is not enforced inside the save() method. When you use unique_together a unique constraint is added to the underlying table in mysql when creating the table:
Django:
unique_together = (('name', 'number'),)
MySQL:
UNIQUE KEY `name` (`name`,`number`)
So if you insert a value into the table using any method (save, bulk_insert or even raw sql) you will get this exception from mysql:
Duplicate entry 'value1-value2' for key 'name'
UPDATE:
What bulk_insert does is that it creates one big query that inserts all the data at once with one query. So if one of the entries is duplicate, it throws an exception and none of the data is inserted.
1- One option is to use batch_size parameter of bulk_insert and make it insert the data in a number of batches so that if one of them fails you only miss rest of the data of that batch. (depends how important it is to insert all the data and how frequent the duplicate entries are)
2- Another option is to write a for loop over the bulk data and insert the bulk data one by one. This way the exception is thrown for that one row only and the rest of the data is inserted. This is gonna query the db every time and is of course a lot slower.
3- Third option is to lift the unique constraint, insert the data using bulk_create and then write a simple query that deletes the duplicate rows.
Django itself does not enforce the unique_together meta attribute. This is enforced by the database using the UNIQUE clause. You can insert as much data as you want and you are guaranteed that the specified fields will be unique. If not, then an exception will be raised (not sure which one). More about unique_together in the docs.

Django append to JSON after serializers.serialze has been run on a queryset

I am returning a JSON serialized queryset using the following queryset:
genome_parents = Genome.objects.filter(genes=cus_id)
where cus_id is the FK pointing to a companies table so I am retrieving all Genome objects related to the current working company. I return this data after a form has been posted via:
genome_parents = serializers.serialize('json', genome_parents, use_natural_keys=True)
However, I need the natural key for one of my foreign keys, but the id for another (both on the same model). So one field is displayed nicely, but the other isn't. So this does what I need except for one little thing, I need the plain id number so that I can pre-populate my FK form field.
One thought I had was to just append something like
genome_parents.append({'id':gene.id})
but that obviously doesn't work. Is there anyway I can augment the JSON so that I can include one more little piece of data (or change how I format the JSON)?
Greg
Just switch the order of the operations. And put the entire gene object into the list so it is properly serialized.
genome_parents = list( Genome.objects.filter(genes=cus_id) )
genome_parents.append(gene)
json_genome_parents = serializers.serialize('json', genome_parents, use_natural_keys=True)

Automate the generation of natural keys

I'm studying a way to serialize part of the data in database A and deserialize it in database B (a sort of save/restore between different installations) and I've had a look to Django natural keys to avoid problems due to duplicated IDs.
The only issue is that I should add a custom manager and a new method to all my models. Is there a way to make Django automatically generate natural keys by looking at unique=True or unique_togheter fields?
Please note this answer has nothing to do with Django, but hopefully give you another alternative to think about.
You didn't mention your database, however, in SQL Server there is a BINARY_CHECKSUM() keyword you can use to give you a unique value for the data held in the row. Think of it as a hash against all the fields in the row.
This checksum method can be used to update a database from another by checking if local row checksum <> remote row checksum.
This SQL below will update a local database from a remote database. It won't insert new rows, for that you use insert ... where id > #MaxLocalID
SELECT delivery_item_id, BINARY_CHECKSUM(*) AS bc
INTO #DI
FROM [REMOTE.NETWORK.LOCAL].YourDatabase.dbo.delivery_item di
SELECT delivery_item_id, BINARY_CHECKSUM(*) AS bc
INTO #DI_local
FROM delivery_item di
-- Get rid of items that already match
DELETE FROM #DI_local
WHERE delivery_item_id IN (SELECT l.delivery_item_id
FROM #DI x, #DI_local l
WHERE l.delivery_item_id = x.delivery_item_id
AND l.bc = x.bc)
DROP TABLE #DI
UPDATE DI
SET engineer_id = X.engineer_id,
... -- Set other fields here
FROM delivery_item DI,
[REMOTE.NETWORK.LOCAL].YourDatabase.dbo.delivery_item x,
#DI_local L
WHERE x.delivery_item_id = L.delivery_item_id
AND DI.delivery_item_id = L.delivery_item_id
DROP TABLE #DI_local
For the above to work, you will need a linked server between your local database and the remote database:
-- Create linked server if you don't have one already
IF NOT EXISTS ( SELECT srv.name
FROM sys.servers srv
WHERE srv.server_id != 0
AND srv.name = N'REMOTE.NETWORK.LOCAL' )
BEGIN
EXEC master.dbo.sp_addlinkedserver #server = N'REMOTE.NETWORK.LOCAL',
#srvproduct = N'SQL Server'
EXEC master.dbo.sp_addlinkedsrvlogin
#rmtsrvname = N'REMOTE.NETWORK.LOCAL',
#useself = N'False', #locallogin = NULL,
#rmtuser = N'your user name',
#rmtpassword = 'your password'
END
GO
In that case you should use a GUID as your key. The database can automatically generate these for you. Google uniqueidentifier. We have 50+ warehouses all inserting data remotely and send their data up to our primary database using SQL Server replication. They all use a GUID as the primary key as this is guaranteed to be unique. It works very well.
my solution has nothing to do with natural keys but uses picke/unpickle.
It's not the most efficient way, but it's simple and easy to adapt to your code. I don't know if it works with a complex db structure, but if this is not your case give it a try!
when connected to db A:
import pickle
records_a = your_model.objects.filter(...)
f = open("pickled.records_a.txt", 'wb')
pickle.dump(records_a, f)
f.close()
then move the file and when connected to db B run:
import pickle
records_a = pickle.load(open('pickled.records_a.txt'))
for r in records_a:
r.id = None
r.save()
Hope this helps
make a custom base model by extending models.Model class, and write your generic manager inside it, and acustom .save() method then edit your models to extend the custome base model. this will have no side effect on your db tables structure nor old saved data, except when you update some old rows. and if you had old data try to make a fake update to all your recoreds.