Using Django with a PostgreSQL (8.x) backend, I have a model where I need to skip a block of ids, e.g. after giving out 49999 I want the next id to be 70000 not 50000 (because that block is reserved for another source where the instances are added explicitly with id - I know that's not a great design but it's what I have to work with).
What is the correct/safest place for doing this?
I know I can set the sequence with
SELECT SETVAL(
(SELECT pg_get_serial_sequence('myapp_mymodel', 'id')),
70000,
false
);
but when does Django actually pull a number from the sequence?
Do I override MyModel.save(), call its super and then grab me a cursor and check with
SELECT currval(
(SELECT pg_get_serial_sequence('myapp_mymodel', 'id'))
);
?
I believe that a sequence may be advanced by django even if saving the model fails, so I want to make sure whenever it hits that number it advances - is there a better place than save()?
P.S.: Even if that was the way to go - can I actually figure out the currval for save()'s session like this? if I grab me a connection and cursor, and execute that second SQL statement, wouldn't I be in another session and therefore not get a currval?
Thank you for any pointers.
EDIT: I have a feeling that this must be done at database level (concurrency issues) and posted a corresponding PostgreSQL question - How can I forward a primary key sequence in PostgreSQL safely?
As I haven't found an "automated" way of doing this yet, I'm thinking of the following workaround - it would be feasible for my particular situation:
Set the sequence with a MAXVALUE 49999 NO CYCLE
When 49999 is reached, the next save() will run into a postgres error
Catch that exception and reraise as a form error "you've run out of numbers, please reset to the next block then try again"
Provide a view where the user can activate the next block, i.e. execute "ALTER SEQUENCE my_seq RESTART WITH 70000 MAXVALUE 89999"
I'm uneasy about doing the restart automatically when catching the exception:
try:
instance.save()
except RunOutOfIdsException:
restart_id_sequence()
instance.save()
as I fear two concurrent save()'s running out of ids will lead to two separate restarts, and a subsequent violation of the unique constraint. (basically same concept as original problem)
My next thought was to not use a sequence for the primary key, but rather always specify the id explicitly from a separate counter table which I check/update before using its latest number - that should be safe from concurrency issues. The only problem is that although I have a single place where I add model instances, other parts of django or third-party apps may still rely on an implicit id, which I don't want to break.
But that same mechanism happens to be easily implemented on postgres level - I believe this is the solution:
Don't use SERIAL for the primary key, use DEFAULT my_next_id()
Follow the same logic as for "single level gapless sequence" - http://www.varlena.com/GeneralBits/130.php - my_next_id() does an update followed by a select
Instead of just increasing by 1, check if a boundary was crossed and if so, increase even further
Related
I'm using Django 2.2 and my question is: does transaction.atomic roll back increments to a pk sequence?
Below is the background bug I wrote up that led me to this issue
I'm facing a really weird issue that I can't figure out and I'm hoping someone has faced a similar issue.
An insert using the django ORM .create() function is returning django.db.utils.IntegrityError: duplicate key value violates unique constraint "my_table_pkey" DETAIL: Key (id)=(5795) already exists.
Fine. But then I look at the table and no record with id=5795 exists!
SELECT * from my_table where id=5795;
shows (0 rows)
A look at the sequence my_table_id_seq shows that it has nonetheless incremented to show last_value = 5795 as if the above record was inserted. Moreover the issue does not always occur. A successful insert with different data is inserted at id=5796. (I tried reset the pk sequence but that didn't do anything, since it doesnt seem to be the problem anyway)
I'm quite stumped by this and it has caused us a lot of issues on one specific table. Finally I realize the call is wrapped in transaction.atomic and that a particular scenario may be causing a double insert with the same pk.
So my theory is: The transaction atomic is not rolling back the increment of the
Postgres sequences do not roll back. Every time they are touched by a statement they advance whether the statement succeeds or not. For more information see Notes section here Create Sequence.
Very basic setup: source-to-target - wanted to replicate the MERGE behavior.
Removed the update strategy, activated "update then insert" rule on target within the session. Doesn't work as described, always attempts to insert into the primary key column, even though the same key arrives, which should have triggered an "update" statement. Tried other target methods - always attempts to insert. Attached is the mapping pic.
basic merge attempt
Finally figured this out. You have to make edits in 3 places: a) mapping - remove update strategy b) session::target properties - set the "update then insert" method c) session's own properties - "treat source rows as"
In the third case you have to switch "treat source rows as" from insert to update.
Which will then allow both - updates and inserts.
Why is it set like this is beyond me. But it works.
I'll make an attempt to clarify this a bit.
First of all, using Update Strategy in the mapping requires the session Treat source rows as property to be set to Data driven. This is slowest possible option as it means it will be set on row-by-row basis within the mapping - but that's exactly what you need if using the Update Strategy transformation. So in order to mirror MERGE, you need to remove it.
And tell the session not to expect this in the mapping anymore - so the property needs to be set to one of the remaining ones. There are two options:
set Treat source rows as to Insert - this means all the rows will be inserted each time. If there are no errors (e.g. caused by unique index), the data will be multiplied. In order to mimic MERGE behavior, you'd need to add the unique index that would prevent inserts and tell the target connector to insert else update. This way in case the insert fails it will make an update attempt.
set Treat source rows as to Update - now this will tell PowerCenter to try updates for each and every input row. Now, using update else insert will cause that in case of failure (i.e. no row to update) there will be no error - instead an insert attempt will be made. Here there's no need for unique index. That's one difference.
Additional difference - although both solutions will reflect the MERGE operation - might be observed in performance. In the environment where new data is very rare, the first approach will be slow: each time an insert attempt will be made just to fail and do an update operation then. Just a few times it will succeed at first attempt. Second approach will be faster: updates will succeed most of the time and just on a rare occasion it will fail and result in an insert operation.
Of course, if updates are not often expected, it will be exactly the opposite.
This can be seen as complex solution for a simple merge. But it also lets the developer to influence the performance.
Hope this sheds some light!
Let's say i want to implement a "Like/Unlike" system in my app. I need to count each like for sorting purposes later. Can i simply insert the current value + 1 ? I think it's too simple.
What if two user click on the same time ? How to prevent my counter to be disturbed ?
I read i need to implement transactions by a simple decorator #transaction.atomic but i am wonder if this can handle my concern.
Transactions are designed to execute a "bloc" of operations triggered by one user, whereas in my case i need be able to handle multiple request at the same time and safely update the counter.
Any advise ?
You can use F() expression, eg.
content.likes_count = F('likes_count') + 1
content.save()
So the operation will be excuted in database not in python.
From the django documentation.
Another useful benefit of F() is that having the database - rather
than Python - update a field’s value avoids a race condition.
If two Python threads execute the code in the first example above, one
thread could retrieve, increment, and save a field’s value after the
other has retrieved it from the database. The value that the second
thread saves will be based on the original value; the work of the
first thread will simply be lost.
If the database is responsible for updating the field, the process is
more robust: it will only ever update the field based on the value of
the field in the database when the save() or update() is executed,
rather than based on its value when the instance was retrieved.
I'm using oracle 12c database and want to test out one problem.
When carrying out web service request it returns underlying ORA-02292 error on constraint name (YYY.FK_L_TILSYNSOBJEKT_BEGRENSNING).
Here is SQL of the table with the constraint:
CONSTRAINT "FK_L_TILSYNSOBJEKT_BEGRENSNING" FOREIGN KEY ("BEGRENSNING")
REFERENCES "XXX"."BEGRENSNING" ("IDSTRING") DEFERRABLE INITIALLY DEFERRED ENABLE NOVALIDATE
The problem is, that when I try to delete the row manually with valid IDSTRING (in both tables) from parent table - it successfully does it.
What cause it to behave this way? Is there any other info I should give?
Not sure if it helps someone, since it was fairly stupid mistake, but i'll try to make it useful since people demand answers.
Keyword DEFERRABLE INITIALLY DEFERRED means that constraint is enforced at commit not at query run time as opposed to INITIALLY IMMEDIATE, which does the check right after you issue the query, however this keyword makes database bulk updates a bit slower (since every query in a transaction has to be checked by constraint, meanwhile in initial deference if it turns out there is an issue - whole bulk is rolled back, no additional unnecessary queries are issued and something can be done about it), hence used less often than initial deference.
Error ORA-02292 however is shown only for DELETE statements, knowing that its a bit easier to debug your statements.
Each time the save() method is called on a Django object, Django executes two queries one INSERT and one SELECT. In my case this is usefull except for some specific places where each query is expensive. Any ideas on how to sometimes state that no object needs to be returned - no SELECT needed.
Also I'm using django-mssql to connect to, this problem doesn't seem to exist on MySQL.
EDIT : A better explanation
h = Human()
h.name='John Foo'
print h.id # Returns None, No insert has been done therefore no id is available
h.save()
print h.id # Returns the ID, an insert has taken place and also a select statement to return the id
Sometimes I don't the need the retruning ID, just insert
40ins's answer was right, but probably it might have higher costs...
When django execustes a save(), it needed to be sure if the object is a new one or an existing one. So it hits the database to check if related objext exists. If yes, it executes an UPDATE, orherwise it executes an ISERT
Check documentatin from here...
You can use force_insert or force_update ,but that might be cause serious data integrity problems, like creating a duplicate entry instead of updating the existing one...
So, if you wish to use force , you must be sure whether it will be an INSERT or an UPDATE...
Try to use save() method with force_insert or force_update attributes. With this attributes django knows about record existence and don't make additional query.
The additional select is the django-mssql backend getting the identity value from the table to determine the ID that was just inserted. If this select is slow, then something is wrong with your SQL server/configuration because it is only doing SELECT CAST(IDENT_CURRENT(*table_name*) as bigint) call.