Sync Framework Deletes not being applied on client - microsoft-sync-framework

Heres my scenario: I was concerned about the SqlCeSyncClient applying deletes, then inserts, then updates. I have cases where a row may be de-referenced from another table, and then deleted. For example, imagine this:
I have two tables; Customer, and Area, of which Customer.Area references Area.Name with a Foreign key constraint
insert into Area values('Australia')
insert into Customer values('customer1','Australia')
-- Sync happens. Client gets 2 inserts.
update Customer set Area = 'New Zealand' where Area = 'Australia'
delete from Area where Name = 'Australia'
-- Sync happens. Client gets 1 update , and 1 delete
The SqlCeClientSyncProvider tries to apply the delete first, which it fails to do because of referential integrity constraints on the client.
My first question is: Why on earth did the boys at Microsoft code the SyncClient to process deletes FIRST when it breaks all referential integrity rules? Shouldn't they apply deletes LAST????
My next question is: I have managed to reverse the order by inspecting the code and writing the whole ApplyChanges method myself... but even when I do that the deletes are not applied. Is there some internal thing with datasets that means you can't change the order of processing?

The problem is not the order from operations ( delete, update, inserts, ...), but the order you placed your synctables...
You should have synced Area table first and after Customer table.

Related

Undoing cascade deletions in a dense database

I have a fairly large production database system, based on a large hierarchy of nodes each with a 10+ associated models. If someone deletes a node fairly high in the tree, there can be thousands of models deleted and if that deletion was a mistake, restoring them can be very difficult. I'm looking for a way to give me an easy 'undo' option.
I've tried using Django-reversion, but it seems like in order to get the functionality I want (easily reverting a large cascade delete) it needs to store a bunch of information with each revision. When I created initial revisions, the process is less than 10% done and it's already using 8GB in my database, which is not going to work for me.
So, is there a standard solution for this problem? Or a way to customize Django-reversions to fit my use case?
What you're looking for is called a soft delete. Add a column named deleted with a value of false to the table. Now when you want to do a "delete" instead change the column deleted to true. Update all the code not to show the rows marked as deleted (or move the database table and replace it with a view that doesn't show them). Change all the unique constraints to have a filter WHERE deleted = false so you won't have a problem with not being able to add something similar to what user can't see in the system.
As for the cascades you have two options. Either do an ON UPDATE trigger that will update the child rows or add the deleted column to the FK and define it as ON UPDATE CASCADE.
You'll get the whole reverse functionality at a cost of one extra row (and not being able to delete stuff to save space unless you do it manually).

Database polling, prevent duplicate fetches

I have a system whereby a central MSSQL database keeps in a table a queue of jobs that need to be done.
For the reasons that processing requirements would not be that high, and that there would not be a particularly high frequency of requests (probably once every few seconds at most) we made the decision to have the applications that utilise the queue simply query the database whenever one is needed; there is no message queue service at this time.
A single fetch is performed by having the client application run a stored procedure, which performs the query(ies) involved and returns a job ID. The client application then fetches the job information by querying by ID and sets the job as handled.
Performance is fine; the only snag we have felt is that, because the client application has to query for the details and perform a check before the job is marked as handled, on very rare occasions (once every few thousand jobs), two clients pick up the same job.
As a way of solving this problem, I was suggesting having the initial stored procedure that runs "tag" the record it pulls with the time and date. The stored procedure, when querying for records, will only pull records where this "tag" is a certain amount of time, say 5 seconds, in the past. That way, if the stored procedure runs twice within 5 seconds, the second instance will not pick up the same job.
Can anyone foresee any problems with fixing the problem this way or offer an alternative solution?
Use a UNIQUEIDENTIFIER field as your marker. When the stored procedure runs, lock the row you're reading and update the field with a NEWID(). You can mark your polling statement using something like WITH(READPAST) if you're worried about deadlocking issues.
The reason to use a GUID here is to have a unique identifier that will serve to mark a batch. Your NEWID() call is guaranteed to give you a unique value, which will be used to prevent you from accidentally picking up the same data twice. GETDATE() wouldn't work here because you could end up having two calls that resolve to the same time; BIT wouldn't work because it wouldn't uniquely mark off batches for picking up or reporting.
For example,
declare #ReadID uniqueidentifier
declare #BatchSize int = 20; -- make a parameter to your procedure
set #ReadID = NEWID();
UPDATE tbl WITH (ROWLOCK)
SET HasBeenRead = #ReadID -- your UNIQUEIDENTIFIER field
FROM (
SELECT TOP (#BatchSize) Id
FROM tbl WITH(UPDLOCK ROWLOCK READPAST )
WHERE HasBeenRead IS null ORDER BY [Id])
AS t1
WHERE ( tbl.Id = t1.Id)
SELECT Id, OtherCol, OtherCol2
FROM tbl WITH(UPDLOCK ROWLOCK READPAST )
WHERE HasBeenRead = #ReadID
And then you can use a polling statement like
SELECT COUNT(*) FROM tbl WITH(READPAST) WHERE HasBeenRead IS NULL
Adapted from here: https://msdn.microsoft.com/en-us/library/cc507804%28v=bts.10%29.aspx

How do you handle "Sync Framework does not automatically handle the deletion of rows that no longer satisfy a filter condition"

http://msdn.microsoft.com/en-us/library/dd918848.aspx
"It is important to understand that a scope is the combination of tables and filters. For example, you could define a filtered scope named sales-WA that contains only the sales data for the state of Washington from the customer_sales table. If you define another filter on the same table, such as sales-OR, this is a different scope. If you define filters, be aware that Sync Framework does not automatically handle the deletion of rows that no longer satisfy a filter condition. For example, if a user or application updates a value in a column that is used for filtering, a row moves from one scope to another. The row is sent to the new scope that the row now belongs to, but the row is not deleted from the old scope. Your application must handle this situation."
I am just wondering someone can shed some light on how to handle "Sync Framework does not automatically handle the deletion of rows that no longer satisfy a filter condition"?
Many thanks.
The sync providers will (as part of the provisioning step) automatically create tombstone tables and triggers to track row deletions. When rows are not deleted, but updated in such a way, as to fall out of the scope, then the automatically generated schema won't log these as deletions. It will log them as updates. So to extend the Microsoft example, assume your application is syncing only Washington data to Washington sales reps. Some sales that were originally entered as Washington sales are corrected and moved to Oregon. The sync framework won't know that it should remove these now-Oregon records from the Washington reps' local databases.
You have a couple of options to solve this:
Modify the provisioning tools to generate triggers that would handle the situation, instead of the default triggers that don't. Look into extending SqlSyncScopeProvisioning to accomplish this. If done correctly, this is probably the most scale-able/extensible solution.
Modify your application to detect the attempt to move a row out of a scope and have the application delete the row and re-insert it instead of just updating it (probably in a stored procedure). If you already use stored procedures to handle updates, this might be a good option.
Add a background service or process that goes through and looks for records that don't match the scope and delete them. This may end up being the easiest solution - especially if your application is already deployed.

Django 1.2 PostgreSQL cascading delete for keys with ON DELETE NO ACTION

I have a postgresql database with about 150 tables(it's a Django 1.2 project). Django adds ON DELETE NO ACTION and ON UPDATE NO ACTION to foreign keys at the time of table creation.
Now I need to bulk delete data (about 800,000 records) from a bunch of tables based on certain condition.
Using Model.objects.filter().delete() is not an options because data is huge and it takes a lot of time.
Only sanest options seems a cascading delete, but since Django has add "ON DELETE NO ACTION" it seem like a no option.
So my question: Is there any way to change all foreing keys to ON DELETE CASCADE in an easy way(there are many of them) or something similar.
(I am aware that I can manually write the SQL queries for each table, but that would be a monumental and difficult to maintain task.)
https://docs.djangoproject.com/en/dev/ref/models/fields/#django.db.models.ForeignKey.on_delete
As pointed out in the link which comprises Andrew's answer, if you set this to CASCADE in Django, then Django will go and do the deletes "retail". If it is set to NO ACTION you can create a database-level foreign key definition to handle things. That sounds like a reasonable plan to me.
Be sure you have an index defined on the referencing set of columns for every foreign key; otherwise you're going to see very slow performance. Some database products will automatically create such an index when you define a foreign key, but there are situations where that is not advantageous, so PostgreSQL puts the matter in your hands to optimize as you see fit. (Just as one example, it might not be worth the cost of maintaining the index during normal operations, but be worth building it before a purge and dropping it after.)
One note: ON DELETE CASCADE performs miserably on bulk operations. The reason is that this is done as a trigger. Consequently the way it looks from an algorithmic perspective is:
for row in delete_set:
for dependent row in (scan for referencing rows):
delete dependent row
If you are deleting 800000 rows in a parent table this translates into 800000 separate delete scans on the dependent tables. Even at your best case, with indexes usable 800000 separate index scans will be much slower than one sequential scan.
A better way to do this is to use a writeable common table expression in 9.1 or higher, or to just do separate delete statements in the same transaction. Something like:
WITH rows_to_delete (id) AS (
SELECT id FROM mytable WHERE where_condition
),
deleted_rows (id) AS (
DELETE FROM referencing_table WHERE mytable_id IN (select id FROM rows_to_delete)
RETURNING mytable_id
),
DELETE FROM mytable WHERE id IN (select id FROM deleted_rows);
This Reduces to something like, algorithmically:
scan for rows to delete as delete_set
for dependent in scan for rows dependent to delete:
delete dependent
for to_delete in scan for rows referenced by deleted dependents:
delete to_delete
Getting rid of the forced nested loop scan will greatly speed things up.

Should I use cflock or not?

I would like to know if locking my table is necessary in this situation (I'm using Coldfusion and MySQL):
I have a table called wishlists(memberId, gameId, rank, updateAt) where members store games in a personal list.
wishlists has a man-to-many with a members table, and many-to-many with a games table. Many members have many games stored in this list. These games are ranked using an int field. Ranks are particular to a member, so:
Member 1 can have itemId=1, rank=1; itemId=2, rank=2
Member 2 can have itemId=1, rank=1, itemId=2, rank=2
etc...
Each member can modify his or her list by deleting an item, or changing the rank of an item (sendGameToTopOfList(), deleteItemFromList(), for example). This means each time a change is made to a list, the list must be ranked again. I wrote a separate function called rankList() to handle re-ranking. It is fired after "deleteGameFromList()" or "sendGameToTopOfList(); it does the following:
1. Gets a memberWishlist query of all records #memberId ordered first by **rank ASC**, then **updateAt ASC** fields
2. Loops through memberWishlist query and sets each row's **rank=memberWishlist.RecordCount**
updateAt field is necessary because if a game was moved to top of the list, we would have two items ranked number 1, and to differentiate them I use updatedAt.
Scenario One: A member has 100 games in their list:
Member moves an item to top; rankList() is called after the operation is completed.
While rankList() is still re-ranking the items, member deletes a game
In a normal page request this is fine as the page will not reload until rankList() is done. But if it were ajax, or if were using cfthread, the member can delete 10 games in 5 seconds by clicking through really quickly. Then again, the list will be re-ranked after delete anyway, so it may not matter; but it seems like it's something I should protect...
Scenario 2:
Some of these wishlist items can turn into orders by using an additional field called "queuedForShipping." If queuedForShipping is 1, the rankList() function ignores them. What if an admin was creating a shipment when a member just deleted a item or moved one to the top?
Your thoughts are appreciated.
Additional information: New items are automatically ranked last at insert
No. CFLock isn't going to have any effect on how MySQL handles things.
However, you might want a transaction. Wrapping your multiple operations in a transaction block will tell MySQL that you want to guarantee that all the operations complete before storing the changes permanently.
This assumes that your multiple queries are generated in CF, like this:
<cffunction sendToTopOfList()>
<cfquery>
send to top
</cfquery>
<cfquery>
resort everything
<cfquery>
</cffunction>
If you are using a stored procedure on the db server, then (1) I probably would not be too concerned about race conditions here, as you will likely apply row locks as the server makes changes, and (2) look into using transactions on the db server if you are still concerned.
If you still have questions, I'll try to answer them, although I'm not really a MySQL expert and haven't used it much in the last few years.