I would like to know if locking my table is necessary in this situation (I'm using Coldfusion and MySQL):
I have a table called wishlists(memberId, gameId, rank, updateAt) where members store games in a personal list.
wishlists has a man-to-many with a members table, and many-to-many with a games table. Many members have many games stored in this list. These games are ranked using an int field. Ranks are particular to a member, so:
Member 1 can have itemId=1, rank=1; itemId=2, rank=2
Member 2 can have itemId=1, rank=1, itemId=2, rank=2
etc...
Each member can modify his or her list by deleting an item, or changing the rank of an item (sendGameToTopOfList(), deleteItemFromList(), for example). This means each time a change is made to a list, the list must be ranked again. I wrote a separate function called rankList() to handle re-ranking. It is fired after "deleteGameFromList()" or "sendGameToTopOfList(); it does the following:
1. Gets a memberWishlist query of all records #memberId ordered first by **rank ASC**, then **updateAt ASC** fields
2. Loops through memberWishlist query and sets each row's **rank=memberWishlist.RecordCount**
updateAt field is necessary because if a game was moved to top of the list, we would have two items ranked number 1, and to differentiate them I use updatedAt.
Scenario One: A member has 100 games in their list:
Member moves an item to top; rankList() is called after the operation is completed.
While rankList() is still re-ranking the items, member deletes a game
In a normal page request this is fine as the page will not reload until rankList() is done. But if it were ajax, or if were using cfthread, the member can delete 10 games in 5 seconds by clicking through really quickly. Then again, the list will be re-ranked after delete anyway, so it may not matter; but it seems like it's something I should protect...
Scenario 2:
Some of these wishlist items can turn into orders by using an additional field called "queuedForShipping." If queuedForShipping is 1, the rankList() function ignores them. What if an admin was creating a shipment when a member just deleted a item or moved one to the top?
Your thoughts are appreciated.
Additional information: New items are automatically ranked last at insert
No. CFLock isn't going to have any effect on how MySQL handles things.
However, you might want a transaction. Wrapping your multiple operations in a transaction block will tell MySQL that you want to guarantee that all the operations complete before storing the changes permanently.
This assumes that your multiple queries are generated in CF, like this:
<cffunction sendToTopOfList()>
<cfquery>
send to top
</cfquery>
<cfquery>
resort everything
<cfquery>
</cffunction>
If you are using a stored procedure on the db server, then (1) I probably would not be too concerned about race conditions here, as you will likely apply row locks as the server makes changes, and (2) look into using transactions on the db server if you are still concerned.
If you still have questions, I'll try to answer them, although I'm not really a MySQL expert and haven't used it much in the last few years.
Related
I'm trying to loop all records displayed in a page, from the selected one to the end of the rows:
For example here, as I'm selecting only the 5th row it will loop through 5th and 6th row (as there are no more rows below)
What I've been trying is this:
ProdOrderLine := Rec;
REPEAT
UNTIL ProdOrderLine.NEXT = 0;
But it will loop through all records in the table which are not even displayed in the page...
How can I loop only the page records from the selected one to the latest?
Try Copy instead of assignment. Assignment only copies values of there field from one instance of record-variable to another, it died not copy filters or keys (sort order).
Alas, I have to mention that this is uncommon scenario to handle records like this in BC. General best practice approach would be to ask user to select all the records he or she needs with the shift+click, ctrl+click or by dragging the mouse. In that case you will use SetSelectionFiler to instantly grab ask the selected records.
This is how it works across the system and this how user should be taught to work. It is a bad idea to add a way to interact with record that only works in one page in the whole system even if users are asking for it bursting into tears. They probably just had this type of interaction in some other system they worked with before. I know this is a tough fight but it worth it. It is for the sake of stability (less coding = less bugs) and predictability (a certain way of interaction works across all the pages) of the system.
basically my whole career is based on reading question here but now I'm stuck since I even do not know how to ask this correctly.
I'm designing a SQLITE database which is meant for the construction of data sheets out of existing data sheets. People like reusing stuff and I want to manage this with a DB and an interface. A data sheet has reusable elements like pictures, text, formulas, sections, lists, frontpages and variables. Sections can contain elements -> This can be coped with recursive CTEs - thanks "mu is too short" for that hint. Texts, Formulas, lists etc. can contain variables. At the end I want to be able to manage variables which must be unique per data sheet, manage elements which are an ordered list making up the data sheet. So selecting a data sheet I must know which elements are contained and what variables within the elements are used. I must be able to create a new data sheet by re-using elements and/or creating new ones if desired.
I came so far to have (see also link to screen shot at the bottom)
a list of variables
which (several of them) can be contained in elements
a list of elements
elements make up the
a list of data sheets
Reading examples like
Store array in SQLite that is referenced in another table
How to store a list in a column of a database table
give me already helpful hints like that I need to create for each data sheet a new atomic list containing the elements and the position of them. Same for the variables which are referenced by each element. But the troubles start when I want to have it consistent and actually how to query it.
How do I connect the the variables which are contained within elements and the elements that are contained within the data sheets. How do I check when one element or variable is being modified, which data sheets need to be recompiled since they are using the same variables and/or elements?
The more I think about this, the more it sounds like I need to write my own search tree based on an object oriented inheritance class structure and must not use data bases. Can somebody convince me that a data base is the right tool for my issue?
I learned data bases once but this is quite some time ago and to be honest the university was not giving good lectures since we never created a database by our own but only worked on existing ones.
To be more specific, my knowledge leads to this solution so far without knowing how to correctly query for a list of data sheets when changing the content of one value since the reference is a text containing the name of a table:
screen shot since I'm a greenhorn
Update:
I think I have to search for unique connections, so it would end up in many-to-many tables. Not perfectly happy with it but I think I can go on with it.
still a green horn, how are you guys using correct high lightning for sql?
In a Django project, I'm refreshing tens of thousands of lines of data from an external API on a daily basis. The problem is that since I don't know if the data is new or just an update, I can't do a bulk_create operation.
Note: Some, or perhaps many, of the rows, do not actually change on a daily basis, but I don't which, or how many, ahead of time.
So for now I do:
for row in csv_data:
try:
MyModel.objects.update_or_create(id=row['id'], defaults={'field1': row['value1']....})
except:
print 'error!'
And it takes.... forever! One or two lines a second, max speed, sometimes several seconds per line. Each model I'm refreshing has one or more other models connected to it through a foreign key, so I can't just delete them all and reinsert every day. I can't wrap my head around this one -- how can I cut down significantly the number of database operations so the refresh doesn't take hours and hours.
Thanks for any help.
The problem is you are doing a database action on each data row you grabbed from the api. You can avoid doing that by understanding which of the rows are new (and do a bulk insert to all new rows), Which of the rows actually need update, and which didn't change.
To elaborate:
grab all the relevant rows from the database (meaning all the rows that can possibly be updated)
old_data = MyModel.objects.all() # if possible than do MyModel.objects.filter(...)
Grab all the api data you need to insert or update
api_data = [...]
for each row of data understand if its new and put it in array, or determine if the row needs to update the DB
for row in api_data:
if is_new_row(row, old_data):
new_rows_array.append(row)
else:
if is_data_modified(row, old_data):
...
# do the update
else:
continue
MyModel.objects.bulk_create(new_rows_array)
is_new_row - will understand if the row is new and add it to an array that will be bulk created
is_data_modified - will look for the row in the old data and understand if the data of that row is changed and will update only if its changed
If you look at the source code for update_or_create(), you'll see that it's hitting the database multiple times for each call (either a get() followed by a save(), or a get() followed by a create()). It does things this way to maximize internal consistency - for example, this ensures that your model's save() method is called in either case.
But you might well be able to do better, depending on your specific models and the nature of your data. For example, if you don't have a custom save() method, aren't relying on signals, and know that most of your incoming data maps to existing rows, you could instead try an update() followed by a bulk_create() if the row doesn't exist. Leaving aside related models, that would result in one query in most cases, and two queries at the most. Something like:
updated = MyModel.objects.filter(field1="stuff").update(field2="other")
if not updated:
MyModel.objects.bulk_create([MyModel(field1="stuff", field2="other")])
(Note that this simplified example has a race condition, see the Django source for how to deal with it.)
In the future there will probably be support for PostgreSQL's UPSERT functionality, but of course that won't help you now.
Finally, as mentioned in the comment above, the slowness might just be a function of your database structure and not anything Django-specific.
Just to add to the accepted answer. One way of recognizing whether the operation is an update or create is to ask the api owner to include a last updated timestamp with each row (if possible) and store it in your db for each row. That way you only have to check for those rows where this timestamp is different from the one in api.
I faced an exact issue where I was updating every existing row and creating new ones. It took a whole minute to update 8000 odd rows. With selective updates, I cut down my time to just 10-15 seconds depending on how many rows have actually changed.
I think below code can do the same thing together instead of update_or_create:
MyModel.objects.filter(...).update()
MyModel.objects.get_or_create()
I am using a static class in my application. It basically uses an access database, and copies itself to various lists.
When the user modifies some data, the data is updates in the list, using LINQ, if there is no entry in the list for the modification then it will add a new item to the list.
This all works fine.
However on the 1st data interrogation, I create the original list, basically all records in the users table, so I have a list lstDATABASERECORDS.
What I do after populating this list I do lstDATABASERECORDSCOMPARISON=lstDATABASERECORDS
this enables me to quickly check whether to use an update or append query.
However when I add to lstDATABASERECORDS a record is added in lstDATABASERECORDSCOMPARISON too.
Can anyone advise?
You are assigning two variables to refer to the same instance of a list. Instead, you may want to try generating a clone of your list to keep for deltas (ICloneable is unfortunately not that useful without additional work to define cloneable semantics for your objects), or use objects that implement IEditableObject and probably INotifyPropertyChanged for change tracking (there's a few options there, including rolling your own).
There's nothing built in to the framework (until EF) that replicates the old ADO recordset capability to auto-magically generate update queries that only attempt to modify changed columns.
http://msdn.microsoft.com/en-us/library/dd918848.aspx
"It is important to understand that a scope is the combination of tables and filters. For example, you could define a filtered scope named sales-WA that contains only the sales data for the state of Washington from the customer_sales table. If you define another filter on the same table, such as sales-OR, this is a different scope. If you define filters, be aware that Sync Framework does not automatically handle the deletion of rows that no longer satisfy a filter condition. For example, if a user or application updates a value in a column that is used for filtering, a row moves from one scope to another. The row is sent to the new scope that the row now belongs to, but the row is not deleted from the old scope. Your application must handle this situation."
I am just wondering someone can shed some light on how to handle "Sync Framework does not automatically handle the deletion of rows that no longer satisfy a filter condition"?
Many thanks.
The sync providers will (as part of the provisioning step) automatically create tombstone tables and triggers to track row deletions. When rows are not deleted, but updated in such a way, as to fall out of the scope, then the automatically generated schema won't log these as deletions. It will log them as updates. So to extend the Microsoft example, assume your application is syncing only Washington data to Washington sales reps. Some sales that were originally entered as Washington sales are corrected and moved to Oregon. The sync framework won't know that it should remove these now-Oregon records from the Washington reps' local databases.
You have a couple of options to solve this:
Modify the provisioning tools to generate triggers that would handle the situation, instead of the default triggers that don't. Look into extending SqlSyncScopeProvisioning to accomplish this. If done correctly, this is probably the most scale-able/extensible solution.
Modify your application to detect the attempt to move a row out of a scope and have the application delete the row and re-insert it instead of just updating it (probably in a stored procedure). If you already use stored procedures to handle updates, this might be a good option.
Add a background service or process that goes through and looks for records that don't match the scope and delete them. This may end up being the easiest solution - especially if your application is already deployed.