Disable QSql(Relational)TableModel's prefetch/caching behaviour - c++

For some (well, performance) reason, Qt's "model" classes only fetch 256 rows from the database so if you want to append the row to the end of the recordset, you, apparently, must do something along the lines of
while (model->canFetchMore()) {
model->fetchMore();
}
This does work and when you do model->insertRow(model->rowCount()) afterwards, the row is, indeed, appended after the last row of the recorset.
There are various other problems related to this behaviour, such as when you insert or remove rows from the model, the view that renders it gets redrawn with only 256 rows showing and you must manually make sure, that the missing rows are fetched again.
Is there a way to bypass this behaviour altogether? My model is very unlikely to display more, than, say, 1000 rows, but getting it to retrieve those 1000 rows seems to be a royal pain. I understand, that this is a great performance optimization if you have to deal with larger recordsets, but for me it is a burden rather than a boon.
The model needs to be writable so I can't simply use QSqlQueryModel instead of QSqlRelationalTableModel.

From the QSqlTableModel documentation:
bool QSqlTableModel::insertRecord ( int row, const QSqlRecord & record )
Inserts the record after row. If row is negative, the record will be appended to the end.
Calls insertRows() and setRecord() internally.
Returns true if the row could be inserted, otherwise false.
See also insertRows() and removeRows().
I've not tried yet, but I think that it's not necessary to fetch the complete dataset to insert a record at the end.

Related

Performance optimization on Django update or create

In a Django project, I'm refreshing tens of thousands of lines of data from an external API on a daily basis. The problem is that since I don't know if the data is new or just an update, I can't do a bulk_create operation.
Note: Some, or perhaps many, of the rows, do not actually change on a daily basis, but I don't which, or how many, ahead of time.
So for now I do:
for row in csv_data:
try:
MyModel.objects.update_or_create(id=row['id'], defaults={'field1': row['value1']....})
except:
print 'error!'
And it takes.... forever! One or two lines a second, max speed, sometimes several seconds per line. Each model I'm refreshing has one or more other models connected to it through a foreign key, so I can't just delete them all and reinsert every day. I can't wrap my head around this one -- how can I cut down significantly the number of database operations so the refresh doesn't take hours and hours.
Thanks for any help.
The problem is you are doing a database action on each data row you grabbed from the api. You can avoid doing that by understanding which of the rows are new (and do a bulk insert to all new rows), Which of the rows actually need update, and which didn't change.
To elaborate:
grab all the relevant rows from the database (meaning all the rows that can possibly be updated)
old_data = MyModel.objects.all() # if possible than do MyModel.objects.filter(...)
Grab all the api data you need to insert or update
api_data = [...]
for each row of data understand if its new and put it in array, or determine if the row needs to update the DB
for row in api_data:
if is_new_row(row, old_data):
new_rows_array.append(row)
else:
if is_data_modified(row, old_data):
...
# do the update
else:
continue
MyModel.objects.bulk_create(new_rows_array)
is_new_row - will understand if the row is new and add it to an array that will be bulk created
is_data_modified - will look for the row in the old data and understand if the data of that row is changed and will update only if its changed
If you look at the source code for update_or_create(), you'll see that it's hitting the database multiple times for each call (either a get() followed by a save(), or a get() followed by a create()). It does things this way to maximize internal consistency - for example, this ensures that your model's save() method is called in either case.
But you might well be able to do better, depending on your specific models and the nature of your data. For example, if you don't have a custom save() method, aren't relying on signals, and know that most of your incoming data maps to existing rows, you could instead try an update() followed by a bulk_create() if the row doesn't exist. Leaving aside related models, that would result in one query in most cases, and two queries at the most. Something like:
updated = MyModel.objects.filter(field1="stuff").update(field2="other")
if not updated:
MyModel.objects.bulk_create([MyModel(field1="stuff", field2="other")])
(Note that this simplified example has a race condition, see the Django source for how to deal with it.)
In the future there will probably be support for PostgreSQL's UPSERT functionality, but of course that won't help you now.
Finally, as mentioned in the comment above, the slowness might just be a function of your database structure and not anything Django-specific.
Just to add to the accepted answer. One way of recognizing whether the operation is an update or create is to ask the api owner to include a last updated timestamp with each row (if possible) and store it in your db for each row. That way you only have to check for those rows where this timestamp is different from the one in api.
I faced an exact issue where I was updating every existing row and creating new ones. It took a whole minute to update 8000 odd rows. With selective updates, I cut down my time to just 10-15 seconds depending on how many rows have actually changed.
I think below code can do the same thing together instead of update_or_create:
MyModel.objects.filter(...).update()
MyModel.objects.get_or_create()

Qt Delete selected row in QTableView

I want to delete a selected row from the table when I click on the delete button.
But I can't find anything regarding deleting rows in the Qt documentation. Any ideas?
You can use the bool QAbstractItemModel::removeRow(int row, const QModelIndex & parent = QModelIndex()) functionality for this.
Here you can find an example for all this.
Also, here is an inline quote from that documentation:
removeRows()
Used to remove rows and the items of data they contain
from all types of model. Implementations must call beginRemoveRows()
before inserting new columns into any underlying data structures, and
call endRemoveRows() immediately afterwards.
The second part of the task would be to connect the button's clicked signal to the slot executing the removal for you.
If you are removing multiple rows you can run into some complications using the removeRow() call. This operates on the row index, so you need to remove rows from the bottom up to keep the row indices from shifting as you remove them. This is how I did it in PyQt, don't know C++ but I imagine it is quite similar:
rows = set()
for index in self.table.selectedIndexes():
rows.add(index.row())
for row in sorted(rows, reverse=True):
self.table.removeRow(row)
Works perfectly for me! However one thing to know, in my case this function gets called when a user clicks on a specific cell (which has a pushbutton with an 'X'). Unfortunately when they click on that pushbutton it deselects the row, which then prevents it from getting removed. To fix this I just captured the row of the sender and appended it to the "remove_list" at the very beginning, before the "for loops". That looks like this:
rows.add(self.table.indexAt(self.sender().pos()).row())
You can use another way by deleting the row from database, then clear the model and fill it again, this solution is also safe when you are removing multiple rows.

Displaying big data using Qt Table View

I am not sure how to go about this big data problem. Instead of using a Table Widget, it is suggested to use a Table View for big data for speed. The data will be at most 200000 rows by 300 columns of strings. Also, it might have gaps in the structure. For the View, it is needed to define a Model. I was not able to do this yet and I wonder the following:
Will it really be faster to use a View if the user only wants replace some values in a column, and inspect the data? Is a QList a smart choice given speed needs, or are there better options? And how to implement this as a Model?
A QTableWidget uses its own Model as backend (container for its data). One can't (or it is not intented that one should) customize a QTableWidget's model. Each cell in a QTableWidget is represented by a QTableWidgetItem. They are written to solve general problems and are not that fast.
When one decides to use a QTableView with it's own model deriving from QAbstractTableModel as backend, one can speed things up allot. I've found that one of the things that took the most processing time was the size calculation of rows. I overloaded sizeHintForRow in my table like this:
int MyTable::sizeHintForRow ( int row ) const
{
//All rows have the same height in MyTable, hence we use the height of the
// first row for performance reasons.
if( cachedRowSizeHint_ == -1 )
{
cachedRowSizeHint_ =
model()->headerData( 0, Qt::Vertical, Qt::SizeHintRole ).toSize().height();
}
return cachedRowSizeHint_;
}
Implementing a simple table model is quite easy (Qt Help files are adequate...). One can start by providing the data for the minimum roles (Edit and Display). TableModel data can always be modified by using QIdentityProxyModels where necessary. If your models don't work, post your code. Remember to not return invalid data for roles e.g:
QVariant IO_PluginMngr::data(const QModelIndex & index, int role) const
{
if( index.isValid() )
{
if( role == Qt::DisplayRole)
{
return sequence_[index.row()][index.column()];
}
}
return QVariant();
}
Returning for example erroneous data for especially SizeHintRole will cause havoc in the views.
I've noticed that one should, when implementing models, be diligent in calling begin/endInsertRows when appropriate, otherwise persistent model indexes don't work.
We've written more complicated tables where the data is modified in threads in the background, and swapped out prior to updating the table. One of the areas in which one could have speed improvement, is to never order the data in your model, but rather use an ordered reference sequence that references the real (unordered) data. We have tables that have about a thousand rows and 50 columns that get updated once a second (with many views), and the processor our processor is sitting a about 20%, but the greatest speed improvement was the caching of the sizeHint.

Returning objects that work with parent object's data and later invalidate them

I'm working on an implementation of tables in a Qt app, but having a bit of trouble getting the design right.
I have two classes, Table and Cell.
Cell has API for setting cell properties such as borders and paddings, and getting the row and column of the cell using int Cell::row() and int Cell::column(). It is an explicitly shared class, using QExplicitlySharedDataPointer for its data. It also has an isValid() API to query if the cell is valid or not.
Table has API for inserting/removing rows and columns and merging areas of cells. A Cell may be retrieved from a table using Table::cellAt(int row, int column). Rows of cells are kept as a QList<QList<Cell>>. When rows and columns are removed, the removed cells are marked as invalid by the table, which makes calls to Cell::isValid on any previously returned cells from the removed rows/columns return false.
Now to the tricky part: Since calculating the row and column number of a cell if you haven't already got them is an expensive operation, the Table::cellAt(int row, int column) methods sets the row/column explicitly on the Cell before returning it and the Cell keeps them as simple int members. This means a Cell can reply fast when queried for its row/column.
But here comes the problem; This also means that the values of Cell::row() and Cell::column will be incorrect if rows or columns are removed/inserted before the row/column that the cell is in.
I can't mark the affected cells as invalid in the same way I do when the actual row/column they are part of is removed. Since later on someone might again retrieve a cell with cellAt(int, int) in that row/column. And that cell should not be invalid.
Does anyone have some advise on a better design here?
You could do a lazy update. That is, instead of updating the cell's position information every time the table changes, only update it when Cell:row or Cell:column are called if the table has changed since the last time the cell's position was updated. That would require you to keep a version stamp or something in Table and Cell. Every time the table gets updated, bump the version. Cell::row and Cell:column would first check if the Cell's version is older than the Table's, and if so, recalculate the position.
Whether that extra work is worth it versus just always recalculating position or recalculating on every change, I can't say.
I discussed this problem with a friend because it seemed too much like a Programming Pearls exercise. This is what we came up with:
Create a new structure for counting indices. It could be something like struct Index { int n; };.
Store row and column indices in two QList<Index*>. Let's call these Dimension. Using a pointer is crucial, as you'll see later.
Cells do not store their row and column values anymore. They point to an element at each of the two Dimension. When queried for their row and column, there is now an extra pointer dereference which should not be too expensive.
When the table has rows and columns added or removed, you add items to or remove items from the corresponding Dimension and update the n value of the following items. Storing pointers is necessary because QList copies its values.
Instead of renumbering every Cell affected by a table manipulation, an operation that has a cost of O(rows x columns), you'll now update only the affected rows or columns, which has cost of O(rows + columns). The downside is added code complexity and two dereferenced accesses when the Cell is queried for its row or column.

Supporting multi-add/delete (and undo/redo) with a QAbstractItemModel (C++)

Greetings,
I've been writing some nasty code to support the undo/redo of deletion of an arbitrary set of objects from my model. I feel like I'm going about this correctly, as all the other mutators (adding/copy-pasting) are subsets of this functionality.
The code is nastier than it needs to me, mostly because the only way to mutate the model involves calling beginInsertRows/beginRemoveRows and removing the rows in a range (just doing 1 row at a time, no need to optimize "neighbors" into a single call yet)
The problem with beginInsertRows/beginRemoveRows is that removal of a row could affect another QModelIndex (say, one cached in a list). For instance:
ParentObj
->ChildObj1
->ChildObj2
->ChildObj3
Say I select ChildObj1 and ChildObj3 and delete them, if I remove ChildObj1 first I've changed ChildObj3's QModelIndex (row is now different). Similar issues occur if I delete a parent object (but I've fixed this by "pruning" children from the list of objects).
Here are the ways I've thought of working around this interface limitation, but I thought I'd ask for a better one before forging ahead:
Move "backwards", assuming a provided list of QModelIndices is orderered from top to bottom just go from bottom up. This really requires sorting to be reliable, and the sort would probably be something naive and slow (maybe there's a smart way of sorting a collection of QModelIndexes? Or does QItemSelectionModel provide good (ordered) lists?)
Update other QModelIndeces each time an object is removed/added (can't think of a non-naive solution, search the list, get new QModelIndeces where needed)
Since updating the actual data is easy, just update the data and rebuild the model. This seems grotesque, and I can imagine it getting quite slow with large sets of data.
Those are the ideas I've got currently. I'm working on option 1 right now.
Regards,
Dan O
Think of beginRemoveRows/endRemoveRows, etc. as methods to ask the QAbstractItemModel base class to fix up your persistent model indexes for you instead of just a way of updating views, and try not to confuse the QAbstractItemModel base class in its work on those indexes. Check out http://labs.trolltech.com/page/Projects/Itemview/Modeltest to exercise your model and see if you are keeping the QAbstractItemModel base class happy.
Where QPersistentModelIndex does not help is if you want to keep your undo/redo data outside of the model. I built a model that is heavily edited, and I did not want to try keeping everything in the model. I store the undo/redo data on the undo stack. The problem is that if you edit a column, storing the persistent index of that column on the undo stack, and then delete the row holding that column, the column's persistent index becomes invalid.
What I do is keep both a persistent model index, and a "historical" regular QModelIndex. When it's time to undo/redo, I check if the persistent index has become invalid. If it has, I pass the historical QModelIndex to a special method of my model to ask it to recreate the index, based on the row, column, and internalPointer. Since all of my edits are on the undo stack, by the time I've backed up to that column edit on the undo stack, the row is sure to be there in the model. I keep enough state in the internalPointer to recreate the original index.
I would consider using a "all-data" model and a filter proxy model with the data model. Data would only be added to the all-data model, and never removed. That way, you could store your undo/redo information with references to that model. (May I suggest QPersistentModelIndex?). Your data model could also track what should be shown, somehow. The filter model would then only return information for the items that should be shown at a given time.