QAbstractItemModel - Should QModelIndex objects be cached when created?

QAbstractItemModel - Should QModelIndex objects be cached when created? - c++

When subclassing a QAbstractItemModel and re-implementing the index() method, I had been simply returning a new index each time with createIndex(). But I noticed that the index() method gets called thousands of times when the model is used in conjunction with a view, for all sorts of paint events and whatnot.
Should I instead be caching the QModelIndex object after I generate it the first time in index(), and then be returning the cached index when index() is subsequently called on the same row/col? It's not mentioned in the documentation, and it seems that indexes themselves can become invalidated under certain circumstances, so I am unsure of what to do here.
In my particular case, I'm working with Pyside6, but I imaging this could apply to any implementation of the Qt framework.

If your model supports inserting or removing rows, your indexes are not persistent. You can still use cache but you must invalidate it every time model shape changes.
If creating index logic is complicated there might be benefit in caching.
Size of QModelIndex is about four ints (row, column and pointer/id and pointer), so it's relatively lightweight, creating and moving it around is cheap.
Either way there's only one way to be sure: try caching and measure perfomance gain.

Related

In Django does .get() have better performance than .first()?

The Django implementation of .first() seems to get all items into a list and then return the first one.
Is .get() more performant ? Surely the database can just return one item, the implementation of .first() seems suboptimal,

I see no reason to think so, although I have not actually profiled.
Slicing on Django querysets is implemented by modifying the query to use LIMIT and OFFSET terms to retrieve only the necessary number of elements. This means the first() implementation only fetches a single element from the database.

Nested operations in QAbstractItemModel

When creating item models, such as by subclassing QAbstractItemModel, are the basic operations like row insertion and removal intended to be nested?
In practice for example, must a call to ::beginInsertRows() immediately followed by a call to ::endInsertRows()? Or in contrast, is it allowed to call ::beginInsertRows() twice with distinct arguments, then do the insertion and then call the corresponding ::endInsertRows() twice?
I am wondering because when reading the QAbstractItemModel sources I observed that operations are done on a stack basis. Note the d->changes.push... On the other hand, d->changes is not used anywhere else than in then beginInsert/endInsert... etc. pairs of functions.

In my case, I broke the insertion down so that I can first do the beginInsertRows / insertion / endInsertRows sequence for the parent items and then the beginInsertRows / insertion / endInsertRows sequence for child items.
It seemed to work out fine at first, but I had some very strange bugs when I used my custom model with a QSortFilterProxyModel. After quite a few hours of fixing tiny inconsistencies in my model and getting increasingly better (or just more sane) results, I was stuck with a last strange behavior, but ran out of ideas of where to look for errors on my part.
Then I thought I'd give a shot to another approach: one beginInsertRows, insert both the parents and their children and then one endInsertRows. To my surprise, it works very well. Finally I can tick this bloody model off my task list.
#vahancho, sorry for my previous comment. I was exhausted from trying to get my custom model to behave and misunderstood your question due to my impatience. Your comment actually holds the correct answer.

Which data structure is sorted by insertion and has fast "contains" check?

I am looking for a data structure that preserves the order in which the elements were inserted and offers a fast "contains" predicate. I also need iterator and random access. Performance during insertion or deletion is not relevant. I am also willing to accept overhead in terms of memory consumption.
Background: I need to store a list of objects. The objects are instances of a class called Neuron and stored in a Layer. The Layer object has the following public interface:
class Layer {
public:
Neuron *neuronAt(const size_t &index) const;
NeuronIterator begin();
NeuronIterator end();
bool contains(const Neuron *const &neuron) const;
void addNeuron(Neuron *const &neuron);
};
The contains() method is called quite often when the software runs, I've asserted that using callgrind. I tried to circumvent some of the calls to contains(), but is still a hot spot. Now I hope to optimize exactly this method.
I thought of using std::set, using the template argument to provide my own comparator struct. But the Neuron class itself does not give its position in the Layer away. Additionally, I'd like to have *someNeuronIterator = anotherNeuron to work without screwing up the order.
Another idea was to use a plain old C array. Since I do not care about the performance of adding a new Neuron object, I thought I could make sure that the Neuron objects are always stored linear in memory. But that would invalidate the pointer I pass to addNeuron(); at least I'd have to change it to point to the new copy I created to keep things linear aligned. Right?
Another idea was to use two data structures in the Layer object: A vector/list for the order, and a map/hash for lookup. But that would contradict my wish for an iterator that allowed operator* without a const reference, wouldn't it?
I hope somebody can hint an idea for a data structure or a concept that would satisfy my needs, or at least give me an idea for an alternative. Thanks!

If this contains check is really where you need the fastest execution, and assuming you can be a little intrusive with the source code, the fastest way to check if a Neuron belongs in a layer is to simply flag it when you insert it into a layer (ex: bit flag).
You have guaranteed O(1) checks at that point to see if a Neuron belongs in a layer and it's also fast at the micro-level.
If there can be numerous layer objects, this can get a little trickier, as you'll need a separate bit for each potential layer a neuron can belong to unless a Neuron can only belong in a single layer at once. This is reasonably manageable, however, if the number of layers are relatively fixed in size.
If the latter case and a Neuron can only belong to one layer at once, then all you need is a backpointer to Layer*. To see if a Neuron belongs in a layer, simply see if that backpointer points to the layer object.
If a Neuron can belong to multiple layers at once, but not too many at one time, then you could store like a little array of backpointers like so:
struct Neuron
{
...
Layer* layers[4]; // use whatever small size that usually fits the common case
Layer* ptr;
int num_layers;
};
Initialize ptr to point to layers if there are 4 or fewer layers to which the Neuron belongs. If there are more, allocate it on the free store. In the destructor, free the memory if ptr != layers. You can also optimize away num_layers if the common case is like 1 layer, in which case a null-terminated solution might work better. To see if a Neuron belongs to a layer, simply do a linear search through ptr. That's practically constant-time complexity with respect to the number of Neurons provided that they don't belong in a mass number of layers at once.
You can also use a vector here but you might reduce cache hits on those common case scenarios since it'll always put its contents in a separate block, even if the Neuron only belongs to like 1 or 2 layers.
This might be a bit different from what you were looking for with a general-purpose, non-intrusive data structure, but if your performance needs are really skewed towards these kinds of set operations, an intrusive solution is going to be the fastest in general. It's not quite as pretty and couples your element to the container, but hey, if you need max performance...
Another idea was to use a plain old C array. Since I do not care about the performance of adding a new Neuron object, I thought I could make sure that the Neuron objects are always stored linear in memory. But that would invalidate the pointer I pass to addNeuron(); [...]
Yes, but it won't invalidate indices. While not as convenient to use as pointers, if you're working with mass data like vertices of a mesh or particles of an emitter, it's common to use indices here to avoid the invalidation and possibly to save an extra 32-bits per entry on 64-bit systems.
Update
Given that Neurons only exist in one Layer at a time, I'd go with the back pointer approach. Seeing if a neuron belongs to a layer becomes a simple matter of checking if the back pointer points to the same layer.
Since there's an API involved, I'd suggest, just because it sounds like you're pushing around a lot of data and have already profiled it, that you focus on an interface which revolves around aggregates (layers, e.g.) rather than individual elements (neurons). It'll just leave you a lot of room to swap out underlying representations when your clients aren't performing operations at the individual scalar element-type interface.
With the O(1) contains implementation and the unordered requirement, I'd go with a simple contiguous structure like std::vector. However, you do expose yourself to potential invalidation on insertion.
Because of that, if you can, I'd suggest working with indices here. However, that become a little unwieldy since it requires your clients to store both a pointer to the layer in which a neuron belongs in addition to its index (though if you do this, the backpointer becomes unnecessary as the client is tracking where things belong).
One way to mitigate this is to simply use something like std::vector<Neuron*> or ptr_vector if available. However, that can expose you to cache misses and heap overhead, and if you want to optimize that, this is where the fixed allocator comes in handy. However, that's a bit of a pain with alignment issues and a bit of a research topic, and so far it seems like your main goal is not to optimize insertion or sequential access quite as much as this contains check, so I'd start with the std::vector<Neuron*>.

You can get O(1) contains-check, O(1) insert and preserve insertion order. If you are using Java, looked at LinkedHashMap. If you are not using Java, look at LinkedHashMap and figure out a parallel data structure that does it or implement it yourself.
It's just a hashmap with a doubly linked list. The link list is to preserve order and the hashmap is to allow O(1) access. So when you insert an element, it makes an entry with the key, and the map will point to a node in the linked list where your data will reside. To look up, you go to the hash table to find the pointer directly to your linked list node (not the head), and get the value in O(1). To access them sequentially, you just traverse the linked list.

A heap sounds like it could be useful to you. It's like a tree, but the newest element is always inserted at the top, and then works its way down based on its value, so there is a method to quickly check if it's there.
Otherwise, you could store a hash table (quick method to check if the neuron is contained in the table) with the key for the neuron, and values of: the neuron itself, and the timestep upon which the neuron was inserted (to check its chronological insertion time).

How to let Qt Tree model work with QSet?

I want to build a tree with items which are always automatically sorted under a tree node when adding, renaming, and doing other operation. std::set seems a good candidates for my data container. However, it seems Qt tree favors vectors or QList (a pointer vector) since the tree items are accessed, inserted, or removed via their indices or row numbers. I am using std::distance like function to calculate index for an item in the set. But I think it is very slow (not tested). Any good way to let Qt tree model work with std::set, or may I need to use vectors for my data, or develop a new container to do it? Thanks a lot!
Find an answer: use boost::container::flat_set. Thanks.

QSet is not a good idea for this purpose. As well as not providing index-based access, it's completely unordered, that is there are absolutely no guarantees on the order of traversal. There's no sensible way that can be made to work with a Qt item model. You'd be much better off just using QList and ensuring the values are unique yourself.

You should probably use QTreeView , and subclass QAbstractItemModel this way you can use any data source you want
see this question : Creating Qt models for tree views

Selection appropriate STL container for logging Data

I require logging and filtering mechanism in my client server application.where client may request log data based on certain parameter.
log will have MACID,date and time,command type and direction as field.
server can filter log data based on these parameter as well.
size of the the log is 10 mb afterwards the log will be override the message from beginning.
My approach is I will log data in to file as well in the STL container as "in memory" so that when the client request data server will filter the log data based on any criteria
So the process is server will first do the sorting on particular criteria on vector<> and then filter it using binary search.
I am planning to use vector as STL container for in memory logging data.
I am bit confused whether vector will appropriate on this situation or not.
since size of the data can max upto 10 mb in vector.
my question whether vector is fare enough for this case or not ?

I'd go with a deque, double ended queue. It's like a vector but you can add/remove elements from both ends.

I would first state that I would use a logging library since there are many and I assure you they will do a better job (log4cxx for ex). If you insist on doing this your yourself A vector is an appropriate mechanism but you will have to manually sort the data biased upon user requests. One other idea is to use sqllite and let it manage storing sorting and filtering your data.

The actual response will depend a lot on the usage pattern and interface. If you are using a graphical UI, then chances are that there is already a widget that implements that feature to some extent (ability to sort by different columns, and even filter). If you really want to implement this outside of the UI, then it will depend on the usage pattern, will the user want a particular view more than others? does she need only filtering, or also sorting?
If there is one view of the data that will be used in most cases, and you only need to show a different order a few times, I would keep an std::vector or std::deque of the elements, and filter out with remove_copy_if when needed. If a different sort is required, I would copy and sort the copy, to avoid having to re-sort back to time based to continue adding elements to the log. Beware if you the application keeps pushing data that you will need to update the copy with the new elements in place (or provide a fixed view and rerun the operation periodically).
If there is no particular view that occurs much more often than the rest, of if you don't want to go through the pain of implementing the above, take a look a boost multi index containers. They keep synchronized views of the same data with different criteria. That will probably be the most efficient in this last case, and even if it might be less efficient in the general case of a dominating view, it might make things simpler, so it could still be worth it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js