STL What is Random access and Sequential access? - c++

So I am curious to know, what is random access?
I searched a little bit, and couldn't find much. The understanding I have now is that the "blocks" in the container are placed randomly (as seen here). Random access then means I can access every block of the container no matter what position (so I can read what it says on position 5 without going through all blocks before that), while with sequential access, I have to go through 1st , 2nd, 3rd and 4th to get to the 5th block.
Am I right? Or if not, then can someone explain to me what random access is and sequential access is?

Sequential access means the cost of accessing the 5th element is 5 times the cost of accessing the first element, or at least that there is an increasing cost associated with an elements position in the set. This is because to access the 5th element of the set, you must first perform an operation to find the 1st, 2nd, 3rd, and 4th elements, so accessing the 5th element requires 5 operations.
Random access means that accessing any element in the set has the same cost as any other element in the set. Finding the 5th element of a set is still only a single operation.
So accessing a random element in a random access data-structure is going to have O(1) cost whereas accessing a random element in a sequential data-structure is going to have a O(n/2) -> O(n) cost. The n/2 comes from that if want to access a random element in a set 100 times, the average position of that element is going to be about halfway through the set. So for a set of n elements, that comes out to n/2 (Which in big O notation can just be approximated to n).
Something you might find cool:
Hashmaps are an example of a data structure which implements random access. A cool thing to note is that on hash collisions in a hash map, the two collided elements are stored in a sequential linked list in that bucket on the hash map. So that means that if you have 100% collisions for a hash map you actually end up with sequential storage.
Here's an image of a hashmap illustrating what I'm describing:
This means the worst case scenario for a hash map is actually O(n) for accessing an element, the same as average case for sequential storage, or put more correctly, finding an element in a hashmap is Ω(n), O(1), and Θ(1). Where Ω is worst case, Θ is best case, and O is average case.
So:
Sequential access: Finding a random element in a set of n elements is Ω(n), O(n/2), and Θ(1) which for very large numbers becomes Ω(n), O(n), and Θ(1).
Random access: Finding a random element in a set of n elements is Ω(n/2), O(1), and Θ(1) which for very large numbers becomes Ω(n), O(1), and Θ(1)
So random access has the benefit of giving better performance for accessing elements, however sequential storage data structures provide benefits in other areas.
Second Edit For #sumsar1812:
I want to preface with this is how I understand the advantages/use cases of sequential storage, but I am not as certain about my understanding of benefits of sequential containers as I am about my answer above. So please correct me anywhere I am mistaken.
Sequential storage is useful because the data will actually be stored sequentially in memory.
You can actually access the next member of a sequentially stored data set by offsetting a pointer to the previous element of that set by the amount of bytes it takes to store a single element of that type.
So since a signed int requires 8 bytes to store, if you have a fixed array of integers with a pointer pointing to the first integer:
int someInts[5];
someInts[1] = 5;
someInts is a pointer pointing to the first element of that array. Adding 1 to that pointer just offsets where it points to in memory by 8 bytes.
(someInts+1)* //returns 5
This means if you need to access every element in your data structure in that specific order its going to be much faster since each lookup for sequential storage is just adding a constant value to the pointer.
For random access storage, there is no guarantee that each element is stored even near the other elements. This means each lookup will be more expensive that just adding a constant amount.
Random access storage containers can still simulate what appears to be an ordered list of elements by using an iterator. However, as long as you allow random access lookups for elements there will be no guarantee that those elements are stored sequentially in memory. This means that even though a container can exhibit behavior of both a random access container and a sequential container, it will not exhibit the benefits of a sequential container.
So if the order of the elements in your container is supposed to be meaningful, or you plan on iterating and operating on every element in a data set then you might benefit from a sequential container.
In truth it still gets a little more complicated because a linked list, which is a sequential container doesn't actually store sequentially in memory whereas a vector, another sequential container, does. Here's a good article that explains use cases for each specific container better than I can.

There are two main aspects to this, and it's unclear which of the two is more relevant to your question. One of those aspects is accessing the content of an STL container via iterators, where those iterators allow either random access or forward (sequential) access. The other aspect is that of accessing a container or even just memory itself in random or sequential orders.
Iterators - Random Access vs. Sequential Access
To start with iterators, take two examples: std::vector<T> and std::list<T>. A vector stores an array of values, whereas a list stores a linked list of values. The former is stored sequentially in memory, and this allows arbitrary random access: calculating the location of any element is just as fast as calculating the location of the next element. Thus the sequential storage gives you efficient random access, and the iterator is a random access iterator.
By contrast, a list performs a separate allocation for each node, and each node only knows where its neighbors are. Thus calculating the location of a random non-neighbor node cannot be done directly. Any attempt to do so must traverse all the intermediate nodes, and thus algorithms that attempt to skip nodes may perform badly. The non-sequential storage yields randomized locations and thus only efficient sequential access. Thus the iterator that list provides is a bidirectional iterator, one of a few different sequential iterators.
Memory - Random Access vs. Sequential Access
However there's another wrinkle in your question. The iterator parts only address the traversal of the container. Underneath that, however, the CPU will be accessing memory itself in a particular pattern. While at a high level the CPU is capable of addressing any random address with no overhead of calculating where it is (it's like a big vector), in practice reading memory involves caching and lots of subtleties that make accessing different parts of memory take different amounts of time.
For example, once you start working with a rather large data set, even if you're working with a vector, it's more efficient to access all elements in sequential order than to access all elements in some random order. By contrast a list doesn't make this possible. Since the nodes of a list aren't even necessarily located sequential memory locations, even a sequential access of the list's items may not read memory sequentially, and can be more expensive because of this.

The terms themselves don't imply any performance characteristics as #echochamber says. The terms only refer to a method of access.
"Random Access" refers to accessing elements in a container in an arbitrary order. std::vector is an example of a C++ container that performs great for random access. std::stack is an example of a C++ container that doesn't even allow random access.
"Sequential Access" refers to accessing elements in order. This is only relevant for ordered containers. Some containers are optimized better for sequential access than random access, for example std::list.
Here's some code to show the difference:
// random access. picking elements out, regardless of ordering or sequencing.
// The important factor is that we are selecting items by some kind of
// index.
auto a = some_container[25];
auto b = some_container[1];
auto c = some_container["hi"];
// sequential access. Here, there is no need for an index.
// This implies that the container has a concept of ordering, where an
// element has neighbors
for(container_type::iterator it = some_container.begin();
it != some_container.end();
++ it)
{
auto d = *it;
}

Related

What container to choose for fast search/insert with huge amounts of data?

So it's a thought experiment. I want to have a huge collection of structures such as:
struct
{
KeyType key;
ValueType value;
}
And I need fast access by a key and fast insertion of new values.
I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
What if I use deque? As I know it has O(1) for push_back/push_front, but still O(n) for inserting (as it would have to move data anyway, less data though).
The questions are:
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
2) What happens when you insert a value to Deque and the bucket it should go into is full?
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Thanks!
I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
That depends on the size of your structs... the bigger they are the less the overheads are as a proportion of the overall memory use. For example, a std::map implementation might average say 20 bytes of housekeeping data per element (I just made that up - measure on your own system), so if your struct size is in the hundreds of bytes - who cares...? But, if the struct holds 2 ints, it's a big proportion....
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
Totally unsuitable....
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
As deque is likely implemented as a vector of fixed-sized arrays, insertion implies a shuffling of all elements towards the nearest end of the container. The shuffling's probably a tiny bit less cache efficient, but if inserting nearer the front of the container it would likely still end up faster.
2) What happens when you insert a value to Deque and the bucket it should go into is full?
As above, it'll need to shuffle, overflowing either:
the last element to become the first element of the next "bucket", moving all those elements along and overflowing into the next bucket, etc.
the first element to become the last element of the previous bucket, moving all those elements along and overflowing into the next bucket, etc.
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
unordered_map, which is implemented as a hash map. If you have small objects (e.g. less than 20 or 30 bytes) or a firm cap on the number of elements, you can normally easily outperform unordered_map with custom code, but it's rarely worth the effort unless the table access dominates you application's performance, and that performance is critical.
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Consider using std::unordered_map, which is an implementation of a hash map. Insertion, lookup, and removal are all O(1) in the average case. This assumes that you will only ever look for an item based on its exact key; if your searches can have different constraints then you either need a different structure, or you need multiple maps to map the various keys you will search for to the corresponding object.
This requires that there is an available hash function for KeyType, either as part of the standard library or provided by you.
There's no container which would provide the best of all the worlds to you. Like you are saying you want best lookup/insertion with minimum amount of space needed for storing elements.
Below if the list of containers which you could consider for your implementation:-
VECTOR :-
Strengths:-
1) Space is allocated only for holding data.
2) Good for random access.
3) Container of choice if insertions/deletions are not in the middle of the container.
Weakness:-
1) poor performance if insertions/deletions are at the middle.
2) rellocations happen if reserve is not used properly.
DEQUE:-
Choose deque over vector in case insertions/deletions are at the beginning as well as end of the container.
MAP:-
Disadvantage over vector:-
1) more space is allocated for holding pointers.
Advantages over vector:-
1) better insertions/deletions/lookup as compared to vector.
If std::unordered_map is used then these dictionary operations would be amortized O(1).
Firstly, in order to directly answer your questions:
1) Is O(n) of inserting data in deque much faster in real situation
than O(n) in vector?
The number of elements that have to be moved is (on average) only half compared to vector. However, it can actually perform worse as the data is stored in non-contiguous memory, so copying/moving the same number of elements is much less efficient (it cannot e.g. be implemented in terms of a single memcopy operation).
2) What happens when you insert a value to Deque and the bucket it
should go into is full?
At least for the gnu gcc Libstdc++ implementation, every bucket except the first and last one is always full. I believe, that inserting in the middle means that all elements are moved/copied one slot to the closer end (front or back) and the effect ripples through all buckets until the first or last one is reached.
In summary, the only scenario, where std::deque is consistently better than vector is if you use it as (suprise) a queue (only inserting and removing elements from the front or end) and that's what the implementation is optimized for. It is not optimized for insertions in the middle.
3) Is there another preferable type of container in case you need to
store lots of data and need two fast operations: search and insertion?
As already stated by others: A hash table like std::unordered_map is the data structure you are looking for.
From what I've heard however, std::unordered_map is a slightly suboptimal implementation if it, as it uses buckets in order to resolve hash collisions and those buckets are implemented as linked lists (here is a very interesting talk from Chandler Carruth on the general topic of the performance of different data structures). For random access on big data structures, cache locality should matter a lot less, so this is probably not such a big issue in your case.
Finally I'd like to mention that if your value and key types are small PODs and depending on how big your huge collection is (are we talking about some million or rather billions of elements) and how often you actually have to insert/remove elements, there might still be cases, where a simple std::vector outperforms any other STL container. So as always: if your thought experiment ever becomes reality try out and measure.

Is it possible to directly access a position (not a key) in a multimap

Question: How do you access a value at a specified position in a key range without loops?
The only possible way I know to acquire this data is by incrementing the iterator however many times the position is from the beginning or end of a key range.
edit The reason I am reluctant in using loops is to reduce processing time by getting the wanted value when the values position in an index is known.
As properly stated in the comments, you basically cannot do this on multimap. Or on map. Or on any container that does not support random access. The simple answer to the question "Why I cannot do this?" is "Because it is not in the interface".
The longer answer requires a minimal understanding of the implementation of different containers. Elements of, say, vector are stored in the memory consequently. Knowing address of the i-th element, you can add k and acquire the address of i+k-th element.
Maps (and multimaps) are different. To allow an efficient way of associative access they use some kind of trees as an underlying data structure. The simplest is the binary search tree. And all nodes of the tree are allocated in the heap. You don't know where before you actually access them. But you can access a node only through other nodes.
What you can do is to go through all elements and store their addresses in a vector, so they can now be accessed "randomly". However, this vector is invalidated once a new element is added or an element is removed from the map. There is no magic data structure which allows you both effective associative and random access.

Appropriate data structure for returning all subsets of a set

I understand the algorithm for doing this but I don't know what data structure (array, linked list, vector, other?) would be best for returning the final set of sets since every example I see just asks to print the sets.
Can someone explain the thought process for deciding between the 3 data structures I mentioned?
Also, are vectors even used anymore? I heard they were obsolete but still see many recent examples.
To be clear, I mean ALL subsets, so they have different sizes.
The decision of which data structure to use depends on:
Type of data to be stored
Operations that you intend to perform on the data
A normal array, would give you contiguous block of memory and random access to the elements, however you need to know the exact number of elements before hand so that you can allocate an array of appropriate size.
With std::vector you get random access to the data, and contiguous just like arrays but vector is a dynamic array,it grows as you add new elements, and a amortized constant complexity, however insertion/deletion of elements is faster only at the ends since all elements need to be moved.
With std::list you don't get the random access but insertion and deletion of elements is faster because it involves just moving around of the pointer links.
Also, are vectors even used anymore?
That is not true at all.
They are very much in use and one of the most widely used data structures provided by the Standard Library.
Once i used bit fields to determine the subsets. Where if the i th bit is 1 then the i the element in the set is selected in the subset and 0 otherwise. In this case you need to store the ordering of the elements. The same can be done with bool vectors i think.

Does binary search have logarithmic performance of deque C++ data structure?

The standard says that std::binary_search(...) and the two related functions std::lower_bound(...) and std::upper_bound(...) are O(log n) if the data structure has random access. So, given that, I presume that these algorithms have O(log n) performance on std::deque (assuming its contents are kept sorted by the user).
However, it seems that the internal representation of std::deque is tricky (it's broken into chunks), so I was wondering: does the requirement of O(log n) search hold for std::deque.
Yes it still holds for deque because the container is required to provide access to any element in constant time (just a slightly higher constant than vector).
That doesn't relieve you of the obligation for the deque to be sorted though.
Yes. deque has constant time access for an index if it is there. It is organized in pages of an equal size. What you have is smth. like a vector of pointers to pages. If you have let's say 2 pages with 100 elements and you would like to access 103rd element than determine the page by 103/100 = 1 and determine the index in the page by 103 %100 => 3. Now use a constant time access in two vectors to get the element.
Here the statement from http://www.cplusplus.com/reference/stl/deque/:
Deques may be implemented by specific libraries in different ways, but in all cases they allow for the individual elements to be accessed through random access iterators, with storage always handled automatically (expanding and contracting as needed).
Just write a program to test that, I think deque's implementation of binary_search will slower than vector's, but it's complexity is still O(logn)

array vs vector vs list

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?
Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.
Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.
It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.
Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.
You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.
First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.
I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}
I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)