removing elements from a vector with O(1) runtime - c++

"Write a function which takes as an input an object of vector type
removes an element at the rank k in the constant time, O(1) [constant]. Assume that the order of elements does not matter."
I thought I might have had an idea about this. But, as I started to try by using .erase(), I looked up what the big-O notation was and found out it was O(n),as in linear relation. I can't think of any other way at the moment. I don't want any code, but I think pseudo code will at least point me in the right direction if anyone can help

Assume that the order of elements does not matter.
This is what you need to pay attention to.
Suppose you have a vector
0 1 2 3 4 5 6
and you want to remove the 3. You can turn this into
0 1 2 6 4 5
in O(1) without any issues.

Actually, there is a way to do it. Here is the pseudocode:
If the element you are trying to remove is the last element in the vector, remove it, done.
Read the last element of the vector and write it over the element-to-be-removed.
Remove the last element of the vector.

You can swap and pop_back in constant time.
std::swap(vec.back(), vec[rank]);
vec.pop_back();

Related

Fastest way for getting last index of item in vector in C++? [duplicate]

This question already has answers here:
C++ How to find position of last occurrence of element in vector
(7 answers)
Closed 11 months ago.
Let's say you have a vector of ints, unsorted and with multiple repeating items, like so:
vector<int> myVec{1, 0, 0, 0, 1, 1, 0, 1,0,0}
What is the fastest way to get the last index of 1 which is 8 for this example, other than looping through it from its end?
Would this be different if the vector would contain other items than 0 and 1?
What is the fastest way to do this in C++?
L.E. I have seen the duplicate topic suggestions but even it dolves partially what I am looking for this has nothing to do with the minimum element in vector so I keep the question maybe it will help someone else too.
Depends on if you are stuck with vector<int>. If you could store the bits with bitset or unsigned int, then you can find the right most set bit through bitwise operations: Efficient bitwise operations for counting bits or find the right|left most ones
The only faster way i can think of would be to save the last index as you populate the vector... It would add extra time to insertion but it would be faster to access.
If that is acceptable for your use case you might also want to consider the number of unique values in your vector, in your example this is feasible, if most values are unique you would quickly increase your memory usage.
You might want to inherit std::vector and implement your own insert as well as constructor if you want to go this way.
Use std::max_element and reverse iterators. And that is looping through the vector. If it is unsorted, there is no faster way.

Optimal data structure (in C++) for random access and looping through elements

I have the following problem: I have a set of N elements (N being somewhere between several hundred and several thousand elements, let's say between 500 and 3000 elements). Out of these elements, small percentage will have some property "X", but the elements "gain" and "lose" this property in a semi-random fashion; so if I store them all in an array, and assign 1 to elements with property X, and zero otherwise, this array of N elements will have n 1's and the N-n zeros (n being small in the 20-50 range).
The problem is the following: these elements change very frequently in a semi-random way (meaning that any element can flip from 0 to 1 and vice versa, but the process that controls that is somewhat stable, so the total number "n" fluctuates a bit, but is reasonably stable in the 20-50 range); and I frequently need all the "X" elements of the set (in other words, indices of the array where value of the array is 1), to perform some task on them.
One simple and slow way to achieve this is to simply loop through the array and if index k has value 1, perform the task, but this is kinda slow because well over 95% of all the elements have value 1. The solution would be to put all the 1s into a different structure (with n elements) and then loop through that structure, instead of looping through all N elements. The question is what's the best structure to use?
Elements will flip from 0 to 1 and vice versa randomly (from several different threads), so there's no order there of any sort (time when element flipped from 0 to 1 is has nothing to do with time it will flip back), and when I loop through them (from another thread), I do not need to loop in any particular order (in other words, I just need to get them all, but it's nor relevant in which order).
Any suggestions what would be the optimal structure for this? "std::map" comes to mind, but since the keys of std::map are sorted (and I don't need that feature), the questions is if there is anything faster?
EDIT: To clarify, the array example is just one (slow) way to solve the problem. The essence of the problem is that out of one big set "S" with "N" elements, there is a continuously changing subset "s" of "n" elements (with n much smaller then N), and I need to loop though that set "s". Speed is of essence, both for adding/removing elements to "s", and for looping through them. So while suggestions like having 2 arrays and moving elements between them would be fast from iteration perspective, adding and removing elements to an array would be prohibitively slow. It sounds like some hash-based approach like std::set would work reasonably fast on both iteration and addition/removal fronts, the question is is there something better than that? Reading the documentation on "unordered_map" and "unordered_set" doesn't really clarify how much faster addition/removal of elements is relative to std::map and std::set, nor how much slower the iteration through them would be. Another thing to keep in mind is that I don't need a generic solution that works best in all cases, I need one that works best when N is in the 500-3000 range, and n is in the 20-50 range. Finally, the speed is really of essence; there are plenty slow ways of doing it, so I'm looking for the fastest way.
Since order doesn't appear to be important, you can use a single array and keep the elements with property X at the front. You will also need an index or iterator to the point in the array that is the transition from X set to unset.
To set X, increment the index/iterator and swap that element with the one you want to change.
To unset X, do the opposite: decrement the index/iterator and swap that element with the one you want to change.
Naturally with multiple threads you will need some sort of mutex to protect the array and index.
Edit: to keep a half-open range as iterators are normally used, you should reverse the order of the operations above: swap, then increment/decrement. If you keep an index instead of an iterator then the index does double duty as the count of the number of X.
N=3000 isn't really much. If you use a single bit for each of them, you have a structure smaller than 400 bytes. You can use std::bitset for that. If you use an unordered_set or a set however be mindful that you'll spend many more bytes for each of the n elements in your list: if you just allocate a pointer for each element in a 64bit architecture you'll use at least 8*50 = 400 bytes, much more than the bitset
#geza : perhaps I misunderstood what you meant by two arrays; I assume you meant something like have one std::vector (or something similar) in which I store all elements with property X, and another where I store the rest? In reality, I don't care about others, so I really need one array. Adding an element is obviously simple if I can just add it to the end of the array; now, correct me if I'm wrong here, but finding an element in that array is O(n) operation (since the array is unsorted), and then removing it from the array again requires shifting all the elements by one place, so this in average requires n/2 operations. If I use linked list instead of vector, then deleting an element is faster, but finding it still takes O(n). That's what I meant when I said it would be prohibitively slow; if I misunderstood you, please do clarify.
It sounds like std::unordered_set or std::unordered_map would be fastest in adding/deleting elements, since it's O(1) to find an element, but it's unclear to me how fast can one loop through all the keys; the documentation clearly states that iteration through keys of std::unordered_map is slower then iteration through keys of std::map, but it's not quantified in any way just how slow is "slower", and how fast is "faster".
And finally, to repeat one more time, I'm not interested in general solution, I'm interested in one for small "n". So if for example I have two solutions, one that's k_1*log(n), and second that's k_2*n^2, first one might be faster in principle (and for large n), but if k_1 >> k_2 (let's say for example k_1 = 1000 and k_2=2 and n=20), second one can still be faster for relatively small "n" (1000*log(20) is still larger than 2*20^2). So even if addition/deletion in std::unordered_map might be done in constant time O(1), for small "n" it still matters if that constant time is 1 nanosecond or 1 microsecond or 1 millisecond. So I'm really looking for suggestions that work best for small "n", not for in the asymptotic limit of large "n".
An alternative approach (in my opinion worth only if the number of element is increased at least tenfold) might be keeping a double index:
#include<algorithm>
#include<vector>
class didx {
// v == indexes[i] && v > 0 <==> flagged[v-1] == i
std::vector<ptrdiff_t> indexes;
std::vector<ptrdiff_t> flagged;
public:
didx(size_t size) : indexes(size) {}
// loop through flagged items using iterators
auto begin() { return flagged.begin(); }
auto end() { return flagged.end(); }
void flag(ptrdiff_t index) {
if(!isflagged(index)) {
flagged.push_back(index);
indexes[index] = flagged.size();
}
}
void unflag(ptrdiff_t index) {
if(isflagged(index)) {
// swap last item with item to be removed in "flagged", update indexes accordingly
// in "flagged" we swap last element with element at index to be removed
auto idx = indexes[index]-1;
auto last_element = flagged.back();
std::swap(flagged.back(),flagged[idx]);
std::swap(indexes[index],indexes[last_element]);
// remove the element, which is now last in "flagged"
flagged.pop_back();
indexes[index] = 0;
}
}
bool isflagged(ptrdiff_t index) {
return indexes[index] > 0;
}
};

Binary Search an object in a sorted vector with some meaningless entries as interference

I have a vector vector<int> A, where all non-zero values have been well sorted. However, there are also some 0s stored inside which we treat it as meaningless. There are no duplicates of non-zero elements in the vector. For example, A is some kind of vector like [0,0,5,0,6,9,10,21,0,40,0]. Now given a number x (x!=0), we should find the first position in this vector such that the value here >= x. For example, if x == 10, the return value should be 6 since A[6] == 10; if x == 23 the return value should be 9 since A[9] == 40 and it's the first element larger than 10.
I know how to write my own code to implement a binary search considering the meaningless 0. However I'm thinking how to use STL existing functions to implement it. A straight-forward idea is to extract all non-zeros elements to another vector<int> B, recording each element's original position in A, apply std::lower_bound to B and finally get the position in A. However since the extracting step will be of complexity O(N), this idea will be meaningless though works.
Could anyone help me with another idea with STL functions? The complexity needs to be no larger than log(N).
Note: some people said if the vector A contains all zeros, we have no way to do it in log(N). Please consider it as a worst case. What I mean by log(N) is, if A contains no zeros, it should be log(N); Of course, with increasing number of zeros, the efficiency must be decreasing, yet this should be required.
You can't do this in O(logn). Consider the case where all the elements in the vector are zero. You can in no way find an algorithm that will do this in less than O(n) time.

Listing specific subsets using STL

Say I have a range of number, say {2,3,4,5}, stored in this order in a std::vector v, and that I want to list all possibles subsets which end with 5 using STL... that is :
2 3 4 5
2 3 5
2 4 5
3 4 5
2 5
3 5
4 5
5
( I hope i don't forget any:) )
I tried using while(next_permutation(v.begin(),v.end())) but didn't come up with the wanted result :)
Does anyone have an idea?
PS : those who have done the archives of google code jam 2010 may recognize this :)
Let's focus on the problem of printing all subsets. As you know, if you have vector of n elements, you'll have 2^n possible subsets. It's not coincidence, that if you have n-bit integer, the maximal stored value is 2^n. If you consider each integer as a vector of bits, then iterating over all possible values will give all possible subsets of bits. Well, we have subsets for free by iterating integer!
Assuming vector has not more than 32 elements (over 4 billion possible subsets!), this piece of code will print all subset of vector v (excluding empty one):
for (uint32_t mask =1; mask < (1<<v.size()); ++mask)
{
std::vector<int>::const_iterator it = v.begin();
for (uint32_t m =mask; m; (m>>=1), ++it)
{
if (m&1) std::cout << *it << " ";
}
std::cout << std::endl;
}
I just create all possible bit masks for size of vector, and iterate through every bit; if it's set, I print appropriate element.
Now applying the rule of ending with some specific number is piece of cake (by checking additional condition while looping through masks). Preferably, if there is only one 5 in your vector, you could swap it to the end and print all subsets of vector without last element.
I'm effectively using std::vector, const_iterator and std::cout, so you might think about it as being solved using STL. If I come up with something more STLish, I'll let you know (well, but how, it's just iterating). You can use this function as a benchmark for your STL solutions though ;-)
EDIT: As pointed out by Jørgen Fogh, it doesn't solve your subset blues if you want to operate on large vectors. Actually, if you would like to print all subsets for 32 elements it would generate terabytes of data. You could use 64-bit integer if you feel limited by constant 32, but you wouldn't even end iterating through all the numbers. If your problem is just answering how many are desired subsets, you definitely need another approach. And STL won't be much helpful also ;-)
As you can use any container I would use std::set because it is next to what we want to represent.
Now your task is to find all subsets ending with 5 so we take our initial set and remove 5 from it.
Now we want to have all subsets of this new set and append 5 to them at the end.
void subsets(std::set<std::set<int>> &sets, std::set<int> initial)
{
if(initial.empty())
return;
sets.insert(initial);//save the current set in the set of sets
std::set<int>::iterator i = initial.begin();
for(; i != initial.end(); i++)//for each item in the set
{
std::set<int> new_set(initial);//copy the set
new_set.erase(new_set.find(*i));//remove the current item
subsets(sets, new_set);//recursion ...
}
}
sets is a set that contains all subsets you want.
initial is the set that you want to have the subsets of.
Finally call this with subsets(all_subsets, initial_list_without_5);
This should create the subsets and finally you can append 5 to all of them. Btw don't forget the empty set :)
Also note that creating and erasing all these sets is not very efficient. If you want it faster the final set should get pointers to sets and new_set should be allocated dynamically...
tomasz describes a solution which will be workable as long as n<=32 although it will be take a very long time to print 2^32 different subsets. Since the bounds for the large dataset are 2 <= n <= 500 generating all the subsets is definitely not the way to go. You need to come up with some clever way to avoid having to generate them. In fact, this is the whole point of the problem.
You can probably find solutions by googling the problem if you want. My hint is that you need to look at the structure of the sets and avoid generating them at all. You should only calculate how many there are.
use permutation to create a vector of vectors. Then use std::partition with a function to sort it into the vectors that end with 5 and those that don't.

Efficient way to sort a concatenation of lists (STL), merge sort hint, partially sorted

I have a situation where I get a list of values that are already partially sorted. There are N blocks in my final list, each block is sorted. So I end up having a list of data like this (slashes are just for emphasis):
1 2 3 4 5 6 7 8 / 1 2 3 4 5 / 2 3 4 5 6 7 8 9 / 1 2 3 4
I have these in a vector as a series of pointers to the objects. Currently I just use std::sort with a custom comparator to the sorting. I would guess this is sub-optimal as my sequence is some degenerate case.
Are there any other stl functions, hints, or otherwise that I could use to provide an optimal sort of such data? (Boost libraries are also fine).
Though I can't easily break up the input data I certainly can determine where the sub-sequences start.
You could try std::merge, although this algorithm can only merge two sorted collections at a time, so you would have to call it in a loop. Also note that std::list provides merge as a member function.
EDIT Actually std::inplace_merge might be an even better candidate.
This calls for a “multiway merge”. The standard library doesn’t have an appropriate algorithm for that. However, the parallel extension of the GCC standard library does:
__gnu_parallel::multiway_merge.
you can iterate on all of the lists at once, keeping and index for each list. and comparing only items in that index.
this can be significantly faster than regular sort : O(n) vs O(n*log(n)) where n is the number of items in all the lists.
see the wikipedia article.
C++ has std::merge for it, but it will not handle multiple lists at once so you may want to craft your own version which does.
If you can spare the memory, mergesort will perform very well for this. For best results, merge the smallest two chains at a time, until you only have one.