Issue with CUDA array compaction using thrust zip_iterator [duplicate]

Issue with CUDA array compaction using thrust zip_iterator [duplicate] - c++

I have two arrays of integers dmap and dflag on the device of
the same length
and I have wrapped them with thrust device pointers, dmapt and
dflagt
There are some elements in the dmap array with value -1. I want to
remove these -1's and the corresponding values from
the dflag array.
I am using the remove_if function to do this, but I cannot figure out
what the return value of this call is or how I should use this
returned value to get .
( I want to pass these reduced arrays to the reduce_by_key function
where dflagt will be used as the keys. )
I am using the following call for doing the reduction. Please let me
know how I can store the returned value in a variable and
use it to address the individual arrays dflag and dmap
thrust::remove_if(
thrust::make_zip_iterator(thrust::make_tuple(dmapt, dflagt)),
thrust::make_zip_iterator(thrust::make_tuple(dmapt+numindices, dflagt+numindices)),
minus_one_equality_test()
);
where the predicate functor used above is defined as
struct minus_one_equality_test
{
typedef typename thrust::tuple<int,int> Tuple;
__host__ __device__
bool operator()(const Tuple& a )
{
return thrust::get<0>(a) == (-1);
}
}

The return value is a zip_iterator which marks the new end of the sequence of tuples for which your functor returned true during the remove_if call. To access the new end iterator of the underlying array you will need to retrieve a tuple iterator from the zip_iterator; the contents of that tuple are then the new end iterators of the original arrays you used to build the zip_iterator. It is a lot more convoluted in words than in code:
#include <thrust/tuple.h>
#include <thrust/device_vector.h>
#include <thrust/device_ptr.h>
#include <thrust/remove.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/copy.h>
#include <iostream>
struct minus_one_equality_test
{
typedef thrust::tuple<int,int> Tuple;
__host__ __device__
bool operator()(const Tuple& a )
{
return thrust::get<0>(a) == (-1);
};
};
int main(void)
{
const int numindices = 10;
int mapt[numindices] = { 1, 2, -1, 4, 5, -1, 7, 8, -1, 10 };
int flagt[numindices] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
thrust::device_vector<int> vmapt(10);
thrust::device_vector<int> vflagt(10);
thrust::copy(mapt, mapt+numindices, vmapt.begin());
thrust::copy(flagt, flagt+numindices, vflagt.begin());
thrust::device_ptr<int> dmapt = vmapt.data();
thrust::device_ptr<int> dflagt = vflagt.data();
typedef thrust::device_vector< int >::iterator VIt;
typedef thrust::tuple< VIt, VIt > TupleIt;
typedef thrust::zip_iterator< TupleIt > ZipIt;
ZipIt Zend = thrust::remove_if(
thrust::make_zip_iterator(thrust::make_tuple(dmapt, dflagt)),
thrust::make_zip_iterator(thrust::make_tuple(dmapt+numindices, dflagt+numindices)),
minus_one_equality_test()
);
TupleIt Tend = Zend.get_iterator_tuple();
VIt vmapt_end = thrust::get<0>(Tend);
for(VIt x = vmapt.begin(); x != vmapt_end; x++) {
std::cout << *x << std::endl;
}
return 0;
}
If you compile this and run it, you should see something like this:
$ nvcc -arch=sm_12 remove_if.cu
$ ./a.out
1
2
4
5
7
8
10
In this example I only "retrieve" the shorted contents of the first element of the tuple, the second is accessed in the same way, ie. the iterator marking the new end of the vector is thrust::get<1>(Tend).

Related

How to efficiently merge k sorted pairwise key/value vectors by keys?

I want to merge k sorted pairwise key/value vectors by keys. Typically, the size n of the vectors is very large (e.g., n >= 4,000,000,000).
Consider the following example for k = 2:
// Input
keys_1 = [1, 2, 3, 4], values_1 = [11, 12, 13, 14]
keys_2 = [3, 4, 5, 6], values_2 = [23, 24, 25, 26]
// Output
merged_keys = [1, 2, 3, 3, 4, 4, 5, 6], merged_values = [11, 12, 13, 23, 14, 24, 25, 26]
Since __gnu_parallel::multiway_merge is a highly efficient k-way merge algorithm, I tried to utilize a state-of-the-art zip iterator (https://github.com/dpellegr/ZipIterator) to "combine" the key-value pair vectors.
#include <iostream>
#include <vector>
#include <parallel/algorithm>
#include "ZipIterator.hpp"
int main(int argc, char* argv[]) {
std::vector<int> keys_1 = {1, 2, 3, 4};
std::vector<int> values_1 = {11, 12, 13, 14};
std::vector<int> keys_2 = {3, 4, 5, 6};
std::vector<int> values_2 = {23, 24, 25, 26};
std::vector<int> merged_keys(8);
std::vector<int> merged_values(8);
auto kv_it_1 = Zip(keys_1, values_1);
auto kv_it_2 = Zip(keys_2, values_2);
auto mkv_it = Zip(merged_keys, merged_values);
auto it_pairs = {std::make_pair(kv_it_1.begin(), kv_it_1.end()),
std::make_pair(kv_it_2.begin(), kv_it_2.end())};
__gnu_parallel::multiway_merge(it_pairs.begin(), it_pairs.end(), mkv_it.begin(), 8, std::less<>());
for (size_t i = 0; i < 8; ++i) {
std::cout << merged_keys[i] << ":" << merged_values[i] << (i == 7 ? "\n" : ", ");
}
return 0;
}
However, I get various compilation errors (building with -O3):
error: cannot bind non-const lvalue reference of type' std::__iterator_traits<ZipIter<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator > > >, void>::value_type&' {aka 'std::tuple<int, int>&'} to an rvalue of type' std::tuple<int, int>'
error: cannot convert ‘ZipIter<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator > > >::reference*’ {aka ‘ZipRef<int, int>’} to ‘_ValueType’ {aka ‘std::tuple<int, int>*’}
Is it possible to modify the ZipIterator to make it work?
Or is there a more efficient way of merging k sorted pairwise key/value vectors by keys?
Considered Alternatives
Define a KeyValuePair struct with int key and int value members as well as operator< and operator<= operators. Move the elements of the key/value vectors into std::vector<KeyValuePair>s. Call __gnu_parallel::multiway_merge on the std::vector<KeyValuePair>s. Move the merged elements back into the key/value vectors.
[Verdict: slow execution, high memory overhead, even with -O3]
Use std::merge(std::execution::par_unseq, kv_it_1.begin(), kv_it_1.end(), kv_it_2.begin(), kv_it_2.end(), mkv_it.begin()); instead of __gnu_parallel::multiway_merge.
[Verdict: supports only two key/value vectors]

Is it possible to modify the ZipIterator to make it work?
Yes, but it would require patching __gnu_parallel::multiway_merge. The source of error is this line:
/** #brief Dereference operator.
* #return Referenced element. */
typename std::iterator_traits<_RAIter>::value_type&
operator*() const
{ return *_M_current; }
This is a member function of _GuardedIterator - an auxiliary structure used in the multiway_merge implementation. It wraps _RAIter class which in your case is ZipIter. By definition, when an iterator is dereferenced (*_M_current), the type of the returned expression is supposed to be reference type. However, this code expects it to be value_type&. In most cases, these are the same types. Indeed, when you dereference an item you expect to get a reference to this very item. However, it is impossible to do with a zip iterator, because its elements are virtual, they are created on the fly. That's why reference type of ZipIter is not a reference type at all, it is actually a value type called ZipRef:
using reference = ZipRef<std::remove_reference_t<typename std::iterator_traits<IT>::reference>...>;
Kind of the same practice that is used with (much hated) vector<bool>.
So, there is no problem with ZipIterator, or with how you use the algorithm, it is a non-trivial requirement for the algorithm itself. The next question is, can we get rid of it?
And the answer is yes. You can change _GuardedIterator::operator*() to return reference instead of value_type&. Then you will have a compile error in this line:
// Default value for potentially non-default-constructible types.
_ValueType* __arbitrary_element = 0;
for (_SeqNumber __t = 0; __t < __k; ++__t)
{
if(!__arbitrary_element
&& _GLIBCXX_PARALLEL_LENGTH(__seqs_begin[__t]) > 0)
__arbitrary_element = &(*__seqs_begin[__t].first);
}
Here the address of an element is taken for some __arbitrary_element. We can store a copy of this element instead since we know ZipRef is cheap to copy and it is default-constructible:
// Local copy of the element
_ValueType __arbitrary_element_val;
_ValueType* __arbitrary_element = 0;
for (_SeqNumber __t = 0; __t < __k; ++__t)
{
if(!__arbitrary_element
&& _GLIBCXX_PARALLEL_LENGTH(__seqs_begin[__t]) > 0) {
__arbitrary_element_val = *__seqs_begin[__t].first;
__arbitrary_element = &__arbitrary_element_val;
}
}
The same errors will appear in several places in the file multiseq_selection.h, e.g. here and here. Fix all of them using the similar technique.
Then you will see multiple errors like this one:
./parallel/multiway_merge.h:879:29: error: passing ‘const ZipIter<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > > >’ as ‘this’ argument discards qualifiers [-fpermissive]
They are about const incorrectness. They are due to the fact that you declared it_pairs as auto, which in this particular scenario deduced the type to be std::inializer_list. This is a very peculiar type. For instance, it provides only constant access to its members, even though it itself is not declared const. That's the source of these errors. Change auto to e.g. std::vector and these errors are gone.
It should compile find at this point. Just don't forget to compile with -fopenmp or you will get "undefined reference to `omp_get_thread_num'" error.
Here is the output that I see:
$ ./a.out
1:11, 2:12, 3:13, 3:23, 4:14, 4:24, 5:25, 6:26

Since you need low memory overhead, one possible solution is to have the multiway_merge algorithm only operate on unique range identifiers and range indices and to supply the comparison and copy operators as lambda functions.
That way the merge algorithm is completely independent of the actual container types and key and value types used.
Here is a C++17 solution which is based on the heap based algorithm described here:
#include <cassert>
#include <cstdint>
#include <functional>
#include <initializer_list>
#include <iostream>
#include <iterator>
#include <queue>
#include <vector>
using range_type = std::pair<std::uint32_t,std::size_t>;
void multiway_merge(
std::initializer_list<std::size_t> range_sizes,
std::function<bool(const range_type&, const range_type&)> compare_func,
std::function<void(const range_type&)> copy_func)
{
// lambda compare function for priority queue of ranges
auto queue_less = [&](const range_type& range1, const range_type& range2) {
// reverse comparison order of range1 and range2 here,
// because we require the smallest element to be on top
return compare_func(range2, range1);
};
// create priority queue from all non-empty ranges
std::priority_queue<
range_type, std::vector<range_type>,
decltype(queue_less)> queue{ queue_less };
for (std::uint32_t range_id = 0; range_id < range_sizes.size(); ++range_id) {
if (std::data(range_sizes)[range_id] > 0) {
queue.emplace(range_id, 0);
}
}
// merge ranges until priority queue is empty
while (!queue.empty()) {
range_type top_range = queue.top();
queue.pop();
copy_func(top_range);
if (++top_range.second != std::data(range_sizes)[top_range.first]) {
// re-insert non-empty range
queue.push(top_range);
}
}
}
int main() {
std::vector<int> keys_1 = { 1, 2, 3, 4 };
std::vector<int> values_1 = { 11, 12, 13, 14 };
std::vector<int> keys_2 = { 3, 4, 5, 6, 7 };
std::vector<int> values_2 = { 23, 24, 25, 26, 27 };
std::vector<int> merged_keys;
std::vector<int> merged_values;
multiway_merge(
{ keys_1.size(), keys_2.size() },
[&](const range_type& left, const range_type& right) {
if (left == right) return false;
switch (left.first) {
case 0:
assert(right.first == 1);
return keys_1[left.second] < keys_2[right.second];
case 1:
assert(right.first == 0);
return keys_2[left.second] < keys_1[right.second];
}
return false;
},
[&](const range_type& range) {
switch (range.first) {
case 0:
merged_keys.push_back(keys_1[range.second]);
merged_values.push_back(values_1[range.second]);
break;
case 1:
merged_keys.push_back(keys_2[range.second]);
merged_values.push_back(values_2[range.second]);
break;
}
});
// copy result to stdout
std::cout << "keys: ";
std::copy(
merged_keys.cbegin(), merged_keys.cend(),
std::ostream_iterator<int>(std::cout, " "));
std::cout << "\nvalues: ";
std::copy(
merged_values.cbegin(), merged_values.cend(),
std::ostream_iterator<int>(std::cout, " "));
std::cout << "\n";
}
The algorithm has a time complexity of O(n log(k)) and a space complexity of O(k), where n is the total size of all ranges and k is the number of ranges.
The sizes of all input ranges need to be passed as an initializer list.
The example only passes the two input ranges from your example. Extending the example for more than two ranges is straightforward.

You will have to implement one that fits that exact case you have and with such a large arrays multi threatening may not be that good if you can afford to allocate a full or close to full copy of the arrays, one optimization you can do is to use large pages and ensure that the memory you are accessing is not paged (it is not ideal to not have swap if you plan to run at capacity).
This simple low memory example works just fine, it is hard to beat sequential i/o, the main bottle neck it has is the use of realloc, when displacing the used values from the arrs to the ret multiple reallocs at every step_size are made but only one is expensive, ret.reserve() can consume a "large" amount of time just because shortening a buffer is always available but extending one might not and multiple memory movements might be need to be made by the os.
#include <vector>
#include <chrono>
#include <stdio.h>
template<typename Pair, typename bool REVERSED = true>
std::vector<Pair> multi_merge_lm(std::vector<std::vector<Pair>>& arrs, float step){
size_t final_size = 0, max, i;
for (i = 0; i < arrs.size(); i++){
final_size += arrs[i].size();
}
float original = (float)final_size;
size_t step_size = (size_t)((float)(final_size) * step);
printf("Merge of %zi (%zi bytes) with %zi step size \n",
final_size, sizeof(Pair), step_size
);
printf("Merge operation size %.*f mb + %.*f mb \n",
3, ((float)(sizeof(Pair) * (float)final_size) / 1000000),
3, ((float)(sizeof(Pair) * (float)final_size * step) / 1000000)
);
std::vector<Pair> ret;
while (final_size --> 0){
for (max = 0, i = 0; i < arrs.size(); i++){
// select the next biggest item from all the arrays
if (arrs[i].back().first > arrs[max].back().first){
max = i;
}
}
// This does not actualy resize the vector
// unless the capacity is too small
ret.push_back(arrs[max].back());
arrs[max].pop_back();
// This check could be extracted of the while
// with a unroll and sort to little
for (i = 0; i < arrs.size(); i++){
if (arrs[i].empty()){
arrs[i] = arrs.back();
arrs.pop_back();
break;
}
}
if (ret.size() == ret.capacity()) {
// Remove the used memory from the arrs and
// realloc more to the ret
for (std::vector<Pair>& chunk : arrs){
chunk.shrink_to_fit();
}
ret.reserve(ret.size() + step_size);
// Dont move this to the while loop, it will slow down
// the execution, leave it just for debugging
printf("\rProgress %i%c / Merge size %zi",
(int)((1 - ((float)final_size / original) ) * 100),
'%', ret.size()
);
}
}
printf("\r%*c\r", 40, ' ');
ret.shrink_to_fit();
arrs.clear();
if (REVERSED){
std::reverse(ret.begin(), ret.end());
}
return ret;
}
int main(void) {
typedef std::pair<uint64_t, uint64_t> Pair;
int inc = 1;
int increment = 100000;
int test_size = 40000000;
float step_size = 0.05f;
auto arrs = std::vector<std::vector<Pair>>(5);
for (auto& chunk : arrs){
// makes the arrays big and asymmetric and adds
// some data to check if it works
chunk.resize(test_size + increment * inc++);
for (int i = 0; i < chunk.size(); i++){
chunk[i] = std::make_pair(i, i * -1);
}
}
printf("Generation done \n");
auto start = std::chrono::steady_clock::now();
auto merged = multi_merge_lm<Pair>(arrs, step_size);
auto end = std::chrono::steady_clock::now();
printf("Time taken: %lfs \n",
(std::chrono::duration<double>(end - start)).count()
);
for (size_t i = 1; i < merged.size(); i++){
if (merged[i - 1] > merged[i]){
printf("Miss placed at index: %zi \n", i - 1);
}
}
merged.clear();
return 0;
}
Merge of 201500000 (16 bytes) with 10075000 step size
Merge operation size 3224.000 mb + 161.200 mb
Time taken: 166.197639s
Running this thru a profiler (ANDuProf in my case) shows that the resizing is quite expensive, the larger you make the step_size the more efficient it becomes.
(Names are duplicated because they are from different parts of the code that call the same functions, and in this case, calls that the std functions make)
This rerun is with 0.5x, it is ~2x faster but now the function consumes 10x more memory than before and you should keep in mind that this values are not generic, they might change depending on what hardware you are running but the proportion are not going to change that much.
Merge of 201500000 (16 bytes) with 100750000 step size
Merge operation size 3224.000 mb + 1612.000 mb
Time taken: 72.062857s
Two other thing you shouldn't forget is that std::vector is dynamic and it's actual size might be bigger and O2 cant really do much optimization to the heap memory access, if you cant make it secuencial then the instruction can only wait.

I barely remember this, but you might find it helpful - I'm pretty sure I have seen merging K sorted linked lists problem. It was using something similar to Divide and Conquer and was close to logarithmic time complexity. I doubt it's any possible to get a better time complexity.
The logic behind this was to minimize iterations over merged lists. If you merge 1st and 2nd lists, then merging it with 3rd involves going through the longer, merged list. This method avoided this by merging all little lists at first, then moving to(what I like to call) '2nd layer merging' by merging 1-time merged lists.
This way, if your lists' length on average is n, you have to do at most logn iterators, resulting in K*log(n) complexity, where K is amount of the lists you have.
Sorry for being a little 'not-so-precise', but I think you might find this piece of information helpful. Although, I'm not familiar with multiway_merge by gnu, so whatever I said might be quite useless too.

How to remove non contiguous elements from a vector in c++

I have a vector std::vector<inputInfo> inputList and another vector std::vector<int> selection.
inputInfo is a struct that has some information stored.
The vector selection corresponds to positions inside inputList vector.
I need to remove elements from inputList which correspond to entries in the selection vector.

Here's my attempt on this removal algorithm.
Assuming the selection vector is sorted and using some (unavoidable ?) pointer arithmetic, this can be done in one line:
template <class T>
inline void erase_selected(std::vector<T>& v, const std::vector<int>& selection)
{
v.resize(std::distance(
v.begin(),
std::stable_partition(v.begin(), v.end(),
[&selection, &v](const T& item) {
return !std::binary_search(
selection.begin(), selection.end(),
static_cast<int>(static_cast<const T*>(&item) - &v[0]));
})));
}
This is based on an idea of Sean Parent (see this C++ Seasoning video) to use std::stable_partition ("stable" keeps elements sorted in the output array) to move all selected items to the end of an array.
The line with pointer arithmetic
static_cast<int>(static_cast<const T*>(&item) - &v[0])
can, in principle, be replaced with STL algorithms and index-free expression
std::distance(std::find(v.begin(), v.end(), item), std::begin(v))
but this way we have to spend O(n) in std::find.
The shortest way to remove non-contiguous elements:
template <class T> void erase_selected(const std::vector<T>& v, const std::vector<int>& selection)
{
std::vector<int> sorted_sel = selection;
std::sort(sorted_sel.begin(), sorted_sel.end());
// 1) Define checker lambda
// 'filter' is called only once for every element,
// all the calls respect the original order of the array
// We manually keep track of the item which is filtered
// and this way we can look this index in 'sorted_sel' array
int itemIndex = 0;
auto filter = [&itemIndex, &sorted_sel](const T& item) {
return !std::binary_search(
sorted_sel.begin(),
sorted_sel.end(),
itemIndex++);
}
// 2) Move all 'not-selected' to the end
auto end_of_selected = std::stable_partition(
v.begin(),
v.end(),
filter);
// 3) Cut off the end of the std::vector
v.resize(std::distance(v.begin(), end_of_selected));
}
Original code & test
If for some reason the code above does not work due to strangely behaving std::stable_partition(), then below is a workaround (wrapping the input array values with selected flags.
I do not assume that inputInfo structure contains the selected flag, so I wrap all the items in the T_withFlag structure which keeps pointers to original items.
#include <algorithm>
#include <iostream>
#include <vector>
template <class T>
std::vector<T> erase_selected(const std::vector<T>& v, const std::vector<int>& selection)
{
std::vector<int> sorted_sel = selection;
std::sort(sorted_sel.begin(), sorted_sel.end());
// Packed (data+flag) array
struct T_withFlag
{
T_withFlag(const T* ref = nullptr, bool sel = false): src(ref), selected(sel) {}
const T* src;
bool selected;
};
std::vector<T_withFlag> v_with_flags;
// should be like
// { {0, true}, {0, true}, {3, false},
// {0, true}, {2, false}, {4, false},
// {5, false}, {0, true}, {7, false} };
// for the input data in main()
v_with_flags.reserve(v.size());
// No "beautiful" way to iterate a vector
// and keep track of element index
// We need the index to check if it is selected
// The check takes O(log(n)), so the loop is O(n * log(n))
int itemIndex = 0;
for (auto& ii: v)
v_with_flags.emplace_back(
T_withFlag(&ii,
std::binary_search(
sorted_sel.begin(),
sorted_sel.end(),
itemIndex++)
));
// I. (The bulk of ) Removal algorithm
// a) Define checker lambda
auto filter = [](const T_withFlag& ii) { return !ii.selected; };
// b) Move every item marked as 'not-selected'
// to the end of an array
auto end_of_selected = std::stable_partition(
v_with_flags.begin(),
v_with_flags.end(),
filter);
// c) Cut off the end of the std::vector
v_with_flags.resize(
std::distance(v_with_flags.begin(), end_of_selected));
// II. Output
std::vector<T> v_out(v_with_flags.size());
std::transform(
// for C++20 you can parallelize this
// with 'std::execution::par' as first parameter
v_with_flags.begin(),
v_with_flags.end(),
v_out.begin(),
[](const T_withFlag& ii) { return *(ii.src); });
return v_out;
}
The test function is
int main()
{
// Obviously, I do not know the structure
// used by the topic starter,
// so I just declare a small structure for a test
// The 'erase_selected' does not assume
// this structure to be 'light-weight'
struct inputInfo
{
int data;
inputInfo(int v = 0): data(v) {}
};
// Source selection indices
std::vector<int> selection { 0, 1, 3, 7 };
// Source data array
std::vector<inputInfo> v{ 0, 0, 3, 0, 2, 4, 5, 0, 7 };
// Output array
auto v_out = erase_selected(v, selection);
for (auto ii : v_out)
std::cout << ii.data << ' ';
std::cout << std::endl;
}

Implementing partition_unique and stable_partition_unique algorithms

I'm looking for a way to partition a set of ordered elements such that all unique elements occur before their respective duplicates, noting that std::unique is not applicable as duplicate elements are overwritten, I thought of using std::partition. Calling this algorithm partition_unique, I also need the corresponding stable_partition_unique (i.e. like stable_partition).
A basic implementation of partition_unique is:
#include <algorithm>
#include <iterator>
#include <unordered_set>
#include <functional>
template <typename BidirIt, typename BinaryPredicate = std::equal_to<void>>
BidirIt partition_unique(BidirIt first, BidirIt last, BinaryPredicate p = BinaryPredicate {})
{
using ValueTp = typename std::iterator_traits<BidirIt>::value_type;
std::unordered_set<ValueTp, std::hash<ValueTp>, BinaryPredicate> seen {};
seen.reserve(std::distance(first, last));
return std::partition(first, last,
[&p, &seen] (const ValueTp& value) {
return seen.insert(value).second;
});
}
Which can be used like:
#include <vector>
#include <iostream>
int main()
{
std::vector<int> vals {1, 1, 2, 4, 5, 5, 5, 7, 7, 9, 10};
const auto it = partition_unique(std::begin(vals), std::end(vals));
std::cout << "Unique values: ";
std::copy(std::begin(vals), it, std::ostream_iterator<int> {std::cout, " "}); // Unique values: 1 10 2 4 5 9 7
std::cout << '\n' << "Duplicate values: ";
std::copy(it, std::end(vals), std::ostream_iterator<int> {std::cout, " "}); // Duplicate values: 7 5 5 1
}
The corresponding stable_partition_unqiue can be achieved by replacing std::partition with std::stable_partition.
The problem with these approaches is that they unnecessarily buffer all unique values in the std::unordered_set (which also adds a hash function requirement), which shouldn't be required as the elements are sorted. It's not too much work to come up with a better implementation for partition_unique, but an implementation of stable_partition_unique seems considerably more difficult, and I'd rather not implement this myself if possible.
Is there a way to use existing algorithms to achieve optimal partition_unique and stable_ partition_unique algorithms?

Create a queue to hold the duplicates. Then, initialize two indexes, src and dest, starting at index 1, and go through the list. If the current item (list[src]) is equal to the previous item (list[dest-1]), then copy it to the queue. Otherwise, copy it to list[dest] and increment dest.
When you've exhausted the list, copy items from the queue to the tail of the original list.
Something like:
Queue dupQueue
int src = 1
int dest = 1
while (src < list.count)
{
if (list[src] == list[dest-1])
{
// it's a duplicate.
dupQueue.push(list[src])
}
else
{
list[dest] = list[src]
++dest
}
++src
}
while (!dupQueue.IsEmpty)
{
list[dest] = dupQueue.pop()
++dest
}
I know the STL has a queue. Whether it has an algorithm similar to the above, I don't know.

Compare dynamically allocated two array in C++

#include <iostream>
using namespace std;
int main()
{
int *array1 = new int [5]();
int *array2 = new int [7]();
array1[2] = 3;// or anychange
array2[2] = 3;// to both arrays
if (array1==array2)
{
//if all values of the both arrays are equal
}
else
{
//if all values of the both arrays are not equal
}
return 0;
}
I have two dynamically allocated array using new (the size may or may not be same). Now I want to compare all elements of array (if size and elements are same, then true, if not either of these then false).
How to do in C++? (not interested using vector in my problem scenario)

First off, I would like to encourage you to use std::vector for dynamically allocated arrays. They will free the allocated memory safely and automatically and you can always retrieve their size without extra manual book-keeping.
Once you have that, you can compare the two arrays in the following way:
#include <vector>
int main()
{
std::vector<int> v1 = { 1, 2, 3 };
std::vector<int> v2 = { 1, 2, 3, 4 };
const bool theyAreEqual = v1 == v2;
}
Comparing two pointers as you did, only compares the addresses of the first elements and not the sizes and the contents of the dynamic arrays elementwise. That's one of the reasons, that it's much safer to use std::vector instead of C-style arrays.

array1 == array2 compares pointers. They will never be equal. Furthermore, you can't know how many elements is in a dynamically allocated array, unless you're:
having its size stored separately
using sentinel value to determine its end - you choose a value (e.g. -1) to represent end of the array (like c-style strings usually use \0)
Then you'll be able to know how many elements to iterate over, comparing the elements of both arrays.

Here is a way to resolve it, but I highly recommend vectors, in cases like this.
You need the length and a bool for checking. check is true for default and arrays should be allocated with length1 and length2.
//...
if (length1 != length2) check = false;
else for (int i = 0; i < length1; i++)
{
if (array1[i] != array2[i])
{
check = false;
break;
}
}
if (check)
//...

I followed up on Ralph comment, because I also wanted to see what std::equal did, and the == operator of std::vector does the right thing, and surprisingly simpler to use than the std::equal operator. If you use the latter, you will need to make sure to user begin()/end() for both arrays (It is a C++14 version of std::equal), or add v1.size() == v2.size() &&...
#include <algorithm>
#include <vector>
int main()
{
std::vector<int> v1 = { 1, 2, 3 };
std::vector<int> v2 = { 1, 2, 3, 4 };
std::vector<int> v3 = { 1, 2, 3 };
const bool theyAreEqualv1v2 = v1 == v2;
const bool theyAreEqualv1v3 = v1 == v3;
const bool theyAreEqualStdv1v2 = std::equal(v1.begin(),v1.end(), v2.begin(),v2.end());
const bool theyAreEqualStdv1v2bad = std::equal(v1.begin(),v1.end(), v2.begin());
const bool theyAreEqualStdv1v3 = std::equal(v1.begin(),v1.end(), v3.begin(),v3.end());
// std::equal according to http://en.cppreference.com/w/cpp/algorithm/equal actually
// only compares the first range thus you would really need begin()/end() for both arrays
printf("equal v1v2: %d\n",theyAreEqualv1v2);
printf("equal v1v3: %d\n",theyAreEqualv1v3);
printf("std::equal v1v2: %d\n",theyAreEqualStdv1v2);
printf("std::equal v1v2 bad: %d\n",theyAreEqualStdv1v2bad);
printf("std::equal v1v3: %d\n",theyAreEqualStdv1v3);
return 0;
}
clang++ -std=c++14 -stdlib=libc++ c.cpp
output:
equal v1v2: 0
equal v1v3: 1
std::equal v1v2: 0
std::equal v1v2 bad: 1
std::equal v1v3: 1

C++ Easiest most efficient way to move a single element to a new position within a vector

Sorry for my potential nOOb'ness but have been trying to get this for hours and cant seem to find an elegant solution for c++ 98.
My question is, say i have a vector of strings { a,b,c,d,e,f } and i want to move 'e' to the 2nd element how would i do so? Obviously the expected output would now print out { a,e,b,c,d,f }
Ideally looking for a single operation that lets me do this just for efficiency reasons but would love to hear some suggestions on how to achieve this.
Thanks.

It's not possible to do this "efficiently" with std::vector<>, because it is stored in contiguous memory and you must therefore move everything between the old and new locations by one element. So it's linear time in the length of the vector (or at least the distance moved).
The naive solution would be to insert() then erase(), but that requires moving everything after the rightmost location you modified, twice! So instead you can do it "by hand", by copying b through d one position to the right (e.g. with std::copy(), then overwriting b. At least then you avoid shifting anything outside the modified range. It looks like you may be able to make std::rotate() do this, as #WhozCraig mentioned in a comment.

I'd try with std::rotate first and only try other manual stuff (or a container other than vector) if that turns out not be efficient enough:
#include <vector>
#include <iostream>
#include <algorithm>
int main()
{
// move 5 from 4th to 1st index
std::vector<int> v {1,2,3,4,5,6};
// position: 0 1 2 3 4 5
std::size_t i_old = 4;
std::size_t i_new = 1;
auto it = v.begin();
std::rotate( it + i_new, it + i_old, it + i_old + 1);
for (int i : v) std::cout << i << ' ';
}
Live demo.

EDIT As noted in the comments, the below code actually mimics std::rotate, which is of course preferred above my hand-rolled code in all cases.
You can accomplish this with K swaps where K is the distance between the elements:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string v = "abcdef"; // use string here so output is trivial
string::size_type insert_index = 1; // at the location of 'b'
string::size_type move_index = 4; // at the location of 'e'
while(move_index > insert_index)
{
std::swap(v[move_index], v[move_index-1]);
--move_index;
}
std::cout << v;
}
Live demo here. Note I used std::string, but the algorithm remains the same for std::vector. The same can be done with iterators, so you can generalize to containers that don't have operator[].

Expanding on jrok's answer, here's a wrapper around std::rotate() for moving a single element around. This is more general than jrok's example, in that it supports moving an element forward in the vector too (rather than only backward).
See the comments within rotate_single() explaining how you have to swap the logic around when moving the element forward versus back.
#include <vector>
#include <stdexcept> // for std::domain_error in range-checking assertion
#include <algorithm> // for std::rotate()
template<class ContiguousContainer>
void assert_valid_idx(ContiguousContainer & v, size_t index)
{
// You probably have a preferred assertion mechanism in your code base...
// This is just a sample.
if(index >= v.size())
{
throw std::domain_error("Invalid index");
}
}
template<class ContiguousContainer>
void rotate_single(ContiguousContainer & v, size_t from_index, size_t to_index)
{
assert_valid_idx(v, from_index);
assert_valid_idx(v, to_index);
const auto from_it = v.begin() + from_index;
const auto to_it = v.begin() + to_index;
if(from_index < to_index)
{
// We're rotating the element toward the back, so we want the new
// front of our range to be the element just after the "from" iterator
// (thereby making our "from" iterator the new end of the range).
std::rotate(from_it, from_it + 1, to_it + 1);
}
else if(to_index < from_index)
{
// We're rotating the element toward the front,
// so we want the new front of the range to be the "from" iterator.
std::rotate(to_it, from_it, from_it + 1);
}
// else the indices were equal, no rotate necessary
}
You can play with this in Compiler Explorer—there are (extensive) unit tests there, but here's an illustrative sample:
TEST_CASE("Handful of elements in the vector")
{
std::vector<int> v{1, 2, 3, 4, 5, 6}; // Note: this gets recreated for each SECTION() below
// position: 0 1 2 3 4 5
SECTION("Interior moves")
{
SECTION("Move 5 from 4th to 1st index")
{
rotate_single(v, 4, 1);
CHECK(v == std::vector<int>{1, 5, 2, 3, 4, 6});
}
SECTION("Move 2 from 1st to 4th index")
{
rotate_single(v, 1, 4);
CHECK(v == std::vector<int>{1, 3, 4, 5, 2, 6});
}
}
SECTION("Swap adjacent")
{
rotate_single(v, 4, 5);
rotate_single(v, 0, 1);
CHECK(v == std::vector<int>{2, 1, 3, 4, 6, 5});
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Issue with CUDA array compaction using thrust zip_iterator [duplicate] - c++

Related

How to efficiently merge k sorted pairwise key/value vectors by keys?

How to remove non contiguous elements from a vector in c++

Implementing partition_unique and stable_partition_unique algorithms

Compare dynamically allocated two array in C++

C++ Easiest most efficient way to move a single element to a new position within a vector

Categories

Resources