generic slicing (views) of multidimensional array in C++20 using ranges

generic slicing (views) of multidimensional array in C++20 using ranges - c++

In Python, accessing a subset of a multidimensional numpy is normally done using the slicing sintax [bx:ex] for a 1D array, [bx:ex,by:ey] for a 2D array and so on and so forth. It is also possible to write a code which is generic such as
def foo(Vin,Vout,lows,highs):
# Vin and Vout are numpys with dimension len(lows)
# and len(lows)=len(highs)
S=tuple(slice(l,h) for l,h in zip(lows,highs))
Vout[S]=Vin[S]
I would like to achieve something similar in C++, where the data is stored in a std::vector and having the same performance (or better) of a bunch of nested for-loops which for a 3D array would look like
for (int k=lz; k<hz; ++k)
for (int j=ly; j<hy; ++j)
for (int i=lx; i<hx; ++i)
Vout[i+nx*(j+ny*k)=Vin[i+nx*(j+ny*k)];
Could this be done using C++20 ranges?
The long term goal is to generate lazily evaluated views of subsets of multidimensional arrays that can be combined together. In other words, being able to fuse loops without creating intermediate arrays.

I am not sure about the performance, but here is one option.
You create a templated struct MD<N,M,L> that takes array dimensions N,M,L and has a static function slice.
slice takes a flat input range and one Slice instance per dimension and returns a corresponding multidimensional range over the elements of the flat input range.
The Slice instances are just structs containing a start index and an optional end index.
You can use deep_flatten from this SO answer to prevent having to use nested for loops over the multidimensional range. Note that the returned range is just an input_range, which does not have a rich interface.
#include <vector>
#include <ranges>
#include <cassert>
#include <iostream>
template <size_t dim>
struct Slice {
// default constructor leaves start at zero and end empty. Correspondes to the whole dimension
constexpr Slice() = default;
// Create a slice with a single index
constexpr Slice(size_t i) : begin(i), end(i+1) {
assert( (0 <= i) && (i < dim));
}
// Create a slice with a start and an end index
constexpr Slice(size_t s, size_t e) : begin(s), end(e+1) {
assert( (0 <= s) && (s <= e) && (e < dim) );
}
size_t begin {0};
size_t end {dim};
};
// An adaptor object to interpret a flat range as a multidimensional array
template <size_t dim, size_t... dims>
struct MD {
constexpr static auto dimensions = std::make_tuple(dim, dims...);
consteval static size_t size(){
if constexpr (sizeof...(dims) > 0) {
return dim*(dims * ...);
}
else {
return dim;
}
}
// returns a multidimensional range over the elements in the flat array
template <typename Rng>
constexpr static auto slice(
Rng&& range,
Slice<dim> const& slice,
Slice<dims> const&... slices
)
{
return slice_impl(range, 0, slice, slices...);
}
template <typename Rng>
constexpr static auto slice_impl(
Rng&& range,
size_t flat_index,
Slice<dim> const& slice,
Slice<dims> const&... slices
)
{
if constexpr (std::ranges::sized_range<Rng>) { assert(std::size(range) >= size()); }
static_assert(sizeof...(slices) == sizeof...(dims), "wrong number of slice arguments.");
if constexpr (sizeof...(slices) == 0)
{
// end recursion at inner most range
return range | std::views::drop(flat_index*dim + slice.begin) | std::views::take(slice.end - slice.begin);
}
else
{
// for every index to be kept in this dimension, recurse to the next dimension and increment the flat_index
return std::views::iota(slice.begin, slice.end) | std::views::transform(
[&range, flat_index, slices...](size_t i){
return MD<dims...>::slice_impl(range, flat_index*dim + i, slices...);
}
);
}
}
// convenience function for the full view
template <typename Rng>
constexpr static auto as_range(Rng&& range){
return slice(range, Slice<dim>{}, Slice<dims>{}...);
}
};
// recursively join a range of ranges
// https://stackoverflow.com/questions/63249315/use-of-auto-before-deduction-of-auto-with-recursive-concept-based-fun
template <typename Rng>
auto flat(Rng&& rng) {
using namespace std::ranges;
auto joined = rng | views::join;
if constexpr (range<range_value_t<decltype(joined)>>) {
return flat(joined);
} else {
return joined;
}
}
int main()
{
static_assert(MD<2,3,2>::size() == 12);
static_assert(std::get<0>(MD<2,3,2>::dimensions) == 2);
static_assert(std::get<1>(MD<2,3,2>::dimensions) == 3);
static_assert(std::get<0>(MD<2,3,2>::dimensions) == 2);
std::vector v = {1,2,3,4,5,6,7,8,9,10,11,12};
// obtain the full view of the data, interpreted as a 2x3x2 array
auto full = MD<2,3,2>::as_range(v);
// print the full view
std::cout << "data interpreted as 2x3x2 array:\n";
for (size_t i=0; i < full.size(); i++) {
std::cout << "index " << i << ":\n";
for (auto const& d3 : full[i]) {
for (auto const& val : d3) {
std::cout << val << " ";
}
std::cout << "\n";
}
}
std::cout << "\n";
auto sliced = MD<2,3,2>::slice(
v,
{}, // 1st dim: take all elements along this dim
{1,2}, // 2nd dim: take indices 1:2
{0} // 3rd dim: take only index 0
);
std::cout << "2x2x1 Slice with indices {{}, {1,2}, {0}} of the 2x3x2 data:\n";
for(size_t i=0; i < 2; ++i){ // index-based loop
for (size_t j=0; j < 2; ++j){
std::cout << sliced[i][j][0] << " ";
}
std::cout << "\n";
}
std::cout << "\n";
for(auto& val : flat(sliced)){
val *= val;
}
// print the whole flat data
std::cout << "\nThe whole data, after squaring all elements in sliced view:\n";
for (auto const& val : v){
std::cout << val << " ";
}
}
Output:
data interpreted as 2x3x2 array:
index 0:
1 2
3 4
5 6
index 1:
7 8
9 10
11 12
2x2x1 Slice with indices {{}, {1,2}, {0}} of the 2x3x2 data:
3 5
9 11
The whole data, after squaring all elements in sliced view:
1 2 9 4 25 6 7 8 81 10 121 12
Live Demo on godbolt compiler explorer
This is a prototype. I am sure the ergonomics can be improved.
Edit
A first quick and dirty benchmark of assigning a 6x6x6 view with another 6x6x6 view out of a 10x10x10:
Quickbench
A nested for loop over the multidimensional range is about 3 times slower than the traditional nested for-loop. Flattening the view using deep_flatten/std::views::join seems to make it 20-30 times slower. Apparently the compiler is having a hard time optimizing here.

Related

C++ EIGEN: How to create triangular matrix map from a vector?

I would like to use data stored into an Eigen (https://eigen.tuxfamily.org) vector
Eigen::Vector<double, 6> vec({1,2,3,4,5,6});
as if they were a triangular matrix
1 2 3
0 4 5
0 0 6
I know how to do it for a full matrix using Eigen's Map
Eigen::Vector<double, 9> vec({1,2,3,4,5,6,7,8,9});
std::cout << Eigen::Map<Eigen::Matrix<double, 3, 3, RowMajor>>(vec.data());
which produces
1 2 3
4 5 6
7 8 9
However I do not know how to make a Map to a triangular matrix.
Is it possible?
Thanks!
[Edited for clarity]

In my opinion this cannot be done using Map only: The implementation of Map as it is relies on stride sizes that remain constant no matter their index positions, see https://eigen.tuxfamily.org/dox/classEigen_1_1Stride.html.
To implement a triangular matrix map you would have to have a Map that changes its inner stride depending on the actual column number. The interfaces in Eigen do not allow that at the moment, see https://eigen.tuxfamily.org/dox/Map_8h_source.html.
But if you are just concerned about the extra memory you can just use Eigen's sparse matrix representation:
https://eigen.tuxfamily.org/dox/group__TutorialSparse.html
(Refer to section "Filling a sparse matrix".)

This is not a direct solution to your problem but a way how to calculate the std::vector to fill in the 0 at the correct place. I think it is also possible to calculate it as a std::array if needed. I am not sure if that helps, but I guess you could use the calculated vector to fill the Eigen::Map
#include <array>
#include <cstddef>
#include <iostream>
#include <vector>
template<typename T, size_t N>
class EigenVector
{
static constexpr int CalculateRowColSize(size_t n)
{
size_t i = 1;
size_t inc = 1;
do
{
if (inc == n)
{
return static_cast<int>(i);
}
i++;
inc += i;
} while (i < n);
return -1;
}
static constexpr bool IsValid(size_t n)
{
if(CalculateRowColSize(n) == -1)
{
return false;
}
return true;
}
static_assert(IsValid(N));
public:
EigenVector() = delete;
static std::vector<T> Calculate(std::array<T, N> values)
{
constexpr size_t mRowColSize = CalculateRowColSize(N);
std::vector<T> ret;
auto count = 0;
auto valueCounter = 0;
for (size_t i = 0; i < mRowColSize; i++)
{
for (auto j = 0; j < count; j++)
{
ret.push_back(T());
}
for (size_t j = 0; j < mRowColSize - count; j++)
{
ret.push_back(values[valueCounter]);
valueCounter++;
}
count++;
}
return ret;
}
};
int main()
{
{
const std::array<int, 6> arr{ 1,2,3,4,5,6 };
const auto values = EigenVector<int, 6>::Calculate(arr);
for (auto& val : values)
{
std::cout << val << " ";
}
}
std::cout << std::endl << std::endl;
{
const std::array<int, 10> arr{ 1,2,3,4,5,6,7,8,9,10 };
const auto values = EigenVector<int, 10>::Calculate(arr);
for (auto& val : values)
{
std::cout << val << " ";
}
}
return 0;
}
Output:
1 2 3 0 4 5 0 0 6
1 2 3 4 0 5 6 7 0 0 8 9 0 0 0 10
Note that the algorithm is written that only possible matrix sizes are valid as input

Is there a way to set an offset to a range-based for loop in C++?

Suppose that these classes have an inner array of AnotherObjectClass type that can be accessed through a function called GetAnotherObjectClassTerms.
#include<iostream>
int main() {
// Suppose a default constructor that assigns values to the arrays
AnObjectClass obj1;
AnObjectClass obj2;
for (AnotherObjectClass term1 : obj1.GetAnotherObjectClassTerms) {
for (AnotherObejctClass term2 : obj2.GetAnotherObjectClassTerms) {
if (term1 > term2) {
std::cout << "Term 1 is greater than term 2" << std::endl;
} else {
std::cout << "Term 1 is not greater than term 2" << std::endl;
// OFFSET THIS INNER LOOP so it doesn't iterate through all the items again.
break;
}
}
}
}
I can do this in a normal for loop by creating a variable that holds that last index, so that when it starts iterating, it starts from that specific position.
The following code is to show my problem using a traditional for-loop as requested.
for (unsigned short i = 0, temp = 0; i < 3; i++) {
for (unsigned short j = temp; j < 3; j++) {
// This traditional for-loop iterates through a different range
// whenever the 'temp' value is increased
if (j > i) {
std::cout << j << " > " << i << std::endl;
}
else {
std::cout << j << " <= " << i << std::endl;
temp++;
}
}
}
The output of the code above would be:
Output #1: 0 <= 0
Output #2: 1 > 0
Output #3: 2 > 0
Output #4: 1 <= 1
Output #5: 2 > 1
Output #6: 2 <= 2
As you can see, the inner loop doesn't iterate from the "beginning" whenever the variable temp is incremented.
So my question is: can this be done in a range-based for loop? If so, how this offset can be applied to a range-base for loop? Or, should I completely avoid this and go with a normal for-loop?
The real problem that I'm having is that I need the inner loop to start by an offset by +1 when the break statements is reached.
Take into account that the range-based for is looping through the elements of an array.

If you can use Boost, you can use boost::adaptors::sliced to get a slice of your range. The following is a complete example:
#include <cstddef> // std::size_t
#include <iostream>
#include <iterator>
#include <boost/range/adaptor/sliced.hpp>
int main() {
int a[] = {0,1,2}, b[] = {0,1,2};
std::size_t off = 0;
for (int term1 : a) {
for (int term2 : b | boost::adaptors::sliced(off, std::size(b))) {
// ^^^^^^^^^ C++17 feature
if (term1 > term2) {
std::cout << "Term 1 is greater than term 2" << std::endl;
} else {
std::cout << "Term 1 is not greater than term 2" << std::endl;
++off;
break;
}
}
}
}
Output:
Term 1 is not greater than term 2
Term 1 is not greater than term 2
Term 1 is not greater than term 2
You can also implement a simple class, say Offset. When combined with a range, it returns a proxy class with begin() and end() member functions (so that it is able to behave like a range), say View, which represents the offseted range.
Example:
#include <cstddef> // std::size_t
#include <iostream>
#include <iterator>
#include <utility>
struct Offset {
std::size_t _offset;
};
constexpr Offset offset(std::size_t s)
{
return {s};
}
template <typename Iterator>
struct View {
constexpr Iterator begin() const {return _begin;}
constexpr Iterator end() const {return _end;}
Iterator _begin;
Iterator _end;
};
// Combine a range with an Offset using operator| like boost::adaptors::sliced
template <typename Range>
auto operator |(Range &&range, const Offset &offset)
-> View<decltype(std::begin(std::forward<Range>(range)))>
{
return {
std::begin(std::forward<Range>(range)) + offset._offset,
std::end(std::forward<Range>(range))
};
}
int main() {
int a[] = {0,1,2}, b[] = {0,1,2};
std::size_t off = 0;
for (int term1 : a) {
for (int term2 : b | offset(off)) {
if (term1 > term2) {
std::cout << "Term 1 is greater than term 2" << std::endl;
} else {
std::cout << "Term 1 is not greater than term 2" << std::endl;
++off;
break;
}
}
}
}

Iterate through different subset of size k

I have an array of n integers (not necessarily distinct!) and I would like to iterate over all subsets of size k. However I'd like to exclude all duplicate subsets.
e.g.
array = {1,2,2,3,3,3,3}, n = 7, k = 2
then the subsets I want to iterate over (each once) are:
{1,2},{1,3},{2,2},{2,3},{3,3}
What is an efficient algorithm for doing this?
Is a recursive approach the most efficient/elegant?
In case you have a language-specific answer, I'm using C++.

The same (or almost the same) algorithm which is used to generated combinations of a set of unique values in lexicographical order can be used to generate combinations of a multiset in lexicographical order. Doing it this way avoids the necessity to deduplicate, which is horribly expensive, and also avoids the necessity of maintaining all the generated combinations. It does require that the original list of values be sorted.
The following simple implementation finds the next k-combination of a multiset of n values in average (and worst-case) time O(n). It expects two ranges: the first range is a sorted k-combination, and the second range is the sorted multiset. (If either range is unsorted or the values in first range do not constitute a sub(multi)set of the second range, then the behaviour is undefined; no sanity checks are made.)
Only the end iterator from the second range is actually used, but I thought that made the calling convention a bit odd.
template<typename BidiIter, typename CBidiIter,
typename Compare = std::less<typename BidiIter::value_type>>
int next_comb(BidiIter first, BidiIter last,
CBidiIter /* first_value */, CBidiIter last_value,
Compare comp=Compare()) {
/* 1. Find the rightmost value which could be advanced, if any */
auto p = last;
while (p != first && !comp(*(p - 1), *--last_value)) --p;
if (p == first) return false;
/* 2. Find the smallest value which is greater than the selected value */
for (--p; comp(*p, *(last_value - 1)); --last_value) { }
/* 3. Overwrite the suffix of the subset with the lexicographically smallest
* sequence starting with the new value */
while (p != last) *p++ = *last_value++;
return true;
}
It should be clear that steps 1 and 2 combined make at most O(n) comparisons, because each of the n values is used in at most one comparison. Step 3 copies at most O(k) values, and we know that k≤n.
This could be improved to O(k) in the case where no values are repeated, by maintaining the current combination as a container of iterators into the value list rather than actual values. This would also avoid copying values, at the cost of extra dereferences. If in addition we cache the function which associates each value iterator with an iterator to the first instance of next largest value, we could eliminate Step 2 and reduce the algorithm to O(k) even for repeated values. That might be worthwhile if there are a large number of repeats and comparisons are expensive.
Here's a simple use example:
std::vector<int> values = {1,2,2,3,3,3,3};
/* Since that's sorted, the first subset is just the first k values */
const int k = 2;
std::vector<int> subset{values.cbegin(), values.cbegin() + k};
/* Print each combination */
do {
for (auto const& v : subset) std::cout << v << ' ';
std::cout << '\n';
} while (next_comb(subset.begin(), subset.end(),
values.cbegin(), values.cend()));
Live on coliru

I like bit-twiddling for this problem. Sure, it limits you to only 32 elements in your vector, but it's still cool.
First, given a bit mask, determine the next bitmask permutation (source):
uint32_t next(uint32_t v) {
uint32_t t = v | (v - 1);
return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}
Next, given a vector and a bitmask, give a new vector based on that mask:
std::vector<int> filter(const std::vector<int>& v, uint32_t mask) {
std::vector<int> res;
while (mask) {
res.push_back(v[__builtin_ctz(mask)]);
mask &= mask - 1;
}
return res;
}
And with that, we just need a loop:
std::set<std::vector<int>> get_subsets(const std::vector<int>& arr, uint32_t k) {
std::set<std::vector<int>> s;
uint32_t max = (1 << arr.size());
for (uint32_t v = (1 << k) - 1; v < max; v = next(v)) {
s.insert(filter(arr, v));
}
return s;
}
int main()
{
auto s = get_subsets({1, 2, 2, 3, 3, 3, 3}, 2);
std::cout << s.size() << std::endl; // prints 5
}

The basic idea of this solution is a function like next_permutation but which generates the next ascending sequence of "digits". Here called ascend_ordered.
template< class It >
auto ascend_ordered( const int n_digits, const It begin, const It end )
-> bool
{
using R_it = reverse_iterator< It >;
const R_it r_begin = R_it( end );
const R_it r_end = R_it( begin );
int max_digit = n_digits - 1;
for( R_it it = r_begin ; it != r_end; ++it )
{
if( *it < max_digit )
{
++*it;
const int n_further_items = it - r_begin;
for( It it2 = end - n_further_items; it2 != end; ++it2 )
{
*it2 = *(it2 - 1) + 1;
}
return true;
}
--max_digit;
}
return false;
}
Main program for the case at hand:
auto main() -> int
{
vector<int> a = {1,2,2,3,3,3,3};
assert( is_sorted( begin( a ), end( a ) ) );
const int k = 2;
const int n = a.size();
vector<int> indices( k );
iota( indices.begin(), indices.end(), 0 ); // Fill with 0, 1, 2 ...
set<vector<int>> encountered;
for( ;; )
{
vector<int> current;
for( int const i : indices ) { current.push_back( a[i] ); }
if( encountered.count( current ) == 0 )
{
cout << "Indices " << indices << " -> values " << current << endl;
encountered.insert( current );
}
if( not ascend_ordered( n, begin( indices ), end( indices ) ) )
{
break;
}
}
}
Supporting includes and i/o:
#include <algorithm>
using std::is_sorted;
#include <assert.h>
#include <iterator>
using std::reverse_iterator;
#include <iostream>
using std::ostream; using std::cout; using std::endl;
#include <numeric>
using std::iota;
#include <set>
using std::set;
#include <utility>
using std::begin; using std::end;
#include <vector>
using std::vector;
template< class Container, class Enable_if = typename Container::value_type >
auto operator<<( ostream& stream, const Container& c )
-> ostream&
{
stream << "{";
int n_items_outputted = 0;
for( const int x : c )
{
if( n_items_outputted >= 1 ) { stream << ", "; }
stream << x;
++n_items_outputted;
}
stream << "}";
return stream;
}

Unlike the previous answer, this is not as efficient and doesn't do anything as fancy as a lot of the bit twiddling. However it does not limit the size of your array or the size of the subset.
This solution uses std::next_permutation to generate the combinations, and takes advantage of std::set's uniqueness property.
#include <algorithm>
#include <vector>
#include <set>
#include <iostream>
#include <iterator>
using namespace std;
std::set<std::vector<int>> getSubsets(const std::vector<int>& vect, size_t numToChoose)
{
std::set<std::vector<int>> returnVal;
// return the whole thing if we want to
// choose everything
if (numToChoose >= vect.size())
{
returnVal.insert(vect);
return returnVal;
}
// set up bool vector for combination processing
std::vector<bool> bVect(vect.size() - numToChoose, false);
// stick the true values at the end of the vector
bVect.resize(bVect.size() + numToChoose, true);
// select where the ones are set in the bool vector and populate
// the combination vector
do
{
std::vector<int> combination;
for (size_t i = 0; i < bVect.size() && combination.size() <= numToChoose; ++i)
{
if (bVect[i])
combination.push_back(vect[i]);
}
// sort the combinations
std::sort(combination.begin(), combination.end());
// insert this new combination in the set
returnVal.insert(combination);
} while (next_permutation(bVect.begin(), bVect.end()));
return returnVal;
}
int main()
{
std::vector<int> myVect = {1,2,2,3,3,3,3};
// number to select
size_t numToSelect = 3;
// get the subsets
std::set<std::vector<int>> subSets = getSubsets(myVect, numToSelect);
// output the results
for_each(subSets.begin(), subSets.end(), [] (const vector<int>& v)
{ cout << "subset "; copy(v.begin(), v.end(), ostream_iterator<int>(cout, " ")); cout << "\n"; });
}
Live example: http://coliru.stacked-crooked.com/a/beb800809d78db1a
Basically we set up a bool vector and populate a vector with the values that correspond with the position of the true items in the bool vector. Then we sort and insert this into a set. The std::next_permutation shuffles the true values in the bool array around and we just repeat.
Admittedly, not as sophisticated and more than likely slower than the previous answer, but it should do the job.

Using binary counting to count all subsets of an array

So if I am given an array such as
a = {1, 2, 3}
We know that the given subarrays (non contiguous included) are (this represents the power set)
{1} {2} {3} {1,2,3} {1,2} {1,3} {2,3}
I also know that these subsets can be represented by counting in binary from
000 -> 111 (0 to 7), where each 1 bit means we 'use' this value from the array
e.g. 001 corresponds to the subset {3}
I know that this method can somehow be used to generate all subsets, but im not really sure how this can be implemented in c++
So basically what I am asking is how can (if it can) binary counting be used to generate power sets?
Any other methods for generating a power set are also much appreciated!

For your example with 3 set elements you can just do this:
for (s = 1; s <= 7; ++s)
{
// ...
}
Here's a demo program:
#include <iostream>
int main()
{
const int num_elems = 3; // number of set elements
const int elems[num_elems] = { 1, 2, 3 }; // mapping of set element positions to values
for (int s = 1; s < (1 << num_elems); ++s) // iterate through all non-null sets
{
// print the set
std::cout << "{";
for (int e = 0; e < num_elems; ++e) // for each set element
{
if (s & (1 << e)) // test for membership of set
{
std::cout << " " << elems[e];
}
}
std::cout << " }" << std::endl;
}
return 0;
}
Compile and test:
$ g++ -Wall sets.cpp && ./a.out
{ 1 }
{ 2 }
{ 1 2 }
{ 3 }
{ 1 3 }
{ 2 3 }
{ 1 2 3 }
Note that it's a common convention to make the least significant bit correspond to the first set element.
Note also that we are omitting the null set, s = 0, as you don't seem to want to include this.
If you need to work with sets larger than 64 elements (i.e. uint64_t) then you'll need a better approach - you can either expand the above method to use multiple integer elements, or use std::bitset or std::vector<bool>, or use something like #Yochai's answer (using std::next_permutation).

Actually creating the sets is pretty easy - just use bitwise operations >>= and & to test a bit at a time. Assuming input vector/array a[] known to have 3 elements and therefore produce a 7 vector output:
std::vector<std::vector<T>> v(7);
for (int n = 1; n <= 7; ++n) // each output set...
for (int i = 0, j = n; j; j >>= 1, ++i) // i moves through a[i],
// j helps extract bits in n
if (j & 1)
v[n-1].push_back(a[i]);

For compile time size, you may use bitset, something like:
template <std::size_t N>
bool increase(std::bitset<N>& bs)
{
for (std::size_t i = 0; i != bs.size(); ++i) {
if (bs.flip(i).test(i) == true) {
return true;
}
}
return false; // overflow
}
template <typename T, std::size_t N>
void display(const std::array<T, N>& a, const std::bitset<N>& bs)
{
std::cout << '{';
const char* sep = "";
for (std::size_t i = 0; i != bs.size(); ++i) {
if (bs.test(i)) {
std::cout << sep << a[i];
sep = ", ";
}
}
std::cout << '}' << std::endl;
}
template <typename T, std::size_t N>
void display_all_subsets(const std::array<T, N>& a)
{
std::bitset<N> bs;
do {
display(a, bs);
} while (increase(bs));
}
Live example

Sorting characters in a string first by frequency and then alphabetically

Given a string, I'm trying to count the occurrence of each letter in the string and then sort their frequency from highest to lowest. Then, for letters that have similar number of occurrences, I have to sort them alphabetically.
Here is what I have been able to do so far:
I created an int array of size 26 corresponding to the 26 letters of the alphabet with individual values representing the number of times it appeared in the sentence
I pushed the contents of this array into a vector of pairs, v, of int and char (int for the frequency, and char for the actual letter)
I sorted this vector of pairs using std::sort(v.begin(), v.end());
In displaying the frequency count, I just used a for loop starting from the last index to display the result from highest to lowest. I am having problems, however, with regard to those letters having similar frequencies, because I need them displayed in alphabetical order. I tried using a nested for loop with the inner loop starting with the lowest index and using a conditional statement to check if its frequency is the same as the outer loop. This seemed to work, but my problem is that I can't seem to figure out how to control these loops so that redundant outputs will be avoided. To understand what I'm saying, please see this example output:
Enter a string: hello world
Pushing the array into a vector pair v:
d = 1
e = 1
h = 1
l = 3
o = 2
r = 1
w = 1
Sorted first according to frequency then alphabetically:
l = 3
o = 2
d = 1
e = 1
h = 1
r = 1
w = 1
d = 1
e = 1
h = 1
r = 1
d = 1
e = 1
h = 1
d = 1
e = 1
d = 1
Press any key to continue . . .
As you can see, it would have been fine if it wasn't for the redundant outputs brought about by the incorrect for loops.
If you can suggest more efficient or better implementations with regard to my concern, then I would highly appreciate it as long as they're not too complicated or too advanced as I am just a C++ beginner.
If you need to see my code, here it is:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
int main() {
cout<<"Enter a string: ";
string input;
getline(cin, input);
int letters[26]= {0};
for (int x = 0; x < input.length(); x++) {
if (isalpha(input[x])) {
int c = tolower(input[x] - 'a');
letters[c]++;
}
}
cout<<"\nPushing the array into a vector pair v: \n";
vector<pair<int, char> > v;
for (int x = 0; x < 26; x++) {
if (letters[x] > 0) {
char c = x + 'a';
cout << c << " = " << letters[x] << "\n";
v.push_back(std::make_pair(letters[x], c));
}
}
// Sort the vector of pairs.
std::sort(v.begin(), v.end());
// I need help here!
cout<<"\n\nSorted first according to frequency then alphabetically: \n";
for (int x = v.size() - 1 ; x >= 0; x--) {
for (int y = 0; y < x; y++) {
if (v[x].first == v[y].first) {
cout << v[y].second<< " = " << v[y].first<<endl;
}
}
cout << v[x].second<< " = " << v[x].first<<endl;
}
system("pause");
return 0;
}

You could simplify this a lot, in two steps:
First use a map to count the number of occurrences of each character in the string:
std::unordered_map<char, unsigned int> count;
for( char character : string )
count[character]++;
Use the values of that map as comparison criteria:
std::sort( std::begin( string ) , std::end( string ) ,
[&]( char lhs , char rhs )
{
return count[lhs] < count[rhs];
}
);
Here is a working example running at ideone.

If you want highest frequency then lowest letter, an easy way would be to store negative values for frequency, then negate it after you sort. A more efficient way would be to change the function used for sorting, but that is a touch trickier:
struct sort_helper {
bool operator()(std::pair<int,char> lhs, std::pair<int,char> rhs) const{
return std::make_pair(-lhs.first,lhs.second)<std::make_pair(-rhs.first,rhs.second);
}
};
std::sort(vec.begin(),vec.end(),sort_helper());

(Posted on behalf of the OP.)
Thanks to the responses of the awesome people here at Stack Overflow, I was finally able to fix my problem. Here is my final code in case anyone is interested or for future references of people who might be stuck in the same boat:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
struct Letters
{
Letters() : freq(0){}
Letters(char letter,int freq) {
this->freq = freq;
this->letter = letter;
}
char letter;
int freq;
};
bool Greater(const Letters& a, const Letters& b)
{
if(a.freq == b.freq)
return a.letter < b.letter;
return a.freq > b.freq;
}
int main () {
cout<<"Enter a string: ";
string input;
getline(cin, input);
vector<Letters> count;
int letters[26]= {0};
for (int x = 0; x < input.length(); x++) {
if (isalpha(input[x])) {
int c = tolower(input[x] - 'a');
letters[c]++;
}
}
for (int x = 0; x < 26; x++) {
if (letters[x] > 0) {
char c = x + 'a';
count.push_back(Letters(c, letters[x]));
}
}
cout<<"\nUnsorted list..\n";
for (int x = 0 ; x < count.size(); x++) {
cout<<count[x].letter<< " = "<< count[x].freq<<"\n";
}
std::sort(count.begin(),count.end(),Greater);
cout<<"\nSorted list according to frequency then alphabetically..\n";
for (int x = 0 ; x < count.size(); x++) {
cout<<count[x].letter<< " = "<< count[x].freq<<"\n";
}
system("pause");
return 0;
}
Example output:
Enter a string: hello world
Unsorted list..
d = 1
e = 1
h = 1
l = 3
o = 2
r = 1
w = 1
Sorted list according to frequency then alphabetically..
l = 3
o = 2
d = 1
e = 1
h = 1
r = 1
w = 1
Press any key to continue . . .
I basically just followed the advice of #OliCharlesworth and implemented a custom comparator through the help of this guide: A Function Pointer as Comparison Function.
Although I'm pretty sure that my code can still be made more efficient, I'm still pretty happy with the results.

// CODE BY VIJAY JANGID in C language
// Using arrays, Time complexity - ( O(N) * distinct characters )
// Efficient answer
#include <stdio.h>
int main() {
int iSizeFrequencyArray= 58;
// 122 - 65 = 57 for A to z
int frequencyArray[iSizeFrequencyArray];
int iIndex = 0;
// Initializing frequency to zero for all
for (iIndex = 0; iIndex < iSizeFrequencyArray; iIndex++) {
frequencyArray[iIndex] = 0;
}
int iMyStringLength = 1000;
char chMyString[iMyStringLength];
// take input for the string
scanf("%s", &chMyString);
// calculating length
int iSizeMyString;
while(chMyString[++iSizeMyString]);
// saving each character frequency in the freq. array
for (iIndex = 0; iIndex < iSizeMyString; iIndex++) {
int currentChar = chMyString[iIndex];
frequencyArray[currentChar - 65]++;
}
/* // To print the frequency of each alphabet
for (iIndex = 0; iIndex < iSizeFrequencyArray; iIndex++) {
char currentChar = iIndex + 65;
printf("\n%c - %d", currentChar, frequencyArray[iIndex ]);
}
*/
int lowestDone = 0, lowest = 0, highestSeen = 0;
for( iIndex = 0; iIndex < iSizeFrequencyArray; iIndex++ ) {
if(frequencyArray[iIndex] > highestSeen) {
highestSeen = frequencyArray[iIndex];
}
}
// assigning sorted values to the current array
while (lowest != highestSeen) {
// calculating lowest frequency
for( iIndex = 0; iIndex < iSizeFrequencyArray; iIndex++ ) {
if( frequencyArray[iIndex] > lowestDone &&
frequencyArray[iIndex] < lowest) {
lowest = frequencyArray[iIndex]; // taking lowest value
}
}
// printing that frequency
for( iIndex =0; iIndex < iSizeFrequencyArray; iIndex++ ) {
// print that work for that times
if(frequencyArray[iIndex] == lowest){
char currentChar = iIndex + 65;
int iIndex3;
for(iIndex3 = 0; iIndex3 < lowest; iIndex3++){
printf("%c", currentChar);
}
}
}
// now that is done, move to next lowest
lowestDone = lowest;
// reset to highest value, to get the next lowest one
lowest = highestSeen+1;
}
return 0;
}
Explanation:
First create array to store repetition of size (112 - 65) to store asci characters from A to z.
Store the frequency of each character by incrementing at each occurrence.
Now find the highest frequency.
Run a loop where condition is (lowest != highest) where lowest = 0 initially.
Now in each iteration print character which whose frequency is equal to lowest. They will be alphabetically in order automatically.
At last find the next higher frequency and print then so on.
When lowest reach highest then break loop.

Using an unordered_map for counting characters as suggested by #Manu343726 is a good idea. However, in order to produce your sorted output, another step is required.
My solution is also in C++11 and uses a lambda expression. This way you neither need to define a custom struct nor a comparison function. The code is almost complete, I just skipped reading the input:
#include <unordered_map>
#include <iostream>
#include <set>
int main() {
string input = "hello world";
unordered_map<char, unsigned int> count;
for (char character : input)
if (character >= 'a' && character <= 'z')
count[character]++;
cout << "Unsorted list:" << endl;
for (auto const &kv : count)
cout << kv.first << " = " << kv.second << endl;
using myPair = pair<char, unsigned int>;
auto comp = [](const myPair& a, const myPair& b) {
return (a.second > b.second || a.second == b.second && a.first < b.first);
};
set<myPair, decltype(comp)> sorted(comp);
for(auto const &kv : count)
sorted.insert(kv);
cout << "Sorted list according to frequency then alphabetically:" << endl;
for (auto const &kv : sorted)
cout << kv.first << " = " << kv.second << endl;
return 0;
}
Output:
Unsorted list:
r = 1
h = 1
e = 1
d = 1
o = 2
w = 1
l = 3
Sorted list according to frequency then alphabetically:
l = 3
o = 2
d = 1
e = 1
h = 1
r = 1
w = 1
Note 1: Instead of inserting each element from the unordered_map into the set, it might be more efficient to use the function std::transform or std:copy, but my code is at least short.
Note 2: Instead of using a custom sorted set which maintains the order you want, it might be more efficient to use a vector of pairs and sort it once in the end, but your solution is already similar to this.
Code on Ideone

#include<stdio.h>
// CODE BY AKSHAY BHADERIYA
char iFrequencySort (char iString[]);
void vSort (int arr[], int arr1[], int len);
int
main ()
{
int iLen, iCount;
char iString[100], str[100];
printf ("Enter a string : ");
scanf ("%s", iString);
iFrequencySort (iString);
return 0;
}
char
iFrequencySort (char iString[])
{
int iFreq[100] = { 0 };
int iI, iJ, iK, iAsc, iLen1 = 0, iLen = 0;
while (iString[++iLen]);
int iOccurrence[94];
int iCharacter[94];
for (iI = 0; iI < iLen; iI++)
{ //frequency of the characters
iAsc = (int) iString[iI];
iFreq[iAsc - 32]++;
}
for (iI = 0, iJ = 0; iI < 94; iI++)
{ //the characters and occurrence arrays
if (iFreq[iI] != 0)
{
iCharacter[iJ] = iI;
iOccurrence[iJ] = iFreq[iI];
iJ++;
}
}
iLen1 = iJ;
vSort (iOccurrence, iCharacter, iLen1); //sorting both arrays
/*letter array consists only the index of iFreq array.
Converting it to the ASCII value of corresponding character */
for (iI = 0; iI < iLen1; iI++)
{
iCharacter[iI] += 32;
}
iK = 0;
for (iI = 0; iI < iLen1; iI++)
{ //characters into original string
for (iJ = 0; iJ < iOccurrence[iI]; iJ++)
{
iString[iK++] = (char) iCharacter[iI];
}
}
printf ("%s", iString);
}
void
vSort (int iOccurrence[], int iCharacter[], int len)
{
int iI, iJ, iTemp;
for (iI = 0; iI < len - 1; iI++)
{
for (iJ = iI + 1; iJ < len; iJ++)
{
if (iOccurrence[iI] > iOccurrence[iJ])
{
iTemp = iOccurrence[iI];
iOccurrence[iI] = iOccurrence[iJ];
iOccurrence[iJ] = iTemp;
iTemp = iCharacter[iI];
iCharacter[iI] = iCharacter[iJ];
iCharacter[iJ] = iTemp;
}
}
}
}

Answers are given and one is accepted. I would like to give an additional answer showing the standard approach for this task.
There is often the requirement to first count things and then to get back their rank or some topmost value or other information.
One of the most common solution is to use a so called associative container for that, and, here specifically, a std::map or even better a std::unordered_map. This, because we need a key value, in the above described way a letter and an associted value, here the count for this letter. The key is unique. There cannot be more than one of the same letter in it. This would of course not make any sense.
Associative containers are very efficient by accessing their elements by their key value.
OK, there are 2 of them. The std::map and the std::unordered_map. One uses a tree to store the key in a sorted manner and the other use fast hashing algorithms to access the key values. Since we are later not interested in sorted keys, but in sorted count of occurence, we can choose the std::unordred_map. As a futher benefit, this will use fast the hashing algorithms mentioned to access a key.
The maps have an additional huge advantage. The have an index operator [], that will look very fast for a key value. If found, it will return a reference to the value associated with the key. If not found, it will create a key and initialize its value with the default (0 in our case). And then counting of any key is as simple as map[key]++.
But then, later, we here often hear: But it must be sorted by the count. That does of course not work, because the count my have duplicate values, and the map can only contain unique key values. So, impossible.
The solution is to use a second associative container a std::multiset which can have more of the same keys and a custome sort operator, where we can sort according to the value. In this we store the not a key and a value as 2 elements, but a std::pair with both values. And we sort by the 2nd part of the pair.
We cannot use a std::multi:set in the first place, because we need the unique key (in this case the letter).
The above described approach gives us extreme flexibility and ease of use. We can basically count anything with this algorithm
It could for example look the the below compact code:
#include <iostream>
#include <string>
#include <utility>
#include <set>
#include <unordered_map>
#include <type_traits>
#include <cctype>
// ------------------------------------------------------------
// Create aliases. Save typing work and make code more readable
using Pair = std::pair<char, unsigned int>;
// Standard approach for counter
using Counter = std::unordered_map<Pair::first_type, Pair::second_type>;
// Sorted values will be stored in a multiset
struct Comp { bool operator ()(const Pair& p1, const Pair& p2) const { return (p1.second == p2.second) ? p1.first<p2.first : p1.second>p2.second; } };
using Rank = std::multiset<Pair, Comp>;
// ------------------------------------------------------------
// --------------------------------------------------------------------------------------
// Compact function to calculate the frequency of charcters and then get their rank
Rank getRank(std::string& text) {
// Definition of our counter
Counter counter{};
// Iterate over all charcters in text and count their frequency
for (const char c : text) if (std::isalpha(c)) counter[char(std::tolower(c))]++;
// Return ranks,sorted by frequency and then sorted by character
return { counter.begin(), counter.end() };
}
// --------------------------------------------------------------------------------------
// Test, driver code
int main() {
// Get a string from the user
if (std::string text{}; std::getline(std::cin, text))
// Calculate rank and show result
for (const auto& [letter, count] : getRank(text))
std::cout << letter << " = " << count << '\n';
}
Please see the minimal statements used. Very elegant.
But often we do see that arrays are use as an associted container. They have also an index (a key) and a value. Disadvantage may be a tine space overhead for unsued keys. Additionally the will only work for something wit a know magnitude. For example for 26 letters. Other countries alphabets may have more or less letters. Then this kind of solution would be not that flexible. Anyway it is also often used and OK.
So, your solution maybe a littel bit more complex, but will of course still work.
Let me give you an additional example for getting the topmost value of any container. Here you will see, how flexible such a solution can be.
I am sorry, but it is a little bit advanced. . .
#include <iostream>
#include <utility>
#include <unordered_map>
#include <queue>
#include <vector>
#include <iterator>
#include <type_traits>
#include <string>
// Helper for type trait We want to identify an iterable container ----------------------------------------------------
template <typename Container>
auto isIterableHelper(int) -> decltype (
std::begin(std::declval<Container&>()) != std::end(std::declval<Container&>()), // begin/end and operator !=
++std::declval<decltype(std::begin(std::declval<Container&>()))&>(), // operator ++
void(*std::begin(std::declval<Container&>())), // operator*
void(), // Handle potential operator ,
std::true_type{});
template <typename T>
std::false_type isIterableHelper(...);
// The type trait -----------------------------------------------------------------------------------------------------
template <typename Container>
using is_iterable = decltype(isIterableHelper<Container>(0));
// Some Alias names for later easier reading --------------------------------------------------------------------------
template <typename Container>
using ValueType = std::decay_t<decltype(*std::begin(std::declval<Container&>()))>;
template <typename Container>
using Pair = std::pair<ValueType<Container>, size_t>;
template <typename Container>
using Counter = std::unordered_map<ValueType<Container>, size_t>;
template <typename Container>
using UnderlyingContainer = std::vector<Pair<Container>>;
// Predicate Functor
template <class Container> struct LessForSecondOfPair {
bool operator () (const Pair<Container>& p1, const Pair<Container>& p2) { return p1.second < p2.second; }
};
template <typename Container>
using MaxHeap = std::priority_queue<Pair<Container>, UnderlyingContainer<Container>, LessForSecondOfPair<Container>>;
// Function to get most frequent used number in any Container ---------------------------------------------------------
template <class Container>
auto topFrequent(const Container& data) {
if constexpr (is_iterable<Container>::value) {
// Count all occurences of data
Counter<Container> counter{};
for (const auto& d : data) counter[d]++;
// Build a Max-Heap
MaxHeap<Container> maxHeap(counter.begin(), counter.end());
// Return most frequent number
return maxHeap.top().first;
}
else
return data;
}
// Test
int main() {
std::vector testVector{ 1,2,2,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,6,7 };
std::cout << "Most frequent is: " << topFrequent(testVector) << "\n";
double cStyleArray[] = { 1.1, 2.2, 2.2, 3.3, 3.3, 3.3 };
std::cout << "Most frequent is: " << topFrequent(cStyleArray) << "\n";
std::string s{ "abbcccddddeeeeeffffffggggggg" };
std::cout << "Most frequent is: " << topFrequent(s) << "\n";
double value = 12.34;
std::cout << "Most frequent is: " << topFrequent(value) << "\n";
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

generic slicing (views) of multidimensional array in C++20 using ranges - c++

Related

C++ EIGEN: How to create triangular matrix map from a vector?

Is there a way to set an offset to a range-based for loop in C++?

Iterate through different subset of size k

Using binary counting to count all subsets of an array

Sorting characters in a string first by frequency and then alphabetically

Categories

Resources