Use for_each or accumulate to calculate frequencies

Use for_each or accumulate to calculate frequencies - c++

I've been playing with a simple example using C++11 and some standard algorithms, and I'm not sure whether to use std::accumulate or std::for_each. The problem is to count letters in a word, so, for example, for an input of "abracadabra", you get
'a' => 5
'b' => 2
'c' => 1
'd' => 1
'r' => 2
My first cut was to use std::accumulate. The reason this seemed natural is that we're really accumulating a value (a set of frequencies). Also I've been doing some functional programming recently and accumulate seemed to be the natural translation of folding a list.
vector<int> charsInWord(const string& text)
{
return
std::accumulate(text.begin(), text.end(), vector<int>(256),
[] (const vector<int>&v, char c)
{
vector<int> v2(v);
v2[c]++;
return v2;
} );
}
However this solution seemed rather cumbersome and took a little while to get right. Moreover, even with the new move semantics I couldn't quite convince myself that there wouldn't be any unnecessary copying.
So I went for for_each instead.
vector<int> charsInWord2(const string& text)
{
vector<int> charCounts(256);
std::for_each(text.begin(), text.end(),
[&] (char c)
{
charCounts[c]++;
} );
return charCounts;
}
This is probably easier to write and understand, and I certainly feel happier about its efficiency (although I miss the declarative, functional style of accumulate).
Is there any good reason to prefer one over the other in examples like these? From the comments and answers so far, it seems like if the value I am accumulating is non-trivial, say an stl container rather than an int, I should always prefer for_each, even when I am really "accumulating".
For the sake of completeness, the rest of the code to get this to compile and test is below
#include <string>
#include <vector>
#include <numeric> // accumulate
#include <algorithm> // for_each
using std::string;
using std::vector;
#include <iostream>
// ... insert code above ...
int main(int argc, char* argv[])
{
const vector<int> charCounts = charsInWord("abracadabra");
for(size_t c=0; c<charCounts.size(); ++c) {
const int count = charCounts[c];
if (count > 0) {
std::cout << "'" << static_cast<char>(c) << "'" << " => " << count << "\n";
}
}
return 0;
}

Personally I would not have written the accumulate like that:
vector<int> charsInWord(const string& text)
{
std::vector<int> result(256); // One version never copied.
int count = std::accumulate(text.begin(), text.end(), 0,
[&result] (int count, char c)
// ^^^^^^^^^ capture
{
result[c]++;
return count+1;
} );
// Might use count in the log file.
return result;
}
But If I am doing that it seems just as easy to use for_each()
vector<int> charsInWord2(const string& text)
{
vector<int> result(256);
std::for_each(text.begin(), text.end(),
[&result] (char c)
{
result[c]++;
} );
return result;
}
I don't see anything wrong with the for_each version.
But why not go with a simple for() loop?
vector<int> charsInWord2(const string& text)
{
vector<int> result(256);
for(char c : text) {result[c]++;}
return result;
}
There was some discussion about using std::map in the comments (and then in some deleted questions). Just to capture that here and expand.
We could have used std::map<char,int> instead of vector<int>. The difference are:
From: #Dave std::map has O(ln(n)) lookup time while vector is O(1). So there is a performance consideration. Note also that the fixed cost for map will be higher than vector. Though this is small but worth noting.
From: #Dave std::vector has a fixed size of approx 256*4 (1024), while map has a size of approx 12*number of unique characters (min 12 max 3072). So no real space consideration in modern machine. But may be worth optimizing on phones and such.
From: #POW The third point is the std::map makes printing the result much easier as you do not need to check for empty values.
Vector print
for(size_t c=0; c<charCounts.size(); ++c) {
if (count > 0) {
std::cout << "'" << static_cast<char>(c) << "' => " << charCounts[c] << "\n";
}
}
Map Print
for(auto loop: charCounts) {
std::cout << "'" << loop.first << "' => " << loop.second << "\n";
}

Related

In C++11, how to find and return all the item(s) in a vector of strings that start with a given string?

(Note: When I refer to vectors, I'm referring to the vector class provided by <vector>.)
The problem
Given a string x and a vector of strings, how can I retrieve the string(s) in the vector that start with x? Preferably in a way that is time-efficient?
That is, if x is "apple" and the vector is vector<string> foods = {"apple pie","blueberry tarts","cherry cobbler"}, then it should return "apple pie" in some capacity.
I am using C++11 and I'm not an expert on it, so simple answers with explanations would be much appreciated. Forgive me if the answer is obvious - I am relatively new to the language.
Possible solutions I've considered
The obvious solution would be to just create an iterator and iterate through each string in the vector, pulling out all items that start with the given string using the overloaded version of rfind that has the pos parameter. (That is, like this: str.rfind("start",0))
However, with a large vector this is time-inefficient, so I'm wondering if there is a better way to do this, i.e. sorting the vector and using some kind of binary search, or perhaps modifying the find method from <algorithm>?

The simplest way to copy desired strings would be a simple linear scan. For example, using the standard library std::copy_if to perform the copying and a lambda to encapsulate the "starts with" string comparison.
#include <algorithm>
#include <iostream>
#include <string>
#include <vector>
int main()
{
std::vector<std::string> foods = { "apple pie","blueberry tarts","cherry cobbler" };
std::string prefix{ "apple" };
auto starts_with = [&prefix](const std::string &str) {
return str.compare(0, prefix.size(), prefix) == 0;
};
std::vector<std::string> result;
std::copy_if(begin(foods), end(foods), back_inserter(result), starts_with);
for (const auto &str : result) {
std::cout << str << '\n';
}
}

A good way to solve your problem would be to use binary search. Note that this requires sorting the vector of strings first, which gives the algorithm a time complexity of NlogN.
vector <string> v = {"a", "apple b", "apple c", "d"}; // stuff
string find = "apple";
// create a second vector that contains the substrings of the first vector
vector <pair<string, string>> v2;
for(string item : v){
v2.push_back({item.substr(0, find.size()), item});
}
sort(v2.begin(), v2.end());
// binary search to find the leftmost and rightmost occurrence of find
int l = v.size()-1, r = 0;
for(int i = v.size()/2; i >= 1; i /= 2){
while(l-i >= 0 && v2[l-i].first >= find){l -= i;}
while(r+i < v.size() && v2[r+i].first <= find){r += i;}
}
if(v2[l].first == find){
for(int i = l; i <= r; ++i){
cout << v2[i].second << endl;
}
}
else{
cout << "No matches were found." << endl;
}
In my code, we first create a second vector called v2 to store pairs of strings. After sorting it, we implement binary search by jumps to find the leftmost and rightmost occurrences of find. Lastly, we check if there are any occurrences at all (this is an edge case), and print all the found strings if occurrences exist.

You can do this in a single pass over the vector. This is the best you'll do unless the vector is pre-sorted, since the cost of sorting will outweigh any gain you get from using a binary search.
Using std::copy_if makes this pretty simple:
#include <string>
#include <vector>
#include <algorithm>
int main() {
std::vector<std::string> v = {
"apple pie",
"blueberry tarts",
"apple",
"cherry cobbler",
"pie"
};
std::vector<std::string> v2;
std::string to_find = "apple";
std::copy_if(
v.begin(),
v.end(),
std::back_inserter(v2),
[&to_find](const std::string& el) {
return el.compare(0, to_find.size(), to_find) == 0;
}
);
}
Live Demo
This will copy all elements from v that match the predicate function into v2. The predicate simply checks that the first to_find.size() characters of each element match the string to find using std::string::compare (overload (2) on that page).

How to search in a Set of vectors of integers

I am trying to form a set of vectors of integers and on checking if the same solution already exists in the set, I am not getting correct answer.
This is in regards to C++11. I had posted a similar kind of query earlier as well but had not got any meaningful replies.
Why is it that whenever we form a map or set of vectors, is is not able to recognize if I insert a vector which is identical to the one I have already inserted ?
I have been searching for an answer since months. Also, since this behavior is allowed in other languages like Java, there must be a work around this. It would be great if someone can point out why this behavior isn't working the way I expect it to and what should be the probable solution to this.
The code below is a solution to 3Sum problem on Leetcode, but doesn't work because of exactly what I have explained above.
vector<vector<int>> threeSum(vector<int>& nums) {
vector<vector<int>>result;
unordered_map<int,int>m;
set<vector<int>>res;
bool flag=false;
if(nums.size()<=2)
return result;
vector<int>temp;
for(int i=0;i<nums.size()-1;i++)
{
int comp=-(nums[i]+nums[i+1]);
if(m.find(comp)!=m.end())
{
auto location=m.find(comp);
temp.push_back(comp);
temp.push_back(nums[i]);
temp.push_back(nums[i+1]);
if(res.find(temp)==res.end())
{
res.insert(temp);
result.push_back(temp);
}
temp.clear();
}
else
{
m[nums[i]]=i+1;
m[nums[i+1]]=i+2;
}
}
return result;
}
On giving input as
[0,0,0,0]
Answer should be:
[0,0,0]
Whereas I get :
[[0,0,0], [0,0,0]]

You could use tuples in the set instead of vectors.
#include <tuple>
#include <set>
#include <iostream>
using std::get;
int main(int argc, char* argv[]) {
std::set<std::tuple<int,int,int>> sums;
auto tup1 = std::make_tuple(0, 0, 0);
sums.insert(tup1);
auto tup2 = std::make_tuple(0,0,0);
sums.insert(tup2);
std::cout << sums.size() << std::endl;
for (auto& item : sums) {
std::cout << "(" << get<0>(item) << "," << get<1>(item) << "," << get<2>(item) << ")\n";
}
return 0;
}

How do you perform transformation to each element and append the result in c++?

I have a set of integers {1,2}. I want to produce "Transform#1, Transform#2" where each element is tranformed and then result is accumulated with a delimiter.
What would be the easiest way to accomplish this? Do we have "folds", "maps" in c++?
We dont use boost.

You can use std::transform and std::accumulate
int main()
{
std::vector<int> v1 {1,2,3};
std::vector<std::string> v2;
std::transform(begin(v1), end(v1), std::back_inserter(v2), [](auto const& i) {
return std::string("Transform#") + std::to_string(i);
});
std::string s = std::accumulate(std::next(begin(v2)), end(v2), v2.at(0), [](auto const& a, auto const& b) {
return a + ", " + b;
});
std::cout << s;
}
prints Transform#1, Transform#2, Transform#3

You may want to use Range Adaptors. Boost already has them and they are coming to the standard with C++20.
Take a look at the boost::adaptors::transformed example here.
Also, check out the reference to get a better picture of what operations are supported by adaptors.
In the end, you can achieve much cleaner code and the performance difference is negligible (unlike in some other languages, where using this style of programming incurs heavy performance costs).

If you can stand a trailing separator, the following function can transform any iterable range of data { X, ..., Z } to the string "<tag>X<sep>...<sep><tag>Z<sep>".
Code
template <class InputIt>
std::string f(InputIt begin, InputIt end, std::string_view separator = ", ", std::string_view tag = "Transform#")
{
std::stringstream output;
std::transform(begin, end,
std::ostream_iterator<std::string>(output, separator.data()),
[tag](auto const& element){ return std::string{tag} + std::to_string(element); }
);
return output.str();
}
It works by transforming each element from the range into a stream iterator.
Usage
int main()
{
std::set<int> const data{1, 2, 3}; // works with vector, string, list, C-arrays, etc.
std::cout << f(begin(data), end(data)) << '\n';
// prints Transform#1, Transform#2, Transform#3,
}
Live demo

You can perform a fold using simply std::accumulate
#include <set>
#include <string>
#include <iostream>
#include <numeric>
int main()
{
auto transformation = [](int number) { return "Transform#" + std::to_string(number); };
auto transform_and_fold = [&transformation](std::string init, int number) { return std::move(init) + ", " + transformation(number); };
std::set<int> numbers{1, 2};
std::cout << std::accumulate(std::next(numbers.begin()), numbers.end(), transformation(*numbers.begin()), transform_and_fold);
}
Outputs
Transform#1, Transform#2

Assuming that I correctly understand the problem, the following straightforward implementation also looks very simple and easy.
This function works in C++11 and over:
DEMO with 5 test cases
std::string concatenate(
const std::vector<int>& indecies,
const std::string& delimiter = ", ",
const std::string& tag = "Transform#")
{
if(indecies.empty()){
return "";
}
std::string s(tag + std::to_string(indecies[0]));
for(auto it = indecies.begin()+1; it != indecies.cend(); ++it){
s += (delimiter + tag + std::to_string(*it));
}
return s;
}
(BTW, as for this function concatenate, if indecies is empty, the return value is also an empty string, not exceptions (AndreasDM's one) or UB (Everlight's one).
And if indecies has only a single element, for instance indecies={1}, then result is "Transform#1”, not "Transform#1, ”(YSC's one) or ", Transform#1”(sakra's one).
These are different from other answers and this function will be more simpler if this handling is removed.)
Although the performance may not be a focal point, the above function can be slightly optimized by pre-reserving the minimum capacity to save the resulted string by std::basic_string::reserve as follows.
Here +1 in *.size()+1 means the minimum length of a number character.
I also removed delimiter+tag in the for-loop.
This still looks simple:
DEMO with 5 test cases
std::string concatenate_fast(
const std::vector<int>& indecies,
std::string delimiter = ", ",
const std::string& tag = "Transform#")
{
if(indecies.empty()){
return "";
}
std::string s(tag + std::to_string(indecies[0]));
delimiter += tag;
s.reserve((tag.size()+1) + (indecies.size()-1)*(delimiter.size()+1));
for(auto it = indecies.begin()+1; it != indecies.cend(); ++it){
s += (delimiter + std::to_string(*it));
}
return s;
}
I have also tested the performance of these functions and some proposed answers as follows.
These tests are done by Quick C++ Benchmark within gcc-8.2, C++17 and O3 optimization.
Since std::transform_reduce is still not available in Quick C++ Benchmark, I haven’t tested it.
The above concatenate_fast shows best performance at least in these cases and concatenate is second best.
Finally, just personally, taking the balance of the readability and the performance into account, I would like to propose the above concatenate as a solution:
- Performance test with size 2 and 8. (DEMO)
- Performance test with size 16 and 32. (DEMO)

Unless you have some other requirement to preserve the intermediate tranformed list, storing it is suboptimal. You can just call std::accumulate and do both operations on the fly:
#include <cstdio>
#include <iterator>
#include <numeric>
int main ( )
{
int const input [] = { 1, 2, 3, 4, 5, 6 };
// computes sum of squares
auto const add_square = [] ( int x, int y ) { return x + y * y; };
int result = std::accumulate
( std::cbegin (input)
, std::cend (input)
, 0
, add_square
);
std::printf ( "\n%i\n", result );
return 0;
}

If you have the luxury of using C++17, there is a standard library algorithm which does exactly what you need. Here is an example:
#include <iterator>
#include <iostream>
#include <numeric>
#include <string>
int main()
{
auto input = {1, 2, 3};
std::cout << std::transform_reduce(
std::cbegin(input), std::cend(input),
std::string("Result:"),
[](const std::string & left, const std::string & right) { return left + " " + right; },
[](int value) { return "Transform#" + std::to_string(value); }
) << "\n";
}

Assign numbers to characters C++

I need a way to assign numbers to letters in C++, for example, '$' would represent the number 1. I obviously need to be able to obtain the number from the character with something like a function, e.g. getNumFromChar('$') would return 1 and getNumFromChar('#') would return 2. Is there an easy and fast way to do this in C++?

The fastest way is to write a 256 entry lookup table, containing the mapped values in the character's ASCII index. This is similar to how isdigit and tolower work, for example:
int getNumFromChar(char c)
{
static const int table[256] = {/* table entries */};
return table[c & 0xff];
}

If you would like to assign the values yourself use a map and store your key to letter combinations. If you are ok with preassigned unique values mapped to each letter, and are only using ASCII characters, then type cast them to integers... ex) std::static_cast< int >('$');

Create a vector std::vector<int> v(256,0); which is indexed by your characters and initially all of their numbers are zeros that you could treat as invalid numbers. Finally assign for each 'numbered' character some number e.g. v['$'] = 1; v['#'] = 2; using a fact that characters are actually integers from 0 to 255.

As pointed out in the comments, you can use a std::map in the following way:
#include <iostream>
#include <map>
#include <cstring>
struct myComp
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) < 0;
}
};
int main()
{
std::map<const char*, int, myComp> test;
test["$"] = 1;
test["#"] = 2;
std::cout << "$ -> " << test["$"] <<"\n";
std::cout << "# -> " << test["#"] <<"\n";
return 0;
}
Live demo here.
Majority of the other answers will work only if you have a maximum of 256 values to be stored. However, using Maps, you can store just any number of elements.

A lot of people are suggesting std::map<char,int>, which is fine and works, but a faster (?) way of doing this with no dependencies is to just use a massive switch statement
int getNumFromChar(char c){
switch(c){
case '$':
return 1;
case '#':
return 2;
//etc
}
return -1; //just in case of error, for style points
}
Depending on how much you care about performance/memory usage and how many case statements you'd have to write, this may or may not be a viable way to do what you want. Just thought I'd throw it out there since at the time of this writing I don't believe anyone has.
EDIT: Also, depending on the frequency of use of each individual character, and if you know the entire mapping before using this function or if you ever change the mapping, a std::map is way better, but I believe this is faster otherwise.

You could do something like this:
#include <map>
#include <iostream>
#include <exception>
typedef std::map<char, int> easymap_type;
class EasyMap {
public:
EasyMap() {}
virtual ~EasyMap() {}
void assign_int_to_char(const int& i, const char& c)
{
_map[c] = i;
}
int get_int_from_char(const char& c) const
{
easymap_type::const_iterator it = _map.find(c);
if (it == _map.end())
{
std::cerr << "EasyMap Error: uninitialized key - '" << c << "'" << std::endl;
throw std::exception();
}
return it->second;
}
private:
easymap_type _map;
};
int main()
{
EasyMap ezmap;
ezmap.assign_int_to_char(42, 'a');
std::cout << "EasyMap[a] = " << ezmap.get_int_from_char('a') << std::endl;
std::cout << "EasyMap[b] = " << ezmap.get_int_from_char('b') << std::endl;
return 0;
}
I handled an uninitizialized key by throwing an exception, but you could do it different ways.

If your compiler support c++11 feature,you can use std::unordered_map as container to store char and double like std::unordered_map<char,double>.
Unordered map is an associative container that contains key-value pairs with unique keys. Search, insertion, and removal of elements have average constant-time complexity.In your problem char is the key and double is your value,char-double must be the key-value stored in container.

There are already a lot of reasonable answers... I prefer the static_cast<int>('#')
And there always has to be the most stupid useless compile time template idea about how to solve a problem.
I know it's stupid, and I'm not good at this kind of things, but here is my shot at it in c++11. Don't take me seriously. I just had to do something dumb.
#include <string>
#include <array>
#include <utility>
#include <iostream>
constexpr uint kNbChars {3};
constexpr std::array<std::pair<char, int>, kNbChars> kCharToInt {
std::make_pair('$', 1)
, std::make_pair('#', 2)
, std::make_pair('#', 3)
};
template <char c>
int getInt()
{
for (auto pair : kCharToInt)
{
if (pair.first == c)
{ return pair.second; }
}
return -1;
}
int main()
{
std::cout << getInt<'#'>() << std::endl;
std::cout << getInt<'g'>() << std::endl;
}
I think you can make getInt() constexpr too in c++14, but I may be wrong and cannot check it right now.
Of course it really is useless since you have to know the letter at compile time, but you could work around that by, well, just not making getInt a template function...

C++ empty and array index

Is it possible to do something like:
string word = "Hello";
word[3] = null;
if(word[3] == null){/.../}
in C++, basically making an array element empty. For example if I wanted to remove the duplicate characters from the array I'd set them to null first and then shifted the array to the left every time I found an array index that contained null.
If this is not possible what's a good way of doing something like this in C++ ?

If you want to remove adjacent duplicate characters, you can do this:
std::string::iterator new_end = std::unique(word.begin(), word.end());
word.erase(new_end, word.end());
If you want to mark arbitrary characters for removal, you can skip the marking and just provide the appropriate predicate to std::remove_if:
new_end = std::remove_if(word.begin(), word.end(), IsDuplicate);
word.erase(new_end, word.end());
However, I can't think of an appropriate predicate to use here that doesn't exhibit undefined behavior. I would just write my own algorithm:
template<typename IteratorT>
IteratorT RemoveDuplicates(IteratorT first, IteratorT last)
{
typedef typename std::iterator_traits<IteratorT>::value_type
ValueT;
std::map<ValueT, int> counts;
for (auto scan=first; scan!=last; ++scan)
{
++counts[*scan];
if(counts[*scan] == 1)
{
*first = std::move(*scan);
++first;
}
}
return first;
}
Or, if you don't care about the order of the elements, you could simply sort it, then use the first solution.

This is possible, since a single element of a string is an element within a char-array and thus representable as pointer, i. e. you can retrieve the address of the element. Therefore you can set word[3] = null. Your if-construct is valid but the compiler prints a warning, this is because NULL is only a pointer constant. Alternatives would be: if (!word[3]) or if(word[3] == 0).
But in any case you should consider using STL algorithms for removing duplicates.

I think you should take a look at the algorithm in the STL.
You are not very specific about what you want to remove but maybe this helps:
std::string string_with_dup("AABBCCDD");
std::string string_without_dup;
std::cout << string_with_dup << std::endl;
// with copy
std::unique_copy(string_with_dup.begin(), string_with_dup.end(), std::back_inserter(string_without_dup));
std::cout << string_without_dup << std::endl;
// or inplace
string_with_dup.erase(std::unique(string_with_dup.begin(), string_with_dup.end()), string_with_dup.end());
std::cout << string_with_dup << std::endl;

If you want to remove all duplicates (not only the adjacent ones, you should use the erase-remove idiom with something like this
#include <iostream>
#include <map>
#include <string>
#include <algorithm>
using namespace std;
struct is_repeated {
is_repeated( map<char,int>& x ) :r(&x) {};
map<char,int>* r;
bool operator()( char c ) {
(*r)[c]++;
if( (*r)[c] > 1 )
return true;
return false;
}
};
int main (int argc, char**argv)
{
map<char,int> counter_map;
string v = "hello hello hello hello hello hello hello";
cout << v << endl;
is_repeated counter(counter_map);
v.erase( remove_if(v.begin(), v.end(), counter ), v.end() );
cout << v << endl;
}
outputs (as of this):
hello hello hello hello hello hello hello
helo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Use for_each or accumulate to calculate frequencies - c++

Related

In C++11, how to find and return all the item(s) in a vector of strings that start with a given string?

How to search in a Set of vectors of integers

How do you perform transformation to each element and append the result in c++?

Assign numbers to characters C++

C++ empty and array index

Categories

Resources