I need to count letters from the string, sort them by count and cout results. For this purpose I'm trying to use vector and struct. Here is part of my code, but it's not working, because I don't know how to implement something:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
struct int_pair{
int key;
int value;
};
bool sort_by_value(int_pair left, int_pair right){
return left.value < right.value;
}
int main() {
string characters = "aasa asdfs dfh f ukjyhkh k wse f sdf sdfsdf";
vector<int_pair> most_frequent;
for (string::size_type i = 0; i <= characters.length(); i++) {
int int_char = (int)characters[i];
most_frequent[int_char]++; <-- I want to do something like this, but it's not working
}
sort(most_frequent.begin(), most_frequent.end(), sort_by_value);
for (vector<int_pair>::iterator it = most_frequent.begin(); it != most_frequent.end(); ++it) <-- is this call correct?
cout << " " << it->key << ":" << it->value << endl;
return 0;
}
At this code I have 2 parts that I don't know how to deal:
most_frequent[int_char]++; <-- I want to do something like this, but it's not working
and
for (vector<int_pair>::iterator it = most_frequent.begin(); it != most_frequent.end(); ++it) <-- is this call correct?
Maybe you can see any other mistakes and potential issues at this code.
I would use a std::map to determine the frequency of each letter, then copy that into a multimap while reversing the key and value to get them in order.
#include <iostream>
#include <map>
#include <algorithm>
template<class T, class U>
std::pair<U,T> flip_pair(const std::pair<T,U>& p) {
return std::make_pair(p.second,p.first);
}
int main(){
std::string characters = "zxcvopqiuweriuzxchajksdui";
std::map<char,int> freq;
std::multimap<int,char> rev_freq;
// Calculate the frequency of each letter.
for(char c: characters){
freq[c]++;
}
// Copy the results into a multimap with the key and value flipped
std::transform(std::begin(freq), std::end(freq),
std::inserter(rev_freq, rev_freq.begin()),
flip_pair<char,int>);
// Print out the results in order.
for(std::pair<int,char> p : rev_freq){
std::cout << p.first << ": " << p.second << std::endl;
}
};
This should do what you need:
most_frequent[int_char].key = int_char;
most_frequent[int_char].value++;
Yes, it sets the key many times, even though it doesn't need to.
When accessing the container with the key (vector is indexed with an integer, which is "the key" in your case), you don't have to store the key in the value field of the container again.
So you don't need your struct since you only need the value field and can can store the number of occurrences directly in the vector.
The idea is to fill the vector with 256 integers in the beginning, all initialized to zero. Then, use the vector index as your "key" (character code) to access the elements (number of occurrences).
This will result in a code similar to this:
// initialize with 256 entries, one for each character:
vector<int> counts(256);
for (string::size_type i = 0; i <= characters.length(); i++) {
// for each occurrence of a character, increase the value in the vector:
int int_char = (int)characters[i];
counts[int_char]++;
}
Once filling of the vector is done, you can find the maximum value (not only the value but also the key where it is stored) using the std::max_element algorithm:
vector<int>::iterator most_frequent =
std::max_element(counts.begin(), counts.end());
// getting the character (index within the container, "key"):
std::cout << (char)(most_frequent - counts.begin());
// the number of occurrences ("value"):
std::cout << (*most_frequent);
Here is your example with the changes (only printing the most frequent character, here it is the space so you don't see it): http://ideone.com/94GfZz
You can sort this vector, however, you will loose the key of course, since the elements will move and change their indices. There is a nice trick to process statistics like that: Use a reversed (multi)map (key, value reversed):
multimap<int,int> keyForOccurrence;
for (vector<int>::iterator i = counts.begin(); i != counts.end(); ++i) {
int occurrences = *i;
int character = i - counts.begin();
keyForOccurrence.insert(std::pair<int,int>(occurrences, character));
}
Updated code: http://ideone.com/Ub5rnL
The last thing you should now sort out by yourself is how to access and process the data within this map. The fancy thing about this reversed map is that it is now automatically sorted by occurrence, since maps are sorted by key.
I find more natural to use a std::map container to store each character occurrences. The character is map's key, its occurrence count is map's value.
It's easy to scan the source string and build this map using std::map::operator[], and ++ to increase the occurrence count.
Then, you can build a second map from the above map, with key and value inverted: so this map will be sorted by occurrences, and then you can print this second map.
Note that you have to use a std::multimap as this second map, since its keys (i.e. the occurrences) can be repeated.
Sample code follows (I tested it with VS2010 SP1/VC10):
#include <stddef.h> // for size_t
#include <algorithm> // for std::transform
#include <functional> // for std::greater
#include <iostream> // for std::cout
#include <iterator> // for std::inserter
#include <map> // for std::map, std::multimap
#include <ostream> // for std::endl
#include <string> // for std::string
#include <utility> // for std::pair
using namespace std;
int main()
{
string str = "aasa asdfs dfh f ukjyhkh k wse f sdf sdfsdf";
// Build the occurrences map (char -> occurrences)
map<char, size_t> freq;
for (size_t i = 0; i < str.length(); ++i)
freq[ str[i] ]++;
// Build a new map from previous map with inverted <key, value> pairs,
// so this new map will be sorted by old map's value (i.e. char's
// occurrences), which is new map's key.
// Use the std::greater comparator to sort in descending order.
multimap<size_t, char, greater<size_t>> sorted_freq;
transform(
freq.begin(), freq.end(), // source
inserter(sorted_freq, sorted_freq.begin()), // destination
[](const pair<char, size_t>& p) // invert key<->value
{
return pair<size_t, char>(p.second, p.first);
}
);
// Print results
for (auto it = sorted_freq.begin(); it != sorted_freq.end(); ++it)
cout << it->second << ": " << it->first << endl;
}
Output:
: 9
s: 7
f: 7
d: 5
a: 4
k: 3
h: 3
u: 1
w: 1
y: 1
j: 1
e: 1
If you don't want to print the space character occurrences, you can easily filter that out.
Note that using std::map/std::multimap will also scale up better than std::vector for non-ASCII characters, e.g. if you use Unicode UTF-32 (since Unicode characters are much more than just 256).
Related
I am trying to reorder the map in a descending way depending on the values, I have been trying to create a new map and insert the one which has the biggest value first but it keeps ordering the map by the keys.
I have also tried to reorder it by the value changing the form of the map into the other way but I will loose some data because I have more than one key which has the same value.
#include <iostream>
#include "SymbolFreq.h"
#include <string>
#include <fstream>
#include <streambuf>
#include <map>
using namespace std;
int main()
{
map <char, int> mymap;
map <char, int> realmap;
ifstream infile{ "ToCompress.txt" };
std::string str((std::istreambuf_iterator<char>(infile)),
std::istreambuf_iterator<char>());
std::map<char, int>::iterator itera;
for (auto it = str.begin(); it != str.end(); ++it)
{
itera = mymap.find(*it);
if (itera != mymap.end())
{
itera->second++;
}
else
{
mymap.insert({ *it, 1 });
}
}
int max = 0;
char provisionalChar;
int provisionalInt;
while (mymap.empty() == false)
{
for (auto it = mymap.cbegin(); it != mymap.cend(); ++it)
{
if (it->second > max)
{
max = it->second;
provisionalChar = it->first;
provisionalInt = it->second;
}
//cout << it->first << "\t" << it->second << "\n";
}
mymap.erase(provisionalChar);
realmap.insert({ provisionalChar, provisionalInt });
max = 0;
}
for (auto it = realmap.cbegin(); it != realmap.cend(); ++it)
{
cout << it->first << "\t" << it->second << "\n";
}
return 0;
}
If I understand the question properly, you'd like to count how many times each char appears in the file and then produce a map sorted with the char that appeared most time first.
Here's one idea:
#include <algorithm>
#include <cstdint>
#include <fstream>
#include <functional>
#include <iostream>
#include <iterator>
#include <map>
#include <string>
#include <unordered_map>
int main() {
std::ifstream infile{"ToCompress.txt"};
// "mymap" is only used for counting how many times each char appears.
std::unordered_map<char, std::uintmax_t> mymap;
// Loop directly over the file. No vector needed:
std::for_each(std::istreambuf_iterator<char>(infile),
std::istreambuf_iterator<char>(), [&mymap](char ch) {
// No need to find first - operator[] inserts an element
// for the key ("ch") if it's missing.
++mymap[ch];
});
// Transform the unordered_map into a multimap where the count is the key
// and in which we use a descending sort order (std::greater):
std::multimap<std::uintmax_t, char, std::greater<std::uintmax_t>> realmap;
std::transform(mymap.begin(), mymap.end(),
std::inserter(realmap, realmap.end()),
[](const auto& ch_co) -> std::pair<std::uintmax_t, char> {
// return a pair with key and value swapped
return {ch_co.second, ch_co.first};
});
// Print the result
for(auto& [count, ch] : realmap) {
std::cout << count << '\t' << ch << '\n';
}
}
Possible output:
537479
120204 t
113285 e
80681
80670 i
79862 n
77984 r
77464 s
69994 o
67377 a
...
Apparently, <space>, t, e and \n are tne most common characters in my C++ programs (which is what I used as input)
Your question may be ill-posed; take a step back and state what you are really trying to accomplish.
That said, I'll attempt an answer based on what you've written.
It looks like you're trying to sort an std::map by value, in which case your question is a duplicate of either this or this question.
Regarding your initial attempt:
Take a look at this table. Only sequence containers allow you to directly influence order. As with priority queue you have limited control over the order of associative containers and almost zero control for unordered containers.
(Note: When I refer to vectors, I'm referring to the vector class provided by <vector>.)
The problem
Given a string x and a vector of strings, how can I retrieve the string(s) in the vector that start with x? Preferably in a way that is time-efficient?
That is, if x is "apple" and the vector is vector<string> foods = {"apple pie","blueberry tarts","cherry cobbler"}, then it should return "apple pie" in some capacity.
I am using C++11 and I'm not an expert on it, so simple answers with explanations would be much appreciated. Forgive me if the answer is obvious - I am relatively new to the language.
Possible solutions I've considered
The obvious solution would be to just create an iterator and iterate through each string in the vector, pulling out all items that start with the given string using the overloaded version of rfind that has the pos parameter. (That is, like this: str.rfind("start",0))
However, with a large vector this is time-inefficient, so I'm wondering if there is a better way to do this, i.e. sorting the vector and using some kind of binary search, or perhaps modifying the find method from <algorithm>?
The simplest way to copy desired strings would be a simple linear scan. For example, using the standard library std::copy_if to perform the copying and a lambda to encapsulate the "starts with" string comparison.
#include <algorithm>
#include <iostream>
#include <string>
#include <vector>
int main()
{
std::vector<std::string> foods = { "apple pie","blueberry tarts","cherry cobbler" };
std::string prefix{ "apple" };
auto starts_with = [&prefix](const std::string &str) {
return str.compare(0, prefix.size(), prefix) == 0;
};
std::vector<std::string> result;
std::copy_if(begin(foods), end(foods), back_inserter(result), starts_with);
for (const auto &str : result) {
std::cout << str << '\n';
}
}
A good way to solve your problem would be to use binary search. Note that this requires sorting the vector of strings first, which gives the algorithm a time complexity of NlogN.
vector <string> v = {"a", "apple b", "apple c", "d"}; // stuff
string find = "apple";
// create a second vector that contains the substrings of the first vector
vector <pair<string, string>> v2;
for(string item : v){
v2.push_back({item.substr(0, find.size()), item});
}
sort(v2.begin(), v2.end());
// binary search to find the leftmost and rightmost occurrence of find
int l = v.size()-1, r = 0;
for(int i = v.size()/2; i >= 1; i /= 2){
while(l-i >= 0 && v2[l-i].first >= find){l -= i;}
while(r+i < v.size() && v2[r+i].first <= find){r += i;}
}
if(v2[l].first == find){
for(int i = l; i <= r; ++i){
cout << v2[i].second << endl;
}
}
else{
cout << "No matches were found." << endl;
}
In my code, we first create a second vector called v2 to store pairs of strings. After sorting it, we implement binary search by jumps to find the leftmost and rightmost occurrences of find. Lastly, we check if there are any occurrences at all (this is an edge case), and print all the found strings if occurrences exist.
You can do this in a single pass over the vector. This is the best you'll do unless the vector is pre-sorted, since the cost of sorting will outweigh any gain you get from using a binary search.
Using std::copy_if makes this pretty simple:
#include <string>
#include <vector>
#include <algorithm>
int main() {
std::vector<std::string> v = {
"apple pie",
"blueberry tarts",
"apple",
"cherry cobbler",
"pie"
};
std::vector<std::string> v2;
std::string to_find = "apple";
std::copy_if(
v.begin(),
v.end(),
std::back_inserter(v2),
[&to_find](const std::string& el) {
return el.compare(0, to_find.size(), to_find) == 0;
}
);
}
Live Demo
This will copy all elements from v that match the predicate function into v2. The predicate simply checks that the first to_find.size() characters of each element match the string to find using std::string::compare (overload (2) on that page).
I have inserted some elements in my unordered_multimap and I am finding all the values mapped to a key k using equal range. Now I want to traverse these mapped values by their order of insertion. Look at the code for better understanding.
#include <iostream>
#include <unordered_map>
using namespace std;
int main()
{
unordered_multimap<int,int> m;
m.insert(make_pair(1,2));
m.insert(make_pair(1,3));
m.insert(make_pair(1,4));
auto it = m.equal_range(1);
for(auto it1 = it.first; it1 != it.second; it1++) {
cout<<it1->second<<endl;
}
}
output:
4
3
2
But I want to traverse in the order in which keys and mapped values were inserted. So, I want to traverse in order 2,3,4. Is it possible?
There is not a straghtforward way to do what you are asking for. When elements are inserted in an ordered or an unordered multimap they are actually placed in the internal structure and it is not known in which order they have been placed.
You should have an auxiliary e.g. an std::queue container for this where you append the iterator to the inserted element. The iterator can be obtained from the insertion as:
auto inserted_pos = m.insert(make_pair(1,4));
Keep in mind that iterators are not invalidated during insertion. They are invalidated if the element is removed, and only for the concerned element.
here's one way to achieve what you want.
It uses a few techniques made available by boost::multi_index.
Note the use of project to convert iterators in one index to iterators in another.
#include <iostream>
#include <vector>
#include <algorithm>
#include <utility>
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/sequenced_index.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <boost/multi_index/member.hpp>
namespace example {
struct by_id {};
struct by_insertion_order {};
using namespace boost;
using namespace boost::multi_index;
using item_type = std::pair<int, int>;
typedef multi_index_container<
item_type, // what we are storing
indexed_by<
// unordered multimap-type index
hashed_non_unique<tag<by_id>, member<item_type, int, &item_type::first> >,
// sequence-type index - records insertion order
sequenced<tag<by_insertion_order>>
>
> my_store;
using const_insertion_sequence_iterator = decltype(std::declval<my_store>().get<by_insertion_order>().cbegin());
using const_by_id_iterator = decltype(std::declval<my_store>().get<by_id>().cbegin());
// convert a range of 'by_id' iterators to an ordered vector 'by_insertion_sequence' iterators
// #param store is a reference to the store for which the iterators are valid
// #param first is the first by_id iterator in the filtered range
// #param last is the 'one past the end' iterator of the filtered range
// #returns a vector of iterators to items ordered by insertion sequence
auto
projected_to_insertion_order(const my_store& store,
const_by_id_iterator first,
const_by_id_iterator last)
-> std::vector<const_insertion_sequence_iterator>
{
std::vector<const_insertion_sequence_iterator> result;
for ( ; first != last ; ++first) {
result.push_back(store.project<by_insertion_order>(first));
}
sort(result.begin(),
result.end(),
[&store](const auto& il, const auto& ir) {
return distance(store.get<by_insertion_order>().cbegin(), il)
< distance(store.get<by_insertion_order>().cbegin(), ir);
});
return result;
}
}
int main()
{
using namespace std;
using example::my_store;
using example::by_id;
using example::by_insertion_order;
using example::projected_to_insertion_order;
// define store
my_store m;
// add some items
m.get<by_id>().emplace(1,2);
m.get<by_id>().emplace(3,6);
m.get<by_id>().emplace(1,3);
m.get<by_id>().emplace(2,5);
m.get<by_id>().emplace(1,4);
// get range of items filtered by id
auto ip = m.get<by_id>().equal_range(1);
cout << "filtered but unordered\n";
for (auto it = ip.first ; it != ip.second ; ++it) {
cout << it->first << ":" << it->second << endl;
}
// project that to a vector of iterators to items ordered by insertion sequence
cout << "filtered and ordered by insertion sequence\n";
for (const auto& it : projected_to_insertion_order(m, ip.first, ip.second)) {
cout << it->first << ":" << it->second << endl;
}
}
expected output:
filtered but unordered
1:4
1:3
1:2
filtered and ordered by insertion sequence
1:2
1:3
1:4
map<int, string>::reverse_iterator& it = temp.rbegin();
it -> points to garbage key value
it++ -> points to the correct key value
map<int, string>::iterator& it = temp.begin();
it-> points to the correct key value from beginning.
Please assist.
Your statements are incorrect. If temp is not empty, then *temp.rbegin() is indeed the last value in the map, and *temp.begin() is the first value.
(However, the underlying iterator of the reverse begin is the ordinary end iterator - but you don't see that unless you call base() on the reverse iterator.)
You must have an error in your code that's filling the map. You can verify this by testing a trivial example such as
#include <algorithm>
#include <map>
#include <iostream>
using namespace std;
int main()
{
map<int, char> coll;
// insert elements from 1 to 9
for (int i=1; i<=9; ++i) {
coll[i] = static_cast<char>(i+'a'-1); // because adding 96 is less obvious that we're indexing based at letter a
}
// print all element in reverse order
for_each (coll.rbegin(), coll.rend(),
[]( pair<int, char> mapinfo ) { cout << mapinfo.second << " "; } );
cout << endl;
}
Is it possible to do something like:
string word = "Hello";
word[3] = null;
if(word[3] == null){/.../}
in C++, basically making an array element empty. For example if I wanted to remove the duplicate characters from the array I'd set them to null first and then shifted the array to the left every time I found an array index that contained null.
If this is not possible what's a good way of doing something like this in C++ ?
If you want to remove adjacent duplicate characters, you can do this:
std::string::iterator new_end = std::unique(word.begin(), word.end());
word.erase(new_end, word.end());
If you want to mark arbitrary characters for removal, you can skip the marking and just provide the appropriate predicate to std::remove_if:
new_end = std::remove_if(word.begin(), word.end(), IsDuplicate);
word.erase(new_end, word.end());
However, I can't think of an appropriate predicate to use here that doesn't exhibit undefined behavior. I would just write my own algorithm:
template<typename IteratorT>
IteratorT RemoveDuplicates(IteratorT first, IteratorT last)
{
typedef typename std::iterator_traits<IteratorT>::value_type
ValueT;
std::map<ValueT, int> counts;
for (auto scan=first; scan!=last; ++scan)
{
++counts[*scan];
if(counts[*scan] == 1)
{
*first = std::move(*scan);
++first;
}
}
return first;
}
Or, if you don't care about the order of the elements, you could simply sort it, then use the first solution.
This is possible, since a single element of a string is an element within a char-array and thus representable as pointer, i. e. you can retrieve the address of the element. Therefore you can set word[3] = null. Your if-construct is valid but the compiler prints a warning, this is because NULL is only a pointer constant. Alternatives would be: if (!word[3]) or if(word[3] == 0).
But in any case you should consider using STL algorithms for removing duplicates.
I think you should take a look at the algorithm in the STL.
You are not very specific about what you want to remove but maybe this helps:
std::string string_with_dup("AABBCCDD");
std::string string_without_dup;
std::cout << string_with_dup << std::endl;
// with copy
std::unique_copy(string_with_dup.begin(), string_with_dup.end(), std::back_inserter(string_without_dup));
std::cout << string_without_dup << std::endl;
// or inplace
string_with_dup.erase(std::unique(string_with_dup.begin(), string_with_dup.end()), string_with_dup.end());
std::cout << string_with_dup << std::endl;
If you want to remove all duplicates (not only the adjacent ones, you should use the erase-remove idiom with something like this
#include <iostream>
#include <map>
#include <string>
#include <algorithm>
using namespace std;
struct is_repeated {
is_repeated( map<char,int>& x ) :r(&x) {};
map<char,int>* r;
bool operator()( char c ) {
(*r)[c]++;
if( (*r)[c] > 1 )
return true;
return false;
}
};
int main (int argc, char**argv)
{
map<char,int> counter_map;
string v = "hello hello hello hello hello hello hello";
cout << v << endl;
is_repeated counter(counter_map);
v.erase( remove_if(v.begin(), v.end(), counter ), v.end() );
cout << v << endl;
}
outputs (as of this):
hello hello hello hello hello hello hello
helo