How to convert a string to a dictionary of letters? - c++

I need to convert letters into a dictionary of characters.
Here's an example:
letter
l: 1
e: 2
t: 2
r: 1
I did some research and found this helpful answer, but that was using getline() and separating words by spaces. Since I am trying to split by character I don't think I can use getline() since '' isn't a valid split character. I could convert to a char* array but I wasn't sure where that would get me.
This is fairly easy in other languages so I thought it wouldn't be too bad in C++. I was hoping there would be something like a my_map[key]++ or something. In Go I would write this as
// Word map of string: int values
var wordMap = make(map[string]int)
// For each letter, add to that key
for i := 0; i < len(word); i++ {
wordMap[string(word[i])]++
}
// In the end you have a map of each letter.
How could I apply this in C++?

How could I apply this in C++?
It could look rather similar to your Go code.
// Word map of char: int values
// (strings would be overkill, since you know they are a single character)
auto wordMap = std::map<char,int>{};
// For each letter, add to that key
for ( char c : word )
wordMap[c]++;
}

Here is the unicode version of Drew Dormann's answer:
#include <locale>
#include <codecvt>
std::string word = "some unicode: こんにちは世界";
std::map<char32_t, uint> wordMap;
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
for (auto c : converter.from_bytes(word)) {
wordMap[c]++;
}
for (const auto [c, v] : wordMap) {
std::cout << converter.to_bytes(c) << " : " << v << std::endl;
}

I wrote an article about this which can be checked out here. Below i have given 2 versions of the program. Version 1 keeps track of the character count in alphabetical order. But sometimes(in case) you want the character count in insertion order for which you can use Version 2.
Version 1: Get character count in ͟a͟l͟p͟h͟a͟b͟e͟t͟i͟c͟a͟l͟ ͟o͟r͟d͟e͟r͟
#include <iostream> //needed for std::cout, std::cin
#include <map> //needed for std::map
#include <iomanip> //needed for formating the output (std::setw)
int main()
{
std::string inputString; //user input will be read into this string variable
std::cout << "Enter a string: " << std::endl;
std::getline(std::cin, inputString);
//this map maps the char to their respective count
std::map < char, int > charCount;
//iterate through the inputString
for (char & c: inputString)
{
charCount[c]++;//increment the count for character c
}
std::cout << "Total unique characters are: " << charCount.size() << std::endl;
std::cout << "------------------------------------" << std::endl;
std::cout << "Character" << std::setw(10) << "Count" << std::endl;
std::cout << "------------------------------------" << std::endl;
for (std::pair < char, int > pairElement: charCount)
{
std::cout << std::setw(4) << pairElement.first << std::setw(13) << pairElement.second << std::endl;
}
return 0;
}
Version 2: Get character count in i͟n͟s͟e͟r͟t͟i͟o͟n͟ ͟o͟r͟d͟e͟r͟
#include <iostream>
#include <map>
#include <iomanip>
int main()
{
std::string inputString;
std::cout << "Enter a string: " << std::endl;
std::getline(std::cin, inputString);
std::map < char, int > charCount;
for (char & c: inputString)
{
charCount[c]++;
}
std::cout << "Total unique characters are: " << charCount.size() << std::endl;
std::cout << "------------------------------------" << std::endl;
std::cout << "Character" << std::setw(10) << "Count" << std::endl;
std::cout << "------------------------------------" << std::endl;
std::size_t i = 0;
//just go through the inputString instead of map
for(char &c: inputString)
{
std::size_t index = inputString.find(c);
if(index != inputString.npos && (index == i)){
std::cout << std::setw(4) << c << std::setw(13) << charCount.at(c)<<std::endl;
}
++i;
}
return 0;
}

Related

C++ output of string.size() not correct

I wrote the following code:
#include <iostream>
#include <time.h>
#include <string.h>
int main()
{
std::string alphabet = "abcdefghijklmnopqrstuvwxyz";
std::string word;
unsigned long long int i = 0;
srand(time(NULL));
while(true)
{
int randomIndex = rand() % 27;
word += alphabet[randomIndex];
// If the word might be the desired word, "hi"
if (word.size() == 2)
{
// The word should only be printed if its length is equal to 2
std::cout << i << " : " << word << word.size() << std::endl;
if (word == "hi")
{
break;
}
else // The word isn't "hi". We reset the variable word and continue looping
{
word = "";
}
i += 1;
}
}
return 0;
}
It is supposed to put together random letters until word is equal to "hi".
Until that, only 2-character words should be printed, but for some reason the program seems to be thinking that 1-character words have a length of 2. Therefore it also prints 1-character words.
Can anyone please help me?
Let's change the print debug line as follows.
std::cout << randomIndex << '\t' << i << " : " << word << " --> " << word.size() << std::endl;
Here, when we check the strings that look like a single character but 2 characters, we come across them in a random number of 26.
The length of the string "abcdefghijklmnopqrstuvwxyz" is 26. Since the indices of the arrays start from 0, they must be in the range of random numbers [0, 25] we produce.
Then let's update the line of code where we generate the random number as follows.
int randomIndex = rand() % 26
In the wrong code, the 26th character corresponds to the memory cell at the end of the string. Let's take a different example to understand.
int main()
{
char arr[5];
std::string a = "a";
a+=arr[1];
std::cout << a << " " << a.size() << '\n'; // 2
a+=arr[5];
std::cout << a << " " << a.size() << '\n'; // 3
a+=arr[6];
std::cout << a << " " << a.size() << '\n'; // 4
return 0;
}

String is out of range

The task is interchange two parts of a word, which contains the dash (i.e we have 1237-456 but should transform it into 456-1237). Here`s my code, it runs but doesnt shows results as a string is out of range and i dk why. It happens in the 1st for, the second iteration ends in the error+ it happens when strlen is 5 and more. The code:
#include <iostream>
#include <cstdlib>
#include <string>
using namespace std;
int main()
{
int u = 0, y = 0;
string first, second;
int i = 0;
string word;
cout << "Enter the text: " << endl;
getline(cin, word);
int l = size(word);
int f = word.find('-');
cout << "The word has " << l << " characters" << endl << endl;
for (int i = 0; i < f; i++) {
first[i] = word[i];
}
for (int i = f + 1; i < l; i++) {
second[y] = word[i];
y++;
}
cout << endl << second << " - " << first << endl;
}
first and second will not have memory allocated to them. They are initialized as strings of size 0. And for this case I would just use iterators instead of indices (though they could work too, but then you need more manual work to allocate enough room for the target strings and all).
All in all I think your code is mixing 'c' and 'c++' style a bit so here is my example:
#include <algorithm> // for find
#include <iostream>
// #include <cstdlib> // <<== this is "c" not C++
// using namespace std; <<== unlearn this
int main()
{
std::string word{ "Mississippi-delta"};
// std::string has a lenght function use that
std::cout << "The word has " << word.length() << " characters\n";
// "it" will be an iterator to the location of '-' (if any)
auto it = std::find(word.begin(), word.end(), '-');
// it points (beyond) the end of the word if no '-' is found
if (it == word.end())
{
std::cout << "no '-' found in word";
}
else
{
std::string first{ word.begin(),it };
++it; // skip '-'
std::string second{ it,word.end() };
std::cout << second << "-" << first << "\n";
}
return 0;
}
Instead of accessing the elements of first and second, just try using .push_back() to add characters from word.

find the longest word in a vector c++

So i have a c++ vector which contains about 106 thousand words which are stored on vector<string>words i need to find the longest word in this vector and i also need to get the location at which the word is, for example (1,2,3) in my vector. I need this location because i have two other vectors which has the meaning and the type for the words. vector<string>definition, vector<string>type
Please help
My current code
this code is not working at all
copy_if(words.begin(), words.end(), back_inserter(length), [](const string& x) { return x.length() > 40; });// looks for words longer than 7 letters
for (const string& s : length)
{
cout << "found!!" << endl;
auto i = find(words.begin(), words.end(), s);//looks for the word in the words vector
if (i != words.end())
{
auto pos = i - words.begin();
//displays the word, type and the definition of the word that the user has entered
cout << "Word : " << words[pos] << '\n';
cout << "Type : " << definitions[pos] << '\n';
cout << "Definition: " << types[pos] << '\n';
cout << '\n';
}
else
cout << "word not found" << endl;
}
You could use the standard algorithm std::max_element to search through the vector<string>.
Example:
#include <algorithm> // max_element
#include <iostream>
#include <iterator> // distance
#include <string>
#include <vector>
int main() {
std::vector<std::string> words{"a", "bb", "ccc"};
auto it = std::max_element(words.begin(), words.end(),
[](const auto& a, const auto& b) {
return a.size() < b.size();
});
std::cout << "The longest word is " << *it << " at (zero-based) pos "
<< std::distance(words.begin(), it) << '\n';
}
Output:
The longest word is ccc at (zero-based) pos 2
I would prefer thinking simply: just check length of elements according to each indice and update information according to that.
std::vector<std::string> length;
// initialize the vector length
size_t max_length = 0; // the length of longest word(s)
std::vector<size_t> max_indice; // the indice of longest word(s)
for (size_t i = 0; i < length.size(); i++) {
size_t this_len = length[i].length();
if (this_len > max_length) {
// new record
max_length = this_len;
max_indice.clear();
max_indice.push_back(i);
} else if (this_len == max_length) {
// tie
max_indice.push_back(i);
}
}
for (size_t pos : max_indice) {
cout << "Word : " << words[pos] << '\n';
cout << "Type : " << definitions[pos] << '\n';
cout << "Definition: " << types[pos] << '\n';
cout << '\n';
}

Finding Top Word Count in .txt file - while loop going extremely slow and not working properly

I'm trying to essentially iterate through every word in the .txt file and when I find a word (from my words map) with more than the maxwordcount variable I add it into the front of the topwords vector
int main(int argc, char** argv) {
fstream txtfile;
string filename = argv[1];
string word, tempword;
int maxwordcount = 0;
int wordcount = 0;
int uniquewordcount = 0;
vector<pair <string, int> > topwords;
map<string, int> words;
if (argc != 2) {
cout << "Incorrect number of arguments on the command line bud" << endl;
}else{
txtfile.open(filename.c_str());
if (txtfile.is_open()) {
while (txtfile >> word){
//removePunctuation(word);
//transform(word.begin(), word.end(), word.begin(), [](unsigned char c){ return::tolower(c); }); //makes string lowercase using iterator
if (words.find(word) == words.end()) {
words[word] = 1; //adds word into the map as a pair starting with a word count of 1
if (words[word] > maxwordcount) { //For case if word is the first word added to the map
maxwordcount = words[word]; //change maxwordcount
topwords.insert( topwords.begin(), make_pair(word, words[word]) ); //insert word into the front of the top words vector
cout << "word: '" << word << "' word-count: " << words[word] << endl;
}
uniquewordcount++;
}else{ //the word is found
words[word]++; //increment count for word by 1
if (words[word] > maxwordcount) { //check if wordcount > maxwordcount
topwords.insert( topwords.begin(), make_pair(word, words[word]) ); //insert word into the front of the top words vector
}
}
wordcount++;
}
At the end of the program I want to display top 10 or so words from the txt file. I tested whether the while loop was running by displaying a live wordcount (cout). The number was going up, but it was going up extremely slow. Also I'm using huge books for my txt files.
Image of results when running
I also don't completely understand inserting variables into maps and vectors, so something might be going wrong there.
I've hit a dead-end, so anything will help at this point.
I used a smaller text file too to test:
This is a small sentence to test test test
hey hey
Results:
word: 'This' word-count: 1
1
2
3
4
5
6
7
7
7
8
8
There were 11 words in the file.
There were 8 unique words in the file.
Top 20 words in little.txt:
hey 2
test 3
test 2
This 1
Segmentation fault
I know I'm doing something wrong, but I don't have a clue where to look next or what to test. Still an amateur at C++ and C too.
you should read the file by lines, process line by line
read file file by line: https://www.systutorials.com/how-to-process-a-file-line-by-line-in-c/
https://www.geeksforgeeks.org/split-a-sentence-into-words-in-cpp/
https://www.w3schools.com/cpp/cpp_functions.asp
https://www.w3schools.com/cpp/cpp_function_param.asp
https://www.w3schools.com/cpp/cpp_function_return.asp
https://www.w3schools.com/cpp/cpp_pointers.asp <--- very importend
https://www.w3schools.com/cpp/cpp_references.asp <--- likewise importend
#include <bits/stdc++.h>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <string>
#include <algorithm>
#include <sstream>
#include <regex>
#include <iterator>
using namespace std;
vector<std::pair <string, int> > topwords;
void store(vector<pair <string, int> > &topwords, string str){
auto pos = std::find_if(topwords.begin(), topwords.end(),
[str](std::pair<string, int> const &b) {
return b.first == str;
});
//std::cout<< pos->first << endl;
if(pos != topwords.end()){
std::cout << "word: " << pos->first << " " << pos->second << " found" << endl;
pos->second++;
}
else{
std::cout << "not found" << endl;
topwords.push_back( make_pair(str,1) );
}
}
void removeDupWord(string str)
{
// Used to split string around spaces.
istringstream ss(str);
// Traverse through all words
/*
do {
// Read a word
string word;
ss >> word;
// Print the read word
cout << word << endl;
store(topwords, word );
// While there is more to read
} while (ss);
*/
string word;
while (ss >> word) {
//cout << word << endl;
const std::regex sanitized{ R"([-[\]{}()*+?.,\^$|#\s])" };
std::stringstream result;
std::regex_replace(std::ostream_iterator<char>(result), word.begin(), word.end(), sanitized, "");
//store(topwords, word );
store(topwords, result.str() );
}
}
void readReadFile(string &fileName){
std::cout << "fileName" << fileName << endl;
std::ifstream file(fileName);
std::string str;
while (std::getline(file, str)) {
//std::cout << str << "\n";
removeDupWord(str);
//store(topwords, str);
}
}
bool compareFunction (const std::pair<std::string, int> &a, const std::pair<std::string, int> &b) {
return a.first<b.first; // sort by letter
}
bool compareFunction2 (const std::pair<std::string, int> &a, const std::pair<std::string, int> &b) {
return a.second>b.second; // sort by count
}
bool cmp(pair<string, int> &A, pair<string, int> &B) {
return A.second < B.second;
}
void check(vector<pair <string, int> > &topwords){
std::pair<string, int> mostUsedWord = make_pair("",0);
for(auto ii : topwords){
std::cout << "word: " << ii.first << " count: " << ii.second << endl;
if(ii.second > mostUsedWord.second){
mostUsedWord.first = ii.first;
mostUsedWord.second = ii.second;
}
}
std::cout << "most used Word: " << mostUsedWord.first << " x " << mostUsedWord.second << " Times." << endl;
}
void get_higestTopTenValues(vector<pair <string, int> > &topwords){
std::sort(topwords.begin(),topwords.end(),compareFunction2);//sort the vector by count
int MAX = std::max_element(topwords.begin(), topwords.end(), cmp)->second;
std::cout << "max: " << MAX << endl;
for(auto ii : topwords){
//std::cout << "word: " << ii.first << " count: " << ii.second << endl;
if(ii.second >= (MAX - 10)){
std::cout << ii.first << " " << ii.second << endl;
}
}
}
void get_LowestTopTenValues(vector<pair <string, int> > &topwords){
std::sort(topwords.begin(),topwords.end(),compareFunction2);//sort the vector by count
int MIN = std::min_element(topwords.begin(), topwords.end(), cmp)->second;
std::cout << "min: " << MIN << endl;
for(auto ii : topwords){
//std::cout << "word: " << ii.first << " count: " << ii.second << endl;
if(ii.second <= (MIN + 9)){
std::cout << ii.first << " " << ii.second << endl;
}
}
}
int main ()
{
std::string word, fileName;
fileName = "input.txt";
readReadFile(fileName);
topwords.push_back( make_pair("ba",1) );
topwords.push_back( make_pair("bu",1) );
topwords.push_back( make_pair("hmmm",1) );
topwords.push_back( make_pair("what",1) );
topwords.push_back( make_pair("and",1) );
topwords.push_back( make_pair("hello",1) );
word = "hellos";
store(topwords, word);
store(topwords, word);
store(topwords, word);
store(topwords, word);
word = "hello";
store(topwords, word);
store(topwords, word);
store(topwords, word);
store(topwords, word);
store(topwords, word);
store(topwords, word);
std::sort(topwords.begin(),topwords.end(),compareFunction);//sort the vector by letter
// or
//std::sort(topwords.begin(),topwords.end(),compareFunction2);//sort the vector by count
std::cout << "---------------------------------------" << endl;
std::cout << " get all values" << endl;
std::cout << "---------------------------------------" << endl;
check(topwords);
std::cout << "---------------------------------------" << endl;
std::cout << " get the top 10 highest values" << endl;
std::cout << "---------------------------------------" << endl;
get_higestTopTenValues(topwords);
std::cout << "---------------------------------------" << endl;
std::cout << " get the top 10 lowest values" << endl;
std::cout << "---------------------------------------" << endl;
get_LowestTopTenValues(topwords);
}
This question has his been aswered long time ago. I stumbled over the question and asnwer and I find everything overly complicated.
Therefore I would like to add a more modern C++ solution making use of existing STL elements.
This makes the code more compact.
Please see below:
#include <iostream>
#include <utility>
#include <unordered_map>
#include <vector>
#include <algorithm>
#include <string>
#include <fstream>
const std::string fileName{"r:\\loremipsum.txt"};
int main() {
if (std::ifstream textFileStream{ fileName }; textFileStream) {
// Here we store the count of all words
std::unordered_map<std::string, size_t> counter{};
size_t countOfOverallWords{}; // Counter for the number of all words
// Read all words from file, remove punctuation, and count teh occurence
for (std::string word; textFileStream >> word; counter[word]++) {
word.erase(std::remove_if(word.begin(), word.end(), ispunct), word.end());
++countOfOverallWords;
}
// For storing the top 10
std::vector<std::pair<std::string, size_t>> top(10);
// Get top 10
std::partial_sort_copy(counter.begin(), counter.end(), top.begin(), top.end(),
[](const std::pair<std::string, size_t >& p1, const std::pair<std::string, size_t>& p2) { return p1.second > p2.second; });
// Now show result
std::cout << "Count of overall words:\t " << countOfOverallWords << "\nCount of unique words:\t " << counter.size() << "\n\nTop 10:\n";
for (const auto& t : top) std::cout << "Value: " << t.first << "\t Count: " << t.second << '\n';
}
else std::cerr << "\n\nError: Could not open source file '" << fileName << "'\n\n";
return 0;
}
Developed and tested with Microsoft Visual Studio Community 2019, Version 16.8.2.
Additionally compiled and tested with clang11.0 and gcc10.2 with flags --std=c++17 -Wall -Wextra -Wpedantic
Language: C++17

C++ stringstream value extraction

I am trying to extract values from myString1 using std::stringstream like shown below:
// Example program
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
string myString1 = "+50years";
string myString2 = "+50years-4months+3weeks+5minutes";
stringstream ss (myString1);
char mathOperator;
int value;
string timeUnit;
ss >> mathOperator >> value >> timeUnit;
cout << "mathOperator: " << mathOperator << endl;
cout << "value: " << value << endl;
cout << "timeUnit: " << timeUnit << endl;
}
Output:
mathOperator: +
value: 50
timeUnit: years
In the output you can see me successfully extract the values I need, the math operator, the value and the time unit.
Is there a way to do the same with myString2? Perhaps in a loop? I can extract the math operator, the value, but the time unit simply extracts everything else, and I cannot think of a way to get around that. Much appreciated.
The problem is that timeUnit is a string, so >> will extract anything until the first space, which you haven't in your string.
Alternatives:
you could extract parts using getline(), which extracts strings until it finds a separator. Unfortunately, you don't have one potential separator, but 2 (+ and -).
you could opt for using regex directly on the string
you could finally split the strings using find_first_of() and substr().
As an illustration, here the example with regex:
regex rg("([\\+-][0-9]+[A-Za-z]+)", regex::extended);
smatch sm;
while (regex_search(myString2, sm, rg)) {
cout <<"Found:"<<sm[0]<<endl;
myString2 = sm.suffix().str();
//... process sstring sm[0]
}
Here a live demo applying your code to extract ALL the elements.
You could something more robust like <regex> like in the example below:
#include <iostream>
#include <regex>
#include <string>
int main () {
std::regex e ("(\\+|\\-)((\\d)+)(years|months|weeks|minutes|seconds)");
std::string str("+50years-4months+3weeks+5minutes");
std::sregex_iterator next(str.begin(), str.end(), e);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << "Expression: " << match.str() << "\n";
std::cout << " mathOperator : " << match[1] << std::endl;
std::cout << " value : " << match[2] << std::endl;
std::cout << " timeUnit : " << match[4] << std::endl;
++next;
}
}
Output:
Expression: +50years
mathOperator : +
value : 50
timeUnit : years
Expression: -4months
mathOperator : -
value : 4
timeUnit : months
Expression: +3weeks
mathOperator : +
value : 3
timeUnit : weeks
Expression: +5minutes
mathOperator : +
value : 5
timeUnit : minutes
LIVE DEMO
I'd use getline for the timeUnit, but since getline can take only one delimiter, I'd search the string separately for mathOperator and use that:
string myString2 = "+50years-4months+3weeks+5minutes";
stringstream ss (myString2);
size_t pos=0;
ss >> mathOperator;
do
{
cout << "mathOperator: " << mathOperator << endl;
ss >> value;
cout << "value: " << value << endl;
pos = myString2.find_first_of("+-", pos+1);
mathOperator = myString2[pos];
getline(ss, timeUnit, mathOperator);
cout << "timeUnit: " << timeUnit << endl;
}
while(pos!=string::npos);