find the longest word in a vector c++ - c++

So i have a c++ vector which contains about 106 thousand words which are stored on vector<string>words i need to find the longest word in this vector and i also need to get the location at which the word is, for example (1,2,3) in my vector. I need this location because i have two other vectors which has the meaning and the type for the words. vector<string>definition, vector<string>type
Please help
My current code
this code is not working at all
copy_if(words.begin(), words.end(), back_inserter(length), [](const string& x) { return x.length() > 40; });// looks for words longer than 7 letters
for (const string& s : length)
{
cout << "found!!" << endl;
auto i = find(words.begin(), words.end(), s);//looks for the word in the words vector
if (i != words.end())
{
auto pos = i - words.begin();
//displays the word, type and the definition of the word that the user has entered
cout << "Word : " << words[pos] << '\n';
cout << "Type : " << definitions[pos] << '\n';
cout << "Definition: " << types[pos] << '\n';
cout << '\n';
}
else
cout << "word not found" << endl;
}

You could use the standard algorithm std::max_element to search through the vector<string>.
Example:
#include <algorithm> // max_element
#include <iostream>
#include <iterator> // distance
#include <string>
#include <vector>
int main() {
std::vector<std::string> words{"a", "bb", "ccc"};
auto it = std::max_element(words.begin(), words.end(),
[](const auto& a, const auto& b) {
return a.size() < b.size();
});
std::cout << "The longest word is " << *it << " at (zero-based) pos "
<< std::distance(words.begin(), it) << '\n';
}
Output:
The longest word is ccc at (zero-based) pos 2

I would prefer thinking simply: just check length of elements according to each indice and update information according to that.
std::vector<std::string> length;
// initialize the vector length
size_t max_length = 0; // the length of longest word(s)
std::vector<size_t> max_indice; // the indice of longest word(s)
for (size_t i = 0; i < length.size(); i++) {
size_t this_len = length[i].length();
if (this_len > max_length) {
// new record
max_length = this_len;
max_indice.clear();
max_indice.push_back(i);
} else if (this_len == max_length) {
// tie
max_indice.push_back(i);
}
}
for (size_t pos : max_indice) {
cout << "Word : " << words[pos] << '\n';
cout << "Type : " << definitions[pos] << '\n';
cout << "Definition: " << types[pos] << '\n';
cout << '\n';
}

Related

How to convert a string to a dictionary of letters?

I need to convert letters into a dictionary of characters.
Here's an example:
letter
l: 1
e: 2
t: 2
r: 1
I did some research and found this helpful answer, but that was using getline() and separating words by spaces. Since I am trying to split by character I don't think I can use getline() since '' isn't a valid split character. I could convert to a char* array but I wasn't sure where that would get me.
This is fairly easy in other languages so I thought it wouldn't be too bad in C++. I was hoping there would be something like a my_map[key]++ or something. In Go I would write this as
// Word map of string: int values
var wordMap = make(map[string]int)
// For each letter, add to that key
for i := 0; i < len(word); i++ {
wordMap[string(word[i])]++
}
// In the end you have a map of each letter.
How could I apply this in C++?
How could I apply this in C++?
It could look rather similar to your Go code.
// Word map of char: int values
// (strings would be overkill, since you know they are a single character)
auto wordMap = std::map<char,int>{};
// For each letter, add to that key
for ( char c : word )
wordMap[c]++;
}
Here is the unicode version of Drew Dormann's answer:
#include <locale>
#include <codecvt>
std::string word = "some unicode: こんにちは世界";
std::map<char32_t, uint> wordMap;
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
for (auto c : converter.from_bytes(word)) {
wordMap[c]++;
}
for (const auto [c, v] : wordMap) {
std::cout << converter.to_bytes(c) << " : " << v << std::endl;
}
I wrote an article about this which can be checked out here. Below i have given 2 versions of the program. Version 1 keeps track of the character count in alphabetical order. But sometimes(in case) you want the character count in insertion order for which you can use Version 2.
Version 1: Get character count in ͟a͟l͟p͟h͟a͟b͟e͟t͟i͟c͟a͟l͟ ͟o͟r͟d͟e͟r͟
#include <iostream> //needed for std::cout, std::cin
#include <map> //needed for std::map
#include <iomanip> //needed for formating the output (std::setw)
int main()
{
std::string inputString; //user input will be read into this string variable
std::cout << "Enter a string: " << std::endl;
std::getline(std::cin, inputString);
//this map maps the char to their respective count
std::map < char, int > charCount;
//iterate through the inputString
for (char & c: inputString)
{
charCount[c]++;//increment the count for character c
}
std::cout << "Total unique characters are: " << charCount.size() << std::endl;
std::cout << "------------------------------------" << std::endl;
std::cout << "Character" << std::setw(10) << "Count" << std::endl;
std::cout << "------------------------------------" << std::endl;
for (std::pair < char, int > pairElement: charCount)
{
std::cout << std::setw(4) << pairElement.first << std::setw(13) << pairElement.second << std::endl;
}
return 0;
}
Version 2: Get character count in i͟n͟s͟e͟r͟t͟i͟o͟n͟ ͟o͟r͟d͟e͟r͟
#include <iostream>
#include <map>
#include <iomanip>
int main()
{
std::string inputString;
std::cout << "Enter a string: " << std::endl;
std::getline(std::cin, inputString);
std::map < char, int > charCount;
for (char & c: inputString)
{
charCount[c]++;
}
std::cout << "Total unique characters are: " << charCount.size() << std::endl;
std::cout << "------------------------------------" << std::endl;
std::cout << "Character" << std::setw(10) << "Count" << std::endl;
std::cout << "------------------------------------" << std::endl;
std::size_t i = 0;
//just go through the inputString instead of map
for(char &c: inputString)
{
std::size_t index = inputString.find(c);
if(index != inputString.npos && (index == i)){
std::cout << std::setw(4) << c << std::setw(13) << charCount.at(c)<<std::endl;
}
++i;
}
return 0;
}

How can I print the elements of a vector as a ordered list using a range-based loop

Ex:
vector<string> myVec = {apple, banana, grape}
How can I print these elements as an ordered list using a range-based loop
Output:
1 apple
2 banana
3 grape
A variation of Jeffrey's answer, but without additional variable:
for (const auto& s : myVec)
{
std::cout << &s - &myVec[0] << " " << s << "\n";
}
This, of course, prints a "correct" 0-based index. Feel free to add 1 to it :)
Using boost ranges really simplifies things Live Demo
using namespace boost::adaptors;
std::vector<std::string> myVec = {"apple", "banana", "grape"};
for (const auto& element : myVec | indexed(1))
{
std::cout << element.index() << " " << element.value() << "\n";
}
Produces
1 apple
2 banana
3 grape
You are looking for
size_t position = 1;
for(const auto& s: myVec)
{
std::cout << position << " " << s << "\n";
position++;
}
as in
#include <iostream>
#include <string>
#include <vector>
using std::vector;
using std::string;
vector<string> myVec = {"apple", "banana", "grape"};
int main()
{
size_t position = 1;
for(const auto& s: myVec)
{
std::cout << position << " " << s << "\n";
position++;
}
}
With range-v3, you could write:
for (auto [i, val] : myVec | ranges::views::enumerate)
{
std::cout << i << ' ' << val << "\n";
}
Here's a demo.
This would be a good issue for the original for loop:
const size_t quantity = myVec.size();
for (unsigned int i = 0; i < quantity; ++i)
{
cout << (i + 1) << " " << myVec[i] << "\n";
}
Simple, effective. Don't knock the old stuff. :-)

Finding Top Word Count in .txt file - while loop going extremely slow and not working properly

I'm trying to essentially iterate through every word in the .txt file and when I find a word (from my words map) with more than the maxwordcount variable I add it into the front of the topwords vector
int main(int argc, char** argv) {
fstream txtfile;
string filename = argv[1];
string word, tempword;
int maxwordcount = 0;
int wordcount = 0;
int uniquewordcount = 0;
vector<pair <string, int> > topwords;
map<string, int> words;
if (argc != 2) {
cout << "Incorrect number of arguments on the command line bud" << endl;
}else{
txtfile.open(filename.c_str());
if (txtfile.is_open()) {
while (txtfile >> word){
//removePunctuation(word);
//transform(word.begin(), word.end(), word.begin(), [](unsigned char c){ return::tolower(c); }); //makes string lowercase using iterator
if (words.find(word) == words.end()) {
words[word] = 1; //adds word into the map as a pair starting with a word count of 1
if (words[word] > maxwordcount) { //For case if word is the first word added to the map
maxwordcount = words[word]; //change maxwordcount
topwords.insert( topwords.begin(), make_pair(word, words[word]) ); //insert word into the front of the top words vector
cout << "word: '" << word << "' word-count: " << words[word] << endl;
}
uniquewordcount++;
}else{ //the word is found
words[word]++; //increment count for word by 1
if (words[word] > maxwordcount) { //check if wordcount > maxwordcount
topwords.insert( topwords.begin(), make_pair(word, words[word]) ); //insert word into the front of the top words vector
}
}
wordcount++;
}
At the end of the program I want to display top 10 or so words from the txt file. I tested whether the while loop was running by displaying a live wordcount (cout). The number was going up, but it was going up extremely slow. Also I'm using huge books for my txt files.
Image of results when running
I also don't completely understand inserting variables into maps and vectors, so something might be going wrong there.
I've hit a dead-end, so anything will help at this point.
I used a smaller text file too to test:
This is a small sentence to test test test
hey hey
Results:
word: 'This' word-count: 1
1
2
3
4
5
6
7
7
7
8
8
There were 11 words in the file.
There were 8 unique words in the file.
Top 20 words in little.txt:
hey 2
test 3
test 2
This 1
Segmentation fault
I know I'm doing something wrong, but I don't have a clue where to look next or what to test. Still an amateur at C++ and C too.
you should read the file by lines, process line by line
read file file by line: https://www.systutorials.com/how-to-process-a-file-line-by-line-in-c/
https://www.geeksforgeeks.org/split-a-sentence-into-words-in-cpp/
https://www.w3schools.com/cpp/cpp_functions.asp
https://www.w3schools.com/cpp/cpp_function_param.asp
https://www.w3schools.com/cpp/cpp_function_return.asp
https://www.w3schools.com/cpp/cpp_pointers.asp <--- very importend
https://www.w3schools.com/cpp/cpp_references.asp <--- likewise importend
#include <bits/stdc++.h>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <string>
#include <algorithm>
#include <sstream>
#include <regex>
#include <iterator>
using namespace std;
vector<std::pair <string, int> > topwords;
void store(vector<pair <string, int> > &topwords, string str){
auto pos = std::find_if(topwords.begin(), topwords.end(),
[str](std::pair<string, int> const &b) {
return b.first == str;
});
//std::cout<< pos->first << endl;
if(pos != topwords.end()){
std::cout << "word: " << pos->first << " " << pos->second << " found" << endl;
pos->second++;
}
else{
std::cout << "not found" << endl;
topwords.push_back( make_pair(str,1) );
}
}
void removeDupWord(string str)
{
// Used to split string around spaces.
istringstream ss(str);
// Traverse through all words
/*
do {
// Read a word
string word;
ss >> word;
// Print the read word
cout << word << endl;
store(topwords, word );
// While there is more to read
} while (ss);
*/
string word;
while (ss >> word) {
//cout << word << endl;
const std::regex sanitized{ R"([-[\]{}()*+?.,\^$|#\s])" };
std::stringstream result;
std::regex_replace(std::ostream_iterator<char>(result), word.begin(), word.end(), sanitized, "");
//store(topwords, word );
store(topwords, result.str() );
}
}
void readReadFile(string &fileName){
std::cout << "fileName" << fileName << endl;
std::ifstream file(fileName);
std::string str;
while (std::getline(file, str)) {
//std::cout << str << "\n";
removeDupWord(str);
//store(topwords, str);
}
}
bool compareFunction (const std::pair<std::string, int> &a, const std::pair<std::string, int> &b) {
return a.first<b.first; // sort by letter
}
bool compareFunction2 (const std::pair<std::string, int> &a, const std::pair<std::string, int> &b) {
return a.second>b.second; // sort by count
}
bool cmp(pair<string, int> &A, pair<string, int> &B) {
return A.second < B.second;
}
void check(vector<pair <string, int> > &topwords){
std::pair<string, int> mostUsedWord = make_pair("",0);
for(auto ii : topwords){
std::cout << "word: " << ii.first << " count: " << ii.second << endl;
if(ii.second > mostUsedWord.second){
mostUsedWord.first = ii.first;
mostUsedWord.second = ii.second;
}
}
std::cout << "most used Word: " << mostUsedWord.first << " x " << mostUsedWord.second << " Times." << endl;
}
void get_higestTopTenValues(vector<pair <string, int> > &topwords){
std::sort(topwords.begin(),topwords.end(),compareFunction2);//sort the vector by count
int MAX = std::max_element(topwords.begin(), topwords.end(), cmp)->second;
std::cout << "max: " << MAX << endl;
for(auto ii : topwords){
//std::cout << "word: " << ii.first << " count: " << ii.second << endl;
if(ii.second >= (MAX - 10)){
std::cout << ii.first << " " << ii.second << endl;
}
}
}
void get_LowestTopTenValues(vector<pair <string, int> > &topwords){
std::sort(topwords.begin(),topwords.end(),compareFunction2);//sort the vector by count
int MIN = std::min_element(topwords.begin(), topwords.end(), cmp)->second;
std::cout << "min: " << MIN << endl;
for(auto ii : topwords){
//std::cout << "word: " << ii.first << " count: " << ii.second << endl;
if(ii.second <= (MIN + 9)){
std::cout << ii.first << " " << ii.second << endl;
}
}
}
int main ()
{
std::string word, fileName;
fileName = "input.txt";
readReadFile(fileName);
topwords.push_back( make_pair("ba",1) );
topwords.push_back( make_pair("bu",1) );
topwords.push_back( make_pair("hmmm",1) );
topwords.push_back( make_pair("what",1) );
topwords.push_back( make_pair("and",1) );
topwords.push_back( make_pair("hello",1) );
word = "hellos";
store(topwords, word);
store(topwords, word);
store(topwords, word);
store(topwords, word);
word = "hello";
store(topwords, word);
store(topwords, word);
store(topwords, word);
store(topwords, word);
store(topwords, word);
store(topwords, word);
std::sort(topwords.begin(),topwords.end(),compareFunction);//sort the vector by letter
// or
//std::sort(topwords.begin(),topwords.end(),compareFunction2);//sort the vector by count
std::cout << "---------------------------------------" << endl;
std::cout << " get all values" << endl;
std::cout << "---------------------------------------" << endl;
check(topwords);
std::cout << "---------------------------------------" << endl;
std::cout << " get the top 10 highest values" << endl;
std::cout << "---------------------------------------" << endl;
get_higestTopTenValues(topwords);
std::cout << "---------------------------------------" << endl;
std::cout << " get the top 10 lowest values" << endl;
std::cout << "---------------------------------------" << endl;
get_LowestTopTenValues(topwords);
}
This question has his been aswered long time ago. I stumbled over the question and asnwer and I find everything overly complicated.
Therefore I would like to add a more modern C++ solution making use of existing STL elements.
This makes the code more compact.
Please see below:
#include <iostream>
#include <utility>
#include <unordered_map>
#include <vector>
#include <algorithm>
#include <string>
#include <fstream>
const std::string fileName{"r:\\loremipsum.txt"};
int main() {
if (std::ifstream textFileStream{ fileName }; textFileStream) {
// Here we store the count of all words
std::unordered_map<std::string, size_t> counter{};
size_t countOfOverallWords{}; // Counter for the number of all words
// Read all words from file, remove punctuation, and count teh occurence
for (std::string word; textFileStream >> word; counter[word]++) {
word.erase(std::remove_if(word.begin(), word.end(), ispunct), word.end());
++countOfOverallWords;
}
// For storing the top 10
std::vector<std::pair<std::string, size_t>> top(10);
// Get top 10
std::partial_sort_copy(counter.begin(), counter.end(), top.begin(), top.end(),
[](const std::pair<std::string, size_t >& p1, const std::pair<std::string, size_t>& p2) { return p1.second > p2.second; });
// Now show result
std::cout << "Count of overall words:\t " << countOfOverallWords << "\nCount of unique words:\t " << counter.size() << "\n\nTop 10:\n";
for (const auto& t : top) std::cout << "Value: " << t.first << "\t Count: " << t.second << '\n';
}
else std::cerr << "\n\nError: Could not open source file '" << fileName << "'\n\n";
return 0;
}
Developed and tested with Microsoft Visual Studio Community 2019, Version 16.8.2.
Additionally compiled and tested with clang11.0 and gcc10.2 with flags --std=c++17 -Wall -Wextra -Wpedantic
Language: C++17

how can I find the sequence number (index) of word in such a paragraph c++?

I'm working on a project which needs to find the number of words and the indices of each word in the paragraph ...I have written the code which is counting the number of word in a string but I stuck with finding the indices of words,
such as : Hi John How are you I miss you ..
I need to print the indices like : 0 1 2 3 4 5 6 7
here is the code:
int _tmain(int argc, _TCHAR* argv[])
{
int count_words(std::string);
std::string input_text;
std::cout<< "Enter a text: ";
std::getline(std::cin,input_text);
int number_of_words=1;
int counter []={0};
for(int i = 0; i < input_text.length();i++)
if(input_text[i] == ' ')
number_of_words++;
std::cout << "Number of words: " << number_of_words << std::endl;
//std:: cout << number_of_words << std::endl;
system ("PAUSE");
}
Hopefully this helps. Edited to include use of count_words function.
#include <iostream>
#include <sstream>
void count_words(std::string);
int main(){
std::string input_text, output_text;
std::cout<< "Enter a text: ";
std::getline(std::cin,input_text);
count_words(input_text);
system ("PAUSE");
return 0; //MUST RETURN AN INTEGER VALUE FROM 'INT MAIN'
}
void count_words(std::string inputString){
std::string output_text;
std::stringstream indexes;
int number_of_words=0; //If there are no words, it would be false, make it 0.
//int counter []={0}; //This serves no purpose.
if(!inputString.empty()){// test to make sure it isn't empty.
number_of_words++;
for(int i = 0; i < inputString.length();i++){ // For loops should have curly braces {} containing their statement.
if(inputString[i] == ' '){
number_of_words++;
}
if((isalpha(inputString[i]))&&inputString[i-1]==' '){ //test for following space separated word
indexes << i << " ";
}
}
}
output_text = indexes.str(); //convert stringstream to string
std::cout << "Number of words: " << number_of_words << std::endl;
//std:: cout << number_of_words << std::endl; //duplicate info
std::cout << "Indexes: " << output_text << std::endl;
}
I'm not sure if i understand the question. You only need print the "indices"?? like this? (Using your own code)
#include <iostream>
#include <vector>
#include <string>
void stringTokenizer(const std::string& str, const std::string& delimiter, std::vector<std::string>& tokens) {
size_t prev = 0, next = 0, len;
while ((next = str.find(delimiter, prev)) != std::string::npos) {
len = next - prev;
if (len > 0) {
tokens.push_back(str.substr(prev, len));
}
prev = next + delimiter.size();
}
if (prev < str.size()) {
tokens.push_back(str.substr(prev));
}
}
int main()
{
std::vector <std::string> split;
std::string input_text;
std::cout<< "Enter a text: ";
std::getline(std::cin,input_text);
stringTokenizer(input_text, " ", split);
int number_of_words = 0;
for (std::vector<std::string>::iterator it = split.begin(); it != split.end(); it++, number_of_words++) {
std::cout << *it << " " << number_of_words << std::endl;
}
}

Counting how many times each word occurs in a file using map. (c++)

#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>
#include <map>
using namespace std;
int main()
{
ifstream fin;
fin.open("myTextFile.txt");
if ( fin.fail()){
cout << "Could not open input file.";
exit(1);
}
string next;
map <string, int> words;
while (fin >> next){
words[next]++;
}
cout << "\n\n" << "Number of words: " << words[next] << endl;
fin.close();
fin.open("myTextFile.txt");
while (fin >> next){
cout << next << ": " << words[next] << endl;
}
fin.close();
return 0;
}
My main problem is that when a word occurs more than once, it is also listed more then once. i.e if the text starts with "hello hello" then cout produces:
"hello: 2" '\n' "hello: 2"
Also, i'd like not to have to close, and then reopen the file for the second while to be true. It seems like its still at the end of the file from the last while loop.
You need to iterate trough the map, and not open the file a second time.
Look at the code sample provided here.
EDIT: here a code sample that iterates trough a map
// map::begin/end
#include <iostream>
#include <map>
int main ()
{
std::map<char,int> mymap;
std::map<char,int>::iterator it;
mymap['b'] = 100;
mymap['a'] = 200;
mymap['c'] = 300;
// show content:
for (std::map<char,int>::iterator it=mymap.begin(); it!=mymap.end(); ++it)
std::cout << it->first << " => " << it->second << '\n';
return 0;
}
Here is the output:
a => 200
b => 100
c => 300
You don't need re-open file:
for (auto i = words.begin(); i != words.end(); i++)
{
cout << i->first << " : " << i->second << endl;
}
or simpler:
for (const auto &i : words)
{
cout << i.first << " : " << i.second << endl;
}
You need to iterate over the map after you set it and then you do not need to open the file again, this is trivial example:
int main()
{
std::map<std::string, int> m1 ;
m1["hello"] = 2 ;
m1["world"] = 4 ;
for( const auto &entry : m1 )
{
std::cout << entry.first << " : " << entry.second << std::endl ;
}
}
The expected output would be:
hello : 2
world : 4