std::erase crash when count duplicate char in a loop?

std::erase crash when count duplicate char in a loop? - c++

Segmentation fault is throw in the below code.
Counting total no of duplicate char As examples
input:
helloWorld
output:
l = 3
o = 2
steps
Loop input string
search for each char
count on found
del the counted char
Below is my code
#include<iostream>
#include<string>
#include<bits/stdc++.h>
#include <algorithm>
int main()
{
std::string str;
std::cin>>str;
int cout = 0;
std::string::iterator it1,it2;
for(it1 = str.begin() ; it1!=str.end(); ++it1)
{
for(it2 = str.begin() ; it2!=str.end(); ++it2)
{
if(*it1==*it2)
{
++cout;
continue ;
}
}
if(cout >1)
{
std::cout<<*it1<<"="<<cout<<"\n";
if(!str.empty() && str.find(*it1))
str.erase(std::remove(str.begin(), str.end(), *it1), str.end()); // remove char once count done
}
cout=0;
}
}

Your iterators it1 and it2 are being invalidated when you adjust the string str.
Don't do it like that. Use indexes into the std::string (of std::size_t types) instead.
(I'd solve this using an container foo say of int of size CHAR_MAX - CHAR_MIN + 1, initialised to zero, which is such that foo[i] is the number of times character i occurs. This can be populated in O(N), and the output stage is particularly trivial: see Can the Duplicate Characters in a string be Identified and Quantified in O(n)?)

Use c++'s built in functionality. By using sort and adjacent_find to do this in O(n log n) time:
sort(begin(str), end(str));
for(auto start = adjacent_find(cbegin(str), cend(str)), finish = upper_bound(start, cend(str), *start); start != cend(str); start = adjacent_find(finish, cend(str)), finish = upper_bound(start, cend(str), *start)) {
cout << *start << " = " << distance(start, finish) << endl;
}
Live Example

Related

Is there a better way than O(n³) to solve this text search?

I am trying to optimize a text search, where I am searching for multiple words. I want to know the frequency of all the words, per line.
I have tried to make it as fast as I can, as I want to run the search many times, with multiple keywords, on the same data.
I still am thinking though that there should be a more efficient way to solve this, so anybody has some good suggestions?
I have put up a simple demo to show the POC on gitlab:
https://gitlab.com/dkruithof/textfind
My current search time is 410ms on 6 keywords in a dataset of 408MB
Also, the source of the demo is this:
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#include <map>
#include <algorithm>
#include <vector>
#include <chrono>
using namespace std;
unsigned int addWord(std::map<std::string, unsigned int>& wordLookup, std::string word)
{
std::transform(word.begin(), word.end(), word.begin(), ::tolower);
auto it = wordLookup.find(word);
unsigned int id;
if (it == wordLookup.end())
{
id = wordLookup.size(); //assign consecutive numbers using size()
wordLookup[word] = id;
}
else
{
id = it->second;
}
return id;
}
void tokenizeWords(std::map<std::string, unsigned int>& wordLookup, std::vector<unsigned int>& wordList, std::string& line)
{
static const char newsDelimiters[] = "., !?\"()'\n\r\t<>/\\";
char str[line.size()];
strncpy(str, line.c_str(), line.size());
// Getting the first token
char *token = strtok(str, newsDelimiters);
while (token != NULL)
{
//finding a word:
unsigned int id = addWord(wordLookup, token);
wordList.push_back(id);
// Getting the next token
// If there are no tokens left, NULL is returned
token = strtok(NULL, newsDelimiters);
}
}
int main()
{
std::vector<std::vector<unsigned int>> textAsNumbers;
std::map<std::string, unsigned int> wordLookup;
std::vector<std::string> searchWords = {"this", "blog", "political", "debate", "climate", "iphone"};
unsigned int searchLength = searchWords.size();
unsigned int searchWordIds[searchLength];
//convert searchWords
unsigned int i = 0;
for(const std::string& word : searchWords)
{
searchWordIds[i] = addWord(wordLookup, word);
++i;
}
//#### This part is not time critical ####
//reading file and convert words to numbers
fstream newsFile;
newsFile.open("news.txt",ios::in);
if (newsFile.is_open())
{
string line;
while(getline(newsFile, line))
{
textAsNumbers.push_back(std::vector<unsigned int>());
std::vector<unsigned int>& wordList = *textAsNumbers.rbegin();
tokenizeWords(wordLookup, wordList, line);
}
newsFile.close();
}
//#### This part should be fast ####
auto start = std::chrono::system_clock::now();
std::vector<unsigned int> counts; //end result
counts.reserve(textAsNumbers.size());
for(std::vector<unsigned int>& line : textAsNumbers)
{
unsigned int count = 0;
for(unsigned int word : line)
{
for(unsigned int s = 0; s < searchLength; ++s)
{
unsigned int searchWord = searchWordIds[s];
if(word == searchWord)
{
++count;
}
}
}
counts.push_back(count);
}
auto end = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
cout << elapsed.count() << "ms" << endl;
//#### Print for checking result, time insensitive :)
int n = 0;
for(unsigned int count : counts)
{
cout << "Count[" << n << "]: " << count << endl;
++n;
if(n > 100)
{
break;
}
}
}
End results
I tried the multiple approaches, and the scores are as following:
Approach
User
Time
Encoding words
kcid42
410 ms
Hash tables
Öö Tiib & Jérôme Richard
135 ms
Ordered & encoded words
A M
13 ms
Hash tables & encoded words
Everybody
72 ms
The committed the results also to my gitlab, if you want to check for yourself.
Analysis
Using hash tables to speed up the search is smart, and does indeed reduce the search time. Better than my blunt approach at least. But it is still using strings, and string comparisons / construction / hashing is rather slow.
The approach of A M to speed up the encoded word search is I think faster because of that.
I have also tried to combine the approaches, to use the hash tables and encoded words together, but that was still slower than A M's custom search.
So I think we learned that A M is pretty good at searching stuff.
Thanks everybody for your input!

If you just want to speed up the part that you marked, then you can get a drastical improvement by sorting all vectors, before you enter this loop.
The searching will be really superfast.
The runtime of the loop will be reduced from 490ms to 10ms.
Can you please check and feed back.
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#include <map>
#include <algorithm>
#include <vector>
#include <chrono>
#include <algorithm>
unsigned int addWord(std::map<std::string, unsigned int>& wordLookup, std::string word)
{
std::transform(word.begin(), word.end(), word.begin(), ::tolower);
auto it = wordLookup.find(word);
unsigned int id;
if (it == wordLookup.end())
{
id = wordLookup.size(); //assign consecutive numbers using size()
wordLookup[word] = id;
}
else
{
id = it->second;
}
return id;
}
void tokenizeWords(std::map<std::string, unsigned int>& wordLookup, std::vector<unsigned int>& wordList, std::string line)
{
static const char newsDelimiters[] = "., !?\"()'\n\r\t<>/\\";
#pragma warning(suppress : 4996)
// Getting the first token
#pragma warning(suppress : 4996)
char* token = strtok(line.data(), newsDelimiters);
while (token != NULL)
{
//finding a word:
unsigned int id = addWord(wordLookup, token);
wordList.push_back(id);
// Getting the next token
// If there are no tokens left, NULL is returned
#pragma warning(suppress : 4996)
token = strtok(NULL, newsDelimiters);
}
}
int main()
{
std::vector<std::vector<unsigned int>> textAsNumbers;
std::map<std::string, unsigned int> wordLookup;
std::vector<std::string> searchWords = { "this", "blog", "political", "debate", "climate", "iphone" };
unsigned int searchLength = searchWords.size();
std::vector<unsigned int> searchWordIds(searchLength);
//convert searchWords
unsigned int i = 0;
for (const std::string& word : searchWords)
{
searchWordIds[i] = addWord(wordLookup, word);
++i;
}
std::sort(searchWordIds.begin(), searchWordIds.end());
//#### This part is not time critical ####
//reading file and convert words to numbers
std::fstream newsFile;
newsFile.open("r:\\news.txt", std::ios::in);
if (newsFile.is_open())
{
std::string line;
while (std::getline(newsFile, line))
{
textAsNumbers.push_back(std::vector<unsigned int>());
std::vector<unsigned int>& wordList = *textAsNumbers.rbegin();
tokenizeWords(wordLookup, wordList, line);
std::sort(textAsNumbers.back().begin(), textAsNumbers.back().end());
}
newsFile.close();
}
#if 1
std::vector<unsigned int>::iterator last2 = searchWordIds.end();
//#### This part should be fast ####
auto start = std::chrono::system_clock::now();
std::vector<unsigned int> counts; //end result
counts.reserve(textAsNumbers.size());
for (std::vector<unsigned int>& line : textAsNumbers)
{
unsigned int count = 0;
std::vector<unsigned int>::iterator first1 = line.begin();
std::vector<unsigned int>::iterator last1 = line.end();
std::vector<unsigned int>::iterator first2 = searchWordIds.begin();
while (first1 != last1 && first2 != last2) {
if (*first1 < *first2) {
++first1;
}
else {
if (!(*first2 < *first1)) {
++count;
++first1;
}
else
++first2;
}
}
counts.push_back(count);
}
auto end = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << elapsed.count() << "ms\n";
#else
auto start = std::chrono::system_clock::now();
std::vector<unsigned int> counts; //end result
counts.reserve(textAsNumbers.size());
for ( std::vector<unsigned int>& line : textAsNumbers)
{
unsigned int count = 0;
for (unsigned int word : line)
{
for (unsigned int s = 0; s < searchLength; ++s)
{
unsigned int searchWord = searchWordIds[s];
if (word == searchWord)
{
++count;
}
}
}
counts.push_back(count);
}
auto end = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << elapsed.count() << "ms\n";
#endif
//#### Print for checking result, time insensitive :)
int n = 0;
for (unsigned int count : counts)
{
std::cout << "Count[" << n << "]: " << count << '\n';
++n;
if (n > 100)
{
break;
}
}
}
.
Edit:
We can make the overall program much more faster by optimizing the design:
increase the IO-Buffer size
read the whole file in one shot (not line by line)
use a special encryption for the characters. Convert all none-essential characters to a SPACE. This will make comparison really fast
use special identifier for End-Of-Line, count it and with that get the number of lines
store all words as std::string_view
also the key for the hash map for the dictionary will be a std::string_view
build the hash map in the same loop where words and End-Of_lines will be identified. This reduces duplication of work
Build rows with IDs for words, so that we can compare single integers instead of strings
Sort all those rows will all encoded words. This will make comparing very fast
Use optimized search and compare algorithm to count the matches per line
All this will reduce the runtime for the whole program from the original roughly 40s to ~4.5s. So, nearly ten times faster.
We can see some astonishing results here:
Reading 430MB in 189 ms
And converting all this amount of data in 90 ms
Counting the number of lines in 80ms
Building a hash map with a size of 284k entries in 3.6 s
Sorting 5000 lines with each many entries in unbelievable 367 ms
And doing the matching and counting in 13 ms
Please see an example of an output. I use a 11 years old Windows 7 machine.
And the code:
#include <iostream>
#include <fstream>
#include <string>
#include <chrono>
#include <filesystem>
#include <cstdint>
#include <array>
#include <execution>
#include <unordered_map>
#include <string_view>
// Basic definitions for data types
using MyChar = uint8_t;
using EncoderType = unsigned int;
// Dependent data types
using String = std::basic_string<MyChar, std::char_traits<MyChar>, std::allocator<MyChar>>;
using StringView = std::basic_string_view<MyChar, std::char_traits<MyChar>>;
using IFStream = std::basic_ifstream<MyChar, std::char_traits<MyChar>>;
using Dictionary = std::unordered_map<StringView, EncoderType>;
using DictionaryIter = Dictionary::iterator;
using EncodedLine = std::vector<EncoderType>;
using EncodedLineIter = EncodedLine::iterator;
using EncodedLines = std::vector<EncodedLine>;
using SearchWords = std::vector<StringView>;
using SearchWordsEncoded = EncodedLine;
using CounterForMatchesInOneLine = std::size_t;
using CounterForMatchesForEachLineLine = std::vector<CounterForMatchesInOneLine>;
StringView operator"" _msv(const char* str, std::size_t len) { return StringView{ reinterpret_cast<const MyChar*>(str), len }; };
// Special encoding of values in text
constexpr MyChar SPACE = 254;
constexpr MyChar EOL = 255;
constexpr std::array<MyChar, 256> Convert{ SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,EOL,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,48,49,50,51,52,53,54,55,56,57,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE
,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE
,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE
,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE,SPACE };
// Speed up reading of file by using larger input buffer
constexpr std::size_t IOBufSize = 5'000'000u;
static MyChar ioBuf[IOBufSize];
// For measuring durations
struct Timer {
std::chrono::time_point<std::chrono::high_resolution_clock> startTime{};
long long elapsedTime{};
void start() { startTime = std::chrono::high_resolution_clock::now(); }
void stop() { elapsedTime = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - startTime).count(); }
friend std::ostream& operator << (std::ostream& os, const Timer& t) { return os << t.elapsedTime << " ms "; }
};
// Main Programm
int main() {
Timer t{}, tAll{}; tAll.start(); // Define Timers
Dictionary dictionary(300000); // The dictionory for words and their encoded IS
EncoderType encodedWordIdentifier{}; // This is for encoding strings. It will be simply incremented for each new word
// The words that we want to search. We use string_views for more efficient processing
SearchWords searchWords{ "this"_msv, "blog"_msv, "political"_msv, "debate"_msv, "climate"_msv, "iphone"_msv };
// And here we will store the encoded search words
SearchWordsEncoded searchWordsEncoded{};
// Add words to dictionary
for (const StringView& searchWord : searchWords) {
dictionary[searchWord] = encodedWordIdentifier;
searchWordsEncoded.push_back(encodedWordIdentifier++);
}
// Now read the complete text file and start all fata processing
// Open file and check, if it could be opened
if (IFStream ifs{ "r:\\news.txt",std::ios::binary }; ifs) {
// To speed up reading of the file, we will set a bigger input buffer
ifs.rdbuf()->pubsetbuf(ioBuf, IOBufSize);
// Here we will store the complete file, all data
String text{};
// Get number of bytes in file
const std::uintmax_t size = std::filesystem::file_size("r:\\news.txt");
text.resize(size);
// Read the whole file with one statement. Will be ultrafast
t.start();
ifs.read(text.data(), size);
t.stop(); std::cout << "Duration for reading complete file:\t\t\t\t" << t << "\tData read: " << ifs.gcount() << " bytes\n";
// No convert characters. Set all none essential characters to space. Build lowercase text. Special Mark for end of line
t.start();
std::transform(std::execution::par, text.begin(), text.end(), text.begin(), [&](const MyChar c) {return Convert[c]; });
t.stop(); std::cout << "Duration for converting all text data:\t\t\t\t" << t << '\n';
// Count the number of lines. We need this to pre-allocate space for our vectors
t.start();
std::size_t numberOfLines = std::count(std::execution::par, text.begin(), text.end(), EOL);
if (text.back() == EOL) ++numberOfLines;
t.stop(); std::cout << "Duration for counting number of lines:\t\t\t\t" << t << "\tNumber of lines identified: " <<numberOfLines << '\n';
// Now we can define the vector for the encoded lines with the exact needed size
EncodedLines encodedLines(numberOfLines);
// Start building the hash map. We will store string_views to optimize space
std::size_t wordLength{}; // Length of word that will be added to the hash map
MyChar* startWord{}; // Startposition (in the overall text) of the word to be added
bool waitForWord{ true }; // Mini state machine. Either we wait for start of word or its end
std::size_t index{}; // This will be used for addressing the current line
t.start();
// Iterate over all characters from the text file
for (MyChar& c : text) {
if (waitForWord) { // If we are in state of waiting for the beginning of the next word
if (c & 0b1000'0000) { // if the charcter is either space or end of line, continue to wait
if (c == EOL) ++index; // If we foound an end of line, then we will address the next line from now one
}
else { // Else, we found a character, so the beginning of a new word
startWord = &c; // Remember start position (in complete text file) of word
wordLength = 1; // The word length is now already 1, because we have foound the first character
waitForWord = false; // From now on we are "in" a word and wait for the end of the word, the next SPACE or EOL
}
}
else { // If we are in state of waiting for the end of the word
if (c & 0b1000'0000) { // If we have found a SPACE or EOL, then we found the end of a word
const StringView wordAsStringView{ startWord, wordLength }; // Build a string_view of the word
EncoderType currentEncodedWordIdentifier{ encodedWordIdentifier }; // Temporaray for the next encoding if
// Either add to dictioanry of use existing encoding ID
if (DictionaryIter entry = dictionary.find(wordAsStringView); entry != dictionary.end())
currentEncodedWordIdentifier = entry->second; // Already existing ID found. use it
else
dictionary[wordAsStringView] = encodedWordIdentifier++; // Create new entry in the hash map
encodedLines[index].push_back(currentEncodedWordIdentifier);
if (c == EOL) ++index; // If we have read an EOL, we will now address the next line
waitForWord = true; // We will change the state and from now on wait for the beginning of the next word again
}
else
++wordLength; // If we are in state of waiting for the end of the word and found a normal character, increment word length counter
}
}
t.stop(); std::cout << "Duration for building the dictionary and encode the lines:\t" << t << "Number of hashes : " << dictionary.size() << '\n';
// Sort all rows with line ideas. Will be very fast
t.start();
std::for_each(std::execution::par, encodedLines.begin(), encodedLines.end(), [](std::vector<unsigned int>& encodedLine) { std::sort(encodedLine.begin(), encodedLine.end()); });
t.stop(); std::cout << "Duration for sorting all line id encodings:\t\t\t" << t << '\n';
// Now, we will count, how often a search word appears in a line
CounterForMatchesForEachLineLine counterForMatchesForEachLineLine{}; // Vector of match-counters for each lines
counterForMatchesForEachLineLine.reserve(numberOfLines); // Preallocate memory
const EncodedLineIter searchWordsEnd = searchWordsEncoded.end(); // Pointer to search word vector end
t.start();
for (EncodedLine& encodedLine : encodedLines) // For all lines
{
CounterForMatchesInOneLine counterForMatchesInOneLine{}; // Counter for matches in current line
EncodedLineIter encodedLineCurrent = encodedLine.begin(); // Pointer to encoded value for current line
const EncodedLineIter encodedLineEnd = encodedLine.end(); // Pointer to last encoded value for current line
EncodedLineIter searchWordCurrent = searchWordsEncoded.begin(); // Pointer to beginning of search word IDs
// Compare and search. Take advantage of sorted IDs
while (encodedLineCurrent != encodedLineEnd && searchWordCurrent != searchWordsEnd) {
if (*encodedLineCurrent < *searchWordCurrent) {
++encodedLineCurrent;
}
else {
if (!(*searchWordCurrent < *encodedLineCurrent)) {
++counterForMatchesInOneLine;
++encodedLineCurrent;
}
else
++searchWordCurrent;
}
}
// Number of matches in this line has been detected. Store count for this line and continue with next line
counterForMatchesForEachLineLine.push_back(counterForMatchesInOneLine);
}
t.stop(); std::cout << "Duration for searching, comparing and counting:\t\t\t" << t << '\n';
tAll.stop(); std::cout << "\n\nDuration Program processing overall: " << tAll << '\n';
// Debug output
std::cout << "\n\nDemo Result. First 100 counts of matches:\n";
int lineCounter{};
for (CounterForMatchesInOneLine counterForMatchesInOneLine : counterForMatchesForEachLineLine)
{
std::cout << "Count[" << lineCounter++ << "]: " << counterForMatchesInOneLine << '\n';
if (lineCounter > 100) break;
}
}
else
std::cerr << "\n***Error: Could not open file\n";
}

I'd try building a https://en.wikipedia.org/wiki/Radix_tree that contains all your search words. When processing each line of text you then only need one maintain one pointer into the radix tree for each character position, and need to advance all of them with every additionally consumed character (or remove the pointer of the character sequence can no longer reach a valid word). Whenever an advanced pointer points to the end of a word, you increment your counter.
This shouldn't require any tokenization.

You do not need to iterate over all the searchWordIds items. Assuming this array do no contains any duplicates, you can use hash table for that so to make the algorithm runs in O(n²) time rather than O(n³) time (thanks to a O(1) search in searchWordIds). More specifically, an std::unordered_set<int> can be used so to check if word is in searchWordIds in constant time. You need to convert searchWordIds to a std::unordered_set<int> first. If the array has duplicates, then you can use a std::unordered_map<int, int> so to store the number of duplicates associated to a given word. The 2 nested loops consist in doing count += searchWordIds[word] in this last case.
If this is not enthough, you can use a Bloom filter so to speed up the lookup in searchWordIds. Indeed, this probabilistic data structure can very quickly find if word is not in searchWordIds (100% sure) or say if it is certainly in it (with a good accuracy assuming the bloom filter is sufficiently large). This should be at least twice faster. Possibly even more (the unordered_set and unordered_map are generally not very efficient, partially due to the use of linked-list-based buckets and a slow hash management).
If this is still not enough, you can parallelize the outermost loop. The idea is to compute a local count value for each section of the textAsNumbers array and then perform a final reduction. This assume the size of the sub arrays is relatively uniform (it will not scale well if one line is much much bigger than all others). You can flatten the vector<vector<int>> so to better load-balance the work and certainly even improve the performance in sequential (due to less indirections and likely less cache misses).

In practice I would perhaps serialize the whole text into std::unordered_map<std::string, int>. There string is word and int is count of that word in text. That operation is about O(X) where X is count of all words in text assuming that individual words are too short for hashing of those to matter. You said it is not time critical ... but just for the record.
After that searching a word in it is O(1) assuming again that the "word" means relatively short string and also we already have count of those words. If you have a list of words to search then it is O(N) where N is length of list.

Why am I having std::out_of_range error with string.replace()?

C++
Hi, I'm new to coding, I wanted to know why "str.replace" didn't work in the "for(char& i : str)" loop but it did in the "for (int i = 0; i <= str.length(); i++)" loop.
#include <iostream>
#include <string>
using namespace std;
int main()
{
string str = "aNo more 'a'";
int sum;
for (char& i : str){
if (i == 'a'){
sum += 1;
}
}
cout << str << " " << sum;
return 0;
}
Output: "aNo more 'a' 2"
This code seems to work fine.
#include <iostream>
#include <string>
using namespace std;
int main()
{
string str = "aNo more 'a'";
int sum;
for (char& i : str){
if (i == 'a'){
str.replace(i,1,"");
}
}
cout << str << endl;
return 0;
}
Error: Terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::replace: __pos (which is 97) > this->size() (which is 12)
#include <iostream>
#include <string>
using namespace std;
int main()
{
string str = "aNo more 'a'";
int sum;
for (int i = 0; i <= str.length(); i++){
if (str[i] == 'a'){
str.replace(i,1,"");
}
}
cout << str << endl;
return 0;
}
Output: "No more '' "
This code works fine because it's not running into the range error listed above. I don't know why that is. Can someone help me understand why the "for(char& i : str)" doesn't work?

A range based for-loop like this:
for (char& i : str) {
if (i == 'a') {
str.replace(i, 1, "");
}
}
Is roughly the same as:
for(auto it = std::begin(str), end = std::end(str); it != end; ++it) {
if (*it == 'a') {
str.replace(*it, 1, "");
}
}
It'll use the character value (i in your code or *it in my loop) as the position in the string where to start replacing. The character 'a' has the ASCII value 97 which explains the exception you get since 97 is clearly out of bounds.
If you change the length of str inside the loop, end will be invalidated and it will run out of bounds.
You may find using std::erase(std::string) easier. This will remove all 'a's from str:
std::erase(str, 'a');

When using a range-for in C++, the current element (i in this case) is not actually an iterator but rather is the element itself, thus i would have the value of the current element. Here you've declared i as char& meaning it is a reference to a character in the collection.
When you passed i to std::string::replace() it expected the first argument to be a value indicating the position you want to replace at. Because char is a smaller integral type it gets promoted to the std::string::size_type ::replace() expects for its first argument, for example 'a' would become 97. Because this is technically valid to request it tries to replace at that position but because 97 is out of the strings range it throws an exception, as it should.
It is considered bad practice to try and modify a container directly in a for loop, particularly a range-for loop as the end of the container becomes invalidated if reallocation occurs, which erasure and replacement often do.
Instead, to achieve the desired outcome, one can utilize a combination of the algorithms std::remove and std::string::erase or if you can use C++20 std::erase_if
std::string str = "aNo more 'a'";
auto end_no_as = std::remove(std::begin(str), std::end(str), 'a');
str.erase(end_no_as, std::end(str));
/// or
std::erase_if(str, [](char& c){ return c == 'a'; });
Godbolt Demo

How to "Fold a word" from a string. EX. "STACK" becomes "SKTCA". C++

I'm trying to figure out how to can fold a word from a string. For example "code" after the folding would become "ceod". Basically start from the first character and then get the last one, then the second character. I know the first step is to start from a loop, but I have no idea how to get the last character after that. Any help would be great. Heres my code.
#include <iostream>
using namespace std;
int main () {
string fold;
cout << "Enter a word: ";
cin >> fold;
string temp;
string backwards;
string wrap;
for (unsigned int i = 0; i < fold.length(); i++){
temp = temp + fold[i];
}
backwards= string(temp.rbegin(),temp.rend());
for(unsigned int i = 0; i < temp.length(); i++) {
wrap = fold.replace(backwards[i]);
}
cout << wrap;
}
Thanks

#Supreme, there are number of ways to do your task and I'm going to post one of them. But as #John had pointed you must try your own to get it done because real programming is all about practicing a lot. Use this solution just as a reference of one possibility and find many others.
int main()
{
string in;
cout <<"enter: "; cin >> in;
string fold;
for (int i=0, j=in.length()-1; i<in.length()/2; i++, j--)
{
fold += in[i];
fold += in[j];
}
if( in.length()%2 != 0) // if string lenght is odd, pick the middle
fold += in[in.length()/2];
cout << endl << fold ;
return 0;
}
good luck !

There are two approaches to this form of problem, a mathematically exact method would be to create a generator function which returns the number in the correct order.
An easier plan would be to modify the string to solve practically the problem.
Mathematical solution
We want a function which returns the index in the string to add. We have 2 sequences - increasing and decreasing and they are interleaved.
sequence 1 :
0, 1 , 2, 3.
sequence 2
len-1, len-2, len-3, len-4.
Given they are interleaved, we want even values to be from sequence 1 and odd values from sequence 2.
So our solution would be to for a given new index, choose which sequence to use, and then return the next value from that sequence.
int generator( int idx, int len )
{
ASSERT( idx < len );
if( idx %2 == 0 ) { // even - first sequence
return idx/2;
} else {
return (len- (1 + idx/2);
}
}
This can then be called from a function fold...
std::string fold(const char * src)
{
std::string result;
std::string source(src);
for (size_t i = 0; i < source.length(); i++) {
result += source.at(generator(i, source.length()));
}
return result;
}
Pratical solution
Although less efficient, this can be easier to think about. We are taking either the first or the last character of a string. This we will do using string manipulation to get the right result.
std::string fold2(const char * src)
{
std::string source = src;
enum whereToTake { fromStart, fromEnd };
std::string result;
enum whereToTake next = fromStart;
while (source.length() > 0) {
if (next == fromStart) {
result += source.at(0);
source = source.substr(1);
next = fromEnd;
}
else {
result += source.at(source.length() - 1); // last char
source = source.substr(0, source.length() - 1); // eat last char
next = fromStart;
}
}
return result;
}

You can take advantage of the concept of reverse iterators to write a generic algorithm based on the solution presented in Usman Riaz answer.
Compose your string picking chars from both the ends of the original string. When you reach the center, add the char in the middle if the number of chars is odd.
Here is a possible implementation:
#include <iostream>
#include <string>
#include <vector>
#include <utility>
#include <algorithm>
#include <iterator>
template <class ForwardIt, class OutputIt>
OutputIt fold(ForwardIt source, ForwardIt end, OutputIt output)
{
auto reverse_source = std::reverse_iterator<ForwardIt>(end);
auto reverse_source_end = std::reverse_iterator<ForwardIt>(source);
auto source_end = std::next(source, std::distance(source, end) / 2);
while ( source != source_end )
{
*output++ = *source++;
*output++ = *reverse_source++;
}
if ( source != reverse_source.base() )
{
*output++ = *source;
}
return output;
}
int main() {
std::vector<std::pair<std::string, std::string>> tests {
{"", ""}, {"a", "a"}, {"stack", "sktca"}, {"steack", "sktcea"}
};
for ( auto const &test : tests )
{
std::string result;
fold(
std::begin(test.first), std::end(test.first),
std::back_inserter(result)
);
std::cout << (result == test.second ? " OK " : "FAILED: ")
<< '\"' << test.first << "\" --> \"" << result << "\"\n";
}
}

Record number of occurrences of specific substring in VECTOR of strings

I'm testing a small program in order to create a larger one.
I have a vector of 3 strings:
pass
pass
TEST pass pass
I want to search the vector for the substring "pass" and record how many times "pass" is found in the vector of strings.
So basically I want it to return the number 4 (4 instances of the substring "pass")
Code looks like this
the strings are stored in the vector myV1
if (find(myV1.begin(), myV1.end(), "pass") != myV1.end() )
{
passes++;
}
when I do this it finds "pass" once and ignores the others.
I can't get a loop to work either. It tells me that it found however many instances of the substring "pass" for as many times as i loop through.
Thanks in advance for any advice

In short: here you can find the working code with an online compiler.
All you need is two loops, one for iterating over the vector elements, and one that iterates over each element while counting the desired word occurrence in that particular element. The external loop summarizes it then.
You could use string::find for the internal loop, and the external loop is the regular one with the iterators.
You will need the snippet below to work properly with C++98/03 and C++11 as well.
#include <string>
#include <vector>
#include <iostream>
using namespace std;
int main()
{
vector<string> stringList;
stringList.push_back("pass");
stringList.push_back("pass");
stringList.push_back("Test pass pass");
string searchWord = "pass";
int searchWordSize = searchWord.size();
int count = 0;
for (vector<string>::iterator iter = stringList.begin(); iter != stringList.end(); ++iter) {
// Avoid the overlapping search word. If that is needed, replace
// pos+=searchWordSize with ++pos
for (size_t pos = 0; pos < (*iter).length(); pos+=searchWordSize) {
pos = (*iter).find(searchWord, pos);
if (pos != string::npos)
++count;
else
break;
}
}
cout << "Count: " << count << endl;
return 0;
}
I have built and run the code with the following commands:
g++ main.cpp
./a.out
The output will be 4 as expected.

You could loop vector and use std::string::find to find the occurrence of "pass" in each string.
To count occurrence of a substring correctly, you need to record postion of first occurrence then increment position and continue the search.
int count(const std::string& s, const std::string token = "pass")
{
int n(0);
std::string::size_type pos = s.find(token);
while (pos != std::string::npos)
{
pos = s.find(token, pos + 1);
n++;
}
return n;
}
int main()
{
std::vector<std::string> v = {"pass", "pass", "TEST pass pass"};
int total(0);
for (auto& w : v)
{
total += count(w);
}
std::cout << total << std::endl;
}

Get three most occuring word with their count value from vector and unordered_map

My below code gives me most occurring word from string. I wan to get get three most occuring words from vector with their count value. Any help?
I have used vector and unordered_map. In last portion of code I got most occuring word from vector.
int main(int argc,char *argv[])
{
typedef std::unordered_map<std::string,int> occurrences;
occurrences s1;
std::string input = argv[1];
std::istringstream iss(std::move(input));
std::vector<std::string> most;
int max_count = 0,second=0,third=0;
//Here I get max_count, 2nd highest and 3rd highest count value
while (iss >> input)
{
int tmp = ++s1[input];
if (tmp == max_count)
{
most.push_back(input);
}
else if (tmp > max_count)
{
max_count = tmp;
most.clear();
most.push_back(input);
third = second;
second = max_count;
}
else if (tmp > second)
{
third = second;
second = tmp;
}
else if (tmp > third)
{
third = tmp;
}
}
//I have not used max_count, second, third below. I dont know how to access them for my purpose
//Print each word with it's occurenece. This works fine
for (occurrences::const_iterator it = s1.cbegin();it != s1.cend(); ++it)
std::cout << it->first << " : " << it->second << std::endl;;
//Prints word which occurs max time. **Here I want to print 1st highest,2nd highest,3rd highest occuring word with there occurrence. How to do?**
std::cout << std::endl << "Maximum Occurrences" << std::endl;
for (std::vector<std::string>::const_iterator it = most.cbegin(); it != most.cend(); ++it)
std::cout << *it << std::endl;
return 0;
}
Any idea to get 3 most occuring word?

I'd prefer to use a std::map<std::string, int> instead
Use this as a source map, insert values from a std::vector<std::string>
Now create multimap, a flip version of source map with std::greater<int> as Comparator
This final map has top three value as most frequent used words
Example :
#include<iostream>
#include<algorithm>
#include<map>
#include<vector>
int main()
{
std::vector<std::string> most { "lion","tiger","kangaroo",
"donkey","lion","tiger",
"lion","donkey","tiger"
};
std::map<std::string, int> src;
for(auto x:most)
++src[x];
std::multimap<int,std::string,std::greater<int> > dst;
std::transform(src.begin(), src.end(), std::inserter(dst, dst.begin()),
[] (const std::pair<std::string,int> &p) {
return std::pair<int,std::string>(p.second, p.first);
}
);
std::multimap<int,std::string>::iterator it = dst.begin();
for(int count = 0;count<3 && it !=dst.end();++it,++count)
std::cout<<it->second<<":"<<it->first<<std::endl;
}
DEMO HERE

It is easier and cleaner to use a heap to store the three most occuring words. It also is easily extensible to a larger number of most occuring words.

If I wanted to know the n most occurring words, I'd have an n element array, iterate over the list of the words, and store the ones that make it into my top n into the array (dropping the lowest one).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

std::erase crash when count duplicate char in a loop? - c++

Related

Is there a better way than O(n³) to solve this text search?

Why am I having std::out_of_range error with string.replace()?

How to "Fold a word" from a string. EX. "STACK" becomes "SKTCA". C++

Record number of occurrences of specific substring in VECTOR of strings

Get three most occuring word with their count value from vector and unordered_map

Categories

Resources