String.erase giving out_of_range exception - c++

I was meant to write some program which will read text from text file and erase given words.
Unfortunately, something's wrong with this particular part of code, I get the following exception notification:
This text is just a sample, based on other textterminate called after throwing
an instance of 'std::out_of_range' what<>: Basic_string_erase
I guess that there is something wrong with the way I use erase, I'm trying to to use do while loop, determine the beginning of word which is meant to be erased every time the loop is done and eventually erase text which begins at the beginning of word which is meant to be erased and the end of it - I'm using its length.
#include <iostream>
#include <string>
using namespace std;
void eraseString(string &str1, string &str2) // str1 - text, str2 - phrase
{
size_t positionOfPhrase = str1.find(str2);
if(positionOfPhrase == string::npos)
{
cout <<"Phrase hasn't been found... at all"<< endl;
}
else
{
do{
positionOfPhrase = str1.find(str2, positionOfPhrase + str2.size());
str1.erase(positionOfPhrase, str2.size());//**IT's PROBABLY THE SOURCE OF PROBLEM**
}while(positionOfPhrase != string::npos);
}
}
int main(void)
{
string str("This text is just a sample text, based on other text");
string str0("text");
cout << str;
eraseString(str, str0);
cout << str;
}

Your function is wrong. It is entirely unclear why you call method find twice after each other.
Try the following code.
#include <iostream>
#include <string>
std::string & eraseString( std::string &s1, const std::string &s2 )
{
std::string::size_type pos = 0;
while ( ( pos = s1.find( s2, pos ) ) != std::string::npos )
{
s1.erase( pos, s2.size() );
}
return s1;
}
int main()
{
std::string s1( "This text is just a sample text, based on other text" );
std::string s2( "text" );
std::cout << s1 << std::endl;
std::cout << eraseString( s1, s2 ) << std::endl;
return 0;
}
The program output is
This text is just a sample text, based on other text
This is just a sample , based on other

I think your trouble is that positionOfPhrase inside do loop can be string::npos, in which case erase will throw an exception. This can be fixed by changing logic to:
while (true) {
positionOfPhrase = str1.find(str2, positionOfPhrase + str2.size());
if (positionOfPhrase == string::npos) break;
str1.erase(positionOfPhrase, str2.size());
}

Related

How to remove stopwords from a vector of sentences?

I am working on some code that requires stopwords to be removed from sentences. My current solution does not work.
I have a vector of two test sentences:
std::vector<std::string> sentences = {"this is a test", "another a test"};
I have an unordered set of strings containing stopwords:
std::unordered_set<std::string> stopwords;
Now I tried to loop over the sentences in the vector, check and compare each word with the stopwords, and if it is a stopword is should get removed.
sentences.erase(std::remove_if(sentences.begin(), sentences.end(),
[](const std::string &s){return stopwords.find(s) != stopwords.end();}),
sentences.end());
The idea is that my vector -after removing the stopwords- contains the sentences without the stopwords, but for now, I get the exact same sentences back. Any idea why?
My unordered set is filled with the following function:
void load() {
std::ifstream file;
file.open ("stopwords.txt");
if(!file.is_open()) {return;}
std::string stopword;
while (file >> stopword) {
stopwords.insert(stopword);
}
}
Your current code cannot work, since you are not deleting words from each individual string. Your erase/remove_if call takes an entire string and tries to match the word in the set with the entire string.
First, you should write a simple function that when given a std::string and a map of words to delete, return the string with the deleted words.
Here is a small function using std::istringstream that can do this:
#include <unordered_set>
#include <sstream>
#include <string>
#include <iostream>
std::string remove_stop_words(const std::string& src, const std::unordered_set<std::string>& stops)
{
std::string retval;
std::istringstream strm(src);
std::string word;
while (strm >> word)
{
if ( !stops.count(word) )
retval += word + " ";
}
if ( !retval.empty())
retval.pop_back();
return retval;
}
int main()
{
std::string test = "this is a test";
std::unordered_set<std::string> stops = {"is", "test"};
std::cout << "Changed word:\n" << remove_stop_words(test, stops) << "\n";
}
Output:
Changed word:
this a
So once you have this working correctly, the std::vector version is nothing more than looping through each item in the vector and calling the remove_stop_words function:
int main()
{
std::vector<std::string> test = {"this is a test", "another a test"};
std::unordered_set<std::string> stops = {"is", "test"};
for (size_t i = 0; i < test.size(); ++i)
test[i] = remove_stop_words(test[i], stops);
std::cout << "Changed words:\n";
for ( auto& s : test )
std::cout << s << "\n";
}
Output:
Changed words:
this a
another a
Note that you can utilize the std::transform function to remove the hand-rolled loop in the above example:
#include <algorithm>
//...
int main()
{
std::vector<std::string> test = {"this is a test", "another a test"};
std::unordered_set<std::string> stops = {"is", "test"};
// Use std::transform
std::transform(test.begin(), test.end(), test.begin(),
[&](const std::string& s){return remove_stop_words(s, stops);});
std::cout << "Changed words:\n";
for ( auto& s : test )
std::cout << s << "\n";
}

How to run a string search algorithm through whole body of text

I am using the brute force string search algorithm to search through a small sentence, however I want the algorithm to return every time it finds the certain string instead of finding it once and then stopping
//Declare and initialise variables
string pat, text;
text = "This is a test sentence, find test within this string";
cout << text << endl;
//User input for pat
cout << "Please enter the string you want to search for" << endl;
cin >> pat;
//Set the length of the pat and text
int patLength = pat.size();
int textLength = text.size();
//Algorithm
for (int i = 0; i < textLength - patLength; ++i)
{
//Do while loop to run through the whole text
do
{
int j;
for (j = 0; j < patLength; j++)
{
if (text[i + j] != pat[j])
break; // Doesn't match here.
}
if (j == patLength)
{
finds.push(i); // Matched here.
}
} while (i < textLength);
}
//Print output
cout << "String: " << pat << " was found at positions: " << finds.top();
The program stores each find in a queue. When I run this program, it asks for the 'pat', then does nothing. I have done a bit of debugging and found that it is probably the do while loop. However I can't find a fix
You could use the std::string::find function combined with a function that you call for each find.
#include <iostream>
#include <functional>
#include <vector>
#include <sstream>
void Algorithm(
const std::string& text, const std::string& pat,
std::function<void(const std::string&,size_t)> f, std::vector<size_t>& positions)
{
size_t pos=0;
while((pos=text.find(pat, pos)) != std::string::npos) {
// store the position
positions.push_back(pos);
// call the supplied function
f(text, pos++);
}
}
// function to call for each position in which the pattern is found
void gotit(const std::string& found_in, size_t pos) {
std::cout << "Found in \"" << found_in << "\" # " << pos << "\n";
}
int main(int argc, char* argv[]) {
std::vector<std::string> args(argv+1, argv+argc);
if(args.size()==0)
args.push_back("This is a test sentence, find test within this string");
for(const auto& text : args) {
std::vector<size_t> found_at;
std::cout << "Please enter the string you want to search for: ";
std::string pat;
std::cin >> pat;
Algorithm(text, pat, gotit, found_at);
std::cout << "collected positions:\n";
for(size_t pos : found_at) {
std::cout << pos << "\n";
}
}
}
My first bit of advice would be to structure your code into separate functions.
Let's say you have a function that returns the position of the pattern's first occurrence in a sequence of characters:
using position = typename std::string::const_iterator;
position first_occurrence(position text_begin, position text_end, const std::string& pattern);
If there is no more occurrence of the pattern, it returns text_end.
You can now write a very simple loop:
auto occurrence = first_occurrence(text_begin, pattern);
while (occurrence != text_end) {
occurrences.push_back(occurrence);
occurrence = first_occurence(occurrence + 1, text_end, pattern);
}
to accumulate all the occurrences of the pattern.
The first_occurrence function already exists in the standard library under the name of std::search. Since C++17, you can customize this function with pattern-searching specialized searchers, such as std::boyer_moore_searcher: it pre-processes the pattern to make it faster to look for in the string. Here's an example application to your problem:
#include <algorithm>
#include <string>
#include <vector>
#include <functional>
using occurrence = typename std::string::const_iterator;
std::vector<occurrence> find_occurrences(const std::string& input, const std::string& pattern) {
auto engine = std::boyer_moore_searcher(pattern.begin(), pattern.end());
std::vector<occurrence> occurrences;
auto it = std::search(input.begin(), input.end(), engine);
while (it != input.end()) {
occurrences.push_back(it);
it = std::search(std::next(it), input.end(), engine);
}
return occurrences;
}
#include <iostream>
int main() {
std::string text = "This is a test sentence, find test within this string";
std::string pattern = "st";
auto occs = find_occurrences(text, pattern);
for (auto occ: occs) std::cout << std::string(occ, std::next(occ, pattern.size())) << std::endl;
}

Difficulties with string declaration/reference parameters (c++)

Last week I got an homework to write a function: the function gets a string and a char value and should divide the string in two parts, before and after the first occurrence of the existing char.
The code worked but my teacher told me to do it again, because it is not well written code. But I don't understand how to make it better. I understand so far that defining two strings with white spaces is not good, but i get out of bounds exceptions otherwise. Since the string input changes, the string size changes everytime.
#include <iostream>
#include <string>
using namespace std;
void divide(char search, string text, string& first_part, string& sec_part)
{
bool firstc = true;
int counter = 0;
for (int i = 0; i < text.size(); i++) {
if (text.at(i) != search && firstc) {
first_part.at(i) = text.at(i);
}
else if (text.at(i) == search&& firstc == true) {
firstc = false;
sec_part.at(counter) = text.at(i);
}
else {
sec_part.at(counter) = text.at(i);
counter++;
}
}
}
int main() {
string text;
string part1=" ";
string part2=" ";
char search_char;
cout << "Please enter text? ";
getline(cin, text);
cout << "Please enter a char: ? ";
cin >> search_char;
divide(search_char,text,aprt1,part2);
cout << "First string: " << part1 <<endl;
cout << "Second string: " << part2 << endl;
system("PAUSE");
return 0;
}
I would suggest you, learn to use c++ standard functions. there are plenty utility function that can help you in programming.
void divide(const std::string& text, char search, std::string& first_part, std::string& sec_part)
{
std::string::const_iterator pos = std::find(text.begin(), text.end(), search);
first_part.append(text, 0, pos - text.begin());
sec_part.append(text, pos - text.begin());
}
int main()
{
std::string text = "thisisfirst";
char search = 'f';
std::string first;
std::string second;
divide(text, search, first, second);
}
Here I used std::find that you can read about it from here and also Iterators.
You have some other mistakes. you are passing your text by value that will do a copy every time you call your function. pass it by reference but qualify it with const that will indicate it is an input parameter not an output.
Why is your teacher right ?
The fact that you need to initialize your destination strings with empty space is terrible:
If the input string is longer, you'll get out of bound errors.
If it's shorter, you got wrong answer, because in IT and programming, "It works " is not the same as "It works".
In addition, your code does not fit the specifications. It should work all the time, independently of the current value which is stored in your output strings.
Alternative 1: your code but working
Just clear the destination strings at the beginning. Then iterate as you did, but use += or push_back() to add chars at the end of the string.
void divide(char search, string text, string& first_part, string& sec_part)
{
bool firstc = true;
first_part.clear(); // make destinations strings empty
sec_part.clear();
for (int i = 0; i < text.size(); i++) {
char c = text.at(i);
if (firstc && c != search) {
first_part += c;
}
else if (firstc && c == search) {
firstc = false;
sec_part += c;
}
else {
sec_part += c;
}
}
}
I used a temporary c instead of text.at(i) or text\[i\], in order to avoid multiple indexing But this is not really required: nowadays, optimizing compilers should produce equivalent code, whatever variant you use here.
Alternative 2: use string member functions
This alternative uses the find() function, and then constructs a string from the start until that position, and another from that position. There is a special case when the character was not found.
void divide(char search, string text, string& first_part, string& sec_part)
{
auto pos = text.find(search);
first_part = string(text, 0, pos);
if (pos== string::npos)
sec_part.clear();
else sec_part = string(text, pos, string::npos);
}
As you understand yourself these declarations
string part1=" ";
string part2=" ";
do not make sense because the entered string in the object text can essentially exceed the both initialized strings. In this case using the string method at can result in throwing an exception or the strings will have trailing spaces.
From the description of the assignment it is not clear whether the searched character should be included in one of the strings. You suppose that the character should be included in the second string.
Take into account that the parameter text should be declared as a constant reference.
Also instead of using loops it is better to use methods of the class std::string such as for example find.
The function can look the following way
#include <iostream>
#include <string>
void divide(const std::string &text, char search, std::string &first_part, std::string &sec_part)
{
std::string::size_type pos = text.find(search);
first_part = text.substr(0, pos);
if (pos == std::string::npos)
{
sec_part.clear();
}
else
{
sec_part = text.substr(pos);
}
}
int main()
{
std::string text("Hello World");
std::string first_part;
std::string sec_part;
divide(text, ' ', first_part, sec_part);
std::cout << "\"" << text << "\"\n";
std::cout << "\"" << first_part << "\"\n";
std::cout << "\"" << sec_part << "\"\n";
}
The program output is
"Hello World"
"Hello"
" World"
As you can see the separating character is included in the second string though I think that maybe it would be better to exclude it from the both strings.
An alternative and in my opinion more clear approach can look the following way
#include <iostream>
#include <string>
#include <utility>
std::pair<std::string, std::string> divide(const std::string &s, char c)
{
std::string::size_type pos = s.find(c);
return { s.substr(0, pos), pos == std::string::npos ? "" : s.substr(pos) };
}
int main()
{
std::string text("Hello World");
auto p = divide(text, ' ');
std::cout << "\"" << text << "\"\n";
std::cout << "\"" << p.first << "\"\n";
std::cout << "\"" << p.second << "\"\n";
}
Your code will only work as long the character is found within part1.length(). You need something similar to this:
void string_split_once(const char s, const string & text, string & first, string & second) {
first.clear();
second.clear();
std::size_t pos = str.find(s);
if (pos != string::npos) {
first = text.substr(0, pos);
second = text.substr(pos);
}
}
The biggest problem I see is that you are using at where you should be using push_back. See std::basic_string::push_back. at is designed to access an existing character to read or modify it. push_back appends a new character to the string.
divide could look like this :
void divide(char search, string text, string& first_part,
string& sec_part)
{
bool firstc = true;
for (int i = 0; i < text.size(); i++) {
if (text.at(i) != search && firstc) {
first_part.push_back(text.at(i));
}
else if (text.at(i) == search&& firstc == true) {
firstc = false;
sec_part.push_back(text.at(i));
}
else {
sec_part.push_back(text.at(i));
}
}
}
Since you aren't handling exceptions, consider using text[i] rather than text.at(i).

C++ Finding an incomplete string from a text file

I have a program which reads text files and parses information from them, and I am trying to accomplish a task like this:
A text file which has about 500 characters of data in it, in this data lies user name like so:
this_just_some_random_data_in_the_file_hdfhehhr2342t543t3y3y
_please_don't_mind_about_me(username: "sara123452")reldgfhfh
2134242gt3gfd2342353ggf43t436tygrghrhtyj7i6789679jhkjhkuklll
The thing is that we only need to find and write sara123452 to a string from that text file. The user name is unknown of course, and does not have fixed length.
Here is what I have managed to do so far:
std::string Profile = "http://something.com/all_users/data.txt";
std::string FileName = "profileInfo.txt";
std::string Buffer, ProfileName;
std::ifstream FileReader;
DeleteUrlCacheEntryA(Profile .c_str());
URLDownloadToFileA(0, Profile .c_str(), FileName.c_str(), 0, 0);
FileReader.open(FileName);
if (FileReader.is_open())
{
std::ostringstream FileBuffer;
FileBuffer << FileReader.rdbuf();
Buffer= FileBuffer.str();
if (Buffer.find("(username: ") != std::string::npos) {
cout << "dont know how to continue" << endl;
}
FileReader.close();
DeleteFileA(FileName.c_str());
}
else {
}
cin.get();
So how can I get the user name string and assign/copy it to ProfileName string?
I believe what you're looking for is something like the code below -- possibly with minor tweaks to account for the username being quoted. The key here is to remember that your Buffer variable is a std::string and you can use substring once you have a definite start and end.
std::size_t userNameStartIndex, userNameEndIndex
...
userNameStartIndex = Buffer.find("(username: ")) + 11;
if (userNameStartIndex != std::string::npos) {
userNameEndIndex = Buffer.find(")", userNameStartIndex);
if (userNameEndIndex != std::string::npos)
{
ProfileName = Buffer.substr(userNameStartIndex, userNameEndIndex - userNameStartIndex)
}
}
There are many other ways to do it, but this one would be less painful I guess.
#include <regex>
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
struct Profile
{ // ...
string name;
};
int main(int argc, const char * argv[])
{
std::cout.sync_with_stdio(false); // optional
// read from file
string filename {"data1.txt"};
ifstream in {filename};
vector<Profile> profiles;
// tweaks this pattern in case you're not happy with it
regex pat {R"(\(username:\s*\"(.*?)\"\))"};
for (string line; getline(in,line); ) {
Profile p;
sregex_iterator first(cbegin(line), cend(line), pat);
const sregex_iterator last;
while (first != last) {
// By dereferencing a first, you get a smatch object.
// [1] gives you the matched sub-string:
p.name = (*first)[1]; // (*first)[1] = sara123452
profiles.push_back(p);
++first;
}
}
// Test
for (const auto& p : profiles)
cout << p.name << '\n';
}

Print out the words of a line in reverse order through recursion

I'm trying not to use any storage containers. I don't know if it's even possible. Here is what I have so far. (I'm getting a segmentation fault).
#include <iostream>
#include <string>
using namespace std;
void foo(string s)
{
size_t pos;
pos = s.find(' ');
if(pos == string::npos)
return;
foo(s.erase(0, pos));
cout << s.substr(0, pos) << " ";
}
int main()
{
foo("hello world");
return 0;
}
I know there's probably many things wrong with this code. So rip away. I'm eager to learn. I'm trying to imitate a post order print as you would do in a reverse print of a singly linked list. Thanks.
EDIT:
An example:
"You are amazing" becomes "amazing are You"
The segfault is a stack overflow.
foo( "hello world" ) erases everything up to the first space (" world") and recurses.
foo( " world" ) erases everything up to the first space (" world") and recurses.
foo( " world" )... you get the idea.
Also, once you called foo( s.erase( 0, pos ) ), trying to print s.substr( 0, pos ) after the recursion returns does not make sense. You need to save the substring somewhere before you erase it, so you still have it to print afterwards.
void foo(string s)
{
size_t pos = s.find(' '); // declare-and-use in one line
string out = s.substr( 0, pos ); // saving the substring
if ( pos != string::npos )
{
foo( s.erase( 0, pos + 1 ) ); // recurse, skipping the space...
cout << " "; // ...but *print* the space
}
cout << out; // print the saved substring
}
The problem is that your recursion continues until you run out of memory.
Pay attention to this line:
if(pos == string::npos)
when your erase the substring you don't erase the white space so in the next recursion s.find returns pos = 0 which means that your recursion never ends.
Here is a code that works. Also note that I added a level variable to be able to control the behaviour on the first level (in this case add a endl)
#include <iostream>
#include <string>
using namespace std;
void foo(string s, int l)
{
size_t pos;
pos = s.find(' ');
if(pos == string::npos){
cout << s << " ";
return;
}
string temp = s.substr(0, pos);
foo(s.erase(0, pos+1),l+1);
cout << temp << " ";
if(l == 0)
cout << endl;
}
int main()
{
foo("hello world", 0);
return 0;
}
An approach to recursion, which may allow your compiler to transform automatically to iteration, is to accumulate the result in the function arguments. This will be familiar if you've written recursive functions in any of the Lisp family of languages:
#include <iostream>
#include <string>
std::string reverse_words(const std::string& s, const std::string& o = {})
{
using std::string;
const auto npos = string::npos;
static const string whitespace(" \n\r\t");
// find start and end of the first whitespace block
auto start = s.find_first_of(whitespace);
if (start == npos)
return s + o;
auto end = s.find_first_not_of(whitespace, start);
if (end == npos)
return s + o;
auto word = s.substr(0, start);
auto space = s.substr(start, end-start);
auto rest = s.substr(end);
return reverse_words(rest, space + word + o);
}
int main()
{
std::cout << reverse_words("hello to all the world") << std::endl;
std::cout << reverse_words(" a more difficult\n testcase ") << std::endl;
return 0;
}
I tried to make a brief example by using standard algorithms. I also handles more kinds of spaces than just standard whitespace (tabs for instance).
#include <cctype>
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
void print_reverse(string words) {
// Termination condition
if(words.empty())
return;
auto predicate = (int(*)(int))isspace;
auto sit = begin(words);
auto wit = find_if_not(sit, end(words), predicate);
auto nit = find_if (wit, end(words), predicate);
print_reverse(string(nit, end(words)));
// word spaces
cout << string(wit, nit) << string(sit, wit);
}
int main() {
string line;
getline(cin, line);
print_reverse(line);
cout << endl;
}
Here is an example run:
$ ./print-out-the-words-of-a-line-in-reverse-order-through-recursion
You are amazing
amazing are You
The key is in adding 1 to pos in the erase statement.
So try:
#include <iostream>
#include <string>
using namespace std;
void foo(string s)
{
size_t pos;
pos = s.find(' ');
if(pos == string::npos)
{
cout << s << " ";
return;
}
string out = s.substr(0, pos);
foo(s.erase(0, pos+1));
cout << out << " ";
}
int main()
{
foo("hello world");
cout << endl;
return 0;
}
EDIT
Alternatively you could use a char* instead of a std::string, then you do not need to make a temp variable. Try it online.
#include <iostream>
#include <cstring>
void foo(char* s)
{
char* next = std::strchr(s, ' ');
if(next != nullptr)
{
foo(next + 1);
*next = 0;
}
std::cout << s << " ";
}
int main()
{
char s[] = "You are amazing";
foo(s);
std::cout << std::endl;
}
The problem is that you're not doing anything with the last word and you're not doing anything with the remaining chunk.
If you have a recursive reverse printer, you'll want something like this (pseudocode):
def recursive-reverse(string) {
pos = string.find-last(" ");
if pos doesn't exist {
print string;
return;
} else {
print string.create-substring(pos+1, string.end);
recursive-reverse(string.create-substring(0, pos));
}
}
To implement this in C++:
#include <iostream>
#include <string>
void recursive_reverse(std::string &s) {
// find the last space
size_t pos = s.find_last_of(" ");
// base case - there's no space
if(pos == std::string::npos) {
// print only word in the string
std::cout << s << std::endl;
// end of recursion
return;
} else {
// grab everything after the space
std::string substring = s.substr(pos+1);
// print it
std::cout << substring << std::endl;
// grab everything before the space
std::string rest = s.substr(0, pos);
// recursive call on everything before the space
recursive_reverse(rest);
}
}
int main() {
std::string s("Hello World!");
recursive_reverse(s);
return 0;
}
ideone