Bug with Iterating over a string c++ - c++

so I have a function called split_alpha() that takes in a std::string and splits the string into words, using any non-alphaneumeric character as a delimiter. It also maps the words to their lower-cased versions.
vector<string> split_alpha(string to_split) {
vector<string> results;
string::iterator start = to_split.begin();
string::iterator it = start;
++it;
//get rid of any non-alphaneumeric chars at the front of the string
while (!isalnum(*start)) {
++start;
++it;
}
while (it != to_split.end()) {
if (!isalnum(*it)) {
string to_add = string(start, it);
lower_alpha(to_add);
results.push_back(to_add);
++it;
if (it == to_split.end()) { break; }
while (!isalnum(*it)) {
++it;
if (it == to_split.end()) { break; }
}
start = it;
++it;
}
else {
++it;
if (it == to_split.end()) { break; }
}
}
//adds the last word
string to_add = string(start, it);
lower_alpha(to_add);
results.push_back(to_add);
return results;
}
The function works fine 99% of the time, but when I give it the string "Sending query: “SELECT * FROM users”" (not including the quotations around the whole string), it does something really weird. It essentially goes into an infinite loop (within that while loop) and never finds the end of the string. Instead it keeps reading random characters/strings from somewhere?? My vector ends up with a size of about 200 before it finally segfaults. Anyone know what could be causing this? I tried printing out the string and it seems perfectly fine. Once again, the code works on every other string I've tried.
Thanks!!

isn't the while loop doing that?
Yes, but you can have several ++it triggering before the while loop check, and in any one of those cases the iterator could already be at the end of the string. Most likely the other strings you tried did not cause a failure because they all end with an alphanumeric character.
Invert the order of the ++it and the check:
if (it == to_split.end()) { break; }
++it;
Explanation: the following assert will fail, as the iterator will no longer be pointing to the end of the string (but one character further):
if (it == to_split.end())
{
++it;
assert(it == to_split.end());
}

Since the origin of the bug in your function has been pointed out already, may I suggest slightly different approach to your word splitting, using regex:
#include <iostream>
#include <regex>
#include <vector>
#include <string>
#include <cctype>
std::vector<std::string> split_alpha(std::string str)
{
std::regex RE{ "([a-zA-Z0-9]+)" }; // isalnum equivalent
std::vector<std::string> result;
// find every word
for (std::smatch matches; std::regex_search(str, matches, RE); str = matches.suffix())
{
//push word to the vector
result.push_back(matches[1].str());
//transform to lower
for (char &c : result[result.size() - 1])
c = std::tolower(c);
}
return result;
}
int main()
{
// test the function
for (auto &word : split_alpha("Sending query: “SELECT * FROM users”"))
std::cout << word << std::endl;
return 0;
}
Result:
sending
query
select
from
users

Related

What is wrong in function

This Function is supposed to return a letter that has single occurrence, upper cased.
I was trying to find a mistake for 30 minutes and I seriously don't know what is the problem. May someone take a look at this?
#include <algorithm>
#include <string>
using namespace std;
char singleOccurrence(string str) {
transform(str.begin(), str.end(), str.begin(), ::toupper);
sort(str.begin(), str.end());
for(int i=0; i<str.length(); i++) {
if(i==str.length()-1)
return str[i];
else if(str[i] != str[i+1])
return str[i];
}
}
int main()
{
string str = "ala";
cout << singleOccurrence(str);
}
This if-else statement
if(i==str.length()-1)
{
return str[i];
}
else
if(str[i] != str[i+1])
{
return str[i];
}
does not make a sense.
For example consider string s = "aab". As s[1] != s[2] your function will return the letter 'a' though it is present in the string more than one time.
Or consider another example of a string like "aaa". In this case your function again will return the letter 'a'.
If you want to use an approach with a sorted sequence of characters then you could within the function use for example the standard container std::map<char, size_t> without sorting the passed string itself.
Otherwise you could use nested for loops for the original string.
In the both cases the function should be declared like
char singleOccurrence( const std::string &str );
or
char singleOccurrence( std::string_view str );
If for example in a string there is no such a character you could return the character '\0'.
if(i==str.length()-1)
{
return str[i];
}
else
if(str[i] != str[i+1])
{
return str[i];
}
}
After "else" statement you should put and "{", idk if that could be a problem. Second of all that if doesn't return something different, in both cases you return str[i]. Third I don't see iostream library included, don't forget to include if you want to use cout or cin(for reading from input). Also you call this function once in main and it will always give you the first letter after the letters will be sorted. You can try to use map, which is implemented in STL in map library. In map you can give your letter from that string as a key and key value will be the number of occurrences.
#include <algorithm>
#include <iostream>
#include <string>
#include <map>
using namespace std;
// char singleOccurrence(string str)
// {
// transform(str.begin(), str.end(), str.begin(), ::toupper);
// sort(str.begin(), str.end());
// for(int i=0; i<str.length(); i++)
// {
// if(i==str.length()-1)
// {
// return str[i];
// }
// else
// if(str[i] != str[i+1])
// {
// return str[i];
// }
// }
// }
int main()
{
map<char, int> occurrences;//declare a map, your key will be a char and you key value an int
string str="ala";
for(int i = 0; i < str.length(); i++){
occurrences[str[i]]++;
}
for(auto n : occurrences){
cout<<n.first<<" "<<n.second<<endl;//n will be an iterator, you use iterators to get map values
}//if you want to check the number and show only the letter with a single occurrence just add an if statement and check in key value (n.second) is equal to one
}
I hope this helps! :)

Function to separate each word from a string and put them into a vector, without using auto keyword?

I'm really stuck here. So I can't edit the main function, and inside it there is a function call with the only parameter being the string. How can I make this function put each word from the string into a vector, without using the auto keyword? I realize that this code is probably really wrong but its my best attempt at what it should look like.
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
vector<string> extract_words(const char * sentence[])
{
string word = "";
vector<string> list;
for (int i = 0; i < sentence.size(); ++i)
{
while (sentence[i] != ' ')
{
word = word + sentence[i];
}
list.push_back(word);
}
}
int main()
{
sentence = "Help me please" /*In the actual code a function call is here that gets input sentence.*/
if (sentence.length() > 0)
{
words = extract_words(sentence);
}
}
Do you know how to read "words" from std::cin?
Then you can put that string in a std::istringstream which works like std::cin but for "reading" strings instead.
Use the stream extract operator >> in a loop to get all the words one by one, and add them to the vector.
Perhaps something like:
std::vector<std::string> get_all_words(std::string const& string)
{
std::vector<std::string> words;
std::istringstream in(string);
std::string word;
while (in >> word)
{
words.push_back(word);
}
return words;
}
With a little more knowledge of C++ and its standard classes and functions, you can actually make the function a lot shorter:
std::vector<std::string> get_all_words(std::string const& string)
{
std::istringstream in(string);
return std::vector<std::string>(std::istream_iterator<std::string>(in),
std::istream_iterator<std::string>());
}
I recommend making the argument to the function a const std::string& instead of const char * sentence[]. A std::string has many member functions, like find_first_of, find_first_not_of and substr and more that could help a lot.
Here's an example using those mentioned:
std::vector<std::string> extract_words(const std::string& sentence)
{
/* Control char's, "whitespaces", that we don't want in our words:
\a audible bell
\b backspace
\f form feed
\n line feed
\r carriage return
\t horizontal tab
\v vertical tab
*/
static const char whitespaces[] = " \t\n\r\a\b\f\v";
std::vector<std::string> list;
std::size_t begin = 0;
while(true)
{
// Skip whitespaces by finding the first non-whitespace, starting at
// "begin":
begin = sentence.find_first_not_of(whitespaces, begin);
// If no non-whitespace char was found, break out:
if(begin == std::string::npos) break;
// Search for a whitespace starting at "begin + 1":
std::size_t end = sentence.find_first_of(whitespaces, begin + 1);
// Store the result by creating a substring from "begin" with the
// length "end - begin":
list.push_back(sentence.substr(begin, end - begin));
// If no whitespace was found, break out:
if(end == std::string::npos) break;
// Set "begin" to the char after the found whitespace before the loop
// makes another lap:
begin = end + 1;
}
return list;
}
Demo
With the added restriction "no breaks", this could be a variant. It does exactly the same as the above, but without using break:
std::vector<std::string> extract_words(const std::string& sentence)
{
static const char whitespaces[] = " \t\n\r\a\b\f\v";
std::vector<std::string> list;
std::size_t begin = 0;
bool loop = true;
while(loop)
{
begin = sentence.find_first_not_of(whitespaces, begin);
if(begin == std::string::npos) {
loop = false;
} else {
std::size_t end = sentence.find_first_of(whitespaces, begin + 1);
list.push_back(sentence.substr(begin, end - begin));
if(end == std::string::npos) {
loop = false;
} else {
begin = end + 1;
}
}
}
return list;
}

c++ Find string when given only part of it inside an array like structure

It's in the form of a word so let's say I'm given the string "foo", and inside my array there are words like "food", "fool", "foo". All three of them should be printed out.
I haven't made a solid attempt at it yet cause I don't know how to wrap my head around it. Any idea?
Assuming you're using std::string, you could use string::find to see if one string is contained in another.
If you have a vector of strings, you might use that along with (for example) std::remove_copy_if to print out all the words from the vector that contain the chosen word:
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
#include <iostream>
int main() {
std::vector<std::string> words{"food", "fool", "foo", "tofoo", "lood", "flood"};
std::string word = "foo";
std::remove_copy_if(words.begin(), words.end(),
std::ostream_iterator<std::string>(std::cout, "\n"),
[&](std::string const &s) {
return s.find(word) == std::string::npos;
});
}
Result:
food
fool
foo
tofoo
You could do something simple, like iterating through each character in the string and checking it against the characters in the string you are trying to match using a separate function. If three characters in a row match the string you are searching for, add it to a vector or something and display them.
// Variables
bool charMatched = false;
vector<string> *stringVec = new vector<string>();
int index = 0;
int counter = 0;
string str = "Whatever you are trying to match";
for (char &c : strings[index]) // For each character in string
{
// Check for match
if (checkChar(c))
{
counter++;
charMatched = true;
if(counter == str.length())
stringVec->push_back(strings[index]);
}
else
{
index++;
counter = 0;
break;
}
}
bool checkChar(char c)
{
// Iterator to go through match string
static string::iterator it = str.begin();
if (c == *it)
{
if (it == str.end())
it = str.begin(); // Reset iterator
else
it++; // Increment iterator
return true;
}
else
{
if (it == str.end())
it = str.begin(); // Reset iterator
else
it++; // Increment iterator
return false;
}
}
You will have to tweak it a little to work with an array the way you want it to but something like this should do what you want. I did not run this through a compiler, I wrote it in Notepad so there may be small syntax errors. I hope this helps!

Getting the words from a sentence and storing them in a vector of strings

Alright, guys ...
Here's my set that has all the letters. I'm defining a word as consisting of consecutive letters from the set.
const char LETTERS_ARR[] = {"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"};
const std::set<char> LETTERS_SET(LETTERS_ARR, LETTERS_ARR + sizeof(LETTERS_ARR)/sizeof(char));
I was hoping that this function would take in a string representing a sentence and return a vector of strings that are the individual words in the sentence.
std::vector<std::string> get_sntnc_wrds(std::string S) {
std::vector<std::string> retvec;
std::string::iterator it = S.begin();
while (it != S.end()) {
if (LETTERS_SET.count(*it) == 1) {
std::string str(1,*it);
int k(0);
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1) == 1))) {
str.push_back(*(it + (++k)));
}
retvec.push_back(str);
it += k;
}
else {
++it;
}
}
return retvec;
}
For instance, the following call should return a vector of the strings "Yo", "dawg", etc.
std::string mystring("Yo, dawg, I heard you life functions, so we put a function inside your function so you can derive while you derive.");
std::vector<std::string> mystringvec = get_sntnc_wrds(mystring);
But everything isn't going as planned. I tried running my code and it was putting the entire sentence into the first and only element of the vector. My function is very messy code and perhaps you can help me come up with a simpler version. I don't expect you to be able to trace my thought process in my pitiful attempt at writing that function.
Try this instead:
#include <vector>
#include <cctype>
#include <string>
#include <algorithm>
// true if the argument is whitespace, false otherwise
bool space(char c)
{
return isspace(c);
}
// false if the argument is whitespace, true otherwise
bool not_space(char c)
{
return !isspace(c);
}
vector<string> split(const string& str)
{
typedef string::const_iterator iter;
vector<string> ret;
iter i = str.begin();
while (i != str.end())
{
// ignore leading blanks
i = find_if(i, str.end(), not_space);
// find end of next word
iter j = find_if(i, str.end(), space);
// copy the characters in [i, j)
if (i != str.end())
ret.push_back(string(i, j));
i = j;
}
return ret;
}
The split function will return a vector of strings, each element containing one word.
This code is taken from the Accelerated C++ book, so it's not mine, but it works. There are other superb examples of using containers and algorithms for solving every-day problems in this book. I could even get a one-liner to show the contents of a file at the output console. Highly recommended.
It's just a bracketing issue, my advice is (almost) never put in more brackets than are necessary, it's only confuses things
while (it+k+1 != S.end() && LETTERS_SET.count(*(it+k+1)) == 1) {
Your code compares the character with 1 not the return value of count.
Also although count does return an integer in this context I would simplify further and treat the return as a boolean
while (it+k+1 != S.end() && LETTERS_SET.count(*(it+k+1))) {
You should use the string steam with std::copy like so:
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <vector>
int main() {
std::string sentence = "And I feel fine...";
std::istringstream iss(sentence);
std::vector<std::string> split;
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(split));
// This is to print the vector
for(auto iter = split.begin();
iter != split.end();
++iter)
{
std::cout << *iter << "\n";
}
}
I would use another more simple approach based on member functions of class std::string. For example
const char LETTERS[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
std::string s( "This12 34is 56a78 test." );
std::vector<std::string> v;
for ( std::string::size_type first = s.find_first_of( LETTERS, 0 );
first != std::string::npos;
first = s.find_first_of( LETTERS, first ) )
{
std::string::size_type last = s.find_first_not_of( LETTERS, first );
v.push_back(
std::string( s, first, last == std::string::npos ? std::string::npos : last - first ) );
first = last;
}
for ( const std::string &s : v ) std::cout << s << ' ';
std::cout << std::endl;
Here you make 2 mistakes, I have correct in the following code.
First, it should be
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1)) == 1))
and, it should move to next by
it += (k+1);
and the code is
std::vector<std::string> get_sntnc_wrds(std::string S) {
std::vector<std::string> retvec;
std::string::iterator it = S.begin();
while (it != S.end()) {
if (LETTERS_SET.count(*it) == 1) {
std::string str(1,*it);
int k(0);
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1)) == 1)) {
str.push_back(*(it + (++k)));
}
retvec.push_back(str);
it += (k+1);
}
else {
++it;
}
}
return retvec;
}
The output have been tested.

Selective iterator

FYI: no boost, yes it has this, I want to reinvent the wheel ;)
Is there some form of a selective iterator (possible) in C++? What I want is to seperate strings like this:
some:word{or other
to a form like this:
some : word { or other
I can do that with two loops and find_first_of(":") and ("{") but this seems (very) inefficient to me. I thought that maybe there would be a way to create/define/write an iterator that would iterate over all these values with for_each. I fear this will have me writing a full-fledged custom way-too-complex iterator class for a std::string.
So I thought maybe this would do:
std::vector<size_t> list;
size_t index = mystring.find(":");
while( index != std::string::npos )
{
list.push_back(index);
index = mystring.find(":", list.back());
}
std::for_each(list.begin(), list.end(), addSpaces(mystring));
This looks messy to me, and I'm quite sure a more elegant way of doing this exists. But I can't think of it. Anyone have a bright idea? Thanks
PS: I did not test the code posted, just a quick write-up of what I would try
UPDATE: after taking all your answers into account, I came up with this, and it works to my liking :). this does assume the last char is a newline or something, otherwise an ending {,}, or : won't get processed.
void tokenize( string &line )
{
char oneBack = ' ';
char twoBack = ' ';
char current = ' ';
size_t length = line.size();
for( size_t index = 0; index<length; ++index )
{
twoBack = oneBack;
oneBack = current;
current = line.at( index );
if( isSpecial(oneBack) )
{
if( !isspace(twoBack) ) // insert before
{
line.insert(index-1, " ");
++index;
++length;
}
if( !isspace(current) ) // insert after
{
line.insert(index, " ");
++index;
++length;
}
}
}
Comments are welcome as always :)
That's relatively easy using the std::istream_iterator.
What you need to do is define your own class (say Term). Then define how to read a single "word" (term) from the stream using the operator >>.
I don't know your exact definition of a word is, so I am using the following definition:
Any consecutive sequence of alpha numeric characters is a term
Any single non white space character that is also not alpha numeric is a word.
Try this:
#include <string>
#include <sstream>
#include <iostream>
#include <iterator>
#include <algorithm>
class Term
{
public:
// This cast operator is not required but makes it easy to use
// a Term anywhere that a string can normally be used.
operator std::string const&() const {return value;}
private:
// A term is just a string
// And we friend the operator >> to make sure we can read it.
friend std::istream& operator>>(std::istream& inStr,Term& dst);
std::string value;
};
Now all we have to do is define an operator >> that reads a word according to the rules:
// This function could be a lot neater using some boost regular expressions.
// I just do it manually to show it can be done without boost (as requested)
std::istream& operator>>(std::istream& inStr,Term& dst)
{
// Note the >> operator drops all proceeding white space.
// So we get the first non white space
char first;
inStr >> first;
// If the stream is in any bad state the stop processing.
if (inStr)
{
if(std::isalnum(first))
{
// Alpha Numeric so read a sequence of characters
dst.value = first;
// This is ugly. And needs re-factoring.
while((first = insStr.get(), inStr) && std::isalnum(first))
{
dst.value += first;
}
// Take into account the special case of EOF.
// And bad stream states.
if (!inStr)
{
if (!inStr.eof())
{
// The last letter read was not EOF and and not part of the word
// So put it back for use by the next call to read from the stream.
inStr.putback(first);
}
// We know that we have a word so clear any errors to make sure it
// is used. Let the next attempt to read a word (term) fail at the outer if.
inStr.clear();
}
}
else
{
// It was not alpha numeric so it is a one character word.
dst.value = first;
}
}
return inStr;
}
So now we can use it in standard algorithms by just employing the istream_iterator
int main()
{
std::string data = "some:word{or other";
std::stringstream dataStream(data);
std::copy( // Read the stream one Term at a time.
std::istream_iterator<Term>(dataStream),
std::istream_iterator<Term>(),
// Note the ostream_iterator is using a std::string
// This works because a Term can be converted into a string.
std::ostream_iterator<std::string>(std::cout, "\n")
);
}
The output:
> ./a.exe
some
:
word
{
or
other
std::string const str = "some:word{or other";
std::string result;
result.reserve(str.size());
for (std::string::const_iterator it = str.begin(), end = str.end();
it != end; ++it)
{
if (isalnum(*it))
{
result.push_back(*it);
}
else
{
result.push_back(' '); result.push_back(*it); result.push_back(' ');
}
}
Insert version for speed-up
std::string str = "some:word{or other";
for (std::string::iterator it = str.begin(), end = str.end(); it != end; ++it)
{
if (!isalnum(*it))
{
it = str.insert(it, ' ') + 2;
it = str.insert(it, ' ');
end = str.end();
}
}
Note that std::string::insert inserts BEFORE the iterator passed and returns an iterator to the newly inserted character. Assigning is important since the buffer may have been reallocated at another memory location (the iterators are invalidated by the insertion). Also note that you can't keep end for the whole loop, each time you insert you need to recompute it.
a more elegant way of doing this exists.
I do not know how BOOST implements that, but traditional way is by feeding input string character by character into a FSM which detects where tokens (words, symbols) start and end.
I can do that with two loops and find_first_of(":") and ("{")
One loop with std::find_first_of() should suffice.
Though I'm still a huge fan of FSMs for such parsing tasks.
P.S. Similar question
How about something like:
std::string::const_iterator it, end = mystring.end();
for(it = mystring.begin(); it != end; ++it) {
if ( !isalnum( *it ))
list.push_back(it);
}
This way, you'll only iterate once through the string, and isalnum from ctype.h seems to do what you want. Of course, the code above is very simplistic and incomplete and only suggests a solution.
Are you looking to tokenize the input string, ala strtok?
If so, here is a tokenizing function that you can use. It takes an input string and a string of delimiters (each char int he string is a possible delimitter), and it returns a vector of tokens. Each token is a tuple with the delimitted string, and the delimiter used in that case:
#include <cstdlib>
#include <vector>
#include <string>
#include <functional>
#include <iostream>
#include <algorithm>
using namespace std;
// FUNCTION : stringtok(char const* Raw, string sToks)
// PARAMATERS : Raw Pointer to NULL-Terminated string containing a string to be tokenized.
// sToks string of individual token characters -- each character in the string is a token
// DESCRIPTION : Tokenizes a string, much in the same was as strtok does. The input string is not modified. The
// function is called once to tokenize a string, and all the tokens are retuned at once.
// RETURNS : Returns a vector of strings. Each element in the vector is one token. The token character is
// not included in the string. The number of elements in the vector is N+1, where N is the number
// of times the Token character is found in the string. If one token is an empty string (as with the
// string "string1##string3", where the token character is '#'), then that element in the vector
// is an empty string.
// NOTES :
//
typedef pair<char,string> token; // first = delimiter, second = data
inline vector<token> tokenize(const string& str, const string& delims, bool bCaseSensitive=false) // tokenizes a string, returns a vector of tokens
{
bCaseSensitive;
// prologue
vector<token> vRet;
// tokenize input string
for( string::const_iterator itA = str.begin(), it=itA; it != str.end(); it = find_first_of(++it,str.end(),delims.begin(),delims.end()) )
{
// prologue
// find end of token
string::const_iterator itEnd = find_first_of(it+1,str.end(),delims.begin(),delims.end());
// add string to output
if( it == itA ) vRet.push_back(make_pair(0,string(it,itEnd)));
else vRet.push_back(make_pair(*it,string(it+1,itEnd)));
// epilogue
}
// epilogue
return vRet;
}
using namespace std;
int main()
{
string input = "some:word{or other";
typedef vector<token> tokens;
tokens toks = tokenize(input.c_str(), " :{");
cout << "Input: '" << input << " # Tokens: " << toks.size() << "'\n";
for( tokens::iterator it = toks.begin(); it != toks.end(); ++it )
{
cout << " Token : '" << it->second << "', Delimiter: '" << it->first << "'\n";
}
return 0;
}