String Tokenization with " ", breaking at "."

String Tokenization with " ", breaking at "." - c++

Encountering an issue where I am trying to tokenize a string that came from a getline from console. It's breaking at " " as well as "." when I only want it to break at " ".
Here is the method that I've created for this problem:
queue<string> getStringList(string list, string delimiter) {
queue<string> values;
size_t pos = 0;
string token;
while ((pos = list.find(delimiter) != string::npos))
{
token = list.substr(0, pos);
std::cout << token << std::endl;
values.push(token);
list.erase(0, pos + delimiter.length());
}
std::cout << list << endl;
return values;
}
Sample Input:
cid arrival_time dir_original dir_target
0 1.1 ^ ^
1 2.0 ^ ^
2 3.3 ^ <
3 3.5 v v
4 4.2 v >
5 4.4 ^ ^
6 5.7 > ^
7 5.9 < ^
This how the data is breaking up from the line previous.

while ((pos = list.find(delimiter) != string::npos))
You have the parentheses in the wrong place, meaning what you have is the equivalent of:
while ((pos = (list.find(delimiter) != string::npos) ))
// (compare first-----------------------)
// (then assign----------------------------------)
In terms of operator precedence, assignment is way down the list, long after comparisons.
So pos will always be either zero (if you don't find the space) or one (if you do find it). It will never be equal to the position of that space in the string, except for the pathological case where the space is the second character.
That's why everything works for your one-character fields, but not for the three-character 1.1 field. It correctly finds the space at position 3 but, because pos has been set to one, it uses substr in a way that deletes the ..
That line should be:
while ((pos = list.find(delimiter)) != string::npos)
// (assign first--------------)
// (then compare-------------------------------)
With a full program and the fix mentioned above:
#include <iostream>
#include <queue>
using namespace std;
queue<string> getStringList(string list, string delimiter) {
queue<string> values;
size_t pos = 0;
string token;
while ((pos = list.find(delimiter)) != string::npos) {
token = list.substr(0, pos);
std::cout << token << std::endl;
values.push(token);
list.erase(0, pos + delimiter.length());
}
values.push(list);
std::cout << list << endl;
return values;
}
int main () {
string x = "0 1.1 ^ ^";
string delim = " ";
auto y = getStringList(x, delim);
return 0;
}
You can see the output is now correct:
0
1.1
^
^
And, as per a comment from Remy Lebeau, you print the final element but do not add it to the list. Hence I've added the line that appends that last token to the list of tokens.
One other thing to watch out for. While this works well with single-space separators, multi-space ones may cause issues. If that becomes an issue, you may need a more "bullet-proof" tokenising routine.

Related

Cannot understand the for loops used in this code

I was trying out a solution to a question, when I came across this code snippet written in C++:
string s;
cin >> s;
vector<int> r;
for (string t: {"twone", "one", "two"}) {
for (size_t pos = 0; (pos = s.find(t, pos)) != string::npos;) {
s[pos + t.length() / 2] = '?';
r.push_back(pos + t.length() / 2);
}
}
cout << r.size() << endl;
for (auto rr: r)
cout << rr + 1 << " ";
cout << endl;
I am new to the language and was unable to understand what is happening in the second (nested) for loop and the 3rd for loop. Can someone help me to understand?

The first and the third loops are range-based for loops.
The first loop iterates over a container of strings. So t takes successively the value "twone", "one", and "two"
The second loop searches for all the occurences of t in the string s (each search starts from position pos of the previous occurence found). As long as a element is found it does:
s[pos + t.length() / 2] = '?';
r.push_back(pos + t.length() / 2);
The push_back() stores the position of the middle of each occurence found in a vector of integers.
The third loop iterates over this vector of stored positions and prints the elements (the positions count starts at 0, the +1 shifts the printed positions as if the count would start with 1).

One of the main ways to try and understand complex code is to try and simplify it. It also helps to know what the involved functions do, so a reference to std::string::find is helpful to read.
First of all, lets skip the body and concentrate only on the loop itself:
for (size_t pos = 0; (pos = s.find(t, pos)) != string::npos;) {
}
All for loops could be seen as a while loop, and while loops could be somewhat easier to understand and follow, so we convert it to such a while loop:
size_t pos = 0;
while (pos = s.find(t, pos)) != string::npos)
{
}
This might not help so much as it's the condition that is most likely the hard part to understand, so then we simplify that as well:
size_t pos = 0;
pos = s.find(t, pos);
while (pos != string::npos)
{
pos = s.find(t, pos);
}
The initialization of pos could then be further simplified:
size_t pos = s.find(t);
while (pos != string::npos)
{
pos = s.find(t, pos);
}
Now the loop itself is a simple as it could be, and looking at it we see that basically attempt to find the sub-string t inside the string s. The loop continues as long as the sub-string t is found inside s.
Now that we deconstructed the loop itself, let's take a look at the loop-body, and what it does:
s[pos + t.length() / 2] = '?';
r.push_back(pos + t.length() / 2);
First of all lets pull out the common sub-expression into a temporary variable:
auto new_pos = pos + t.length() / 2;
s[new_pos] = '?';
r.push_back(new_pos);
The first statement
s[new_pos] = '?';
replaces the middle character of the sub-string t inside s with the character '?'.
The second statement
r.push_back(new_pos);
pushes the position of the '?' into the vector r.
Now lastly we put the inner loop (explained above) into the context of the outer loop:
for (string t: {"twone", "one", "two"})
This is a range-based for loop which loops over all elements in the container on the right-hand side of the :. That is, the loop will iterate three times, with t being equal to "twone", "one" and "two" in that order.
So loops will search for "twone", "one" and "two" inside the string s, replace the middle character of the sub-strings ("twone", "one" and "two") inside s with a single '?' character, and push the position of that '?' character into the vector r.
For example if the input in s is "someone with the number two" then the result will the the string "someo?e with the number t?o", and the vector r should contain the values 5 and 25 (which will be printed as 6 and 26 because of the + 1).
Here's an example shoing exactly that.

Just run the code inserting in it an output pf intermediate results.
Here is a demonstrative program.
#include <iostream>
#include <string>
#include <vector>
int main()
{
std::string s;
std::cin >> s;
std::vector<int> r;
for ( const std::string &t : { "twone", "one", "two" } )
{
for ( std::string::size_type pos = 0; (pos = s.find( t, pos ) ) != std::string::npos; )
{
s[pos + t.length() / 2] = '?';
std::cout << pos << ": " << s << '\n';
r.push_back( pos + t.length() / 2 );
}
}
std::cout << r.size() << '\n';
for ( const auto &rr: r ) std::cout << rr + 1 << " ";
std::cout << '\n';
}
Let's assume that the user entered string onetwoone. So the inner loop searches in the entered string all occurrences of words "twone", "one", "two" sequentially.
For the given string the word "twone" is not found.
The word "one" is found at position 0. This statement
s[pos + t.length() / 2] = '?';
the middle character of the found word in the entered string by the sign '?'.
Thus this added statement
std::cout << pos << ": " << s << '\n';
outputs
0: o?etwoone
The position of the sign '?' (the number 1) is stored in the vector.
Then within the loop the word "one" is found second time. And again the middle character of the found word is substituted for '?'. So this statement
std::cout << pos << ": " << s << '\n';
outputs
6: o?etwoo?e
The position of the sign '?' (the number 7) is stored in the vector.
So at this moment we have the following output
0: o?etwoone
6: o?etwoo?e
The word "one" is not found any more.
The word "two" is occurred only once in the given string. SO the output is
3: o?et?oo?e
The position of '?' equal to 4 is stored in the vector.
Now at this moment we have the following output
0: o?etwoone
6: o?etwoo?e
3: o?et?oo?e
produced by the inner loop.
So as a result three occurrences of the words are found in the entered string.
Thus these statements
std::cout << r.size() << '\n';
for ( const auto &rr: r ) std::cout << rr + 1 << " ";
output
3
2 8 5
The last values correspond to expressions rr + 1 that is to stored positions of the sign '?' plus 1.

How to remove all double spaces from string

I am attempting to remove all double spaces from my string so that only single spaces remain:
while (doublespace != -1) {
kstring.replace(doublespace, 1, " ") ;
doublespace = kstring.find_first_of(" ") ; }
it finds the first double space, triggering the while statement. It then takes the first space, adds 1 to it and sets the two spaces to one space. Then it checks again.
The problem is that the loop never ends - for example if I put "hello " doubleSpace would never be set to -1.

std::string::find_first_of only searches until it finds one of the characters in the input string, so as you pass it " ", it'll effectively only search for " " - see the documentation here:
Searches the string for the first character that matches any of the characters specified in its arguments.
You should use std::string::find instead, which searches for the first instance of the entire substring:
Notice that unlike member find_first_of, whenever more than one character is being searched for, it is not enough that just one of these characters match, but the entire sequence must match.
You're also replacing only the first space with a space (sString.replace(doubleSpace, 1, " "), which means your output will still contain double spaces. Just use std::string::erase instead, to erase just the first space.
This means your code snippet should look more like:
std::size_t doubleSpace = sString.find(" ");
while (doubleSpace != std::string::npos)
{
sString.erase(doubleSpace, 1);
doubleSpace = sString.find(" ");
}

Here is an alternative take that returns copy with only single spaces in it:
#include <iostream>
#include <string>
int main()
{
std::string str = " hello - h e l l o ";
std::string newstr;
size_t beg = 0;
size_t len = str.length();
while (beg < len)
{
size_t end = str.find_first_of(' ', beg) + 1;
newstr += str.substr(beg, end - beg);
beg = str.find_first_not_of(' ', end);
}
std::cout << newstr << std::endl;
return 0;
}
Result:
hello - h e l l o
As suggested by #hnefatl this approach could also be more efficient (see comment below)

I see two errors in your code. The first is that find_first_of() only searches for one of the characters you supply so, in your case, it will only be looking for single spaces. Secondly you only replace one space, not two.
This should fix both those problems:
std::string& reduce_double_spaces(std::string& s)
{
std::string::size_type pos = s.find(" ");
while (pos != std::string::npos) {
// replace BOTH spaces with one space
s.replace(pos, 2, " ");
// start searching again, where you left off
// rather than going back to the beginning
pos = s.find(" ", pos);
}
return s;
}
NOTE: By beginning the subsequent searches from the place you found your last space, this version should be much more efficient. The longer the string the bigger the savings.

This alternative uses const time operations of back(), pop_back(), push_back(), empty(), and size(),
std::string str = " hello - h e l l o ";
std::string newStr = str; // diag only
std::string Miss; Miss.reserve(str.size());
while ( str.size() > 1 ) // note: str.back() undefined when str.empty()
{
// fetch copy and remove last element
char aKar = str.back(); str.pop_back();
if (! ((' ' == aKar) && // space
( ' ' == str.back()))) // space
{
Miss.push_back(aKar); // not double-space
}
}
assert(1 == str.size()); // optional
while (! Miss.empty() ) // restore str to original order
{
str.push_back (Miss.back()); // copy last element
Miss.pop_back(); // remove last element
}
assert(Miss.empty()); // optional
std::cout << "\n " << __FUNCTION__
<< "\n in: " << newStr
<< "\n out: " << str << std::endl;

C++ String insert iteration

I have the string:
string str = "1234567890";
//Magic code
cout<<str<<endl;
Which i want to output: 12 34 56 78 90
I assume std has some neat feature/function to help solve this. How do I that in the most convenient way?

The std::string::insert with a for loop could help you to insert spaces into an std::string very easely:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string str = "1234567890";
for(auto it = str.begin(); it != str.end(); it += min<int>(str.end() - it, 2))
it = (it != str.begin() ? str.insert(it, ' ') + 1 : it);
cout << str << endl;
}
The std::string::insert returns an iterator pointing to the character inserted, therefore it has to be incremented to step over the inserted character.
Because std::string has a random-access iterator it can be incremented or decremented more than by one. The min<int>(str.end() - it, 2) ensures that the next step won't go out of bounds.

A more general approach. Define a function that inserts a given character char_to_insert into a given string s every interval characters, excluding beginning and end of string:
std::string insert_char(const std::string& s, char char_to_insert, size_t interval)
{
// corner cases
if (!interval || !s.length()) return s;
// compute number of characters to insert
auto number_of_chars_to_insert = (s.length()-1)/interval;
// compute total length
auto output_length = s.length() + number_of_chars_to_insert;
// pre-allocate output string with all characters equal to char_to_insert
std::string retval(output_length, char_to_insert);
// cycle input string, keeping track of position in input and output strings
size_t pos = 0, pos_in_input = 0;
for (const auto& c : s)
{
// copy from input to output
retval[pos++] = c;
// advance in output every interval chars
if ((++pos_in_input) % interval == 0)
++pos;
}
return retval;
}
Then:
int main()
{
std::string s = "1234567890";
for (size_t i = 1; i != 5; ++i)
std::cout << insert_char(s, ' ', i) << std::endl;
return 0;
}
The output:
1 2 3 4 5 6 7 8 9 0
12 34 56 78 90
123 456 789 0
1234 5678 90

There is no built-in to do what you want.
Instead, the most convenient solution is probably to iterate through the string and output pairs of digits as you go:
string str = "1234567890";
for (auto it = str.begin(); it != str.end(); ++it){
std::cout << *(it);
if (++it != str.end()){
std::cout << *it << " ";
}
}
std::cout << std::endl;
Or non-iterator version:
string str = "1234567890";
for (idx = 0; idx < str.length(); idx += 2){
std::cout << str.substr(idx, 2) << " ";
}
std::cout << std::endl;
Both of these examples will have a trailing space on the line, but I've left it in to keep the examples simpler.

std::out_of_range when using string find and replace inside while loop

So I have a task to convert all occurrences of some word in one string to another string. But there is problem with condition of while loop which makes this error
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::replace
This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. Process returned 3 (0x3) execution time : 2.751 s
My code is:
#include <iostream>
#include <string>
using namespace std;
int main ()
{
string str2("three");
string str("one three two four three three");
while ( str.find(str2) != NULL ){
str.replace(str.find(str2),str2.length(),"five");
cout << str << endl; // i put it inside loop to see output
}
cout << str << endl;
return 0;
}
Any suggestions?

You are checking if str.find(str2) had an occurrence comparing it to NULL, but this is wrong, because NULL is a macro that isn't meant for that and often expands into 0, which can be a valid index. You should compare it to std::string::npos. After doing this change, your code will work.
Edit : std::string::npos corresponds to 18446744073709551615 when testing on coliru. So that clearly isn't a valid index in your string.

This condition
while ( str.find(str2) != NULL ){
does not make sense because a call of find can return std::string::npos that is not equal to zero. In this case the code has undefined behavior.
You can apply the following approach
std::string str2("three");
std::string str("one three two four three three");
const char *five = "five";
size_t n = std::strlen(five);
for (std::string::size_type pos = 0;
( pos = str.find(str2, pos) ) != std::string::npos; pos += n)
{
str.replace(pos, str2.length(), five);
}

it's caused because str.find(str2) returns -1 if str2 is not existed in the str. You can use a variable pos to save the found position, so that you will not need to re-invoke find function. The solution is supposed as following:
#include <iostream>
#include <string>
using namespace std;
int main () {
string str2("three");
string str("one three two four three three");
int pos = str.find(str2);
while (pos > 0) {
str.replace(pos, str2.length(), "five");
pos = str.find(str2);
cout << str << endl; // i put it inside loop to see output
}
cout << str << endl;
return 0;
}

How to make String::Find(is) omit this

If I have a list, which contains the 4 nodes ("this"; "test example"; "is something of"; "a small") and I want to find every string that has "is" (only 1 positive with this list). This topic has been posted a large number of times, which I have used to help get me this far. However, I can't see anywhere how I omit "this" from a positive result. I could probably use string::c_str, then find it myself, after I've reduced my much larger list. Or is there a way I could use string::find_first_of? It would seem there's a better way. Thanks.
EDIT: I know that I can omit a particular string, but I'm looking for bigger picture b/c my list is quite large (ex: poem).
for(it = phrases.begin(); it != phrases.end(); ++it)
{
found = it->find(look);
if(found != string::npos)
cout << i++ << ". " << *it << endl;
else
{
i++;
insert++;
}
}

Just to clarify: what are you struggling with?
What you want to do is check if what you have found is the start of a word (or the phrase) and is also the end of a word (or the phrase)
ie. check if:
found is equal to phrases.begin OR the element preceding found is a space
AND two elements after found is a space OR phrases.end
EDIT: You can access the character that was found by using found (replace X with the length of the string you're finding (look.length)
found = it->find(look);
if(found!=string::npos)
{
if((found==0 || it->at(found-1)==' ')
&& (found==it->length-X || it->at(found+X)==' '))
{
// Actually found it
}
} else {
// Do whatever
}

We can use boost regex for searching regular expressions. Below is an example code. Using regular expression complex seacrh patterns can be created.
#include <boost/regex.hpp>
#include <string>
#include <iostream>
#include <boost/tokenizer.hpp>
using namespace boost;
using namespace std;
int main()
{
std::string list[4] = {"this","hi how r u ","is this fun is","no"};
regex ex("^is");
for(int x =0;x<4;++x)
{
string::const_iterator start, end;
boost::char_separator<char> sep(" ");
boost::tokenizer<boost::char_separator<char> > token(list[x],sep);
cout << "Search string: " << list[x] <<"\n"<< endl;
int x = 0;
for(boost::tokenizer<boost::char_separator<char> >::iterator itr = token.begin();
itr!=token.end();++itr)
{
start = (*itr).begin();
end = (*itr).end();
boost::match_results<std::string::const_iterator> what;
boost::match_flag_type flags = boost::match_default;
if(boost::regex_search(start, end, what, ex, flags))
{
++x;
cout << "Found--> " << what.str() << endl;
}
}
cout<<"found pattern "<<x <<" times."<<endl<<endl;
}
return 0;
}
Output:
Search string: this
found pattern 0 times.
Search string: hi how r u
found pattern 0 times.
Search string: is this fun is
Found--> is Found--> is found pattern 2 times.
Search string: no
found pattern 0 times.

I didn't realize you only wanted to match "is". You can do this by using an std::istringstream to tokenize it for you:
std::string term("is");
for(std::list<std::string>::const_iterator it = phrases.begin();
it != phrases.end(); ++it)
{
std::istringstream ss(*it);
std::string token;
while(ss >> token)
{
if(token == term)
std::cout << "Found " << token << "\n";
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

String Tokenization with " ", breaking at "." - c++

Related

Cannot understand the for loops used in this code

How to remove all double spaces from string

C++ String insert iteration

std::out_of_range when using string find and replace inside while loop

How to make String::Find(is) omit this

Categories

Resources