I was trying out a solution to a question, when I came across this code snippet written in C++:
string s;
cin >> s;
vector<int> r;
for (string t: {"twone", "one", "two"}) {
for (size_t pos = 0; (pos = s.find(t, pos)) != string::npos;) {
s[pos + t.length() / 2] = '?';
r.push_back(pos + t.length() / 2);
}
}
cout << r.size() << endl;
for (auto rr: r)
cout << rr + 1 << " ";
cout << endl;
I am new to the language and was unable to understand what is happening in the second (nested) for loop and the 3rd for loop. Can someone help me to understand?
The first and the third loops are range-based for loops.
The first loop iterates over a container of strings. So t takes successively the value "twone", "one", and "two"
The second loop searches for all the occurences of t in the string s (each search starts from position pos of the previous occurence found). As long as a element is found it does:
s[pos + t.length() / 2] = '?';
r.push_back(pos + t.length() / 2);
The push_back() stores the position of the middle of each occurence found in a vector of integers.
The third loop iterates over this vector of stored positions and prints the elements (the positions count starts at 0, the +1 shifts the printed positions as if the count would start with 1).
One of the main ways to try and understand complex code is to try and simplify it. It also helps to know what the involved functions do, so a reference to std::string::find is helpful to read.
First of all, lets skip the body and concentrate only on the loop itself:
for (size_t pos = 0; (pos = s.find(t, pos)) != string::npos;) {
}
All for loops could be seen as a while loop, and while loops could be somewhat easier to understand and follow, so we convert it to such a while loop:
size_t pos = 0;
while (pos = s.find(t, pos)) != string::npos)
{
}
This might not help so much as it's the condition that is most likely the hard part to understand, so then we simplify that as well:
size_t pos = 0;
pos = s.find(t, pos);
while (pos != string::npos)
{
pos = s.find(t, pos);
}
The initialization of pos could then be further simplified:
size_t pos = s.find(t);
while (pos != string::npos)
{
pos = s.find(t, pos);
}
Now the loop itself is a simple as it could be, and looking at it we see that basically attempt to find the sub-string t inside the string s. The loop continues as long as the sub-string t is found inside s.
Now that we deconstructed the loop itself, let's take a look at the loop-body, and what it does:
s[pos + t.length() / 2] = '?';
r.push_back(pos + t.length() / 2);
First of all lets pull out the common sub-expression into a temporary variable:
auto new_pos = pos + t.length() / 2;
s[new_pos] = '?';
r.push_back(new_pos);
The first statement
s[new_pos] = '?';
replaces the middle character of the sub-string t inside s with the character '?'.
The second statement
r.push_back(new_pos);
pushes the position of the '?' into the vector r.
Now lastly we put the inner loop (explained above) into the context of the outer loop:
for (string t: {"twone", "one", "two"})
This is a range-based for loop which loops over all elements in the container on the right-hand side of the :. That is, the loop will iterate three times, with t being equal to "twone", "one" and "two" in that order.
So loops will search for "twone", "one" and "two" inside the string s, replace the middle character of the sub-strings ("twone", "one" and "two") inside s with a single '?' character, and push the position of that '?' character into the vector r.
For example if the input in s is "someone with the number two" then the result will the the string "someo?e with the number t?o", and the vector r should contain the values 5 and 25 (which will be printed as 6 and 26 because of the + 1).
Here's an example shoing exactly that.
Just run the code inserting in it an output pf intermediate results.
Here is a demonstrative program.
#include <iostream>
#include <string>
#include <vector>
int main()
{
std::string s;
std::cin >> s;
std::vector<int> r;
for ( const std::string &t : { "twone", "one", "two" } )
{
for ( std::string::size_type pos = 0; (pos = s.find( t, pos ) ) != std::string::npos; )
{
s[pos + t.length() / 2] = '?';
std::cout << pos << ": " << s << '\n';
r.push_back( pos + t.length() / 2 );
}
}
std::cout << r.size() << '\n';
for ( const auto &rr: r ) std::cout << rr + 1 << " ";
std::cout << '\n';
}
Let's assume that the user entered string onetwoone. So the inner loop searches in the entered string all occurrences of words "twone", "one", "two" sequentially.
For the given string the word "twone" is not found.
The word "one" is found at position 0. This statement
s[pos + t.length() / 2] = '?';
the middle character of the found word in the entered string by the sign '?'.
Thus this added statement
std::cout << pos << ": " << s << '\n';
outputs
0: o?etwoone
The position of the sign '?' (the number 1) is stored in the vector.
Then within the loop the word "one" is found second time. And again the middle character of the found word is substituted for '?'. So this statement
std::cout << pos << ": " << s << '\n';
outputs
6: o?etwoo?e
The position of the sign '?' (the number 7) is stored in the vector.
So at this moment we have the following output
0: o?etwoone
6: o?etwoo?e
The word "one" is not found any more.
The word "two" is occurred only once in the given string. SO the output is
3: o?et?oo?e
The position of '?' equal to 4 is stored in the vector.
Now at this moment we have the following output
0: o?etwoone
6: o?etwoo?e
3: o?et?oo?e
produced by the inner loop.
So as a result three occurrences of the words are found in the entered string.
Thus these statements
std::cout << r.size() << '\n';
for ( const auto &rr: r ) std::cout << rr + 1 << " ";
output
3
2 8 5
The last values correspond to expressions rr + 1 that is to stored positions of the sign '?' plus 1.
Related
Encountering an issue where I am trying to tokenize a string that came from a getline from console. It's breaking at " " as well as "." when I only want it to break at " ".
Here is the method that I've created for this problem:
queue<string> getStringList(string list, string delimiter) {
queue<string> values;
size_t pos = 0;
string token;
while ((pos = list.find(delimiter) != string::npos))
{
token = list.substr(0, pos);
std::cout << token << std::endl;
values.push(token);
list.erase(0, pos + delimiter.length());
}
std::cout << list << endl;
return values;
}
Sample Input:
cid arrival_time dir_original dir_target
0 1.1 ^ ^
1 2.0 ^ ^
2 3.3 ^ <
3 3.5 v v
4 4.2 v >
5 4.4 ^ ^
6 5.7 > ^
7 5.9 < ^
This how the data is breaking up from the line previous.
while ((pos = list.find(delimiter) != string::npos))
You have the parentheses in the wrong place, meaning what you have is the equivalent of:
while ((pos = (list.find(delimiter) != string::npos) ))
// (compare first-----------------------)
// (then assign----------------------------------)
In terms of operator precedence, assignment is way down the list, long after comparisons.
So pos will always be either zero (if you don't find the space) or one (if you do find it). It will never be equal to the position of that space in the string, except for the pathological case where the space is the second character.
That's why everything works for your one-character fields, but not for the three-character 1.1 field. It correctly finds the space at position 3 but, because pos has been set to one, it uses substr in a way that deletes the ..
That line should be:
while ((pos = list.find(delimiter)) != string::npos)
// (assign first--------------)
// (then compare-------------------------------)
With a full program and the fix mentioned above:
#include <iostream>
#include <queue>
using namespace std;
queue<string> getStringList(string list, string delimiter) {
queue<string> values;
size_t pos = 0;
string token;
while ((pos = list.find(delimiter)) != string::npos) {
token = list.substr(0, pos);
std::cout << token << std::endl;
values.push(token);
list.erase(0, pos + delimiter.length());
}
values.push(list);
std::cout << list << endl;
return values;
}
int main () {
string x = "0 1.1 ^ ^";
string delim = " ";
auto y = getStringList(x, delim);
return 0;
}
You can see the output is now correct:
0
1.1
^
^
And, as per a comment from Remy Lebeau, you print the final element but do not add it to the list. Hence I've added the line that appends that last token to the list of tokens.
One other thing to watch out for. While this works well with single-space separators, multi-space ones may cause issues. If that becomes an issue, you may need a more "bullet-proof" tokenising routine.
I am attempting to remove all double spaces from my string so that only single spaces remain:
while (doublespace != -1) {
kstring.replace(doublespace, 1, " ") ;
doublespace = kstring.find_first_of(" ") ; }
it finds the first double space, triggering the while statement. It then takes the first space, adds 1 to it and sets the two spaces to one space. Then it checks again.
The problem is that the loop never ends - for example if I put "hello " doubleSpace would never be set to -1.
std::string::find_first_of only searches until it finds one of the characters in the input string, so as you pass it " ", it'll effectively only search for " " - see the documentation here:
Searches the string for the first character that matches any of the characters specified in its arguments.
You should use std::string::find instead, which searches for the first instance of the entire substring:
Notice that unlike member find_first_of, whenever more than one character is being searched for, it is not enough that just one of these characters match, but the entire sequence must match.
You're also replacing only the first space with a space (sString.replace(doubleSpace, 1, " "), which means your output will still contain double spaces. Just use std::string::erase instead, to erase just the first space.
This means your code snippet should look more like:
std::size_t doubleSpace = sString.find(" ");
while (doubleSpace != std::string::npos)
{
sString.erase(doubleSpace, 1);
doubleSpace = sString.find(" ");
}
Here is an alternative take that returns copy with only single spaces in it:
#include <iostream>
#include <string>
int main()
{
std::string str = " hello - h e l l o ";
std::string newstr;
size_t beg = 0;
size_t len = str.length();
while (beg < len)
{
size_t end = str.find_first_of(' ', beg) + 1;
newstr += str.substr(beg, end - beg);
beg = str.find_first_not_of(' ', end);
}
std::cout << newstr << std::endl;
return 0;
}
Result:
hello - h e l l o
As suggested by #hnefatl this approach could also be more efficient (see comment below)
I see two errors in your code. The first is that find_first_of() only searches for one of the characters you supply so, in your case, it will only be looking for single spaces. Secondly you only replace one space, not two.
This should fix both those problems:
std::string& reduce_double_spaces(std::string& s)
{
std::string::size_type pos = s.find(" ");
while (pos != std::string::npos) {
// replace BOTH spaces with one space
s.replace(pos, 2, " ");
// start searching again, where you left off
// rather than going back to the beginning
pos = s.find(" ", pos);
}
return s;
}
NOTE: By beginning the subsequent searches from the place you found your last space, this version should be much more efficient. The longer the string the bigger the savings.
This alternative uses const time operations of back(), pop_back(), push_back(), empty(), and size(),
std::string str = " hello - h e l l o ";
std::string newStr = str; // diag only
std::string Miss; Miss.reserve(str.size());
while ( str.size() > 1 ) // note: str.back() undefined when str.empty()
{
// fetch copy and remove last element
char aKar = str.back(); str.pop_back();
if (! ((' ' == aKar) && // space
( ' ' == str.back()))) // space
{
Miss.push_back(aKar); // not double-space
}
}
assert(1 == str.size()); // optional
while (! Miss.empty() ) // restore str to original order
{
str.push_back (Miss.back()); // copy last element
Miss.pop_back(); // remove last element
}
assert(Miss.empty()); // optional
std::cout << "\n " << __FUNCTION__
<< "\n in: " << newStr
<< "\n out: " << str << std::endl;
Let's say I have the following string:
aabccd
I'd like to find and remove all patterns that are repeated. In this example, one a is followed by another one, which is a repetition. Same thing about cc.
The final string would be:
bd
Another example: banana => ba (because two an or two na).
Here's the algorithm I've come up with:
Splitting the string in two halves and taking the biggest one
Let's say it has a length of 4; I look in the other half if it's there
Next turn of loop, I shift my string from one character (but still of
length 4) until I loop all the possible patterns
Then I reduce the half by one character (so length 3) and start again
// etc.
It would do something like this:
helloguys -> hell -> ello -> llog
And some code, not working as it should:
std::string processWord(std::string word)
{
// Split word in the two halves
std::string firstHalf = word.substr(0, word.size() / 2);
std::string secondHalf = word.substr(firstHalf.size());
// Check if the tiniest half is present in the biggest one (firstHalf will be eitheir lower or equal to secondHalf)
if (secondHalf.find(firstHalf) != std::string::npos)
{
std::cout << firstHalf << " is found in " << secondHalf << std::endl;
// Remove firstHalf from secondHalf
word.replace(word.find(firstHalf), firstHalf.size(), "");
std::cout << word << std::endl;
return word;
}
for (size_t i = 1; i < secondHalf.size(); ++i)
{
// Get secondHalf minus one character at each loop turn to check occurences
std::string occurence = secondHalf.substr(0, secondHalf.size() - i);
// Mark the first occurence
size_t startIndex = indexOf(word, occurence);
// Mark the last occurence
size_t lastIndex = startIndex;
int totalOccurences = 1;
// As long as there are occurences following each other, we continue
// Example: "anandgdfgd". We see "an" twice. But it would not work with "anshfsduihsuan" since they are not following each other
while (word.find(occurence, lastIndex + occurence.size()) != std::string::npos)
{
lastIndex += occurence.size();
totalOccurences++;
}
std::ostringstream oss;
oss << "[" << totalOccurences << "*" << occurence << "]";
word.replace(startIndex, lastIndex, oss.str());
}
// Some code is missing here
return word;
}
I'm sure this problem has already been solved but I can't find anything. Any tips?
Clear rules have not been defined yet, but I think that building Suffix Array is good starting point. You can compare neighbour suffixes, find length of common starting parts and make decision - what substrings are the best candidates for removing.
My code counts spaces in string temp, which works fine. Later in the code I have a similar loop that does not end as expected.
Based on my debugging and some research, I’ve determined that the while loop isn’t stopping when it is supposed to the second time around, so an exception is thrown (std::out_of_range)… Why? Both while loops’ conditions are identical, so they should behave the same, right? That is, they should both stop looping when find() didn’t find a space…
First part of the code:
// Just find out how many spaces are in the temp string for now
// Find index (position in the string) of the first space in temp string
_index = temp.find( " " );
// Loop as long as there are still spaces in temp string
while (_index != string::npos) {
// Find the index of the next space
_index = temp.find(" ", _index + 1);
// Keep track of how many spaces are in temp string
count++;
}
Later (don’t mind my debugging code interspersed in there):
// If the amount of spaces is just right, i.e. there are exactly SIZE - 1 spaces
if (count == SIZE - 1) {
cout << "\n\n - Inside the if (count == SIZE - 1) block\n"
<< " - temp = \"" << temp << "\", temp's size = " << temp.length()
<< "\n\n";
// This time, we're removing the spaces from the temp string in order to
// be able to search it for any non-number characters
// Find index of the first space in the temp string
_index = temp.find(" ");
cout << " - The first temp.find(\" \") yields: " << _index << "\n\n";
// Loop as long as there are still spaces in temp string
while (_index != string::npos) {
cout << " - Right before temp.replace() in the loop\n"
<< " - _index = " << _index << "\n\n";
// Remove the space in temp string
temp.replace(_index, 1, "");
// find the index of the next space
_index = input.find(" ", _index + 1);
}
// …
}
Within the second loop you reference a variable input instead of temp.
_index = input.find( " ", _index + 1 );`
Therefore the loop does never terminate. This you don't do within the first while-loop.
I want to search a large string for all the locations of a string.
The two other answers are correct but they are very slow and have O(N^2) complexity. But there is the Knuth-Morris-Pratt algorithm, which finds all substrings in O(N) complexity.
Edit:
Also, there is another algorithm: the so-called "Z-function" with O(N) complexity, but I couldn't find an English source for this algorithm (maybe because there is also another more famous one with same name - the Z-function of Rieman), so I will just put its code here and explain what it does.
void calc_z (string &s, vector<int> & z)
{
int len = s.size();
z.resize (len);
int l = 0, r = 0;
for (int i=1; i<len; ++i)
if (z[i-l]+i <= r)
z[i] = z[i-l];
else
{
l = i;
if (i > r) r = i;
for (z[i] = r-i; r<len; ++r, ++z[i])
if (s[r] != s[z[i]])
break;
--r;
}
}
int main()
{
string main_string = "some string where we want to find substring or sub of string or just sub";
string substring = "sub";
string working_string = substring + main_string;
vector<int> z;
calc_z(working_string, z);
//after this z[i] is maximal length of prefix of working_string
//which is equal to string which starting from i-th position of
//working_string. So the positions where z[i] >= substring.size()
//are positions of substrings.
for(int i = substring.size(); i < working_string.size(); ++i)
if(z[i] >=substring.size())
cout << i - substring.size() << endl; //to get position in main_string
}
Using std::string::find. You can do something like:
std::string::size_type start_pos = 0;
while( std::string::npos !=
( start_pos = mystring.find( my_sub_string, start_pos ) ) )
{
// do something with start_pos or store it in a container
++start_pos;
}
EDIT: Doh! Thanks for the remark, Nawaz! Better?
I'll add for completeness, there is another approach that is possible with std::search, works like std::string::find, difference is that you work with iterators, something like:
std::string::iterator it(str.begin()), end(str.end());
std::string::iterator s_it(search_str.begin()), s_end(search_str.end());
it = std::search(it, end, s_it, s_end);
while(it != end)
{
// do something with this position..
// a tiny optimisation could be to buffer the result of the std::distance - heyho..
it = std::search(std::advance(it, std::distance(s_it, s_end)), end, s_it, s_end);
}
I find that this sometimes outperforms std::string::find, esp. if you represent your string as a vector<char>.
Simply use std::string::find() which returns the position at which the substring was found, or std::string::npos if none was found.
Here is the documentation.
An here is the example taken from this documentation:
// string::find
#include <iostream>
#include <string>
using namespace std;
int main ()
{
string str ("There are two needles in this haystack with needles.");
string str2 ("needle");
size_t found;
// different member versions of find in the same order as above:
found=str.find(str2);
if (found!=string::npos)
cout << "first 'needle' found at: " << int(found) << endl;
found=str.find("needles are small",found+1,6);
if (found!=string::npos)
cout << "second 'needle' found at: " << int(found) << endl;
found=str.find("haystack");
if (found!=string::npos)
cout << "'haystack' also found at: " << int(found) << endl;
found=str.find('.');
if (found!=string::npos)
cout << "Period found at: " << int(found) << endl;
// let's replace the first needle:
str.replace(str.find(str2),str2.length(),"preposition");
cout << str << endl;
return 0;
}