Using strtok/strtok_r in a while loop in C++

Using strtok/strtok_r in a while loop in C++ - c++

I'm getting unexpected behavior from the strtok and strtrok_r functions:
queue<string> tks;
char line[1024];
char *savePtr = 0;
while(true)
{
//get input from user store in line
tks.push(strtok_r(line, " \n", &savePtr)); //initial push only works right during first loop
char *p = nullptr;
for (...)
{
p = strtok_r(NULL, " \n", &savePtr);
if (p == NULL)
{
break;
}
tks.push(p);
}
delete p;
savePtr = NULL;
//do stuff, clear out tks before looping again
}
I've tried using strtok and realized that during the second loop, the initial push is not occurring. I attempted to use the reentrant version strtok_r in order to control what the saved pointer is pointing to during the second loop by making sure it is null before looping again.
tks is only correctly populated during the first time through the loop - subsequent loops give varying results depending on the length of line
What am I missing here?

Just focusing on the inner loop and chopping off all of the stuff I don't see as necessary.
#include <iostream>
#include <queue>
#include <string>
#include <cstring>
using namespace std;
int main()
{
std::queue<std::string> tks;
while(true)
{
char line[1024];
char *savePtr;
char *p;
cin.getline(line, sizeof(line));
p = strtok_r(line, " \n", &savePtr); // initial read. contents of savePtr ignored
while (p != NULL) // exit when no more data, which includes an emtpy line
{
tks.push(p); // got data, store it
p = strtok_r(NULL, " \n", &savePtr); // get next token
}
// consume tks
}
}
I prefer the while loop over the for loop used by Toby Speight in his answer because I think it is more transparent and easier to read. Your mileage may vary. By the time the compiler is done with it they will be identical.
There is no need to delete any memory. It is all statically allocated. There is no need to clear anything before the next round except for tks. savePtr will be reset by the first strtok_r.
There is a failure case if the user inputs more than 1024 characters on a line, but this will not crash. If this still doesn't work, look into how you're consuming tks. It's not posted so we can't troubleshoot that portion.
Wholeheartedly recommend changing to a string-based solution if possible. This is a really simple, easy to write, but slow, one:
#include <iostream>
#include <queue>
#include <string>
#include <sstream>
int main()
{
std::queue<std::string> tks;
while(true)
{
std::string line;
std::getline(std::cin, line);
std::stringstream linestream(line);
std::string word;
// parse only on ' ', not on the usual all whitespace of >>
while (std::getline(linestream, word, ' '))
{
tks.push(word);
}
// consume tks
}
}

Your code wouldn't compile for me, so I fixed it:
#include <iostream>
#include <queue>
#include <string>
#include <cstring>
std::queue<std::string> tks;
int main() {
char line[1024] = "one \ntwo \nthree\n";
char *savePtr = 0;
for (char *p = strtok_r(line, " \n", &savePtr); p;
p = strtok_r(nullptr, " \n", &savePtr))
tks.push(p);
// Did we read it correctly?
for (; tks.size() > 0; tks.pop())
std::cout << ">" << tks.front() << "<" << std::endl;
}
This produces the expected output:
>one<
>two<
>three<
So your problem isn't with the code you posted.

If you have the option to use boost, try this one out to tokenize a string. Of course by providing your own string and delimeters.
#include <vector>
#include <boost/algorithm/string.hpp>
int main()
{
std::string str = "Any\nString\nYou want";
std::vector< std::string > results;
boost::split( results, str, boost::is_any_of( "\n" ) );
}

Related

Trouble getting two variables to update in C++ for loop

I am creating a function that splits a sentence into words, and believe the way to do this is to use str.substr, starting at str[0] and then using str.find to find the index of the first " " character. Then update the starting position parameter of str.find to start at the index of that " " character, until the end of str.length().
I am using two variables to mark the beginning position and end position of the word, and update the beginning position variable with the ending position of the last. But it is not updating as desired in the loop as I currently have it, and cannot figure out why.
#include <iostream>
#include <string>
using namespace std;
void splitInWords(string str);
int main() {
string testString("This is a test string");
splitInWords(testString);
return 0;
}
void splitInWords(string str) {
int i;
int beginWord, endWord, tempWord;
string wordDelim = " ";
string testWord;
beginWord = 0;
for (i = 0; i < str.length(); i += 1) {
endWord = str.find(wordDelim, beginWord);
testWord = str.substr(beginWord, endWord);
beginWord = endWord;
cout << testWord << " ";
}
}

It is easier to use a string stream.
#include <vector>
#include <string>
#include <sstream>
using namespace std;
vector<string> split(const string& s, char delimiter)
{
vector<string> tokens;
string token;
istringstream tokenStream(s);
while (getline(tokenStream, token, delimiter))
{
tokens.push_back(token);
}
return tokens;
}
int main() {
string testString("This is a test string");
vector<string> result=split(testString,' ');
return 0;
}
You can write it using the existing C++ libraries:
#include <string>
#include <vector>
#include <iterator>
#include <sstream>
int main()
{
std::string testString("This is a test string");
std::istringstream wordStream(testString);
std::vector<std::string> result(std::istream_iterator<std::string>{wordStream},
std::istream_iterator<std::string>{});
}

Couple of issues:
The substr() method second parameter is a length (not a position).
// Here you are using `endWord` which is a poisition in the string.
// This only works when beginWord is 0
// for all other values you are providing an incorrect len.
testWord = str.substr(beginWord, endWord);
The find() method searches from the second paramer.
// If str[beginWord] contains one of the delimiter characters
// Then it will return beginWord
// i.e. you are not moving forward.
endWord = str.find(wordDelim, beginWord);
// So you end up stuck on the first space.
Assuming you got the above fixed. You would be adding space at the front of each word.
// You need to actively search and remove the spaces
// before reading the words.
nice things you could do:
Here:
void splitInWords(string str) {
You are passing the parameter by value. This means you are making a copy. A better technique would be to pass by const reference (you are not modifying the original or the copy).
void splitInWords(string const& str) {
An Alternative
You can use the stream functionality.
void split(std::istream& stream)
{
std::string word;
stream >> word; // This drops leading space.
// Then reads characters into `word`
// until a "white space" character is
// found.
// Note: it emptys words before adding any
}

Using strtok() to parse text file

I've been trying to make a program that parses a text file and feeds 6 pieces of information into an array of objects. The problem for me is that I'm having issues figuring out how to process the text file. I was told that the first step I needed to do was to write some code that counted how many letters long each entry was. The txt file is in this format:
"thing1","thing2","thing3","thing4","thing5","thing6"
This is the current version of my code:
#include<iostream>
#include<string>
#include<fstream>
#include<cstring>
using namespace std;
int main()
{
ifstream myFile("Book List.txt");
while(myFile.good())
{
string line;
getline(myFile, line);
char *sArr = new char[line.length() + 1];
strcpy(sArr, line.c_str());
char *sPtr;
sPtr = strtok(sArr, " ");
while(sPtr != NULL)
{
cout << strlen(sPtr) << " ";
sPtr = strtok(NULL, " ");
}
cout << endl;
}
myFile.close();
return 0;
}
So there are two things making it hard for me right now.
1) How do I deal with the delimiters?
2) How do I deal with "skipping" the first quotation mark in each line?

Read in a string instead of a c-style string. This means that you can use the handy std methods.
The std::string::find() method should help you out with finding each thing that you want to parse.
http://www.cplusplus.com/reference/string/string/find/
You can use this to find all the commas, which will give you the starts of all the things.
Then you can use std::string::substr() to cut up the string into each piece.
http://www.cplusplus.com/reference/string/string/substr/
You can manage to get rid of the quotation marks by passing in 1 more than the start and 1 less than the length of the thing, you can also use

If you have to use strtok then this code snippet should give enough to modify your program to parse your data:
#include <cstdio>
#include <cstring>
int main ()
{
char str[] ="\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str,"\",");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ",\"");
}
return 0;
}
If you do not have to use strtok then you should use std::string as others have advised. Using std::string and std::istringstream:
#include <string>
#include <sstream>
#include <vector>
#include <iostream>
int main ()
{
std::string str2( "\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"" ) ;
std::istringstream is(str2);
std::string part;
while (getline(is, part, ','))
std::cout << part.substr(1,part.length()-2) << std::endl;
return 0;
}

For starters, don't use strtok if you can avoid it (and you easily can here - and you can even avoid using the find series of functions as well).
If you want to read in the whole line and then parse it:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
// defines a new ctype that treats commas as whitespace
struct csv_reader : std::ctype<char>
{
csv_reader() : std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
rc['\n'] = std::ctype_base::space;
rc[','] = std::ctype_base::space;
return &rc[0];
}
};
int main()
{
std::ifstream fin("yourFile.txt");
std::string line;
csv_reader csv;
std::vector<std::vector<std::string>> values;
while (std::getline(fin, line))
{
istringstream iss(line);
iss.imbue(std::locale(std::locale(), csv));
std::vector<std::string> vec;
std::copy(std::istream_iterator<std::string>(iss), std::istream_iterator<std::string>(), std::back_inserter(vec));
values.push_back(vec);
}
// values now contains a vector for each line that has the strings split by their commas
fin.close();
return 0;
}
That answers your first question. For your second, you can skip all the quotation marks by adding them to the rc mask (also treating them as whitespace) or you can strip them out afterwards (either directly or by using a transform):
std::transform(vec.begin(), vec.end(), vec.begin(), [](std::string& s)
{
std::string::iterator pend = std::remove_if(s.begin(), s.end(), [](char c)
{
return c == '"';
});
s.erase(pend, s.end());
});

Random ascii char's appearing

I was trying to write a program that stores the message in a string backwards into a character array, and whenever I run it sometimes it successfully writes it backwards but other times it will add random characters to the end like this:
input: write this backwards
sdrawkcab siht etirwˇ
#include <iostream>
#include <string>
using namespace std;
int main()
{
string message;
getline(cin, message);
int howLong = message.length() - 1;
char reverse[howLong];
for(int spot = 0; howLong >= 0; howLong--)
{
reverse[spot] = message.at(howLong);
spot++;
}
cout << reverse;
return 0;
}

The buffer reverse needs to be message.length() + 1 in length so that it can store a null termination byte. (And the null termination byte needs to be placed in the last position in that buffer.)

Since you can't declare an array with a length that is only known at runtime, you have to use a container instead.
std::vector<char> reverse(message.length());
Or better, use std::string. The STL also offers some nice functions to you, for example building the reversed string in the constructor call:
std::string reverse(message.rbegin(), message.rend();

Instead of reversing into a character buffer, you should build a new string. It's easier and less prone to bugs.
string reverse;
for(howlong; howLong >= 0; howLong--)
{
reverse.push_back(message.at(howLong));
}

Use a proper C++ solution.
Inline reverse the message:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string message;
getline(cin, message);
//inline reverse the message
reverse(message.begin(),message.end());
//print the reversed message:
cout << message << endl;
return 0;
}
Reverse a copy of the message string:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string message, reversed_message;
getline(cin, message);
//reverse message
reversed_message = message;
reverse(reversed_message.begin(), reversed_message.end());
//print the reversed message:
cout << reversed_message << endl;
return 0;
}
If you really need to save the reversed string in a C string, you can do it:
char *msg = (char *)message.c_str();
but, as a rule of thumb use C++ STL strings if you can.

To store tokens into an array

A novice at C++, i am trying to create a stats program to practice coding. i am hoping to get a text file, read it and store values into arrays on which i can perform mathematical operations. i am stuck here
main ()
{
char output[100];
char *charptr;
int age[100];
ifstream inFile;
inFile.open("data.txt");
if(!inFile)
{
cout<<"didn't work";
cin.get();
exit (1);
}
inFile.getline(output,100);
charptr = strtok(output," ");
for (int x=0;x<105;x++)
{
age[x] = atoi(charptr);
cout<<*age<<endl;
}
cin.get();
}
in the code above, I am trying to store subject ages into the int array 'age', keeping ages in the first line of the file. I intend to use strtok as mentioned, but i am unable to convert the tokens into the array.
As you can obviously see, I am a complete noob please bear with me as I am learning this on my own. :)
Thanks
P.S: I have read similar threads but am unable to follow the detailed code given there.

There are a few issues with the for loop:
Possibility of going out-of-bounds due to age having 100 elements, but terminating condition in for loop is x < 105
No check on charptr being NULL prior to use
No subsequent call to strtok() inside for loop
Printing of age elements is incorrect
The following would be example fix of the for loop:
charptr = strtok(output, " ");
int x = 0;
while (charptr && x < sizeof(age)/sizeof(age[0]))
{
age[x] = atoi(charptr);
cout << age[x] << endl;
charptr = strtok(NULL, " ");
x++;
}
As this is C++, suggest:
using std::vector<int> instead of a fixed size array
use the std::getline() to avoid specifying a fixed size buffer for reading a line
use std::copy() with istream_iterator for parsing the line of integers
For example:
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
int main ()
{
std::vector<int> ages;
std::ifstream inFile;
inFile.open("data.txt");
if(!inFile)
{
std::cout<<"didn't work";
std::cin.get();
exit (1);
}
std::string line;
std::getline(inFile, line);
std::istringstream in(line);
std::copy(std::istream_iterator<int>(in),
std::istream_iterator<int>(),
std::back_inserter(ages));
return 0;
}

Why isn't C++ strtok() working for me?

The program is supposed to receive an input through cin, tokenize it, and then output each one to show me that it worked properly. It did not.
The program compiles with no errors, and takes an input, but fails to output anything.
What am I doing wrong?
int main(int argc, char* argv[])
{
string input_line;
while(std::cin >> input_line){
char* pch = (char*)malloc( sizeof( char ) *(input_line.length() +1) );
char *p = strtok(pch, " ");
while (p != NULL) {
printf ("Token: %s\n", p);
p = strtok(NULL, " ");
}
}
return 0;
}
I followed the code example here: http://www.cplusplus.com/reference/clibrary/cstring/strtok/
Thanks.

Looks like you forget to copy the contents of input_line to pch:
strcpy(pch, input_line.c_str());
But I'm not sure why you're doing string tokenization anyway. Doing cin >> input_line will not read a line, but a token.. so you get tokens anyway?

This is more of a correctness post, Hans has your problem.
The correct way to get a line of input is with getline:
std::string s;
std::getline(std::cin, s);
std::cin breaks at whitespace anyway, so if you typed asd 123 and ran your code, input_line would first be "asd", then the second time in the loop "123" (without waiting for enter).
That said, an easy way to get your result is with a stringstream. Any time you explicitly allocate memory, especially with malloc, you're probably doing something the hard way. Here's one possible solution to tokenizing a string:
#include <sstream>
#include <string>
#include <iostream>
int main(void)
{
std::string input;
std::getline(std::cin, input);
std::stringstream ss(input);
std::string token;
while(std::getline(ss, token, ' '))
{
std::cout << token << "...";
}
std::cout << std::endl;
}
If you really want to use strtok, you might do something like this:
#include <cstring>
#include <string>
#include <iostream>
#include <vector>
int main(void)
{
std::string input;
std::getline(std::cin, input);
std::vector<char> buffer(input.begin(), input.end());
buffer.push_back('\0');
char* token = strtok(&buffer[0], " ");
for (; token; token = strtok(0, " "))
{
std::cout << token << "...";
}
std::cout << std::endl;
}
Remember, manually memory management is bad. Use a vector for arrays, and you avoid leaks. (Which your code has!)

You didn't initialize your string. Insert
strcpy(pch, input_line.c_str());
after the malloc line.

GMan's answer is probably better and more purely c++. This is more of a mix which specifically uses strtok(), since I think that was your goal.
I used strdup()/free() since it was the easiest way to copy the string. In the question you were leaking memory since you'd malloc() with no matching free().
Also operator>> with the string will break on whitespace and so inappropriate for getting lines. Use getline() instead.
token.cpp
#include <iostream>
#include <string>
#include <cstring> /* for strtok() and strdup() */
#include <cstdlib> /* for free() */
int main(int argc, char * argv[]){
std::string line;
while(getline(std::cin, line)){
char *pch = strdup(line.c_str());
char *p = strtok(pch, " ");
while(p){
std::cout<<"Token: "<<p<<std::endl;
p = strtok(NULL, " ");
}
std::cout <<"End of line"<<std::endl;
free(pch);
}
return 0;
}
When you run this, you get what appears to be the correct result/
$ printf 'Hi there, I like tokens\nOn new lines too\n\nBlanks are fine'|./token
Token: Hi
Token: there,
Token: I
Token: like
Token: tokens
End of line
Token: On
Token: new
Token: lines
Token: too
End of line
End of line
Token: Blanks
Token: are
Token: fine
End of line

Or use this:
pch = strdup(input_line.c_str());

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using strtok/strtok_r in a while loop in C++ - c++

Related

Trouble getting two variables to update in C++ for loop

Using strtok() to parse text file

Random ascii char's appearing

To store tokens into an array

Why isn't C++ strtok() working for me?

Categories

Resources