Trying to understand strtok - c++

Consider the following snippet that uses strtok to split the string madddy.
char* str = (char*) malloc(sizeof("Madddy"));
strcpy(str,"Madddy");
char* tmp = strtok(str,"d");
std::cout<<tmp;
do
{
std::cout<<tmp;
tmp=strtok(NULL, "dddy");
}while(tmp!=NULL);
It works fine, the output is Ma. But by modifying the strtok to the following,
tmp=strtok(NULL, "ay");
The output becomes Madd. So how does strtok exactly work? I have this question because I expected strtok to take each and every character that is in the delimiter string to be taken as a delimiter. But in certain cases it is doing that way but in few cases, it is giving unexpected results. Could anyone help me understand this?

"Trying to understand strtok" Good luck!
Anyway, we're in 2011. Tokenise properly:
std::string str("abc:def");
char split_char = ':';
std::istringstream split(str);
std::vector<std::string> token;
for (std::string each; std::getline(split, each, split_char); token.push_back(each));
:D

Fred Flintstone probably used strtok(). It predates multi threaded environments and beats up (modifies) the source string.
When called with NULL for the first parameter, it continues parsing the last string. This feature was convenient, but a bit unusual even in its day.

It seems you forget that you have call strtok the first time (out of loop) by delimiter "d".
The strtok is working fine. You should have a reference here.
For the second example(strtok("ay")):
First, you call strtok(str, "d"). It will look for the first "d", and seperate your string. Specifically, it sets tmp = "Ma", and str = "ddy" (dropping the first "d").
Then, you call strtok(str, "ay"). It will look for an "a" in str, but since your string now is only "ddy", no matching occurs. Then it will look for an "y". So str = "dd" and tmp = "".
It prints "Madd" as you saw.

Actually your code is wrong, no wonder you get unexpected results:
char* str = (char*) malloc(sizeof("Madddy"));
should be
char* str = (char*) malloc(strlen("Madddy") + 1);

I asked a question inspired from another question about functions causing security problems/bad practise functions and the c standard library.
To quote the answer given to me from there:
A common pitfall with the strtok()
function is to assume that the parsed
string is left unchanged, while it
actually replaces the separator
character with '\0'.
Also, strtok() is used by making
subsequent calls to it, until the
entire string is tokenized. Some
library implementations store
strtok()'s internal status in a
global variable, which may induce some
nasty suprises, if strtok() is
called from multiple threads at the
same time.
As you've tagged your question C++, use something else! If you want to use C, I'd suggest implementing your own tokenizer that works in a safe fashion.

Since you changed your tag to be C and not C++, I rewrote your function to use printf so that you can see what is happening. Hoang is correct. You seeing correct output, but I think that you are printing everything on the same line, so you got confused by the output. Look at Hoang's answer as he explains what is happening correctly. Also, as others have noted, strtok destroys the input string, so you have to be careful about that - and it's not thread safe. But if you need a quick an dirty tokenizer, it works. Also, I changed the code to correctly use strlen, and not sizeof as correctly pointed out by Anders.
Here is your code modified to be more C-like:
char* str = (char*) malloc(strlen("Madddy") + 1);
strcpy(str,"Madddy");
char* tmp = strtok(str,"d");
printf ("first token: %s\n", tmp);
do
{
tmp=strtok(NULL, "ay");
if (tmp != NULL ) {
printf ("next token: %s\n", tmp);
}
} while(tmp != NULL);

Related

Qt - C++ string concatenation segmentation fault

I'm new to c++, so I guess I fell into a newbyes C++ pitfall.
I tried to do the following:
QString sdkInstallationDirectory=getenv("somEnv");
QString someSourceDir=sdkInstallationDirectory+"\\Data\\"+someReference+ "\\src";
and I get a segmentation fault.
I guess this is because of the concatenation of the const chars and insufficient memory allocated to the someSourceDir QString.
What exactly is my mistake? How can I do this concatenation?
char * getenv ( const char * name );
A null-terminated string with the value of the requested environment
variable, or NULL if that environment variable does not exist.
Why you not check result?
EDIT.
So, check pointer is not necessary.
For historical reasons, QString distinguishes between a null string
and an empty string. A null string is a string that is initialized
using QString's default constructor or by passing (const char *)0 to
the constructor.
You can't add strings together with a +. Try using a stringstream.
Something like:
stringstream ss;
ss << sdkInstallationDirectory << "\Data\" + someReference << "\src";
string str = ss.str();
Although, if you are using Qt, you shouldn't be joining paths as strings.
See How to build a full path string (safely) from separate strings?
Thank you all for your answers.
It appears that I was wrong, and the segmentation fault was caused a line before, where I created the reference I mentioned in the question.
I discovered it with further debugging.
Sorry for the confusion, and thank you again!

How convert type from const char * to char *

I'm trying create simple application in C++. This application has to read from file and displays data. I've written function:
std::vector <AndroidApplication> AndroidApplication::getAllApp(){
std::vector<AndroidApplication> allApp;
std::fstream f;
f.open("freeApps.txt");
std::string line;
if(f.is_open()){
while(getline(f, line)) {
std::string myLine = "";
char * line2 = line.c_str();
myLine = strtok(line2,"\t");
AndroidApplication * tmpApp = new AndroidApplication(myLine[1], myLine[2], myLine[4]);
tmpApp->Developer = myLine[0];
tmpApp->Pop = myLine[3];
tmpApp->Type = myLine[5];
allApp->pushBack(tmpApp);
}
}
return allApp;
}
It throws me an error in line:
myLine = strtok(line2,"\t");
An error:
cannot convert from 'const char *' to 'char *'
Could you tell me how can I deal with it?
Don't use strtok. std::string has its own functions for string-scanning, e.g., find.
To use strtok, you'll need a writeable copy of the string. c_str() returns a read-only pointer.
You can't just "convert it" and forget about it. The pointer you get from .c_str() is to a read-only buffer. You need to copy it into a new buffer to work with: ideally, by avoiding using antiquated functions like strtok in the first place.
(I'm not quite sure what you're doing with that tokenisation, actually; you're just indexing into characters in the once-tokenised string, not indexing tokens.)
You're also confusing dynamic and automatic storage.
std::vector<AndroidApplication> AndroidApplication::getAllApp()
{
std::vector<AndroidApplication> allApp;
// Your use of fstreams can be simplified
std::fstream f("freeApps.txt");
if (!f.is_open())
return allApp;
std::string line;
while (getline(f, line)) {
// This is how you tokenise a string in C++
std::istringstream split(line);
std::vector<std::string> tokens;
for (std::string each;
std::getline(split, each, '\t');
tokens.push_back(each));
// No need for dynamic allocation here,
// and I'm assuming you wanted tokens ("words"), not characters.
AndroidApplication tmpApp(tokens[1], tokens[2], tokens[4]);
tmpApp.Developer = tokens[0];
tmpApp.Pop = tokens[3];
tmpApp.Type = tokens[5];
// The vector contains objects, not pointers
allApp.push_back(tmpApp);
}
return allApp;
}
I suspect the error is actually on the previous line,
char * line2 = line.c_str();
This is because c_str() gives a read-only pointer to the string contents. There is no standard way to get a modifiable C-style string from a C++ string.
The easiest option to read space-separated words from a string (assuming that's what you're tying to do) is to use a string stream:
std::vector<std::string> words;
std::istringstream stream(line);
std::copy(std::istream_iterator<std::string>(stream),
std::istream_iterator<std::string>(),
back_inserter(words));
If you really want to use strtok, then you'll need a writable copy of the string, with a C-style terminator; one way to do this is to copy it into a vector:
std::vector<char> writable(line.c_str(), line.c_str() + line.length() + 1);
std::vector<char *> words;
while (char * word = strtok(words.empty() ? &writable[0] : NULL, " ")) {
words.push_back(word);
}
Bear in mind that strtok is quite difficult to use correctly; you need to call it once for each token, not once to create an array of tokens, and make sure nothing else (such as another thread) calls it until you've finished with the string. I'm not sure that my code is entirely correct; I haven't tried to use this particular form of evil in a long time.
Since you asked for it:
Theoretically you could use const_cast<char*>(line.c_str()) to get a char*. However giving the result of this to strtok (which modifies its parameter) is IIRC not valid c++ (you may cast away constness, but you may not modify a const object). So it might work for your specific platform/compiler or not (and even if it works it might break anytime).
The other way is to create a copy, which is filled with the contents of the string (and modifyable):
std::vector<char> tmp_str(line.begin(), line.end());
myLine = strtok(&tmp_str[0],"\t");
Of course as the other answers tell you in great detail, you really should avoid using functions like strtok in c++ in favour of functionality working directly on std::string (at least unless you have a firm grasp on c++, high performance requirements and know that using the c-api function is faster in your specific case (through profiling)).

Concatenate Strings in C/C++

How do I concatenate Strings with C/C++?
I tried the following ways:
PS: errorInfo is a char * I should return it.
errorInfo = strcat("Workflow: ", strcat(
workflowToString(workflow).utf8(), strcat(" ERROR: ",
errorCode.utf8)));
sprintf(errorInfo, "Workflow %s ERROR: %s",
workflowToString(workflow).utf8(), errorCode.utf8());
errorInfo = "Workflow: " + workflowToString(workflow).utf8() + " ERROR: " + errorCode.utf8;
Just the sprintf compiles but when running my application crash.
PS: I'm using NDK from Android
There ISN'T such a language as C/C++. There is C, and there is C++.
In C++ you concatenate std::string's by using operator+
In C, you use strcat
I know this doesn't quite answer your question, this is just an outcry :)
According to this page strcat does the following:
Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first character of source, and a new null-character is appended at the end of the new string formed by the concatenation of both in destination.
In your implementation, however, "Workflow: " is a constant string. You cannot modify that string, which is what strcat would do. In order to do that, create a string like:
char message[1000];
strcpy(message, "Workflow: ");
strcat(message, "other string");
....
However, be careful about the utf8 character encoding because one utf8 code point could be multiple chars long.
Concatenation is almost always the wrong idiom for string building, especially in C. It's error-prone, clutters your code, and has extremely bad asymptotic performance (i.e. O(n^2) instead of O(n) for building a string of length n).
Instead you should use the snprintf function, as in:
snprintf(buf, sizeof buf, "Workflow: %s ERROR: %s", workflow, error);
or if you're writing to a file/socket/etc. and don't need to keep the resulting string in memory, simply use fprintf to begin with.
With string literals you can simple use:
char str[] = "foo" " bar";
const char *s = " 1 " " 2 ";
s = " 3 " " 4 ";
By using strcat(), you are working in c, not c++.
c is not going to automatically manage memory for you.
c can be confusing since sometimes it seems like it has a string data type when all it is doing is providing you a string interface to arrays of characters.
For one thing, the first argument to strcat() has to be writable and have enough room to add the second string.
char *out = strcat("This", "nThat");
is asking c to stomp on string literal memory.
In general, you should NEVER use strcat()/sprintf, as in the above "chosen" answer. You can overwrite memory that way. Use strncat()/snprintf() instead to avoid buffer overruns. If you don't know the size to pass to "n" in strncat(), you're likely doing something wrong.
One way to do this in c would be:
#define ERROR_BUF_SIZE 2048 // or something big enough, you have to know in c
char errorInfo[ERROR_BUF_SIZE];
snprintf(errorInfo, ERROR_BUF_SIZE, "Workflow %s ERROR: %s",
workflowToString(workflow).utf8(), errorCode.utf8());
or similarly using strncpy/strncat
There are many ways you can concatenate in C while using Android NDK:
Two ways I used are:
strcat
sprintf
here is example:
enter code here
strcat
char* buffer1=(char*)malloc(250000);
char* buffer2=(char*)malloc(250000);
char* buffer3=(char*)malloc(250000);
buffer1 = strcat(buffer1, buffer2);
sprintf
sprintf(buffer3,"this is buffer1: %s and this is buffer2:%s",buffer1,buffer2);`
sprintf returns length of your string
strcat is not recommended as its use more memory..
you can use sprintf or others like strcpy.
Hope it helps.

What use is there for 'ends' these days?

I came across a subtle bug a couple of days ago where the code looked something like this:
ostringstream ss;
int anInt( 7 );
ss << anInt << "HABITS";
ss << ends;
string theWholeLot = ss.str();
The problem was that the ends was sticking a '\0' into the ostringstream so theWholeLot actually looked like "7HABITS\0" (i.e. a null at the end)
Now this hadn't shown up because theWholeLot was then being used to take the const char * portion using string::c_str() That meant that the null was masked as it became just a delimiter. However, when this changed to use strings throughout, the null suddenly meant something and comparisons such as:
if ( theWholeLot == "7HABITS" )
would fail. This got me thinking: Presumably the reason for ends is a throwback to the days of ostrstream when the stream was not normally terminated with a null and had to be so that str() (which then cast out not a string but a char *) would work correctly.
However, now that it's not possible to cast out a char * from a ostringstream, using ends is not only superfluous, but potentially dangerous and I'm considering removing them all from my clients code.
Can anyone see an obvious reason to use ends in a std::string only environment?
You've essentially answered your own question is as much detail that's needed. I certainly can't think of any reason to use std::ends when std::string and std::stringstream handle all that for you.
So, to answer your question explicitly, no, there is no reason to use std::ends in a std::string only environment.
There are some APIs that expect a "string array" with multiple zero terminated strings, a double zero to mark the end. Raymond Chang just recently blogged about it, most of all to demonstrate how often that this gets fumbled.

Why does std::ends cause string comparison to fail?

I spent about 4 hours yesterday trying to fix this issue in my code. I simplified the problem to the example below.
The idea is to store a string in a stringstream ending with std::ends, then retrieve it later and compare it to the original string.
#include <sstream>
#include <iostream>
#include <string>
int main( int argc, char** argv )
{
const std::string HELLO( "hello" );
std::stringstream testStream;
testStream << HELLO << std::ends;
std::string hi = testStream.str();
if( HELLO == hi )
{
std::cout << HELLO << "==" << hi << std::endl;
}
return 0;
}
As you can probably guess, the above code when executed will not print anything out.
Although, if printed out, or looked at in the debugger (VS2005), HELLO and hi look identical, their .length() in fact differs by 1. That's what I am guessing is causing the == operator to fail.
My question is why. I do not understand why std::ends is an invisible character added to string hi, making hi and HELLO different lengths even though they have identical content. Moreover, this invisible character does not get trimmed with boost trim. However, if you use strcmp to compare .c_str() of the two strings, the comparison works correctly.
The reason I used std::ends in the first place is because I've had issues in the past with stringstream retaining garbage data at the end of the stream. std::ends solved that for me.
std::ends inserts a null character into the stream. Getting the content as a std::string will retain that null character and create a string with that null character at the respective positions.
So indeed a std::string can contain embedded null characters. The following std::string contents are different:
ABC
ABC\0
A binary zero is not whitespace. But it's also not printable, so you won't see it (unless your terminal displays it specially).
Comparing using strcmp will interpret the content of a std::string as a C string when you pass .c_str(). It will say
Hmm, characters before the first \0 (terminating null character) are ABC, so i take it the string is ABC
And thus, it will not see any difference between the two above. You are probably having this issue:
std::stringstream s;
s << "hello";
s.seekp(0);
s << "b";
assert(s.str() == "b"); // will fail!
The assert will fail, because the sequence that the stringstream uses is still the old one that contains "hello". What you did is just overwriting the first character. You want to do this:
std::stringstream s;
s << "hello";
s.str(""); // reset the sequence
s << "b";
assert(s.str() == "b"); // will succeed!
Also read this answer: How to reuse an ostringstream
std::ends is simply a null character. Traditionally, strings in C and C++ are terminated with a null (ascii 0) character, however it turns out that std::string doesn't really require this thing. Anyway to step through your code point by point we see a few interesting things going on:
int main( int argc, char** argv )
{
The string literal "hello" is a traditional zero terminated string constant. We copy that whole into the std::string HELLO.
const std::string HELLO( "hello" );
std::stringstream testStream;
We now put the string HELLO (including the trailing 0) into the stream, followed by a second null which is put there by the call to std::ends.
testStream << HELLO << std::ends;
We extract out a copy of the stuff we put into the stream (the literal string "hello", plus the two null terminators).
std::string hi = testStream.str();
We then compare the two strings using the operator == on the std::string class. This operator (probably) compares the length of the string objects - including how ever many trailing null characters. Note that the std::string class does not require the underlying character array to end with a trailing null - put another way it allows the string to contain null characters so the first of the two trailing null characters is treated as part of the string hi.
Since the two strings are different in the number of trailing nulls, the comparison fails.
if( HELLO == hi )
{
std::cout << HELLO << "==" << hi << std::endl;
}
return 0;
}
Although, if printed out, or looked at
in the debugger (VS2005), HELLO and hi
look identical, their .length() in
fact differs by 1. That's what I am
guessing is causing the "==" operator
to fail.
Reason being, the length is different by one trailing null character.
My question is why. I do not
understand why std::ends is an
invisible character added to string
hi, making hi and HELLO different
lengths even though they have
identical content. Moreover, this
invisible character does not get
trimmed with boost trim. However, if
you use strcmp to compare .c_str() of
the two strings, the comparison works
correctly.
strcmp is different from std::string - it is written from back in the early days when strings were terminated with a null - so when it gets to the first trailing null in hi it stops looking.
The reason I used std::ends in the
first place is because I've had issues
in the past with stringstream
retaining garbage data at the end of
the stream. std::ends solved that for
me.
Sometimes it is a good idea to understand the underlying representation.
You're adding a NULL char to HELLO with std::ends. When you initialize hi with str() you are removing the NULL char. The strings are different. strcmp doesn't compare std::strings, it compares char* (it's a C function).
std::ends adds a null terminator, (char)'\0'. You'd use it with the deprecated strstream classes, to add the null terminator.
You don't need it with stringstream, and in fact it screws things up, because the null terminator isn't "the special null terminator that ends a string" to stringstream, to stringstream it's just another character, the zeroth character. stringstream just adds it, and that increases the character count (in your case) to seven, and makes the comparison to "hello" fail.
I think to have a good way to compare strings is to use std::find method. Do not mix C methods and std::string ones!