I've written a simple string tokenizing program using pointers for a recent school project. However, I'm having trouble with my StringTokenizer::Next() method, which, when called, is supposed to return a pointer to the first letter of the next word in the char array. I get no compile-time errors, but I get a runtime error which states:
Unhandled exception at 0x012c240f in Project 5.exe: 0xC0000005: Access violation reading location 0x002b0000.
The program currently tokenizes the char array, but then stops and this error pops up. I have a feeling it has to do with the NULL checking I'm doing in my Next() method.
So how can I fix this?
Also, if you notice anything I could do more efficiently or with better practice, please let me know.
Thanks!!
StringTokenizer.h:
#pragma once
class StringTokenizer
{
public:
StringTokenizer(void);
StringTokenizer(char* const, char);
char* Next(void);
~StringTokenizer(void);
private:
char* pStart;
char* pNextWord;
char delim;
};
StringTokenizer.cpp:
#include "stringtokenizer.h"
#include <iostream>
using namespace std;
StringTokenizer::StringTokenizer(void)
{
pStart = NULL;
pNextWord = NULL;
delim = 'n';
}
StringTokenizer::StringTokenizer(char* const pArray, char d)
{
pStart = pArray;
delim = d;
}
char* StringTokenizer::Next(void)
{
pNextWord = pStart;
if (pStart == NULL) { return NULL; }
while (*pStart != delim) // access violation error here
{
pStart++;
}
if (pStart == NULL) { return NULL; }
*pStart = '\0'; // sometimes the access violation error occurs here
pStart++;
return pNextWord;
}
StringTokenizer::~StringTokenizer(void)
{
delete pStart;
delete pNextWord;
}
Main.cpp:
// The PrintHeader function prints out my
// student info in header form
// Parameters - none
// Pre-conditions - none
// Post-conditions - none
// Returns - void
void PrintHeader();
int main ( )
{
const int CHAR_ARRAY_CAPACITY = 128;
const int CHAR_ARRAY_CAPCITY_MINUS_ONE = 127;
// create a place to hold the user's input
// and a char pointer to use with the next( ) function
char words[CHAR_ARRAY_CAPACITY];
char* nextWord;
PrintHeader();
cout << "\nString Tokenizer Project";
cout << "\nyour name\n\n";
cout << "Enter in a short string of words:";
cin.getline ( words, CHAR_ARRAY_CAPCITY_MINUS_ONE );
// create a tokenizer object, pass in the char array
// and a space character for the delimiter
StringTokenizer tk( words, ' ' );
// this loop will display the tokens
while ( ( nextWord = tk.Next ( ) ) != NULL )
{
cout << nextWord << endl;
}
system("PAUSE");
return 0;
}
EDIT:
Okay, I've got the program working fine now, as long as the delimiter is a space. But if I pass it a `/' as a delim, it comes up with the access violation error again. Any ideas?
Function that works with spaces:
char* StringTokenizer::Next(void)
{
pNextWord = pStart;
if (*pStart == '\0') { return NULL; }
while (*pStart != delim)
{
pStart++;
}
if (*pStart = '\0') { return NULL; }
*pStart = '\0';
pStart++;
return pNextWord;
}
An access violation (or "segmentation fault" on some OSes) means you've attempted to read or write to a position in memory that you never allocated.
Consider the while loop in Next():
while (*pStart != delim) // access violation error here
{
pStart++;
}
Let's say the string is "blah\0". Note that I've included the terminating null. Now, ask yourself: how does that loop know to stop when it reaches the end of the string?
More importantly: what happens with *pStart if the loop fails to stop at the end of the string?
This answer is provided based on the edited question and various comments/observations in other answers...
First, what are the possible states for pStart when Next() is called?
pStart is NULL (default constructor or otherwise set to NULL)
*pStart is '\0' (empty string at end of string)
*pStart is delim (empty string at an adjacent delimiter)
*pStart is anything else (non-empty-string token)
At this point we only need to worry about the first option. Therefore, I would use the original "if" check here:
if (pStart == NULL) { return NULL; }
Why don't we need to worry about cases 2 or 3 yet? You probably want to treat adjacent delimiters as having an empty-string token between them, including at the start and end of the string. (If not, adjust to taste.) The while loop will handle that for us, provided you also add the '\0' check (needed regardless):
while (*pStart != delim && *pStart != '\0')
After the while loop is where you need to be careful. What are the possible states now?
*pStart is '\0' (token ends at end of string)
*pStart is delim (token ends at next delimiter)
Note that pStart itself cannot be NULL here.
You need to return pNextWord (current token) for both of these conditions so you don't drop the last token (i.e., when *pStart is '\0'). The code handles case 2 correctly but not case 1 (original code dangerously incremented pStart past '\0', the new code returned NULL). In addition, it is important to reset pStart for case 1 correctly, such that the next call to Next() returns NULL. I'll leave the exact code as an exercise to reader, since it is homework after all ;)
It's a good exercise to outline the possible states of data throughout a function in order to determine the correct action for each state, similar to formally defining base cases vs. recursive cases for recursive functions.
Finally, I noticed you have delete calls on both pStart and pNextWord in your destructor. First, to delete arrays, you need to use delete [] ptr; (i.e., array delete). Second, you wouldn't delete both pStart and pNextWord because pNextWord points into the pStart array. Third, by the end, pStart no longer points to the start of the memory, so you would need a separate member to store the original start for the delete [] call. Lastly, these arrays are allocated on the stack and not the heap (i.e., using char var[], not char* var = new char[]), and therefore they shouldn't be deleted. Therefore, you should simply use an empty destructor.
Another useful tip is to count the number of new and delete calls; there should be the same number of each. In this case, you have zero new calls, and two delete calls, indicating a serious issue. If it was the opposite, it would indicate a memory leak.
Inside ::Next you need to check for the delim character, but you also need to check for the end of the buffer, (which I'm guessing is indicated by a \0).
while (*pStart != '\0' && *pStart != delim) // access violation error here
{
pStart++;
}
And I think that these tests in ::Next
if (pStart == NULL) { return NULL; }
Should be this instead.
if (*pStart == '\0') { return NULL; }
That is, you should be checking for a Nul character, not a null pointer. Its not clear whether you intend for these tests to detect an uninitialized pStart pointer, or the end of the buffer.
An access violation usually means a bad pointer.
In this case, the most likely cause is running out of string before you find your delimiter.
Related
I was working on a system that split a sentence to a 2D pointer.
I don't wanna use any kind of library or another ways like string, because I want to practice pointers and learn them.
char** sscanf(char* hstring)
{
int count = 0;
char* current = hstring;
while (*current)
{
if (*current == ' ')
{
count++;
}
while (*current == ' ')
{
current++;
}
if (*current)
break;
current++;
}
char** result = new char*[count];
current = hstring;
char* nstr = new char;
int c = 0, i = 0;
while (*current)
{
if (!*current) break;
cout << "t1";
if (*current == ' ')
{
*(++result) = nstr;
nstr = nullptr;
nstr = new char;
}
cout << "t2";
while (*current != '/0' && *current == ' ')
{
current++;
}
cout << "t3";
while (*current != '/0' && *current != ' ')
{
if (!*current) break;
*(++nstr) = *current;
current++;
}
cout << "t4";
*nstr = '/0';
cout << "t5";
}
return result;
}
But it's very strange, sometimes redirects me to
static size_t __CLRCALL_OR_CDECL length(_In_z_ const _Elem * const _First) _NOEXCEPT // strengthened
{ // find length of null-terminated string
return (_CSTD strlen(_First));
}
with error: Acces Violation, other times, choose a random line and call it Acces Breakout(sorry if I spelled wrong)
What I want from you is not to repair my code simply, I want some explanations, because I want to learn this stuff.
First, some advice
I understand that you are making this function as an exercise, but being C++ I'd like to warn you that things like new char*[count] are bad practices and that's why std::vector or std::array were created.
You seem confused about how dynamic allocation works. The statement char* nstr = new char; will create just one byte (char) in heap memory, and nothing is guaranteed to be adjacent to it. This means that ++nstr is a "invalid" operation, I mean, it's making the nstr point to the next byte after the allocated one, which can be some random invalid location.
There is a whole lot of other dangerous operations in your code, like calling new several times (which reserves memory) and not calling delete on them when you no longer use the reserved memory (aka. memory leaks). Having said that, I strongly encourage you to study this subject, for example starting with the ISO C++ FAQ on memory management.
Also, before digging into pointers and dynamic allocation, you should be more confortable with statements and flow control. I say this because I see some clear misunderstandings, like:
while (*current) {
if (!*current) break;
...
}
The check inside the if statement will certainly be false, because the while check is executed just before it and guarantees that the opposite condition is true. This means that this if is never evaluated to true and it's completely useless.
Another remark is: don't name your functions the same as standard libraries ones. sscanf is already taken, choose another (and more meaningful) one. This will save you some headaches in the future; be used to name your own functions properly.
A guided solution
I'm in a good mood, so I'll go through some steps here. Anyway, if someone is looking for an optimized and ready to go solution, see Split a String in C++.
0. Define the steps
Reading your code, I could guess some of your desired steps:
char** split_string(char* sentence)
{
// Count the number of words in the sentence
// Allocate memory for the answer (a 2D buffer)
// Write each word in the output
}
Instead of trying to get them right all at once, why don't you try one by one? (Notice the function's and parameter's names, clearer in my opinion).
1. Count the words
You could start with a simple main(), testing your solution. Here is mine (sorry, I couldn't just adapt yours). For those who are optimization-addicted, this is not an optimized solution, but a simple snippet for the OP.
// I'll be using this header and namespace on the next snippets too.
#include <iostream>
using namespace std;
int main()
{
char sentence[] = " This is my sentence ";
int n_words = 0;
char *p = sentence;
bool was_space = true; // see logic below
// Reading the whole sentence
while (*p) {
// Check if it's a space and advance pointer
bool is_space = (*p++ == ' ');
if (was_space && !is_space)
n_words++; // count as word a 'rising edge'
was_space = is_space;
}
cout << n_words;
}
Test it, make sure you understand why it works. Now, you can move to the next step.
2. Allocate the buffer
Well, you want to allocate one buffer for each word, so we need to know the size of each one of them (I'll not discuss whether or not this is a good approach to the split sentence problem..). This was not calculated on the previous step, so we might do it now.
int main()
{
char sentence[] = " This is my sentence ";
///// Count the number of words in the sentence
int n_words = 0;
char *p = sentence;
bool was_space = true; // see logic below
// Reading the whole sentence
while (*p) {
// Check if it's a space and advance pointer
bool is_space = (*p++ == ' ');
if (was_space && !is_space)
n_words++; // count as word a 'rising edge'
was_space = is_space;
}
///// Allocate memory for the answer (a 2D buffer)
// This is more like C than C++, but you asked for it
char **words = new char*[n_words];
char *ini = sentence; // the initial char of each word
for (int i = 0; i < n_words; ++i) {
while (*ini == ' ') ini++; // search next non-space char
char *end = ini + 1; // pointer to the end of the word
while (*end && *end != ' ') end++; // search for \0 or space
int word_size = end - ini; // find out the word size by address offset
ini = end; // next for-loop iteration starts
// at the next word
words[i] = new char[word_size]; // a whole buffer for one word
cout << i << ": " << word_size << endl; // debugging
}
// Deleting it all, one buffer at a time
for (int i = 0; i < n_words; ++i) {
delete[] words[i]; // delete[] is the syntax to delete an array
}
}
Notice that I'm deleting the allocated buffers inside the main(). When you move this logic to your function, this deallocation will be performed by the caller of the function, since it will probably use the buffers before deleting them.
3. Assigning each word to its buffer
I think you got the idea. Assign the words and move the logic to the separated function. Update your question with a Minimal, Complete, and Verifiable example if you still have troubles.
I know this is a Q&A forum, but I think this is already a healthy answer to the OP and to others that may pass here. Let me know if I should answer differently.
#include <iostream>
#include <cstdlib>
using namespace std;
void reverse(char* str){
char *end = str;
char tmp;
if(str){
cout << "hello" << endl;
while(*end){
cout << end << endl;
++end;
}
--end;
while (str < end){
tmp = *str;
*str++ = *end;
*end-- = tmp;
}
}
}
int main(){
char str[] = "helloyouarefunny";
string input = str;
reverse(str);
for(int i = 0; i < input.length(); i++) {
cout << str[i];
}
}
Is if(str){} equivalent to if(str == NULL){}?
What does while(*end){} mean and what is it exactly doing? I think I have a general understanding that the while loop will continue to be executed as long as it does not "see" a '\0'. But I am not sure what is exactly going on with this line of code.
Given that if(str){} is an equivalent statement to if(str == NULL){}, what would you pass into a function to make str = NULL? For example, in my main(){}, I tried to do char str[] = NULL, thereby, attempting to pass a NULL so that it wouldn't go inside the code if(str == NULL){}. But I get an error saying I cannot make this declaration char str[] = NULL. So my question is why am I getting this error and what can I pass through the reverse() function in order to make the code inside of if(str){} not execute? I hope this question made sense.
And the code ++end is doing pointer arithmetic correct? So every time it is incremented, the address is moving to the address right next to it?
I'm a little confused while(str < end){}. What is the difference between just str and *str? I understand that cout << str << endl; has to do with overloading the operator << and therefore prints the entire string that is passed through the argument. But why, when I cout << *end << endl;, it only prints the character at that memory address? So my question is, what's the difference between the two? Is it just dereferencing when i do *str? I might actually be asking more than that question in this question. I hope I don't confuse you guys >_<.
Is if(str){} equivalent to if(str == NULL){}?
No, if(str){} is equivalent to if(str != NULL){}
What does while(*end){} mean and what is it exactly doing?
Since the type of of end is char*, while(*end){} is equivalent to while (*end != '\0'). The loop is executed for all the characters of the input string. When the end of the string is reached, the loop stops.
Given that if(str){} is an equivalent statement to if(str == NULL){}
That is not correct. I did not read rest of the paragraph since you start out with an incorrect statement.
And the code ++end is doing pointer arithmetic correct? So every time it is incremented, the address is moving to the address right next to it?
Sort of. The value of end is incremented. It points to the next object that it used to point to before the operation.
I'm a little confused while(str < end){}
In the previous while loop, end was incremented starting from str until it reached the end of the string. In this loop, end is decremented until it reaches the start of the string. When end reaches str, the conditional of the while statement evaluates to false and the loop breaks.
Update
Regarding
what would you pass into a function to make str = NULL?
You could simply call
reverse(NULL);
I tried to do char str[] = NULL;
str is an array of characters. It can be initialized using couple of ways:
// This is what you have done.
char str[] = "helloyouarefunny";
// Another, more tedious way:
char str[] = {'h','e','l','l','o','y','o','u','a','r','e','f','u','n','n','y', '\0'};
Notice the presence of an explicitly specified null character in the second method.
You cannot initialize a variable that is of type array of chars to to NULL. The language does not allow that. You can initialize a pointer to NULL but not an array.
char* s1 = NULL; // OK
reverse(s1); // Call the function
s1 = malloc(10); // Allocate memory for the pointer.
strcpy(s1, "test") // Put some content in the allocated memory
reverse(s1); // Call the function, this time with some content.
These are pretty standard C programming idioms.
No, in fact if (str) ... is equivalent to if (str != NULL) ...
C character strings are null terminated, meaning that "Hello" is represented in memory as the character array {'H', 'e', 'l', 'l', 'o', '\0'}. As with pointers, the 0 or NULL value is considered false in a logical expression. Thus while (*end) ... will execute the body of the while loop so long as end has not reached the null character.
N/A
Correct - this advances to the next character in the string, or to the null terminator.
This is the reverse algorithm. After the first loop, end points to one past the end of the string and str points to the beginning. Now we work these two pointers toward each other, swapping characters.
1/2) In C and C++, whatever is in the if or while is evaluated as a boolean. 0 is evaluated to false while any other value is evaluated to true. Given that NULL is equivalent to 0, if(str) and if(str != NULL) do the same things.
Likewise, while(*end) will only loop so long as the value end is pointing to does not evaluate to 0.
3) If you pass a char pointer to this function, it could be the null pointer (char *str = 0), so you're checking to make sure str is not null.
4) Yes, the pointer is then pointing to the next location in memory until eventually you find the null at the end of the string.
5) Perhaps your confusion is based around the fact that the code is missing parenthesis, the loop should look like:
while (str < end){
tmp = *str;
*(str++) = *end;
*(end--) = tmp;
}
So that the two pointers will continue to make there way towards eachother until crossing paths (at which point, str will no longer be less than end)
I want to create a function that can read a file char by char continuously until some specific char encountered.
This is my method in a class FileHandler.
char* tysort::FileHandler::readUntilCharFound(size_t start, char seek)
{
char* text = new char;
if(this->inputFileStream != nullptr)
{
bool goOn = true;
size_t seekPos = start;
while (goOn)
{
this->inputFileStream->seekg(seekPos);
char* buffer = new char;
this->inputFileStream->read(buffer, 1);
if(strcmp(buffer, &seek) != 0)
{
strcat(text, buffer); // Execution stops here
seekPos++;
}
else
{
goOn = false;
}
}
}
//printf("%s\n", text);
return text;
}
I test this function and it actually works. This is an example to read a file content until new line character '\n' found.
size_t startPosition = 0;
char* text = this->fileHandler->readUntilCharFound(startPosition, '\n');
However, I am sure that something not right is exists somewhere in the code because if I use those method in a loop block, the app will just hangs. I guess the 'not right' things are about pointer but I don't know exactly where. Could you please point out for me?
C++ provides some easy-to-use solutions. For instance:
istream& getline (istream& is, string& str, char delim);
In your case, the parameter would be the equivalent of your text variable and delim would be the equivalent of your seek parameter. Also, the return value of getline would in some way be the equivalent of your goOn flag (there are good FAQs regarding the right patterns to check for EOF and IO errors using the return value of getline)
The lines
if(strcmp(buffer, &seek) != 0)
and
strcat(text, buffer); // Execution stops here
are causes for undefined behavior. strcmp and strcat expect null terminated strings.
Here's an updated version, with appropriate comments.
char* tysort::FileHandler::readUntilCharFound(size_t start, char seek)
{
// If you want to return a string containing
// one character, you have to allocate at least two characters.
// The first one contains the character you want to return.
// The second one contains the null character - '\0'
char* text = new char[2];
// Make it a null terminated string.
text[1] = '\0';
if(this->inputFileStream != nullptr)
{
bool goOn = true;
size_t seekPos = start;
while (goOn)
{
this->inputFileStream->seekg(seekPos);
// No need to allocate memory form the heap.
char buffer[2];
this->inputFileStream->read(buffer, 1);
if( buffer[0] == seek )
{
buffer[1] = '\0';
strcat(text, buffer);
seekPos++;
}
else
{
goOn = false;
}
}
}
return text;
}
You can further simplify the function to:
char* tysort::FileHandler::readUntilCharFound(size_t start, char seek)
{
// If you want to return a string containing
// one character, you have to allocate at least two characters.
// The first one contains the character you want to return.
// The second one contains the null character - '\0'
char* text = new char[2];
text[1] = '\0';
if(this->inputFileStream != nullptr)
{
this->inputFileStream->seekg(start);
// Keep reading from the stream until we find the character
// we are looking for or EOF is reached.
int c;
while ( (c = this->inputFileStream->get()) != EOF && c != seek )
{
}
if ( c != EOF )
{
text[0] = c;
}
}
return text;
}
this->inputFileStream->read(buffer, 1);
No error checking.
if(strcmp(buffer, &seek) != 0)
The strcmp function is used to compare strings. Here you just want to compare two characters.
I've got a function that splits up a string into various sections and then parses them, but when converting a string to char* I get a malformed output.
int parseJob(char * buffer)
{ // Parse raw data, should return individual jobs
const char* p;
int rows = 0;
for (p = strtok( buffer, "~" ); p; p = strtok( NULL, "~" )) {
string jobR(p);
char* job = &jobR[0];
parseJobParameters(job); // At this point, the data is still in good condition
}
return (1);
}
int parseJobParameters(char * buffer)
{ // Parse raw data, should return individual job parameters
const char* p;
int rows = 0;
for (p = strtok( buffer, "|" ); p; p = strtok( NULL, "|" )) { cout<<p; } // At this point, the data is malformed.
return (1);
}
I don't know what happens between the first function calling the second one, but it malforms the data.
As you can see from the code example given, the same method to convert string to char* is used and it works fine.
I'm using Visual Studio 2012/C++, any guidance and code examples will be greatly appreciated.
The "physical" reason your code does not work has nothing to do with std::string or C++. It wouldn't work in pure C as well. strtok is a function that stores its intermediate parsing state in some global variable. This immediately means that you cannot use strtok to parse more than one string at a time. Starting the second parse session before finishing the first would override the internal data stored by the first parse session, thus ruining it beyond repair. In other words, strtok parse sessions must not overlap. In your code they do overlap.
Also, in C++03 the idea of using std::string with strtok directly is doomed from the start. The internal sequence stored in std::string is not guaranteed to be null-terminated. This means that generally &jobR[0] is not a C-string. It can't be used with strtok. To convert a std::string to a C-string you have to use c_str(). But C-string returned by c_str() is non-modifiable.
In C++11 the null-termination is supposed to be visible through the [] operator, but still there seems to be no requirement to store the terminator object contiguously with the actual string, so &jobR[0] is still not a C-string even in C++11. C-string returned by c_str() or data() is non-modifiable.
You cannot use strtok() to parse multiple strings at the same time, like you are doing. The first call to parseJobParameters() in the first loop iteration of parseJob() will alter the internal buffer that strtok() points to, thus the second loop iteration of parseJob() will not be processing the original data anymore. You need to rewrite your code to not use nested calls to strtok() anymore, eg:
#include <vector>
#include <string>
void split(std::string s, const char *delims, std::vector &vec)
{
// alternatively, use s.find_first_of() and s.substr() instead...
for (const char* p = strtok(s.c_str(), delims); p != NULL; p = strtok(NULL, delims))
{
vec.push_back(p);
}
}
int parseJob(char * buffer)
{
std::vector<std::string> jobs;
split(buffer, "~", jobs);
for (std::vector<std::string>::iterator i = jobs.begin(); i != jobs.end(); ++i)
{
parseJobParameters(i->c_str());
}
return (1);
}
int parseJobParameters(char * buffer)
{
std::vector<std::string> params;
split(buffer, "|", params);
for (std::vector<std::string>::iterator i = params.begin(); i != params.end(); ++i)
{
std::cout << *i;
}
return (1);
}
Whilst this will give you the address of the first character in the string char* job = &jobR[0];, it does not give you a valid C-style string. YOu SHOULD use char* job = jobR.c_str();
I'm fairly sure that will solve your problem, but there could of course be something wrong with the way you read the buffer that is passed to parseJob in as well.
Edit: of course, you are also calling strtok from a function that uses strtok. Inside strtok looks a bit like this:
char *strtok(char *str, char *separators)
{
static char *last;
char *found = NULL;
if (!str) str = last;
... do searching for needle, set found to beginning of non-separators ...
if (found)
{
*str = 0; // mark end of string.
}
last = str;
return found;
}
Since "last" gets overwritten when you call parseParameters, you can't use strtok(NULL, ... ) when you get back to parseJobs
I just wrote a program that tokenizes a char array using pointers. The program only needed to work with a space as the delimiter character. I just turned it in and got full credit, but after turning it in, I realized that this program worked only if the delimiter character was a space.
My question is, how could I make this program work with an arbitrary delimiter character?
The function I've shown you below returns a pointer to the next word in the char array. This is what I believe I need to change for it to work with any delimiter character.
Thanks!
Code:
char* StringTokenizer::Next(void) {
pNextWord = pStart;
if (*pStart == '\0') { return NULL; }
while (*pStart != delim) {
pStart++;
}
if (*pStart == '\0') { return NULL; }
*pStart = '\0';
pStart++;
return pNextWord;
}
The printing loop in main():
while ((nextWord = tk.Next()) != NULL) {
cout << nextWord << endl;
}
The simpliest way is to change your
while (*pStart != delim)
to something like
while (*pStart != ' ' && *pStart != '\n' && *pStart != '\t')
Or, you could make delim a string, and create a function that checks if a char is in the string:
bool isDelim(char c, const char *delim) {
while (*delim) {
if (*delim == c)
return true;
delim++;
}
return false;
}
while ( !isDelim(*pStart, " \n\t") )
Or, perhaps the best solution is to use one of the prebuilt functions for doing all this, such as strtok.
Just change the line
while (*pStart != delim)
as follows:
while (*pStart != '\0' && strchr(" \t\n", *pStart) == NULL)
The standard strchr function (declared in the string.h header)
looks for a character (given in the second argument) in a C-string
(given in the first argument) and returns a pointer to the position
where that character occurs for the first time. Hence, the expression
strchr(" \t\n", *pStart) == NULL is true if the current character
(*pStart) cannot be not found in string " \t\n" and, therefore,
is not a delimiter. (Modify the delimiter string to adapt it to your
needs, of course.)
This approach provides a short and simple way to test whether a given
character belongs to a (small) set of characters of interest. And it
uses a standard function.
By the way, you can do this using not only a C-string, but with
a std::string, too. All you need is to declare a const std::string
with " \t\n"-like value and then replace the call to the strchr
function with the find method of the declared delimiter string.
Hmm...this doesn't look quite right:
if (*pStart = '\0')
The condition can never be true. I'm guessing you intended == instead of =? You also have a bit of a problem here:
while (*pStart != delim)
If the last word in the string isn't followed by a delimiter, this is going to run off the end of the string, which will cause serious problems.
Edit: Unless you really need to do this on your own, consider using a stringstream for the job. It already has all the right mechanism in place and quite heavily tested. It does add overhead, but it's quite acceptable in a lot of cases.
Not compiled. but I'd do something like this.
//const int N = someGoodValue;
char delimList[N] = {' ',',','.',';', '|', '!', '$', '\n'};//all delims here.
char* StringTokenizer::Next(void)
{
if (*pStart == '\0') { return NULL; }
pNextWord = pStart;
while (1){
for (int x = 0; x < N; x++){
if (*pStart == delimList[x]){ //this is it.
*pStart = '\0';
pStart++;
return pNextWord;
}
}
if ('\0' == *pStart){ //last word.. maybe.
return pNextWord;
}
pStart++;
}
}
// (!compiled).
I assume that we want to stick to C instead of C++. Functions strspn and strcspn are good for tokenizing by a set a delimiters. You can use strspn to find where the next separator begins (i.e. where the current token ends) and then using strcspn to find where the separator ends (i.e. where the next token begins). Loop until you reach the end.