How do i store an (int)string into an int array - c++

int ascii[1000] = {0};
string *data = (string*)malloc ( 1000*sizeof( string));
char *text = (char*)malloc ( 1000 *sizeof( char));
cout << "Enter the first arrangement of data." << endl;
cin.getline(text, 1000);
char *token = strtok(text, " ");
while ( token != NULL )
{
if ( strlen(token) > 0)
{
cout << "The tokens are: " << token << endl;
data[Tcount++] = *token;
}
token = strtok(NULL, " ");
for(i=0; i < (Tcount); i++)
{
ascii[i] = (int)data[i]; // error here
}
Im using this code to build a parser and i want to store the ascii values of the tokens which are stored in 'data' into an array named 'ascii'.
When i run the program i get the error message, "error: assigning to 'int' from incompatible type 'string' (aka 'basic_string, allocator >')
Any help would be appreciated.

One thing before the main event here. Obviously you're allowed to use std::string so, let's get the data in a more civilized fashion.
std::vector<std::string> data;
std::string line;
std::getline(cin, line); //read a whole line
std::stringstream tokenizer(line); // stuff the line into an object that's
// really good at tokenizing
std::string token;
while (tokenizer >> token) // one by one push a word out of the tokenizer
{
data.push_back(token); //and stuff it into a vector
}
we now have all of the individual words on the line packed into a nice resizable container, a vector. No messy dynamic memory to clean up.
Step 2: turn those strings into ints. 'Fraid you can't do that, ace. You could take a string that represents a number and turn it into an int. That's easy. Dozens of ways to do it. I like strtol.
But the ascii values are character by character. A string is a variable number of characters. You can pack one into an int, shift the int over by the width of one character and stuff in another, but you're going to run out of space after probably 4 or 8 characters.
Let's go with that, shall we? And we'll do it the old way without an iterator.
std::string data;
int ascii = 0;
if (data.length() > 0)
{
ascii |= data[index];
for(size_t index = 0; index < data.length(); index++)
{
ascii <<= 8; //we're talking ascii here so no unicode bit counting games
ascii |= data[index];
}
}
Done. Not very useful unless all the strings are pretty short, but done.
Instead if you're going to do a parser why not go full geek and try this:
typedef void handlerfunc();
std::map<std::string, handlerfunc> parser;
parser["do something"] = somethingfunc;
parser["do something else"] = somethingelsefunc;
Where somethingfunc is a function that looks like void somethingfunc() that, obviously, does something. Dito somethingelsefunc. Only it does somethingelse.
Usage could be as simple as:
parser[token]();
But it's not. Sigh.
It's more like
found = parser.find(token)
if (found != parser.end())
{
found->second();
return CMD_OK;
}
else
{
return CMD_NOT_FOUND;
}
But seriously, look into some of the fun stuff a good container can do for you. Save a ton of time.
I crapped out all of the code without a compiler. Please let me know if I borked any of it.

Related

C++ store consecutive chars into an array

This could be a very trivial question, but I have been searching how to get around it without much luck. I have a function to read from the serial port using the libserial function, the response I will get always finishes with a carriage return or a "\r" so, in order to read it, I was thinking in reading character by character comparing if it is not a \r and then storing each character into an array for later usage. My function is as follows:
void serial_read()
{
char character;
int numCharacter = 0;
char data[256];
while(character != '\r')
{
serial_port >> character;
numCharacter++;
character >> data[numCharacter];
}
cout << data;
}
In summary, probably my question should be how to store consecutive chars into an array. Thank you very much for your valuable insight.
I guess you intended
void serial_read()
{
char character = 0;
int numCharacter = 0;
char data[256];
while(character != '\r' && numCharacter < 255)
{
serial_port >> character;
data [numCharacter ++] = character;
}
data [numCharacter] = 0; // close "string"
cout << data;
}

Split a sentence in words using char pointers

I was working on a system that split a sentence to a 2D pointer.
I don't wanna use any kind of library or another ways like string, because I want to practice pointers and learn them.
char** sscanf(char* hstring)
{
int count = 0;
char* current = hstring;
while (*current)
{
if (*current == ' ')
{
count++;
}
while (*current == ' ')
{
current++;
}
if (*current)
break;
current++;
}
char** result = new char*[count];
current = hstring;
char* nstr = new char;
int c = 0, i = 0;
while (*current)
{
if (!*current) break;
cout << "t1";
if (*current == ' ')
{
*(++result) = nstr;
nstr = nullptr;
nstr = new char;
}
cout << "t2";
while (*current != '/0' && *current == ' ')
{
current++;
}
cout << "t3";
while (*current != '/0' && *current != ' ')
{
if (!*current) break;
*(++nstr) = *current;
current++;
}
cout << "t4";
*nstr = '/0';
cout << "t5";
}
return result;
}
But it's very strange, sometimes redirects me to
static size_t __CLRCALL_OR_CDECL length(_In_z_ const _Elem * const _First) _NOEXCEPT // strengthened
{ // find length of null-terminated string
return (_CSTD strlen(_First));
}
with error: Acces Violation, other times, choose a random line and call it Acces Breakout(sorry if I spelled wrong)
What I want from you is not to repair my code simply, I want some explanations, because I want to learn this stuff.
First, some advice
I understand that you are making this function as an exercise, but being C++ I'd like to warn you that things like new char*[count] are bad practices and that's why std::vector or std::array were created.
You seem confused about how dynamic allocation works. The statement char* nstr = new char; will create just one byte (char) in heap memory, and nothing is guaranteed to be adjacent to it. This means that ++nstr is a "invalid" operation, I mean, it's making the nstr point to the next byte after the allocated one, which can be some random invalid location.
There is a whole lot of other dangerous operations in your code, like calling new several times (which reserves memory) and not calling delete on them when you no longer use the reserved memory (aka. memory leaks). Having said that, I strongly encourage you to study this subject, for example starting with the ISO C++ FAQ on memory management.
Also, before digging into pointers and dynamic allocation, you should be more confortable with statements and flow control. I say this because I see some clear misunderstandings, like:
while (*current) {
if (!*current) break;
...
}
The check inside the if statement will certainly be false, because the while check is executed just before it and guarantees that the opposite condition is true. This means that this if is never evaluated to true and it's completely useless.
Another remark is: don't name your functions the same as standard libraries ones. sscanf is already taken, choose another (and more meaningful) one. This will save you some headaches in the future; be used to name your own functions properly.
A guided solution
I'm in a good mood, so I'll go through some steps here. Anyway, if someone is looking for an optimized and ready to go solution, see Split a String in C++.
0. Define the steps
Reading your code, I could guess some of your desired steps:
char** split_string(char* sentence)
{
// Count the number of words in the sentence
// Allocate memory for the answer (a 2D buffer)
// Write each word in the output
}
Instead of trying to get them right all at once, why don't you try one by one? (Notice the function's and parameter's names, clearer in my opinion).
1. Count the words
You could start with a simple main(), testing your solution. Here is mine (sorry, I couldn't just adapt yours). For those who are optimization-addicted, this is not an optimized solution, but a simple snippet for the OP.
// I'll be using this header and namespace on the next snippets too.
#include <iostream>
using namespace std;
int main()
{
char sentence[] = " This is my sentence ";
int n_words = 0;
char *p = sentence;
bool was_space = true; // see logic below
// Reading the whole sentence
while (*p) {
// Check if it's a space and advance pointer
bool is_space = (*p++ == ' ');
if (was_space && !is_space)
n_words++; // count as word a 'rising edge'
was_space = is_space;
}
cout << n_words;
}
Test it, make sure you understand why it works. Now, you can move to the next step.
2. Allocate the buffer
Well, you want to allocate one buffer for each word, so we need to know the size of each one of them (I'll not discuss whether or not this is a good approach to the split sentence problem..). This was not calculated on the previous step, so we might do it now.
int main()
{
char sentence[] = " This is my sentence ";
///// Count the number of words in the sentence
int n_words = 0;
char *p = sentence;
bool was_space = true; // see logic below
// Reading the whole sentence
while (*p) {
// Check if it's a space and advance pointer
bool is_space = (*p++ == ' ');
if (was_space && !is_space)
n_words++; // count as word a 'rising edge'
was_space = is_space;
}
///// Allocate memory for the answer (a 2D buffer)
// This is more like C than C++, but you asked for it
char **words = new char*[n_words];
char *ini = sentence; // the initial char of each word
for (int i = 0; i < n_words; ++i) {
while (*ini == ' ') ini++; // search next non-space char
char *end = ini + 1; // pointer to the end of the word
while (*end && *end != ' ') end++; // search for \0 or space
int word_size = end - ini; // find out the word size by address offset
ini = end; // next for-loop iteration starts
// at the next word
words[i] = new char[word_size]; // a whole buffer for one word
cout << i << ": " << word_size << endl; // debugging
}
// Deleting it all, one buffer at a time
for (int i = 0; i < n_words; ++i) {
delete[] words[i]; // delete[] is the syntax to delete an array
}
}
Notice that I'm deleting the allocated buffers inside the main(). When you move this logic to your function, this deallocation will be performed by the caller of the function, since it will probably use the buffers before deleting them.
3. Assigning each word to its buffer
I think you got the idea. Assign the words and move the logic to the separated function. Update your question with a Minimal, Complete, and Verifiable example if you still have troubles.
I know this is a Q&A forum, but I think this is already a healthy answer to the OP and to others that may pass here. Let me know if I should answer differently.

A local array repeats inside a loop! C++

The current_name is a local char array inside the following loop. I declared it inside the loop so it changes every time I read a new line from a file. But, for some reason the previous data is not removed from the current_name! It prints old data out if it wasn't overridden by new characters from the next line.
ANY IDEAS?
while (isOpen && !file.eof()) {
char current_line[LINE];
char current_name[NAME];
file.getline(current_line, LINE);
int i = 0;
while (current_line[i] != ';') {
current_name[i] = current_line[i];
i++;
}
cout << current_name << endl;
}
You're not terminating current_name after filling it. Add current_name[i] = 0 after the inner loop just before your cout. You're probably seeing this if you read abcdef then read jkl and probably get jkldef for output
UPDATE
You wanted to know if there is a better way. There is--and we'll get to it. But, coming from Java, your question and followup identified some larger issues that I believe you should be aware of. Be careful what you wish for--you may actually get it [and more] :-). All of the following is based on love ...
Attention All Java Programmers! Welcome to "A Brave New World"!
Basic Concepts
Before we even get to C the language, we need to talk about a few concepts first.
Computer Architecture:
https://en.wikipedia.org/wiki/Computer_architecture
https://en.wikipedia.org/wiki/Instruction_set
Memory Layout of Computer Programs:
http://www.geeksforgeeks.org/memory-layout-of-c-program/
Differences between Memory Addresses/Pointers and Java References:
Is Java "pass-by-reference" or "pass-by-value"?
https://softwareengineering.stackexchange.com/questions/141834/how-is-a-java-reference-different-from-a-c-pointer
Concepts Alien to Java Programmers
The C language gives you direct access the underlying computer architecture. It will not do anything that you don't explicitly specify. Herein, I'm mentioning C [for brevity] but what I'm really talking about is a combination of the memory layout and the computer architecture.
If you read memory that you didn't initialize, you will see seemingly random data.
If you allocate something from the heap, you must explicitly free it. It doesn't magically get marked for deletion by a garbage collector when it "goes out of scope".
There is no garbage collector in C
C pointers are far more powerful that Java references. You can add and subtract values to pointers. You can subtract two pointers and use the difference as an index value. You can loop through an array without using index variables--you just deference a pointer and increment the pointer.
The data of automatic variables in Java are stored in the heap. Each variable requires a separate heap allocation. This is slow and time consuming.
In C, the data of automatic variables in stored in the stack frame. The stack frame is a contiguous area of bytes. To allocate space for the stack frame, C simply subtracts the desired size from the stack pointer [hardware register]. The size of the stack frame is the sum of all variables within a given function's scope, regardless of whether they're declared inside a loop inside the function.
Its initial value depends upon what previous function used that area for and what byte values it stored there. Thus, if main calls function fnca, it will fill the stack with whatever data. If then main calls fncb it will see fnca's values, which are semi-random as far as fncb is concerned. Both fnca and fncb must initialize stack variables before they are used.
Declaration of a C variable without an initializer clause does not initialize the variable. For the bss area, it will be zero. For a stack variable, you must do that explicitly.
There is no range checking of array indexes in C [or pointers to arrays or array elements for that matter]. If you write beyond the defined area, you will write into whatever has been mapped/linked into the memory region next. For example, if you have a memory area: int x[10]; int y; and you [inadvertently] write to x[10] [one beyond the end] you will corrupt y
This is true regardless of which memory section (e.g. data, bss, heap, or stack) your array is in.
C has no concept of a string. When people talk about a "c string" what they're really talking about is a char array that has an "end of string" (aka EOS) sentinel character at the end of the useful data. The "standard" EOS char is almost universally defined as 0x00 [since ~1970]
The only intrinsic types supported by an architecture are: char, short, int, long/pointer, long long, and float/double. There may be some others on a given arch, but that's the usual list. Everything else (e.g. a class or struct is "built up" by the compiler as a convenience to the programmer from the arch intrinsic types)
Here are some things that are about C [and C++]:
- C has preprocessor macros. Java has no concept of macros. Preprocessor macros can be thought of as a crude form of metaprogramming.
- C has inline functions. They look just like regular functions, but the compiler will attempt to insert their code directly into any function that calls one. This is handy if the function is cleanly defined but small (e.g. a few lines). It saves the overhead of actually calling the function.
Examples
Here are several versions of your original program as an example:
// myfnc1 -- original
void
myfnc1(void)
{
istream file;
while (isOpen && !file.eof()) {
char current_line[LINE];
char current_name[NAME];
file.getline(current_line, LINE);
int i = 0;
while (current_line[i] != ';') {
current_name[i] = current_line[i];
i++;
}
current_name[i] = 0;
cout << current_name << endl;
}
}
// myfnc2 -- moved definitions to function scope
void
myfnc2(void)
{
istream file;
int i;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
i = 0;
while (current_line[i] != ';') {
current_name[i] = current_line[i];
i++;
}
current_name[i] = 0;
cout << current_name << endl;
}
}
// myfnc3 -- converted to for loop
void
myfnc(void)
{
istream file;
int i;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
for (i = 0; current_line[i] != ';'; ++i)
current_name[i] = current_line[i];
current_name[i] = 0;
cout << current_name << endl;
}
}
// myfnc4 -- converted to use pointers
void
myfnc4(void)
{
istream file;
const char *line;
char *name;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
name = current_name;
for (line = current_line; *line != ';'; ++line, ++name)
*name = *line;
*name = 0;
cout << current_name << endl;
}
}
// myfnc5 -- more efficient use of pointers
void
myfnc5(void)
{
istream file;
const char *line;
char *name;
int chr;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
name = current_name;
line = current_line;
for (chr = *line++; chr != ';'; chr = *line++, ++name)
*name = chr;
*name = 0;
cout << current_name << endl;
}
}
// myfnc6 -- fixes bug if line has no semicolon
void
myfnc6(void)
{
istream file;
const char *line;
char *name;
int chr;
char current_line[LINE];
char current_name[NAME];
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
name = current_name;
line = current_line;
for (chr = *line++; chr != 0; chr = *line++, ++name) {
if (chr == ';')
break;
*name = chr;
}
*name = 0;
cout << current_name << endl;
}
}
// myfnc7 -- recoded to use "smart" string
void
myfnc7(void)
{
istream file;
const char *line;
char *name;
int chr;
char current_line[LINE];
xstr_t current_name;
xstr_t *name;
name = &current_name;
xstrinit(name);
while (isOpen && !file.eof()) {
file.getline(current_line, LINE);
xstragain(name);
line = current_line;
for (chr = *line++; chr != 0; chr = *line++) {
if (chr == ';')
break;
xstraddchar(name,chr);
}
cout << xstrcstr(name) << endl;
}
xstrfree(name);
}
Here is a "smart" string [buffer] class similar to what you're used to:
// xstr -- "smart" string "class" for C
typedef struct {
size_t xstr_maxlen; // maximum space in string buffer
char *xstr_lhs; // pointer to start of string
char *xstr_rhs; // pointer to start of string
} xstr_t;
// xstrinit -- reset string buffer
void
xstrinit(xstr_t *xstr)
{
memset(xstr,0,sizeof(xstr));
}
// xstragain -- reset string buffer
void
xstragain(xstr_t xstr)
{
xstr->xstr_rhs = xstr->xstr_lhs;
}
// xstrgrow -- grow string buffer
void
xstrgrow(xstr_t *xstr,size_t needlen)
{
size_t curlen;
size_t newlen;
char *lhs;
lhs = xstr->xstr_lhs;
// get amount we're currently using
curlen = xstr->xstr_rhs - lhs;
// get amount we'll need after adding the whatever
newlen = curlen + needlen + 1;
// allocate more if we need it
if ((newlen + 1) >= xstr->xstr_maxlen) {
// allocate what we'll need plus a bit more so we're not called on
// each add operation
xstr->xstr_maxlen = newlen + 100;
// get more memory
lhs = realloc(lhs,xstr->xstr_maxlen);
xstr->xstr_lhs = lhs;
// adjust the append pointer
xstr->xstr_rhs = lhs + curlen;
}
}
// xstraddchar -- add character to string
void
xstraddchar(xstr_t *xstr,int chr)
{
// get more space in string buffer if we need it
xstrgrow(xstr,1);
// add the character
*xstr->xstr_rhs++ = chr;
// maintain the sentinel/EOS as we go along
*xstr->xstr_rhs = 0;
}
// xstraddstr -- add string to string
void
xstraddstr(xstr_t *xstr,const char *str)
{
size_t len;
len = strlen(str);
// get more space in string buffer if we need it
xstrgrow(xstr,len);
// add the string
memcpy(xstr->xstr_rhs,str,len);
*xstr->xstr_rhs += len;
// maintain the sentinel/EOS as we go along
*xstr->xstr_rhs = 0;
}
// xstrcstr -- get the "c string" value
char *
xstrcstr(xstr_t *xstr,int chr)
{
return xstr->xstr_lhs;
}
// xstrfree -- release string buffer data
void
xstrfree(xstr_t *xstr)
{
char *lhs;
lhs = xstr->xstr_lhs;
if (lhs != NULL)
free(lhs);
xstrinit(xstr);
}
Recommendations
Before you try to "get around" a "c string", embrace it. You'll encounter it in many places. It's unavoidable.
Learn how to manipulate pointers as easily as index variables. They're more flexible and [once you get the hang of them] easier to use. I've seen code written by programmers who didn't learn this and their code is always more complex than it needs to be [and usually full of bugs that I've needed to fix].
Good commenting is important in any language but, perhaps, more so in C than Java for certain things.
Always compile with -Wall -Werror and fix any warnings. You have been warned :-)
I'd play around a bit with the myfnc examples I gave you. This can help.
Get a firm grasp of the basics before you ...
And now, a word about C++ ...
Most of the above was about architecture, memory layout, and C. All of that still applies to C++.
C++ does do a more limited reclamation of stack variables when the function returns and they go out of scope. This has its pluses and minuses.
C++ has many classes to alleviate the tedium of common functions/idioms/boilerplate. It has the std standard template library. It also has boost. For example, std::string will probably do what you want. But, compare it against my xstr first.
But, once again, I wish to caution you. At your present level, work from the fundamentals, not around them.
Adding current_name[i] = 0; as described did not work for me.
Also, I got an error on isOpen as shown in the question.
Therefore, I freehanded a revised program beginning with the code presented in the question, and making adjustments until it worked properly given an input file having two rows of text in groups of three alpha characters that were delimited with " ; " without the quotes. That is, the delimiting code was space, semicolon, space. This code works.
Here is my code.
#define LINE 1000
int j = 0;
while (!file1.eof()) {
j++;
if( j > 20){break;} // back up escape for testing, in the event of an endless loop
char current_line[LINE];
//string current_name = ""; // see redefinition below
file1.getline(current_line, LINE, '\n');
stringstream ss(current_line); // stringstream works better in this case
while (!ss.eof()) {
string current_name;
ss >> current_name;
if (current_name != ";")
{
cout << current_name << endl;
} // End if(current_name....
} // End while (!ss.eof...
} // End while(!file1.eof() ...
file1.close();
cout << "Done \n";

How can I tokenize a c-style string from a file without making a copy?

Let's say I have a constant c-style string say
const char* msg = "fred,jim,345,7665";
I'd like to tokenize this and read out the individual fields but for performance reasons I don't want to make a copy. How can I do this?
Obviously strtok takes a non-constant pointer and boost::tokenizer is an option but I am unsure what is doing behind the scenes.
Inevitably you will require some copy of the string, even if it is a substring being copied.
If you have a strtok_r function, you can use that, but it will still require a mutable string to do its work. Beware, however, as not all systems provide the function (e.g. Windows), which is why I've provided an implementation here. It works by requiring an additional parameter: a pointer to a C string to save the address of the next match. This allows for it to be more reentrant (thread-safe) in theory. However, you'll still be mutating the value. You could modify it to suit your needs if you like, perhaps copying N bytes into a destination buffer and null-terminating that buffer to avoid the need to modify the source string.
/*
Usage:
char *tok;
char *savep;
tok = mystrtok_r (somestr, ",", &savep);
while (NULL != tok)
{
/* Do something with `tok'. */
tok = mystrtok_r (NULL, ",", &savep);
}
*/
char *
mystrtok_r (char *str, const char *delims, char **nextp)
{
if (str == NULL)
str = *nextp;
str += strspn (str, delims);
*nextp = str + strcspn (str, delims);
**nextp = 0;
if (*str == 0)
return NULL;
++*nextp;
return str;
}
It depends on how you're going to use it.
If you want to get the next token, and then the next (like an iteration over the string, then you only really need to copy the current token into memory.
long strtok2( char *strDest, const char *strSrc, const char cTok, long lOffset, long lMax)
{
if(lMax > 0)
{
strSrc += lOffset;
char * start = strDest;
while(--lMax && *strSrc != cTok && (*strDest++ = * strSrc++) );
*strDest = 0; //for when the token was found, not the null.
return strDest - start - 1; //the length of the token
}
return 0;
}
I snagged a simple strcpy from http://vijayinterviewquestions.blogspot.com.au/2007/07/implement-strcpy-function.html
const char* msg = "fred,jim,345,7665";
char * buffer[20];
long offset = 0;
while(length = strtok2(buffer, msg, ',', offset, 20))
{
cout << buffer;
offset += (length+1);
}
Well, without a little more detail it's hard to know exactly what you want. I'll guess you are parsing delimited items where consecutive delimiters should be treated as zero length tokens (which is usually correct for comma separated elements). I'm also assuming a blank line counts as a single zero length token. This is how I'd approach it:
const char *token_begin = msg;
int length;
for(;;)
{
length = 0;
while(!isDelimiter(token_begin[length])) //< must include \0 as delimiter
++length;
//..do something here with token. token is at: token_begin[0..length)
if ( token_begin[length] != 0 )
token_begin = &token_begin[length+1]; //skip beyond non-null delimiter
else
break; //token null terminated. exit
}
If you are going to store the tokens somewhere then a copy will be necessary in any case and strtok does this nicely by using the string a placing null terminating character inside it.
The only other option I see to avoid copying it is a lexer which reads the string and through a state machine produces tokens by scanning the string and storing the partial results in a buffer but every token should in any case be stored at least in a null terminated string to you are not really saving anything.
Here is my proposal, my code is structured and use a global variable pos(I know global variable are a bad practice but is only to give you the idea), you can replace it with a data member if you need OOP.
int position, messageLength;
char token[MAX]; // MAX = Value greater than the maximum length
// of the tokens(e.g. 1,000);
bool hasNext()
{
return position < messageLength;
}
char* next(const char* message)
{
int i = 0;
while (position < messageLength && message[position] != ',') {
token[i++] = message[position];
position++;
}
position++; // ',' found
token[i] = '\0';
return token;
}
int main(int argc, char **argv)
{
const char* msg = "fred,jim,345,7665";
position = 0;
messageLength = strlen(msg);
while (hasNext())
cout << next(msg) << endl;
return EXIT_SUCCESS;
}

How to tokenize (words) classifying punctuation as space

Based on this question which was closed rather quickly:
Trying to create a program to read a users input then break the array into seperate words are my pointers all valid?
Rather than closing I think some extra work could have gone into helping the OP to clarify the question.
The Question:
I want to tokenize user input and store the tokens into an array of words.
I want to use punctuation (.,-) as delimiter and thus removed it from the token stream.
In C I would use strtok() to break an array into tokens and then manually build an array.
Like this:
The main Function:
char **findwords(char *str);
int main()
{
int test;
char words[100]; //an array of chars to hold the string given by the user
char **word; //pointer to a list of words
int index = 0; //index of the current word we are printing
char c;
cout << "die monster !";
//a loop to place the charecters that the user put in into the array
do
{
c = getchar();
words[index] = c;
}
while (words[index] != '\n');
word = findwords(words);
while (word[index] != 0) //loop through the list of words until the end of the list
{
printf("%s\n", word[index]); // while the words are going through the list print them out
index ++; //move on to the next word
}
//free it from the list since it was dynamically allocated
free(word);
cin >> test;
return 0;
}
The line tokenizer:
char **findwords(char *str)
{
int size = 20; //original size of the list
char *newword; //pointer to the new word from strok
int index = 0; //our current location in words
char **words = (char **)malloc(sizeof(char *) * (size +1)); //this is the actual list of words
/* Get the initial word, and pass in the original string we want strtok() *
* to work on. Here, we are seperating words based on spaces, commas, *
* periods, and dashes. IE, if they are found, a new word is created. */
newword = strtok(str, " ,.-");
while (newword != 0) //create a loop that goes through the string until it gets to the end
{
if (index == size)
{
//if the string is larger than the array increase the maximum size of the array
size += 10;
//resize the array
char **words = (char **)malloc(sizeof(char *) * (size +1));
}
//asign words to its proper value
words[index] = newword;
//get the next word in the string
newword = strtok(0, " ,.-");
//increment the index to get to the next word
++index;
}
words[index] = 0;
return words;
}
Any comments on the above code would be appreciated.
But, additionally, what is the best technique for achieving this goal in C++?
Have a look at boost tokenizer for something that's much better in a C++ context than strtok().
Already covered by a lot of questions is how to tokenize a stream in C++.
Example: How to read a file and get words in C++
But what is harder to find is how get the same functionality as strtok():
Basically strtok() allows you to split the string on a whole bunch of user defined characters, while the C++ stream only allows you to use white space as a separator. Fortunately the definition of white space is defined by the locale so we can modify the locale to treat other characters as space and this will then allow us to tokenize the stream in a more natural fashion.
#include <locale>
#include <string>
#include <sstream>
#include <iostream>
// This is my facet that will treat the ,.- as space characters and thus ignore them.
class WordSplitterFacet: public std::ctype<char>
{
public:
typedef std::ctype<char> base;
typedef base::char_type char_type;
WordSplitterFacet(std::locale const& l)
: base(table)
{
std::ctype<char> const& defaultCType = std::use_facet<std::ctype<char> >(l);
// Copy the default value from the provided locale
static char data[256];
for(int loop = 0;loop < 256;++loop) { data[loop] = loop;}
defaultCType.is(data, data+256, table);
// Modifications to default to include extra space types.
table[','] |= base::space;
table['.'] |= base::space;
table['-'] |= base::space;
}
private:
base::mask table[256];
};
We can then use this facet in a local like this:
std::ctype<char>* wordSplitter(new WordSplitterFacet(std::locale()));
<stream>.imbue(std::locale(std::locale(), wordSplitter));
The next part of your question is how would I store these words in an array. Well, in C++ you would not. You would delegate this functionality to the std::vector/std::string. By reading your code you will see that your code is doing two major things in the same part of the code.
It is managing memory.
It is tokenizing the data.
There is basic principle Separation of Concerns where your code should only try and do one of two things. It should either do resource management (memory management in this case) or it should do business logic (tokenization of the data). By separating these into different parts of the code you make the code more generally easier to use and easier to write. Fortunately in this example all the resource management is already done by the std::vector/std::string thus allowing us to concentrate on the business logic.
As has been shown many times the easy way to tokenize a stream is using operator >> and a string. This will break the stream into words. You can then use iterators to automatically loop across the stream tokenizing the stream.
std::vector<std::string> data;
for(std::istream_iterator<std::string> loop(<stream>); loop != std::istream_iterator<std::string>(); ++loop)
{
// In here loop is an iterator that has tokenized the stream using the
// operator >> (which for std::string reads one space separated word.
data.push_back(*loop);
}
If we combine this with some standard algorithms to simplify the code.
std::copy(std::istream_iterator<std::string>(<stream>), std::istream_iterator<std::string>(), std::back_inserter(data));
Now combining all the above into a single application
int main()
{
// Create the facet.
std::ctype<char>* wordSplitter(new WordSplitterFacet(std::locale()));
// Here I am using a string stream.
// But any stream can be used. Note you must imbue a stream before it is used.
// Otherwise the imbue() will silently fail.
std::stringstream teststr;
teststr.imbue(std::locale(std::locale(), wordSplitter));
// Now that it is imbued we can use it.
// If this was a file stream then you could open it here.
teststr << "This, stri,plop";
cout << "die monster !";
std::vector<std::string> data;
std::copy(std::istream_iterator<std::string>(teststr), std::istream_iterator<std::string>(), std::back_inserter(data));
// Copy the array to cout one word per line
std::copy(data.begin(), data.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}