C++ splitting string using strtok ignores the last entry [duplicate]

C++ splitting string using strtok ignores the last entry [duplicate] - c++

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 3 years ago.
I have the following code that is suppose to split a string and inserting it into an array :
char *t;
char *tmpLine = (char *)line.c_str();
t = strtok(tmpLine, "\t");
int counter = 0;
while(t != NULL) {
tempGrade[counter] = atoi(t);
counter++;
t = strtok(NULL, "\t");
}
but for some reason the last entry of line is ignored and not inserted. also line is :
string line = "1 90 74 84 48 76 76 80 85";
NOTE : the spaces are tabspaces (\t) in the original file.

Your code is so wrong it burns my eyes.
What is so wrong?
char *tmpLine = (char *)line.c_str();
First, you are (unless I'm wrong) casting away the constness of a std::string, which is a very bad idea. Anyway, casting away the constness smells of modification attempt...
t = strtok(tmpLine, "\t");
And here we are...
You are modifying the buffer provided by the std::string, as strtok destroys the string given to it.
Bad idea: You are not the owner of the std::string's internal buffer, so you should not modify it (you don't want to break your string, and provoke a bug, want you?),
Back to the code:
t = strtok(tmpLine, "\t");
Ok, so, you're using strtok, which is not a reentrant function.
I don't know what compiler you're using, but I guess most would have a more safe (i.e. less stupid) alternative. For example, Visual C++ provides strtok_s. If you don't have one provided, then the best solution is to write your own strtok, or use another tokenization API.
In fact, strtok is one of the few counter-examples of "don't reinvent the wheel": In that case, rewriting your own will always be better than the original standard C function, if you have some experience in C or C++.
Ok, but, about the original problem?
Others have provided insight about alternatives, or even the fact the incomplete sample of code you provided seemed complete, so I'll stop here my analysis, and let you consider their answers.
I don't know what your code is for, but it is plain wrong, and would be even if the bug you reported (the missing token) didn't exist. Don't use that on production code.

but can you give me a quick hack around this? because this is not an important project and I would have to change alot to do it like in that question
No, you don't. This does the same thing assuming tempGrade is an int array:
istringstream iss(line);
int counter = copy(istream_iterator<int>(iss),
istream_iterator<int>(),
tempGrade) - tempGrade;
But it would be better to change tempGrade to vector and use the code from the answer Moo-Juice linked to.

As other people mention, I suggest you take the recommended C++ approach to this problem.
But I tried out your code, and I can't tell what is wrong with it. I don't repro your issue on my end:
1
90
74
84
48
76
76
80
85
Perhaps your iteration to print out the loop is too short, or your tempGrade array is too small? Or perhaps your final "tab" isn't really a tab character?
Here's the code I compiled to check this code.
#include<iostream>
int main(int argc, char* argv[])
{
std::string line = "1\t90\t74\t84\t48\t76\t76\t80\t85";
char *t;
char *tmpLine = (char *)line.c_str();
t = strtok(tmpLine, "\t");
int counter = 0;
int tempGrade[9];
while(t != NULL) {
tempGrade[counter] = atoi(t);
counter++;
t = strtok(NULL, "\t");
}
for(int i = 0; i < 9; ++i) {
std::cout << tempGrade[i] << "\n";
}
}

Here is a very simple example showing how to make use of stream extraction operators and vector to do this in a more idiomatic C++ manner:
#include <string>
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
void parse(string line)
{
stringstream stream(line);
int score;
vector<int> scores;
while (stream >> score)
scores.push_back(score);
for (vector<int>::iterator itr = scores.begin(); itr != scores.end(); ++itr)
cout << *itr << endl;
}
int main(int argc, char* argv[])
{
parse("1\t90\t74\t84\t48\t76\t76\t80\t85");
return 0;
}

Related

while loop inside for loop

I read this sample code in a book. I can't figure out why this part of the following sample code's function declaration is necessary:
while (i <= n)
p[i++] = '\0'; // set rest of string to '\0'
Here is the whole code:
#include <iostream>
const int ArSize = 80;
char * left(const char * str, int n = 1);
int main()
{
using namespace std;
char sample[ArSize];
cout << "Enter a string:\n";
cin.get(sample,ArSize);
char *ps = left(sample, 4);
cout << ps << endl;
delete [] ps; // free old string
ps = left(sample);
cout << ps << endl;
delete [] ps; // free new string
return 0;
}
// This function returns a pointer to a new string
// consisting of the first n characters in the str string.
char * left(const char * str, int n)
{
if(n < 0)
n = 0;
char * p = new char[n+1];
int i;
for (i = 0; i < n && str[i]; i++)
p[i] = str[i]; // copy characters
while (i <= n)
p[i++] = '\0'; // set rest of string to '\0'
return p;
}
I ran the code after I erased it and there was no problem.

The loop is unnecessary. Null-terminated strings end at the first null byte. If more memory was allocated than the actual string needs, it does not matter what’s in those extra bytes. All non-broken C-string handling code stops at the first null terminator. All that’s required is a single
p[i] = '\0';
after the for loop. However, that one null byte is mandatory. C-string functions depend on it and will happily overrun the allocated memory if it’s missing. Essentially they’ll (try to) keep going until they stumble upon the next null byte in memory. If that is past the allocated memory it causes undefined behaviour, resulting in a crash if you’re lucky; or corrupted data if you’re less lucky.
That said: Throw away that book yesterday. The code is a catastrophe from the first to the last line. It barely qualifies as C++. Most of it is plain C. And even as C code it’s highly questionable.
Why to avoid using namespace std. #vol7ron pointed out in the comments that the major complaint is against using namespace std in headers. Here it’s used inside a function in a .cpp file, which lessens the impact significantly. Although in my opinion it is still worth avoiding. If you don’t know the implementation of your standard library in depth, you don’t really have an idea about all the symbols you pull into your scope. If you need it for readability, pulling in specific symbols (e.g. using std::cout;) is a better choice. Also, I’m confident I’m not alone in kind of expecting the std:: prefix. For example, std::string is what I expect to see. string looks slightly off. There’s always a lingering doubt that it might not be the std library string, but a custom string type. So, including the prefix can benefit readability as well.
Why all the C-string pain? We’ve had std::string for a while now …
Copying characters in a loop? Seriously? That’s what std::strcpy() is for.
Raw new and delete everywhere: error prone because you have to keep track of the new/delete pairs manually to avoid memory leaks.
Even worse: asymmetric owning raw pointers. left() allocates and returns a pointer; and it’s the caller’s responsibility to delete it. It doesn’t get more error prone than that.
… And these are only the problems that stick out on first glance.
What that piece of code should look like:
#include <iostream>
#include <string>
std::string left(const std::string& str, std::size_t len = 1);
int main()
{
// getline can fail. If that happens we get an empty string.
std::string sample;
std::getline(std::cin, sample);
auto ps = left(sample, 4);
std::cout << ps << '\n';
ps = left(sample);
std::cout << ps << '\n';
return 0;
}
// `len` may be longer than the string. In that case a copy
// of the complete input string is returned.
std::string left(const std::string& str, std::size_t len)
{
return str.substr(0, len);
}

C++ Put string in array

I keep seeing similar questions to mine, however, I can't seem to find one that helps my situation. Honestly, it seems like such a mundane question, I shouldn't be asking it, but here I am 2 weeks latter, still with no answer.
{
string word;
ArrayWithWords[d] = word;
d++;
}
Every time this loop runs, I want to put word in position d of the array. Other examples I've found only turn the string into char*.
The array will be used more than once and having a solid value, if that's what it's called, is far more preferred. I'd like to avoid using a pointer.

Just use a vector of strings.
#include <string>
#include <vector>
int main()
{
std::vector<std::string> ArrayWithWords(10);
size_t d = 5; // something between 0 and 9
std::string word;
ArrayWithWords[d] = word;
d++;
}

Pretty much the same thing that was just posted but a little bit more old school.
#include <string>
using namespace std;
int main()
{
string stringArray[10];
string word;
word = "hello";
for (int i = 0; i < 10; i++)
{
stringArray[i] = string(word);
}
}

Remove character from array where spaces and punctuation marks are found [duplicate]

This question already has answers here:
C++ Remove punctuation from String
(12 answers)
Closed 9 years ago.
In my program, I am checking whole cstring, if any spaces or punctuation marks are found, just add empty character to that location but the complilor is giving me an error: empty character constant.
Please help me out, in my loop i am checking like this
if(ispunct(str1[start])) {
str1[start]=''; // << empty character constant.
}
if(isspace(str1[start])) {
str1[start]=''; // << empty character constant.
}
This is where my errors are please correct me.
for eg the word is str,, ing, output should be string.

There is no such thing as an empty character.
If you mean a space then change '' to ' ' (with a space in it).
If you mean NUL then change it to '\0'.

Edit: the answer is no longer relevant now that the OP has edited the question. Leaving up for posterity's sake.
If you're wanting to add a null character, use '\0'. If you're wanting to use a different character, using the appropriate character for that. You can't assign it nothing. That's meaningless. That's like saying
int myHexInt = 0x;
or
long long myIndeger = L;
The compiler will error. Put in the value you wanted. In the char case, that's a value from 0 to 255.

UPDATE:
From the edit to OP's question, it's apparent that he/she wanted to trim a string of punctuation and space characters.
As detailed in the flagged possible duplicate, one way is to use remove_copy_if:
string test = "THisisa test;;';';';";
string temp, finalresult;
remove_copy_if(test.begin(), test.end(), std::back_inserter(temp), ptr_fun<int, int>(&ispunct));
remove_copy_if(temp.begin(), temp.end(), std::back_inserter(finalresult), ptr_fun<int, int>(&isspace));
ORIGINAL
Examining your question, replacing spaces with spaces is redundant, so you really need to figure out how to replace punctuation characters with spaces. You can do so using a comparison function (by wrapping std::ispunct) in tandem with std::replace_if from the STL:
#include <string>
#include <algorithm>
#include <iostream>
#include <cctype>
using namespace std;
bool is_punct(const char& c) {
return ispunct(c);
}
int main() {
string test = "THisisa test;;';';';";
char test2[] = "THisisa test;;';';'; another";
size_t size = sizeof(test2)/sizeof(test2[0]);
replace_if(test.begin(), test.end(), is_punct, ' ');//for C++ strings
replace_if(&test2[0], &test2[size-1], is_punct, ' ');//for c-strings
cout << test << endl;
cout << test2 << endl;
}
This outputs:
THisisa test
THisisa test another

Try this (as you asked for cstring explicitly):
char str1[100] = "str,, ing";
if(ispunct(str1[start]) || isspace(str1[start])) {
strncpy(str1 + start, str1 + start + 1, strlen(str1) - start + 1);
}
Well, doing this just in pure c language, there are more efficient solutions (have a look at #MichaelPlotke's answer for details).
But as you also explicitly ask for c++, I'd recommend a solution as follows:
Note you can use the standard c++ algorithms for 'plain' c-style character arrays also. You just have to place your predicate conditions for removal into a small helper functor and use it with the std::remove_if() algorithm:
struct is_char_category_in_question {
bool operator()(const char& c) const;
};
And later use it like:
#include <string>
#include <algorithm>
#include <iostream>
#include <cctype>
#include <cstring>
// Best chance to have the predicate elided to be inlined, when writing
// the functor like this:
struct is_char_category_in_question {
bool operator()(const char& c) const {
return std::ispunct(c) || std::isspace(c);
}
};
int main() {
static char str1[100] = "str,, ing";
size_t size = strlen(str1);
// Using std::remove_if() is likely to provide the best balance from perfor-
// mance and code size efficiency you can expect from your compiler
// implementation.
std::remove_if(&str1[0], &str1[size + 1], is_char_category_in_question());
// Regarding specification of the range definitions end of the above state-
// ment, note we have to add 1 to the strlen() calculated size, to catch the
// closing `\0` character of the c-style string being copied correctly and
// terminate the result as well!
std::cout << str1 << endl; // Prints: string
}
See this compilable and working sample also here.

As I don't like the accepted answer, here's mine:
#include <stdio.h>
#include <string.h>
#include <cctype>
int main() {
char str[100] = "str,, ing";
int bad = 0;
int cur = 0;
while (str[cur] != '\0') {
if (bad < cur && !ispunct(str[cur]) && !isspace(str[cur])) {
str[bad] = str[cur];
}
if (ispunct(str[cur]) || isspace(str[cur])) {
cur++;
}
else {
cur++;
bad++;
}
}
str[bad] = '\0';
fprintf(stdout, "cur = %d; bad = %d; str = %s\n", cur, bad, str);
return 0;
}
Which outputs cur = 18; bad = 14; str = string
This has the advantage of being more efficient and more readable, hm, well, in a style I happen to like better (see comments for a lengthy debate / explanation).

strchr not working with char[]

I am working on ROT13 for c++ practice. however this bit of code here returns an error and fails to compile, i do not understand why! I am posting a snippet of code in the following lines
string encode(string &x)
{
char alphabet[] = "abcdefghijklmnopqrstuvwxyz";
for (size_t l=0;l<x.size();++l){
cout<<x[l];
cout<< strchr(alphabet,x[l]);
}
return x;
}
Q2. Also help me return the index of the matching letter from alphabet[] (e.g.,5 for 'f') to which i can add 13 and append that to x and so on ..
Q3. Besides practice, which course in CS would help me develop more efficient algorithms? Is it theory of computation, discrete mathematics, or algorithms ?

In order, starting with question 1:
The following compiles fine for me:
#include <iostream>
#include <cstring>
std::string encode(std::string &x)
{
char alphabet[] = "abcdefghijklmnopqrstuvwxyz";
char *ptr;
for (size_t l=0;l<x.size();++l){
std::cout<<x[l];
std::cout<< std::strchr(alphabet,x[l]);
}
return x;
}
int main (int argc, char* argv []) {
return 0;
}
Make sure:
you include the headers given, for cout and strchr.
use std:: prefixes unless you're using the std namespace.
fix that ptr problem.
Question 2:
If you're looking for a handy ROT-13 method, consider using two C strings, one for the source and one for the translation:
char from[] = "abcdefghijklmnopqrstuvwxyz";
char to [] = "nopqrstuvwxyzabcdefghijklm";
Then you can use strchr to look it up in the first one and use that pointer to find the equivalent in the second.
char src = 'j';
char *p = strchr (from, src);
if (p == NULL)
std::cout << src;
else
std::cout << to[p - from];
That would output the character as-is if it wasn't found or look up the translation if it was found. You may also want to put the capital letters in there as well.
Question 3:
If you want to learn about efficient algorithms, I'd go for, surprisingly enough, an algorithms course :-)
Theory of computation sounds a little dry, though it may well cover the theoretical basis behind algorithms. Discrete mathematics has applicability to algorithms but, again, it's probably very theoretical. That's all based on what the words mean, of course, the actual subject areas covered may be totally different, so you should probably take it up with the people offering the courses.
Extra bit:
If you're looking for something to compare your own work to, here's one I put together based on my suggestions above:
#include <iostream>
#include <cstring>
std::string rot13 (std::string x)
{
char from[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
char to [] = "nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM";
std::string retstr = "";
for (size_t i = 0; i < x.size(); ++i) {
char *p = std::strchr (from, x[i]);
if (p == 0)
retstr += x[i];
else
retstr += to[p - from];
}
return retstr;
}
int main (int argc, char* argv []) {
std::string one = "This string contains 47 and 53.";
std::string two = rot13 (one);
std::string three = rot13 (two);
std::cout << one << '\n';
std::cout << two << '\n';
std::cout << three << '\n';
return 0;
}
The building of the return string could have probably been done more efficiently (such as a new'ed character array which becomes a string only at the end) but it illustrates the "lookup" part of the method well.
The output is:
This string contains 47 and 53.
Guvf fgevat pbagnvaf 47 naq 53.
This string contains 47 and 53.
which you can verify here, if necessary.

Cast alphabet to a const char*, it should work afterwards. Keep in mind that type[] is different from type *.

Finding all occurrences of a character in a string

I have comma delimited strings I need to pull values from. The problem is these strings will never be a fixed size. So I decided to iterate through the groups of commas and read what is in between. In order to do that I made a function that returns every occurrence's position in a sample string.
Is this a smart way to do it? Is this considered bad code?
#include <string>
#include <iostream>
#include <vector>
#include <Windows.h>
using namespace std;
vector<int> findLocation(string sample, char findIt);
int main()
{
string test = "19,,112456.0,a,34656";
char findIt = ',';
vector<int> results = findLocation(test,findIt);
return 0;
}
vector<int> findLocation(string sample, char findIt)
{
vector<int> characterLocations;
for(int i =0; i < sample.size(); i++)
if(sample[i] == findIt)
characterLocations.push_back(sample[i]);
return characterLocations;
}

vector<int> findLocation(string sample, char findIt)
{
vector<int> characterLocations;
for(int i =0; i < sample.size(); i++)
if(sample[i] == findIt)
characterLocations.push_back(sample[i]);
return characterLocations;
}
As currently written, this will simply return a vector containing the int representations of the characters themselves, not their positions, which is what you really want, if I read your question correctly.
Replace this line:
characterLocations.push_back(sample[i]);
with this line:
characterLocations.push_back(i);
And that should give you the vector you want.

If I were reviewing this, I would see this and assume that what you're really trying to do is tokenize a string, and there's already good ways to do that.
Best way I've seen to do this is with boost::tokenizer. It lets you specify how the string is delimited and then gives you a nice iterator interface to iterate through each value.
using namespace boost;
string sample = "Hello,My,Name,Is,Doug";
escaped_list_seperator<char> sep("" /*escape char*/, ","/*seperator*/, "" /*quotes*/)
tokenizer<escaped_list_seperator<char> > myTokens(sample, sep)
//iterate through the contents
for (tokenizer<escaped_list_seperator<char>>::iterator iter = myTokens.begin();
iter != myTokens.end();
++iter)
{
std::cout << *iter << std::endl;
}
Output:
Hello
My
Name
Is
Doug
Edit If you don't want a dependency on boost, you can also use getline with an istringstream as in this answer. To copy somewhat from that answer:
std::string str = "Hello,My,Name,Is,Doug";
std::istringstream stream(str);
std::string tok1;
while (stream)
{
std::getline(stream, tok1, ',');
std::cout << tok1 << std::endl;
}
Output:
Hello
My
Name
Is
Doug
This may not be directly what you're asking but I think it gets at your overall problem you're trying to solve.

Looks good to me too, one comment is with the naming of your variables and types. You call the vector you are going to return characterLocations which is of type int when really you are pushing back the character itself (which is type char) not its location. I am not sure what the greater application is for, but I think it would make more sense to pass back the locations. Or do a more cookie cutter string tokenize.

Well if your purpose is to find the indices of occurrences the following code will be more efficient as in c++ giving objects as parameters causes the objects to be copied which is insecure and also less efficient. Especially returning a vector is the worst possible practice in this case that's why giving it as a argument reference will be much better.
#include <string>
#include <iostream>
#include <vector>
#include <Windows.h>
using namespace std;
vector<int> findLocation(string sample, char findIt);
int main()
{
string test = "19,,112456.0,a,34656";
char findIt = ',';
vector<int> results;
findLocation(test,findIt, results);
return 0;
}
void findLocation(const string& sample, const char findIt, vector<int>& resultList)
{
const int sz = sample.size();
for(int i =0; i < sz; i++)
{
if(sample[i] == findIt)
{
resultList.push_back(i);
}
}
}

How smart it is also depends on what you do with those subtstrings delimited with commas. In some cases it may be better (e.g. faster, with smaller memory requirements) to avoid searching and splitting and just parse and process the string at the same time, possibly using a state machine.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js