Split string by delimiter strtok weird behaviour

Split string by delimiter strtok weird behaviour - c++

I am trying to split string ,but unfortunately strtok behaves weirdly
I have following string get|user=password|23|info|hello I have tried widely used method using strtok, but unfortunately it treats = as delimiter and I cannot parse my string.
So get parsed correctly than parsed only user, but not user=password.
Please help to find the problem or suggest any other way to split the string.
I am programming for Arduino.
Thanks
Code
const char delimeter = '|';
char *token;
token = strtok(requestString, &delimeter);
// Handle parsed
token = strtok(NULL, &delimeter);

From cppreference,
delim - pointer to the null-terminated byte string identifying delimiters
The requirement that your approach doesn't fit is null terminated. You take the address of a single char, but clearly you cannot access anything past this one symbol. strtok, however, searches for \0 character which terminates the string. Thus you're entering undefined behaviour land.
Instead, use
const char* delimiter = "|";

Change this:
const char delimeter = '|';
to this:
const char * delimeter = "|"; // note the double quotes

Related

How to get multiple characters from a string

I am currently trying to learn c++, and I was informed that this website is a great place to start getting involved in.
I was just wondering if it were possible to retrieve multiple characters from a string rather then repeating multiple lines of code.
string lname = "";
char l = lname.at(0);
char a = lname.at(1);

A string is essentially a char array followed by a NULL character.
The string class reference guide will help you know what you can and cannot do with a string variable. You didn't provide enough details for us to thoroughly answer your question, but if you are looking for a substring, try using the ".substr" function in the string class.
For instance:
string tempString = "Hello, my name is brw59";
// at character 7 (starts at 0) print two characters
cout << tempString.substr(7, 2); // output == "my"

Why does a C++ string need a \0?

I was hoping that I could get some further explanation. I was told that I need to explicitly add \0 to the end of a string. Apparently this is for the C++ string class and that it is actually an array of characters that seems to be parsed under the hood. I was told that we must use the \0 in order to tell where the end of the string is as seen below:
int main()
{
char str[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
cout << str << endl;
return 0;
}
However, if I have a user input their name, for example, I don't believe that C++ automatically uses the \0 to terminate the string. So the argument that the \0 must be there to know where the string ends makes no sense. Why cant we use the .length() function to account for the length of the string?
I wrote the following program to illustrate that the length of the input can be found from the .length() function.
int main()
{
string firstName;
cout << "Enter your first name: ";
cin >> firstName;
cout << "First Name = " << firstName << endl;
cout << "String Length = " << firstName.length() << endl;
return 0;
}
So, if the user inputs the name "Tom". Then the output would be the following:
First Name = Tom
String Length = 3
I brought this to my professor's attention and also this article http://www.cplusplus.com/reference/string/string/length/
and I was told that is why I am in college because it cannot be done this way. Can any one offer any insight, since I don't understand what I am missing?

The "C string" was adopted into C++ from the C language. The C language did not have a string type. Strings in C were represented as an array of char, and the string was terminated with the NUL byte (\0). A plain string literal in C++ still has these semantics.
The C++ string type maintains the length within the object, as you say, so in a string, the NUL is not required. To get a "C string" from a string, you can use the c_str() method on the string. This is useful if you need to pass the contents of the C++ string to a function that only understands the NUL terminated variety.
std::string s("a string"); // s is initialized,
// the length is computed when \0 is encountered.
assert(s.size() == sizeof("a string")-1);
// sizeof string literal includes the \0
assert(s.c_str()[s.size()] == '\0');
// c_str() includes the \0
In your first program, you are initializing an array of char with an initializer list. The initialization is equivalent to the following:
char str[6] = "Hello";
This style of initializing an array of char is a special allowance that C++ provides since it is the syntax accepted by C.
In your second program, you are getting the name from the standard input. When C++ scans the input to populate the string argument, it essentially scans byte by byte until it encounters a separator (whitespace characters, by default). It may or may not insert a NUL byte at the end.

You're not missing anything per se. The null terminator is used on character arrays to indicate the end. However, the string class takes care of all of that for you. The length attribute is a perfectly acceptable way of doing it since you're using strings.
However, if you're using a character array, then yes, you would need to check if you're on the null terminator, as you may not know the length of your string.
The following will give you no issues.
int length = 2;
char str[] = "AB";
However, try the following, and you'll see some issues.
int length = 5;
char str[length + 1] = "ABCDE"; // +1 makes room for automatic \0
char str2[length + 1] = "ABC";
Try the second snipped using your for loop method knowing the length, and the first one will give you ABCDE, but the second one will give you "ABC" followed by one junk character. It's only one because you'll have [A][B][C][\0][JUNK] in your array. Make length larger and you'll see more junk.

How could I copy data that contain '\0' character

I'm trying to copy data that conatin '\0'. I'm using C++ .
When the result of the research was negative, I decide to write my own fonction to copy data from one char* to another char*. But it doesn't return the wanted result !
My attempt is the following :
#include <iostream>
char* my_strcpy( char* arr_out, char* arr_in, int bloc )
{
char* pc= arr_out;
for(size_t i=0;i<bloc;++i)
{
*arr_out++ = *arr_in++ ;
}
*arr_out = '\0';
return pc;
}
int main()
{
char * out= new char[20];
my_strcpy(out,"12345aa\0aaaaa AA",20);
std::cout<<"output data: "<< out << std::endl;
std::cout<< "the length of my output data: " << strlen(out)<<std::endl;
system("pause");
return 0;
}
the result is here:
I don't understand what is wrong with my code.
Thank you for help in advance.

Your my_strcpy is working fine, when you write a char* to cout or calc it's length with strlen they stop at \0 as per C string behaviour. By the way, you can use memcpy to copy a block of char regardless of \0.

If you know the length of the 'string' then use memcpy. Strcpy will halt its copy when it meets a string terminator, the \0. Memcpy will not, it will copy the \0 and anything that follows.

(Note: For any readers who are unaware that \0 is a single-character byte with value zero in string literals in C and C++, not to be confused with the \\0 expression that results in a two-byte sequence of an actual backslash followed by an actual zero in the string... I will direct you to Dr. Rebmu's explanation of how to split a string in C for further misinformation.)
C++ strings can maintain their length independent of any embedded \0. They copy their contents based on this length. The only thing is that the default constructor, when initialized with a C-string and no length, will be guided by the null terminator as to what you wanted the length to be.
To override this, you can pass in a length explicitly. Make sure the length is accurate, though. You have 17 bytes of data, and 18 if you want the null terminator in the string literal to make it into your string as part of the data.
#include <iostream>
using namespace std;
int main() {
string str ("12345aa\0aaaaa AA", 18);
string str2 = str;
cout << str;
cout << str2;
return 0;
}
(Try not to hardcode such lengths if you can avoid it. Note that you didn't count it right, and when I corrected another answer here they got it wrong as well. It's error prone.)
On my terminal that outputs:
12345aaaaaaa AA
12345aaaaaaa AA
But note that what you're doing here is actually streaming a 0 byte to the stdout. I'm not sure how formalized the behavior of different terminal standards are for dealing with that. Things outside of the printable range can be used for all kinds of purposes depending on the kind of terminal you're running... positioning the cursor on the screen, changing the color, etc. I wouldn't write out strings with embedded zeros like that unless I knew what the semantics were going to be on the stream receiving them.
Consider that if what you're dealing with are bytes, not to confuse the issue and to use a std::vector<char> instead. Many libraries offer alternatives, such as Qt's QByteArray

Your function is fine (except that you should pass to it 17 instead of 20). If you need to output null characters, one way is to convert the data to std::string:
std::string outStr(out, out + 17);
std::cout<< "output data: "<< outStr << std::endl;
std::cout<< "the length of my output data: " << outStr.length() <<std::endl;

I don't understand what is wrong with my code.
my_strcpy(out,"12345aa\0aaaaa AA",20);
Your string contains character '\' which is interpreted as escape sequence. To prevent this you have to duplicate backslash:
my_strcpy(out,"12345aa\\0aaaaa AA",20);
Test
output data: 12345aa\0aaaaa AA
the length of my output data: 18

Your string is already terminated midway.
my_strcpy(out,"12345aa\0aaaaa AA",20);
Why do you intend to have \0 in between like that? Have some other delimiter if yo so desire
Otherwise, since std::cout and strlen interpret a \0 as a string terminator, you get surprises.
What I mean is that follow the convention i.e. '\0' as string terminator

sscanf for this type of string

I'm not quite sure even after reading the documentation how to do this with sscanf.
Here is what I want to do:
given a string of text:
Read up to the first 64 chars or until space is reached
Then there will be a space, an = and then another space.
Following that I want to extract another string either until the end of the string or if 8192 chars are reached. I would also like it to change any occurrences in the second string of "\n" to the actual newline character.
I have: "%64s = %8192s" but I do not think this is correct.
Thanks
Ex:
element.name = hello\nworld
Would have string 1 with element.name and string2 as
hello
world

I do recommend std::regex for this, but apart from that, you should be fine with a little error checking:
#include <cstdio>
int main(int argc, const char *argv[])
{
char s1[65];
char s2[8193];
if (2!=std::scanf("%64s = %8192s", s1, s2))
puts("oops");
else
std::printf("s1 = '%s', s2 = '%s'\n", s1, s2);
return 0;
}

Your format string looks right to me; however, sscanf will not change occurences of "\n" to anything else. To do that you would then need to write a loop that uses strtok or even just a simple for loop evaluating each character in the string and swapping it for whatever character you prefer. You will also need to evaluate the sscanf return value to determine if the 2 strings were indeed scanned correctly. sscanf returns the number of field successfully scanned according to your format string.
#sehe shows the correct usage of sscanf including the check for the proper return value.

Split a wstring by specified separator

I have a std::wstring variable that contains a text and I need to split it by separator. How could I do this? I wouldn't use boost that generate some warnings. Thank you
EDIT 1
this is an example text:
hi how are you?
and this is the code:
typedef boost::tokenizer<boost::char_separator<wchar_t>, std::wstring::const_iterator, std::wstring> Tok;
boost::char_separator<wchar_t> sep;
Tok tok(this->m_inputText, sep);
for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
{
cout << *tok_iter;
}
the results are:
hi
how
are
you
?
I don't understand why the last character is always splitted in another token...

In your code, question mark appears on a separate line because that's how boost::tokenizer works by default.
If your desired output is four tokens ("hi", "how", "are", and "you?"), you could
a) change char_separator you're using to
boost::char_separator<wchar_t> sep(L" ", L"");
b) use boost::split which, I think, is the most direct answer to "split a wstring by specified character"
#include <string>
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
int main()
{
std::wstring m_inputText = L"hi how are you?";
std::vector<std::wstring> tok;
split(tok, m_inputText, boost::is_any_of(L" "));
for(std::vector<std::wstring>::iterator tok_iter = tok.begin();
tok_iter != tok.end(); ++tok_iter)
{
std::wcout << *tok_iter << '\n';
}
}
test run: https://ideone.com/jOeH9

You're default constructing boost::char_separator. The documentation says:
The function std::isspace() is used to identify dropped delimiters and std::ispunct() is used to identify kept delimiters. In addition, empty tokens are dropped.
Since std::ispunct(L'?') is true, it is treated as a "kept" delimiter, and reported as a separate token.

Hi you can use wcstok function

You said you don't want boost so...
This is maybe a wierd approach to use in C++ but I used it one in a MUD where i needed a lot of tokenization in C.
take this block of memory assigned to the char * chars:
char chars[] = "I like to fiddle with memory";
If you need to tokenize on a space character:
create array of char* called splitvalues big enough to store all tokens
while not increment pointer chars and compare value to '\0'
if not already set set address of splitvalues[counter] to current memory address - 1
if value is ' ' write 0 there
increment counter
when you finish you have the original string destroyed so do not use it, instead you have the array of strings pointing to the tokens. the count of tokens is the counter variable (upperbound of the array).
the approach is this:
iterate the string and on first occurence update token start pointer
convert the char you need to split on to zeroes that mean string termination in C
count how many times you did this
PS. Not sure if you can use a similar approach in a unicode environment tough.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Split string by delimiter strtok weird behaviour - c++

Change this: const char delimeter = '|'; to this: const char * delimeter = "|"; // note the double quotes

Related

How to get multiple characters from a string

Why does a C++ string need a \0?

How could I copy data that contain '\0' character

sscanf for this type of string

Split a wstring by specified separator

Categories

Resources