Split Function c++ [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Splitting a string in C++
I need a function of split.
has to work like this:
buffer = split(str, ' ');
I searchead a split functions, tryed boost libs, and all works bad :/

strtok() from standard c library is pretty good and does what you are looking for. Unless you are keen on using it from multiple threads and worried about function not being re entrant which i don't suspect is the case here.
P.S. Above assumes you have a character array as input. If it was a c++ string, still you can use string.c_str to get the c string before using strtok

The boost lib is supposed to work as well.
Use it like so:
vector <string> buffer;
boost::split(buffer, str_to_split, boost::is_any_of(" "));
Added:
Make sure to include the algorithm:
#include <boost/algorithm/string.hpp>
Print it to the std::cout like so:
vector<string>::size_type sz = buffer.size();
cout << "buffer contains:";
for (unsigned i=0; i<sz; i++)
cout << ' ' << buffer[i];
cout << '\n';

I guess strtok() is what you're looking for.
It allows you to always return the first sub string delimited by given character(s):
char *string = "Hello World!";
char *part = strtok(string, " "); // passing a string starts a new iteration
while (part) {
// do something with part
part = strtok(NULL, " "); // passing NULL continues with the last string
}
Note that this version must not be used in several threads at once (there's also a version (strtok_s(), more details here) which has an additional parameter to make it work in a parallelized environment). This is also true for cases where you'd like to split a substring within a loop.

Related

Boost xpressive regex results in garbage character

I am trying to write some code that changes a string like "/path/file.extension" to another specified extension. I am trying to use boost::xpressive to do so. But, I am having problems. It appears that a garbage character appears in the output:
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
using namespace std;
int main()
{
std::string str( "xml.xml.xml.xml");
sregex date = sregex::compile( "(\\.*)(\\.xml)$");
std::string format( "\1.zipxml");
std::string str2 = regex_replace( str, date, format );
std::cout << "str = " << str << "\n";
std::cout << "str2 = " << str2 << "\n";
return 0;
}
Now compile and run it:
[bitdiot#kantpute foodir]$ g++ badregex.cpp
[bitdiot#kantpute foodir]$ ./a.out > output
[bitdiot#kantpute foodir]$ less output
[bitdiot#kantpute foodir]$ cat -vte output
str = xml.xml.xml.xml$
str2 = xml.xml.xml^A.zipxml$
In the above example, I redirect output to a file, and use cat to print out the non-printable character. Notice the ctrl-A in the str2.
Anyways, am I using boost libraries incorrectly? Is this a boost bug? Is there another regular expression I can use that can allow me to string replace the ".tail" with some other string? (It's fix in my example.)
thanks.
At least as I'm reading things, the culprit is right here: std::string format( "\1.zipxml");.
You forgot to escape the backslash, so \1 is giving you a control-A. You almost certainly want \\1.
Alternatively (if your compiler is new enough) you could use a raw string instead, so it would be something like: R"(\1.zipxml)", and you wouldn't have to escape your backslashes. I probably wouldn't bother to mention this, except for the fact that if you're writing REs in C++ strings, raw strings are pretty much your new best friend (IMO, anyway).
As Jerry Coffin pointed out to me. It was a stupid mistake on my part.
The errant code is the following:
std::string format( "\1.zipxml");
This should be replaced with:
std::string format( "$1.zipxml");
Thanks for your help everyone.

Simple Sentence Reverser in C++

I'm trying to build a program to solve a problem in a text book I bought recently and it's just driving me crazy.
I have to built a sentence reverser so I get the following:
Input = "Do or do not, there is no try."
Output = "try. no is there not, do or Do"
Here's what I've got so far:
void ReverseString::reversalOperation(char str[]) {
char* buffer;
int stringReadPos, wordReadPos, writePos = 0;
// Position of the last character is length -1
stringReadPos = strlen(str) - 1;
buffer = new char[stringReadPos+1];
while (stringReadPos >= 0) {
if (str[stringReadPos] == ' ') {
wordReadPos = stringReadPos + 1;
buffer[writePos++] = str[stringReadPos--];
while (str[wordReadPos] != ' ') {
buffer[writePos] = str[wordReadPos];
writePos++;
wordReadPos++;
}
} else {
stringReadPos--;
}
}
cout << str << endl;
cout << buffer << endl;
}
I was sure I was on the right track but all I get for an output is the very first word ("try.") I've been staring at this code so long I can't make any headway. Initially I was checking in the inner while look for a '/0' character as well but it didn't seem to like that so I took it out.
Unless you're feeling masochistic, throw your existing code away, and start with std::vector and std::string (preferably an std::vector<std::string>). Add in std::copy with the vector's rbegin and rend, and you're pretty much done.
This is utter easy in C++, with help from the standard library:
std::vector< std::string > sentence;
std::istringstream input( str );
// copy each word from input to sentence
std::copy(
(std::istream_iterator< std::string >( input )), std::istream_iterator< std::string >()
, std::back_inserter( sentence )
);
// print to cout sentence in reverse order, separated by space
std::copy(
sentence.rbegin(), sentence.rend()
, (std::ostream_iterator< std::string >( std::cout, " " ))
);
In the interest of science, I tried to make your code work as is. Yeah, it's not really the C++ way to do things, but instructive nonetheless.
Of course this is only one of a million ways to get the job done. I'll leave it as an exercise for you to remove the trailing space this code leaves in the output ;)
I commented my changes with "EDIT".
char* buffer;
int stringReadPos, wordReadPos, writePos = 0;
// Position of the last character is length -1
stringReadPos = strlen(str) - 1;
buffer = new char[stringReadPos+1];
while (stringReadPos >= 0) {
if ((str[stringReadPos] == ' ')
|| (stringReadPos == 0)) // EDIT: Need to check for hitting the beginning of the string
{
wordReadPos = stringReadPos + (stringReadPos ? 1 : 0); // EDIT: In the case we hit the beginning of the string, don't skip past the space
//buffer[writePos++] = str[stringReadPos--]; // EDIT: This is just to grab the space - don't need it here
while ((str[wordReadPos] != ' ')
&& (str[wordReadPos] != '\0')) // EDIT: Need to check for hitting the end of the string
{
buffer[writePos] = str[wordReadPos];
writePos++;
wordReadPos++;
}
buffer[writePos++] = ' '; // EDIT: Add a space after words
}
stringReadPos--; // EDIT: Decrement the read pos every time
}
buffer[writePos] = '\0'; // EDIT: nul-terminate the string
cout << str << endl;
cout << buffer << endl;
I see the following errors in your code:
the last char of buffer is not set to 0 (this will cause a failure in cout<
in the inner loop you have to check for str[wordReadPos] != ' ' && str[wordReadPos] != 0 otherwise while scanning the first word it will never find the terminating space
Since you are using a char array, you can use C string library. It will be much easier if you use strtok: http://www.cplusplus.com/reference/clibrary/cstring/strtok/
It will require pointer use, but it will make your life much easier. Your delimiter will be " ".
What where the problems with your code and what are more cplusplusish ways of doing is yet well written. I would, however, like to add that the methodology
write a function/program to implement algorithm;
see if it works;
if it doesn't, look at code until you get where the problem is
is not too productive. What can help you resolve this problem here and many other problems in the future is the debugger (and poor man's debugger printf). It will make you able to see how your program actually works in steps, what happens to the data etc. In other words, you will be able to see which parts of it works as you expect and which behaves differently. If you're on *nix, don't hesitate to try gdb.
Here is a more C++ version. Though I think the simplicity is more important than style in this instance. The basic algorithm is simple enough, reverse the words then reverse the whole string.
You could write C code that was just as evident as the C++ version. I don't think it's necessarily wrong to write code that isn't ostentatiously C++ here.
void word_reverse(std::string &val) {
size_t b = 0;
for (size_t i = 0; i < val.size(); i++) {
if (val[i] == ' ') {
std::reverse(&val[b], &val[b]+(i - b));
b = ++i;
}
}
std::reverse(&val[b], &val[b]+(val.size() - b));
std::reverse(&val[0], &val[0]+val.size());
}
TEST(basic) {
std::string o = "Do or do not, there is no try.";
std::string e = "try. no is there not, do or Do";
std::string a = o;
word_reverse(a);
CHECK_EQUAL( e , a );
}
Having a multiple, leading, or trailing spaces may be degenerate cases depending on how you actually want them to behave.

Split a wstring by specified separator

I have a std::wstring variable that contains a text and I need to split it by separator. How could I do this? I wouldn't use boost that generate some warnings. Thank you
EDIT 1
this is an example text:
hi how are you?
and this is the code:
typedef boost::tokenizer<boost::char_separator<wchar_t>, std::wstring::const_iterator, std::wstring> Tok;
boost::char_separator<wchar_t> sep;
Tok tok(this->m_inputText, sep);
for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
{
cout << *tok_iter;
}
the results are:
hi
how
are
you
?
I don't understand why the last character is always splitted in another token...
In your code, question mark appears on a separate line because that's how boost::tokenizer works by default.
If your desired output is four tokens ("hi", "how", "are", and "you?"), you could
a) change char_separator you're using to
boost::char_separator<wchar_t> sep(L" ", L"");
b) use boost::split which, I think, is the most direct answer to "split a wstring by specified character"
#include <string>
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
int main()
{
std::wstring m_inputText = L"hi how are you?";
std::vector<std::wstring> tok;
split(tok, m_inputText, boost::is_any_of(L" "));
for(std::vector<std::wstring>::iterator tok_iter = tok.begin();
tok_iter != tok.end(); ++tok_iter)
{
std::wcout << *tok_iter << '\n';
}
}
test run: https://ideone.com/jOeH9
You're default constructing boost::char_separator. The documentation says:
The function std::isspace() is used to identify dropped delimiters and std::ispunct() is used to identify kept delimiters. In addition, empty tokens are dropped.
Since std::ispunct(L'?') is true, it is treated as a "kept" delimiter, and reported as a separate token.
Hi you can use wcstok function
You said you don't want boost so...
This is maybe a wierd approach to use in C++ but I used it one in a MUD where i needed a lot of tokenization in C.
take this block of memory assigned to the char * chars:
char chars[] = "I like to fiddle with memory";
If you need to tokenize on a space character:
create array of char* called splitvalues big enough to store all tokens
while not increment pointer chars and compare value to '\0'
if not already set set address of splitvalues[counter] to current memory address - 1
if value is ' ' write 0 there
increment counter
when you finish you have the original string destroyed so do not use it, instead you have the array of strings pointing to the tokens. the count of tokens is the counter variable (upperbound of the array).
the approach is this:
iterate the string and on first occurence update token start pointer
convert the char you need to split on to zeroes that mean string termination in C
count how many times you did this
PS. Not sure if you can use a similar approach in a unicode environment tough.

C++ - string.compare issues when output to text file is different to console output?

I'm trying to find out if two strings I have are the same, for the purpose of unit testing. The first is a predefined string, hard-coded into the program. The second is a read in from a text file with an ifstream using std::getline(), and then taken as a substring. Both values are stored as C++ strings.
When I output both of the strings to the console using cout for testing, they both appear to be identical:
ThisIsATestStringOutputtedToAFile
ThisIsATestStringOutputtedToAFile
However, the string.compare returns stating they are not equal. When outputting to a text file, the two strings appear as follows:
ThisIsATestStringOutputtedToAFile
T^#h^#i^#s^#I^#s^#A^#T^#e^#s^#t^#S^#t^#r^#i^#n^#g^#O^#u^#t^#p^#u^#t^#
t^#e^#d^#T^#o^#A^#F^#i^#l^#e
I'm guessing this is some kind of encoding problem, and if I was in my native language (good old C#), I wouldn't have too many problems. As it is I'm with C/C++ and Vi, and frankly don't really know where to go from here! I've tried looking at maybe converting to/from ansi/unicode, and also removing the odd characters, but I'm not even sure if they really exist or not..
Thanks in advance for any suggestions.
EDIT
Apologies, this is my first time posting here. The code below is how I'm going through the process:
ifstream myInput;
ofstream myOutput;
myInput.open(fileLocation.c_str());
myOutput.open("test.txt");
TEST_ASSERT(myInput.is_open() == 1);
string compare1 = "ThisIsATestStringOutputtedToAFile";
string fileBuffer;
std::getline(myInput, fileBuffer);
string compare2 = fileBuffer.substr(400,100);
cout << compare1 + "\n";
cout << compare2 + "\n";
myOutput << compare1 + "\n";
myOutput << compare2 + "\n";
cin.get();
myInput.close();
myOutput.close();
TEST_ASSERT(compare1.compare(compare2) == 0);
How did you create the content of myInput? I would guess that this file is created in two-byte encoding. You can use hex-dump to verify this theory, or use a different editor to create this file.
The simpliest way would be to launch cmd.exe and type
echo "ThisIsATestStringOutputtedToAFile" > test.txt
UPDATE:
If you cannot change the encoding of the myInput file, you can try to use wide-chars in your program. I.e. use wstring instead of string, wifstream instead of ifstream, wofstream, wcout, etc.
The following works for me and writes the text pasted below into the file. Note the '\0' character embedded into the string.
#include <iostream>
#include <fstream>
#include <sstream>
int main()
{
std::istringstream myInput("0123456789ThisIsATestStringOutputtedToAFile\x0 12ou 9 21 3r8f8 reohb jfbhv jshdbv coerbgf vibdfjchbv jdfhbv jdfhbvg jhbdfejh vbfjdsb vjdfvb jfvfdhjs jfhbsd jkefhsv gjhvbdfsjh jdsfhb vjhdfbs vjhdsfg kbhjsadlj bckslASB VBAK VKLFB VLHBFDSL VHBDFSLHVGFDJSHBVG LFS1BDV LH1BJDFLV HBDSH VBLDFSHB VGLDFKHB KAPBLKFBSV LFHBV YBlkjb dflkvb sfvbsljbv sldb fvlfs1hbd vljkh1ykcvb skdfbv nkldsbf vsgdb lkjhbsgd lkdcfb vlkbsdc xlkvbxkclbklxcbv");
std::ofstream myOutput("test.txt");
//std::ostringstream myOutput;
std::string str1 = "ThisIsATestStringOutputtedToAFile";
std::string fileBuffer;
std::getline(myInput, fileBuffer);
std::string str2 = fileBuffer.substr(10,100);
std::cout << str1 + "\n";
std::cout << str2 + "\n";
myOutput << str1 + "\n";
myOutput << str2 + "\n";
std::cout << str1.compare(str2) << '\n';
//std::cout << myOutput.str() << '\n';
return 0;
}
Output:
ThisIsATestStringOutputtedToAFile
ThisIsATestStringOutputtedToAFile
It turns out that the problem was that the file encoding of myInput was UTF-16, whereas the comparison string was UTF-8. The way to convert them with the OS limitations I had for this project (Linux, C/C++ code), was to use the iconv() functions. To keep the compatibility of the C++ strings I'd been using, I ended up saving the string to a new text file, then running iconv through the system() command.
system("iconv -f UTF-16 -t UTF-8 subStr.txt -o convertedSubStr.txt");
Reading the outputted string back in then gave me the string in the format I needed for the comparison to work properly.
NOTE
I'm aware that this is not the most efficient way to do this. I've I'd had the luxury of a Windows environment and the windows.h libraries, things would have been a lot easier. In this case though, the code was in some rarely used unit tests, and as such didn't need to be highly optimized, hence the creation, destruction and I/O operations of some text files wasn't an issue.

tokenizing and converting to pig latin

This looks like homework stuff but please be assured that it isn't homework. Just an exercise in the book we use in our c++ course, I'm trying to read ahead on pointers..
The exercise in the book tells me to split a sentence into tokens and then convert each of them into pig latin then display them..
pig latin here is basically like this: ball becomes allboy in piglatin.. boy becomes oybay.. take the first letter out, put it at the end then add "ay"..
so far this is what i have:
#include <iostream>
using std::cout;
using std::cin;
using std::endl;
#include <cstring>
using std::strtok;
using std::strcat;
using std::strcpy;
void printPigLatin( char * );
int main()
{
char sentence[500];
char *token;
cout << "Enter string to tokenize and convert: ";
cin.getline( sentence, 500 );
token = strtok( sentence, " " );
cout << "\nPig latin for each token will be: " << endl;
while( token != NULL )
{
printPigLatin( token );
token = strtok( NULL, " " );
}
return 0;
}
void printPigLatin( char *word )
{
char temp[50];
for( int i = 0; *word != '\0'; i++ )
{
temp[i] = word[i + 1];
}
strcat( temp, "ay" );
cout << temp << endl;
}
I understand the tokenizing part quite clearly but I'm not sure how to do the pig latin.. i tried to start by simply adding "ay" to the token and see what the results will be .. not sure why the program goes into an infinite loop and keeps on displaying "ayay" .. any tips?
EDIT: this one works fine now but im not sure how to add the first letter of the token before adding the "ay"
EDIT: this is how i "see" it done but not sure how to correctly implement it ..
You're running over your input string with strcat. You need to either create a new string for each token, copying the token and "ay", or simply print the token and then "ay". However, if you're using C++ why not use istream iterators and STL algorithms?
To be honest, I severly doubt the quality of the C++ book, judging from your example. The “basic stuff” in C++ isn't the C pointer style programming. Rather, it's applying high-level library functionality. As “On Freund” pointed out, the C++ standard library provides excellent features to tackle your task. You might want to search for recommendations of better C++ books.
Concerning the problem: your printPigLatin could use the existing function strcpy (or better: strncpy which is safer in regards to buffer overflows). Your manual copy omits the first character from the input because you're using the i + 1st position. You also have a broken loop condition which always tests the same (first) character. Additionally, this should result in an overflow anyway.
As the people before me pointed out, there are several other methods of achieving what you want to do.
However, the actual problem with your code seems to be the use of strcat, I see that you changed it a bit in the edit. Here is an explanation of why the initial one did not work char* and size issues
Basically, the pointer does not allocate enough memory to add the "ay" to the string provided. If you create a pointer using the technique shown in the link, it should work fine.
I got your program to work, taking the strcat out and using
cout << word << "ay" << endl
Your loop is infinite because of *word != '\0'.
The word pointer is not changed at any time in the loop.
This seemed to have worked:
void printPigLatin( char *word )
{
cout << word + 1 << word[0] << "ay" << endl;
}
Just not sure if it's a good idea to do that.