I was writing a code to tokenize a string wrt delimeters ",".
void Tokenize(const string& str, vector<string>& tokens, const string& delimeters)
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
int main()
{
string str;
int test_case;
cin>>test_case;
while(test_case--)
{
vector<string> tokens;
getline(cin, str);
Tokenize(str, tokens, ",");
// Parsing the input string
cout<<tokens[0]<<endl;
}
return 0;
}
It gives segmentation fault on running. When I debugged it the line
cout<<tokens[0]<<endl
was the cause of problem.I can't understand why because at cplusplus.com it uses the [ ] operartors for accesing the values of vectors
cin>>test_case; // this leaves a newline in the input buffer
while(test_case--)
{
vector<string> tokens;
getline(cin, str); // already found newline
Tokenize(str, tokens, ","); // passing empty string
Without looking at your Tokenize function, I would guess that an empty string results in an empty vector, which means when you print tokens[0], that element does not actually exist. You need to make sure your input buffer is empty before you call getline. You could put a call to cin.ignore() right after your number input, for example.
You could also forgo operator>>, and only use getline. Then do string to number conversions with your favorite method.
Is it possible that the read using std::getline() wasn't successful? In this case the string would be empty and using the subscript operator would crash. You should always test whether reading was successful after trying to read, e.g.:
if (std::getline(std::cin, str)) {
// process the read string
}
Related
I understand how to split a string by a string by a delimiter in C++, but how do you split a string embedded in a delimiter, e.g. try and split ”~!hello~! random junk... ~!world~!” by the string ”~!” into an array of [“hello”, “ random junk...”, “world”]? are there any C++ standard library functions for this or if not any algorithm which could achieve this?
#include <iostream>
#include <vector>
using namespace std;
vector<string> split(string s,string delimiter){
vector<string> res;
s+=delimiter; //adding delimiter at end of string
string word;
int pos = s.find(delimiter);
while (pos != string::npos) {
word = s.substr(0, pos); // The Word that comes before the delimiter
res.push_back(word); // Push the Word to our Final vector
s.erase(0, pos + delimiter.length()); // Delete the Delimiter and repeat till end of String to find all words
pos = s.find(delimiter); // Update pos to hold position of next Delimiter in our String
}
res.push_back(s); //push the last word that comes after the delimiter
return res;
}
int main() {
string s="~!hello~!random junk... ~!world~!";
vector<string>words = split(s,"~!");
int n=words.size();
for(int i=0;i<n;i++)
std::cout<<words[i]<<std::endl;
return 0;
}
The above program will find all the words that occur before, in between and after the delimiter that you specify. With minor changes to the function, you can make the function suit your need ( like for example if you don't need to find the word that occurs before the first delimiter or last delimiter) .
But for your need, the given function does the word splitting in the right way according to the delimiter you provide.
I hope this solves your question !
I need to create string parser in C++. I tried using
vector<string> Tokenize(const string& strInput, const string& strDelims)
{
vector<string> vS;
string strOne = strInput;
string delimiters = strDelims;
int startpos = 0;
int pos = strOne.find_first_of(delimiters, startpos);
while (string::npos != pos || string::npos != startpos)
{
if(strOne.substr(startpos, pos - startpos) != "")
vS.push_back(strOne.substr(startpos, pos - startpos));
// if delimiter is a new line (\n) then add new line
if(strOne.substr(pos, 1) == "\n")
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne.substr(pos, 1) != " ")
vS.push_back(strOne.substr(pos, 1));
if( string::npos == strOne.find_first_not_of(delimiters, pos) )
startpos = strOne.find_first_not_of(delimiters, pos);
else
startpos = pos + 1;
pos = strOne.find_first_of(delimiters, startpos);
}
return vS;
}
This works for 2X+7cos(3Y)
(tokenizer("2X+7cos(3Y)","+-/^() \t");)
But gives a runtime error for 2X
I need non Boost solution.
I tried using C++ String Toolkit (StrTk) Tokenizer
std::vector<std::string> results;
strtk::split(delimiter, source,
strtk::range_to_type_back_inserter(results),
strtk::tokenize_options::include_all_delimiters);
return results;
but it doesn't give token as a separate string.
eg: if I give the input as 2X+3Y
output vector contains
2X+
3Y
What's probably happening is this is crashing when passed npos:
lastPos = str.find_first_not_of(delimiters, pos);
Just add breaks to your loop instead of relying on the while clause to break out of it.
if (pos == string::npos)
break;
lastPos = str.find_first_not_of(delimiters, pos);
if (lastPos == string::npos)
break;
pos = str.find_first_of(delimiters, lastPos);
Loop exit condition is broken:
while (string::npos != pos || string::npos != startpos)
Allows entry with, say pos = npos and startpos = 1.
So
strOne.substr(startpos, pos - startpos)
strOne.substr(1, npos - 1)
end is not npos, so substr doesn't stop where it should and BOOM!
If pos = npos and startpos = 0,
strOne.substr(startpos, pos - startpos)
lives, but
strOne.substr(pos, 1) == "\n"
strOne.substr(npos, 1) == "\n"
dies. So does
strOne.substr(pos, 1) != " "
Sadly I'm out of time and can't solve this right now, but QuestionC's got the right idea. Better filtering. Something along the lines of:
if (string::npos != pos)
{
if (strOne.substr(pos, 1) == "\n") // can possibly simplify this with strOne[pos] == '\n'
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne[pos] != ' ')
vS.push_back(strOne.substr(pos, 1));
}
Would be great if you could share some info on your environment. Your program ran fine with an input value of 2X on my Fedora 20 using g++.
I created a little function that splits a string into substrings (which are stored in a vector) and it allows you to set which characters you want to treat as whitespace. Normal whitespace will still be treated as whitespace, so you don't have to define that. Actually, all it does is turns the character you defined as whitespace into actual whitespace (space char ' '). Then it runs that in a stream (stringstream) to separate the substrings and store them in a vector. This may not be what you need for this particular problem, but maybe it can give you some ideas.
// split a string into its whitespace-separated substrings and store
// each substring in a vector<string>. Whitespace can be defined in argument
// w as a string (e.g. ".;,?-'")
vector<string> split(const string& s, const string& w)
{
string temp{ s };
// go through each char in temp (or s)
for (char& ch : temp) {
// check if any characters in temp (s) are whitespace defined in w
for (char white : w) {
if (ch == white)
ch = ' '; // if so, replace them with a space char (' ')
}
}
vector<string> substrings;
stringstream ss{ temp };
for (string buffer; ss >> buffer;) {
substrings.push_back(buffer);
}
return substrings;
}
I am trying to split a string and put it into a vector
however, I also want to keep an empty token whenever there are consecutive delimiter:
For example:
string mystring = "::aa;;bb;cc;;c"
I would like to tokenize this string on :; delimiters
but in between delimiters such as :: and ;;
I would like to push in my vector an empty string;
so my desired output for this string is:
"" (empty)
aa
"" (empty)
bb
cc
"" (empty)
c
Also my requirement is not to use the boost library.
if any could lend me an idea.
thanks
code that tokenize a string but does not include the empty tokens
void Tokenize(const string& str,vector<string>& tokens, const string& delim)
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
You can make your algorithm work with some simple changes. First, don't skip delimiters at the beginning, then instead of skipping delimiters in the middle of the string, just increment the position by one. Also, your npos check should ensure that both positions are not npos so it should be && instead of ||.
void Tokenize(const string& str,vector<string>& tokens, const string& delimiters)
{
// Start at the beginning
string::size_type lastPos = 0;
// Find position of the first delimiter
string::size_type pos = str.find_first_of(delimiters, lastPos);
// While we still have string to read
while (string::npos != pos && string::npos != lastPos)
{
// Found a token, add it to the vector
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Look at the next token instead of skipping delimiters
lastPos = pos+1;
// Find the position of the next delimiter
pos = str.find_first_of(delimiters, lastPos);
}
// Push the last token
tokens.push_back(str.substr(lastPos, pos - lastPos));
}
I have a version using iterators:
std::vector<std::string> split_from(const std::string& s
, const std::string& d, unsigned r = 20)
{
std::vector<std::string> v;
v.reserve(r);
auto pos = s.begin();
auto end = pos;
while(end != s.end())
{
end = std::find_first_of(pos, s.end(), d.begin(), d.end());
v.emplace_back(pos, end);
pos = end + 1;
}
return v;
}
Using your interface:
void Tokenize(const std::string& s, std::vector<std::string>& tokens
, const std::string& delims)
{
auto pos = s.begin();
auto end = pos;
while(end != s.end())
{
end = std::find_first_of(pos, s.end(), delims.begin(), delims.end());
tokens.emplace_back(pos, end);
pos = end + 1;
}
}
This is my partial code:
if(action=="auth")
{
myfile.open("account.txt");
while(!myfile.eof())
{
getline(myfile,sline);
vector<string> y = split(sline, ':');
logincheck = "";
logincheck = y[0] + ":" + y[3];
if (sline==actionvalue)
{
sendClient = "login done#Successfully Login.";
break;
}
else
{
sendClient = "fail login#Invalid username/password.";
}
y.clear();
}
myfile.close();
}
If i don't have this
logincheck = y[0] + ":" + y[3];
The code will not have any segmentation core dump error, but when I add that line, it will went totally wrong.
My account.txt is as followed:
admin:PeterSmite:hr:password
cktang:TangCK:normal:password
The split function:
std::vector<std::string> split(std::string const& str, std::string const& delimiters = "#") {
std::vector<std::string> tokens;
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos) {
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
return tokens;
}
std::vector<std::string> split(std::string const& str, char const delimiter) {
return split(str,std::string(1,delimiter));
}
You should do some basic input checking before you blithely assume that the vector contains at least 4 elements, otherwise y[3] will explode when you parse a line of input without three colons:
if (y.size >= 4) {
// Do login check
} else {
// Invalid input
}
I'd guess that you probably have a blank line in your input.
Wrap the whole section of code that relies on reading a "a:b:c:d" line of input:
if(action=="auth") {
myfile.open("account.txt");
while(getline(myfile,sline))
{
vector<string> y = split(sline, ':');
if (y.size >= 4) {
logincheck = "";
logincheck = y[0] + ":" + y[3];
if (sline==actionvalue) {
sendClient = "login done#Successfully Login.";
break;
} else {
sendClient = "fail login#Invalid username/password.";
}
}
}
myfile.close();
}
The problem is the structure of your loop:
while(!myfile.eof())
{
getline(myfile,sline);
istream::eof() isn't guaranteed to return true until you attempt to read past the end of the stream. So what happens is you read 2 lines and eof() still hasn't return true. Then you enter the loop for the 3rd time. Since you don't check for errors after getline call, you happily access sline when its content is unspecified - it could be empty, it could still carry the content from the previous iteration, it could contain something else.
You always need to check if getline() call is succesful before you attempt to access the string. The idiomatic way is to put it in the condition of the loop:
while (getline(myfile, sline)) { /* do your stuff */ }
This way you only enter the loop body if the read is successful.
The problem is that the call to getline that pulls the last usable line isn't setting EOF, so you do one extra loop iteration after you have gotten the last usable line. That loop operation is running on an empty sline, which causes bad things to happen, namely, split doesn't return a vector with four elements, but then you try to access those elements.
You can just use
while (getline(myfile,sline))
{
// do stuff
}
in place of
while(!myfile.eof())
{
getline(myfile,sline);
// do stuff
}
Basically, I know virtually nothing about C++ and have only programmed briefly in Visual Basic.
I want a bunch of numbers from a csv file to be stored as a float array. Here is some code:
string stropenprice[702];
float openprice[702];
int x=0;
ifstream myfile ("open.csv");
if (myfile.is_open())
{
while ( myfile.good() )
{
x=x+1;
getline (myfile,stropenprice[x]);
openprice[x] = atof(stropenprice[x]);
...
}
...
}
Anyways it says:
error C2664: 'atof' : cannot convert parameter 1 from 'std::string' to 'const char *'
Well, you'd have to say atof(stropenprice[x].c_str()), because atof() only operates on C-style strings, not std::string objects, but that's not enough. You still have to tokenize the line into comma-separated pieces. find() and substr() may be a good start (e.g. see here), though perhaps a more general tokenization function would be more elegant.
Here's a tokenizer function that I stole from somewhere so long ago I can't remember, so apologies for the plagiarism:
std::vector<std::string> tokenize(const std::string & str, const std::string & delimiters)
{
std::vector<std::string> tokens;
// Skip delimiters at beginning.
std::string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
std::string::size_type pos = str.find_first_of(delimiters, lastPos);
while (std::string::npos != pos || std::string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
return tokens;
}
Usage: std::vector<std::string> v = tokenize(line, ","); Now use std::atof() (or std::strtod()) on each string in the vector.
Here's a suggestion, just to give you some idea how one typically writes such code in C++:
#include <string>
#include <fstream>
#include <vector>
#include <cstdlib>
// ...
std::vector<double> v;
std::ifstream infile("thefile.txt");
std::string line;
while (std::getline(infile, line))
{
v.push_back(std::strtod(line.c_str(), NULL)); // or std::atof(line.c_str())
}
// we ended up reading v.size() lines