c++ char* parsing

c++ char* parsing - c++

SO, I am trying to write a function that will return a vector<char**>, as such:
vector<char**> test(string mystr) {
char*temp=new char[mystr.size()+1];
strcpy(temp,mystr.c_str());
char*subStr=strtok(temp,":");
while(subStr!=NULL) {
int i=0;
char**args=new char*[200];
char*tempsta=newchar[strlen(subStr)+1];
strcpy(tempsta, subStr);
args[i]=strtok(tempsta," ");
while(args[i]!=NULL) {
i++;
args[i]=strtok(NULL," ");
}
fullVec.push_back(args);
//cout<<subStr<<endl;
subStr=strtok(NULL,":");
}
return fullVec;
}
so I want to split the parameter string up with ":" delimeter, then with " " delimeter. On the cout<<subStr call I get what is expected if I comment out everything from int i=0 to fullVec.push_back(args). If I do not comment out all of those lines I only get the first substring (until the first ":") is encountered, and then the largest while loop exits.
for what is expected I mean; let us assume the parameter is "my name is: bon jovi: xxx ab"
if everything is commented out, the following lines will be printed:
my name is
bon jovi
xxx ab
if I leave it as is, what will happen is only
my name is
will print, and the large loop will exit
any assistance is appreciated, thanks! (Yes, I am aware that this seems like a silly exercise which can be done much more elegantly/easily...however I would like to get this solution to work before I entertain using string etc.)

Your problem is that strtok() maintains state between calls.
If the first parameter is not NULL then it uses to reset the state otherwise it uses the state it has saved to continue parsing from where it left off.
Since you have two nested calls to strtok() the second call is messing with state of the outer call.
This call:
args[i]=strtok(tempsta," ");
Is resetting the internal state of strtok(). It now no longer knows anything about the state from your outer call. Thus when you get to the end of the string in your inner loop.
This call:
subStr=strtok(NULL,":");
Is now using the saved state of the inner loop. So it basically just terminates as you have already reached the end of that tokenizing stream.

As it was perfectly pointed out by Loki already, you should not mix C and C++. In case you want C++ solution for your problem, then it's better to stick with STL classes that will take care of ugly memory management for you (see RAII idiom) such as std::string, std::vector, std::istringstream.
This is how your function could look like:
typedef std::vector<std::string> Line;
std::vector<Line> parse(std::string inputString)
{
std::vector<Line> lines;
std::istringstream inputStream(inputString);
for (std::string line; std::getline(inputStream, line, ':'); )
{
if (!line.empty())
{
lines.push_back(Line());
std::istringstream lineStream(line);
for (std::string word; std::getline(lineStream, word, ' '); )
{
if (!word.empty())
lines.back().push_back(word);
}
}
}
return lines;
}
Example of usage:
std::vector<Line> lines = parse("my name is: bon jovi: xxx ab");
for (int li = 0; li < lines.size(); ++li)
{
for (int wi = 0; wi < lines[li].size(); ++wi)
std::cout << lines[li][wi] << "_";
std::cout << std::endl;
}
outputs
my_name_is_
bon_jovi_
xxx_ab_
Hope this helps :)

As mentioned in comments, you are mixing C-style and C++-style code which results in quite a mess. Unless you have "good reason" to resort to char* instead of std::string then it's better practice to make use of the stl or boost.
The boostway:
std::string delims = " :";
boost::split(vector, mystr, boost::is_any_of(delims));
The stl way:
vector<string> result;
std::string delims = " :";
std::istringstream ss( mystr );
while (!ss.eof())
{
getline( ss, field, delims);
if ((empties == split::no_empties) && field.empty()) continue;
result.push_back( field );
}
For more approaches and a good comparison, see this cplusplus article

Related

How can I split a getline() into array in c++

I have an input getline:
man,meal,moon;fat,food,feel;cat,coat,cook;love,leg,lunch
And I want to split this into an array when it sees a ;, it can store all values before the ; in an array.
For example:
array[0]=man,meal,moon
array[1]=fat,food,feel
And so on...
How can I do it? I tried many times but I failed!😒
Can anyone help?
Thanks in advance.

You can use std::stringstream and std::getline.
I also suggest that you use std::vector as it's resizeable.
In the example below, we get input line and store it into a std::string, then we create a std::stringstream to hold that data. And you can use std::getline with ; as delimiter to store the string data between the semicolon into the variable word as seen below, each "word" which is pushed back into a vector:
int main()
{
string line;
string word;
getline(cin, line);
stringstream ss(line);
vector<string> vec;
while (getline(ss, word, ';')) {
vec.emplace_back(word);
}
for (auto i : vec) // Use regular for loop if you can't use c++11/14
cout << i << '\n';
Alternatively, if you can't use std::vector:
string arr[256];
int count = 0;
while (getline(ss, word, ';') && count < 256) {
arr[count++] = word;
}
Live demo
Outputs:
man,meal,moon
fat,food,feel
cat,coat,cook
love,leg,lunch

I don't want to give you some code because you must be new at C++ and you have to learn by yourself but I can give an hint: use substring to store it into a vector of string.

Splitting sentences and placing in vector

I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.

This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.

You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.

I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.

first string in array of strings is being skipped

Somehow when I run this code and it comes to inputting strings, the first string where i=0 is being skipped and it starts entering strings from A[1]. So I end up with A[0] filled with random stuff from memory. Can someone please point at the problem?
cin>>s;
char** A;
A = new char *[s];
cout<<"now please fill the strings"<<endl;
for (int i=0;i<s;i++)
{
A[i] = new char[100];
cout<<"string "<<i<<": ";
gets(A[i]);
}

That code is horrible. Here's how it should look like in real C++:
#include <string>
#include <iostream>
#include <vector>
int main()
{
std::cout << "Please start entering lines. A blank line or "
<< "EOF (Ctrl-D) will terminate the input.\n";
std::vector<std::string> lines;
for (std::string line; std::getline(std::cin, line) && !line.empty(); )
{
lines.push_back(line);
}
std::cout << "Thank you, goodbye.\n";
}
Note the absence of any pointers or new expressions.
If you like you can add a little prompt print by adding std::cout << "> " && at the beginning of the conditional check in the for loop.

Probably because you're using gets()... never use gets()
Use fgets() instead.
gets vs fgets

The problem is that cin>>s; just picks up the number you want and leaves a \n (newline from the enter press) on stdin that gets() picks up in the first iteration. This is not the nicest way to fix it, but to prove it write this line after that line:
int a = fgetc(stdin);
Check out a afterwards to confirm it has a newline.

Well, you probably get an empty string: when reading s you use formatted input which stops as soon as a non-digit is encountered, e.g., the newline used to indicate its input is finished. gets(), thus, immediately finds a newline, terminating the first string read.
That said, you shall never use gets(): It is a primary security problem and the root cause of many potential attack! You should, instead, use fgets() or, better, yet, std::getline() together with std::strings and a std::vector<std::string> >. Aslo, you should always verify that the attempt to input was successful:
if ((std::cin >> s).ignore(std::numeric_limits<std::streamsize>::max(), `\n`)) {
std::string line;
for (int i(0); i != s && std::getline(std::cin, line); ) {
A.push_back(line);
}
}

How do I split a user-defined sentence into words in C++ using substr and find?

I used this function but it is wrong.
for (int i=0; i<sen.length(); i++) {
if (sen.find (' ') != string::npos) {
string new = sen.substr(0,i);
}
cout << "Substrings:" << new << endl;
}
Thank you! Any kind of help is appreciated!

new is a keyword in C++, so first step is to not use that as a variable name.
After that, you need to put your output statement in the "if" block, so that it can actually be allowed to access the substring. Scoping is critical in C++.

First: this cannot compile because new is a language keyword.
Then you have a loop running through every character in the string so you shouldn't need to use std::string::find. I would use std::string::find, but then the loop condition should be different.

This doesn't use substr and find, so if this is homework and you have to use that then this won't be a good answer... but I do believe it's the better way to do what you're asking in C++. It's untested but should work fine.
//Create stringstream and insert your whole sentence into it.
std::stringstream ss;
ss << sen;
//Read out words one by one into a string - stringstream will tokenize them
//by the ASCII space character for you.
std::string myWord;
while (ss >> myWord)
std::cout << myWord << std::endl; //You can save it however you like here.
If it is homework you should tag it as such so people stick to the assignment and know how much to help and/or not help you so they don't give it away :)

No need to iterate over the string, find already does this. It starts to search from the beginning by default, so once we found a space, we need to start the next search from this found space:
std::vector<std::string> words;
//find first space
size_t start = 0, end = sen.find(' ');
//as long as there are spaces
while(end != std::string::npos)
{
//get word
words.push_back(sen.substr(start, end-start));
//search next space (of course only after already found space)
start = end + 1;
end = sen.find(' ', start);
}
//last word
words.push_back(sen.substr(start));
Of course this doesn't handle duplicate spaces, starting or trailing spaces and other special cases. You would actually be better off using a stringstream:
#include <sstream>
#include <algorithm>
#include <iterator>
std::istringstream stream(sen);
std::vector<std::string> words(std::istream_iterator<std::string>(stream),
std::istream_iterator<std::string>());
You can then just put these out however you like or just do it directly in the loops without using a vector:
for(std::vector<std::string>::const_iterator iter=
words.begin(); iter!=words.end(); ++iter)
std::cout << "found word: " << *iter << '\n';

std::string manipulation: whitespace, "newline escapes '\'" and comments #

Kind of looking for affirmation here. I have some hand-written code, which I'm not shy to say I'm proud of, which reads a file, removes leading whitespace, processes newline escapes '\' and removes comments starting with #. It also removes all empty lines (also whitespace-only ones). Any thoughts/recommendations? I could probably replace some std::cout's with std::runtime_errors... but that's not a priority here :)
const int RecipeReader::readRecipe()
{
ifstream is_recipe(s_buffer.c_str());
if (!is_recipe)
cout << "unable to open file" << endl;
while (getline(is_recipe, s_buffer))
{
// whitespace+comment
removeLeadingWhitespace(s_buffer);
processComment(s_buffer);
// newline escapes + append all subsequent lines with '\'
processNewlineEscapes(s_buffer, is_recipe);
// store the real text line
if (!s_buffer.empty())
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}
is_recipe.close();
return 0;
}
void RecipeReader::processNewlineEscapes(string &s_string, ifstream &is_stream)
{
string s_temp;
size_t sz_index = s_string.find_first_of("\\");
while (sz_index <= s_string.length())
{
if (getline(is_stream,s_temp))
{
removeLeadingWhitespace(s_temp);
processComment(s_temp);
s_string = s_string.substr(0,sz_index-1) + " " + s_temp;
}
else
cout << "Error: newline escape '\' found at EOF" << endl;
sz_index = s_string.find_first_of("\\");
}
}
void RecipeReader::processComment(string &s_string)
{
size_t sz_index = s_string.find_first_of("#");
s_string = s_string.substr(0,sz_index);
}
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
const size_t sz_length = s_string.size();
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index <= sz_length)
s_string = s_string.substr(sz_index);
else if ((sz_index > sz_length) && (sz_length != 0)) // "empty" lines with only whitespace
s_string.clear();
}
Some extra info: the first s_buffer passed to the ifstream contains the filename, std::string s_buffer is a class data member, so is std::vector v_s_recipe. Any comment is welcome :)
UPDATE: for the sake of not being ungrateful, here is my replacement, all-in-one function that does what I want for now (future holds: parenthesis, maybe quotes...):
void readRecipe(const std::string &filename)
{
string buffer;
string line;
size_t index;
ifstream file(filename.c_str());
if (!file)
throw runtime_error("Unable to open file.");
while (getline(file, line))
{
// whitespace removal
line.erase(0, line.find_first_not_of(" \t\r\n\v\f"));
// comment removal TODO: store these for later output
index = line.find_first_of("#");
if (index != string::npos)
line.erase(index, string::npos);
// ignore empty buffer
if (line.empty())
continue;
// process newline escapes
index = line.find_first_of("\\");
if (index != string::npos)
{
line.erase(index,string::npos); // ignore everything after '\'
buffer += line;
continue; // read next line
}
else // no newline escapes found
{
buffer += line;
recipe.push_back(buffer);
buffer.clear();
}
}
}

Definitely ditch the hungarian notation.

It's not bad, but I think you're thinking of std::basic_string<T> too much as a string and not enough as an STL container. For example:
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
s_string.erase(s_string.begin(),
std::find_if(s_string.begin(), s_string.end(), std::not1(isspace)));
}

A few comments:
As another answer (+1 from me) said - ditch the hungarian notation. It really doesn't do anything but add unimportant trash to every line. In addition, ifstream yielding an is_ prefix is ugly. is_ usually indicates a boolean.
Naming a function with processXXX gives very very little information on what it is actually doing. Use removeXXX, like you did with the RemoveLeadingWhitespace function.
The processComment function does an unnecessary copy and assignment. Use s.erase(index, string::npos); (npos is default, but this is more obvious).
It's not clear what your program does, but you might consider storing a different file format (like html or xml) if you need to post-process your files like this. That would depend on the goal.
using find_first_of('#') may give you some false positives. If it's present in quotes, it's not necessarily indicating a comment. (But again, this depends on your file format)
using find_first_of(c) with one character can be simplified to find(c).
The processNewlineEscapes function duplicates some functionality from the readRecipe function. You may consider refactoring to something like this:
-
string s_buffer;
string s_line;
while (getline(is_recipe, s_line)) {
// Sanitize the raw line.
removeLeadingWhitespace(s_line);
removeComments(s_line);
// Skip empty lines.
if (s_line.empty()) continue;
// Add the raw line to the buffer.
s_buffer += s_line;
// Collect buffer across all escaped lines.
if (*s_line.rbegin() == '\\') continue;
// This line is not escaped, now I can process the buffer.
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}

I'm not big on methods that modify the parameters. Why not return strings rather than modifying the input arguments? For example:
string RecipeReader::processComment(const string &s)
{
size_t index = s.find_first_of("#");
return s_string.substr(0, index);
}
I personally feel this clarifies intent and makes it more obvious what the method is doing.

I'd consider replacing all your processing code (almost everything you've written) with boost::regex code.

A few comments:
If s_buffer contains the file name to be opened, it should have a better name like s_filename.
The s_buffer member variable should not be reused to store temporary data from reading the file. A local variable in the function would do as well, no need for the buffer to be a member variable.
If there is not need to have the filename stored as a member variable it could just be passed as a parameter to readRecipe()
processNewlineEscapes() should check that the found backslash is at the end of the line before appending the next line. At the moment any backslash at any position triggers adding of the next line at the position of the backslash. Also, if there are several backslashes, find_last_of() would probably easier to use than find_first_of().
When checking the result of find_first_of() in processNewlineEscapes() and removeLeadingWhitespace() it would be cleaner to compare against string::npos to check if anything was found.
The logic at the end of removeLeadingWhitespace() could be simplified:
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index != s_string.npos)
s_string = s_string.substr(sz_index);
else // "empty" lines with only whitespace
s_string.clear();

You might wish to have a look at Boost.String. It's a simple collection of algorithms to work with streams, and notably features trim methods :)
Now, on to the review itself:
Don't bother to remove the hungarian notation, if it's your style then use it, however you should try and improve the names of methods and variables. processXXX is definitely not indicating anything useful...
Functionally, I am worried about your assumptions: the main issue here is that you do not care for espace sequences (\n uses a backslash for example) and you do not worry for the presence of strings of charachters: std::cout << "Process #" << pid << std::endl; would yield an invalid line because of your "comment" preprocessing
Furthermore, since you remove the comments before processing the newline escapes:
i = 3; # comment \
running comment
will be parsed as
i = 3; running comment
which is syntactically incorrect.
From an interface point of view: there is not benefit in having the methods being class members here, you don't need an instance of RecipeReader really...
And finally, I find it awkward that two methods would read from the stream.
Little peeve of mine: returning by const value does not serve any purpose.
Here is my own version, as I believe than showing is easier than discussing:
// header file
std::vector<std::string> readRecipe(const std::string& fileName);
std::string extractLine(std::ifstream& file);
std::pair<std:string,bool> removeNewlineEscape(const std::string& line);
std::string removeComment(const std::string& line);
// source file
#include <boost/algorithm/string.hpp>
std::vector<std::string> readRecipe(const std::string& fileName)
{
std::vector<std::string> result;
ifstream file(fileName.c_str());
if (!file) std::cout << "Could not open: " << fileName << std::endl;
std::string line = extractLine(file);
while(!line.empty())
{
result.push_back(line);
line = extractLine(file);
} // looping on the lines
return result;
} // readRecipe
std::string extractLine(std::ifstream& file)
{
std::string line, buffer;
while(getline(file, buffer))
{
std::pair<std::string,bool> r = removeNewlineEscape(buffer);
line += boost::trim_left_copy(r.first); // remove leading whitespace
// based on the current locale
if (!r.second) break;
line += " "; // as we append, we insert a whitespace
// in order unintended token concatenation
}
return removeComment(line);
} // extractLine
//< Returns the line, minus the '\' character
//< if it was the last significant one
//< Returns a boolean indicating whether or not the line continue
//< (true if it's necessary to concatenate with the next line)
std::pair<std:string,bool> removeNewlineEscape(const std::string& line)
{
std::pair<std::string,bool> result;
result.second = false;
size_t pos = line.find_last_not_of(" \t");
if (std::string::npos != pos && line[pos] == '\')
{
result.second = true;
--pos; // we don't want to have this '\' character in the string
}
result.first = line.substr(0, pos);
return result;
} // checkNewlineEscape
//< The main difficulty here is NOT to confuse a # inside a string
//< with a # signalling a comment
//< assuming strings are contained within "", let's roll
std::string removeComment(const std::string& line)
{
size_t pos = line.find_first_of("\"#");
while(std::string::npos != pos)
{
if (line[pos] == '"')
{
// We have detected the beginning of a string, we move pos to its end
// beware of the tricky presence of a '\' right before '"'...
pos = line.find_first_of("\"", pos+1);
while (std::string::npos != pos && line[pos-1] == '\')
pos = line.find_first_of("\"", pos+1);
}
else // line[pos] == '#'
{
// We have found the comment marker in a significant position
break;
}
pos = line.find_first_of("\"#", pos+1);
} // looking for comment marker
return line.substr(0, pos);
} // removeComment
It is fairly inefficient (but I trust the compiler for optmizations), but I believe it behaves correctly though it's untested so take it with a grain of salt. I have focused mainly on solving the functional issues, the naming convention I follow is different from yours but I don't think it should matter.

I want to point out a small and sweet version which lacks \ support but skips whitespace-lines and comments. (Note the std::ws in the call to std::getline.
#include <algorithm>
#include <iostream>
#include <sstream>
#include <string>
int main()
{
std::stringstream input(
" # blub\n"
"# foo bar\n"
" foo# foo bar\n"
"bar\n"
);
std::string line;
while (std::getline(input >> std::ws, line)) {
line.erase(std::find(line.begin(), line.end(), '#'), line.end());
if (line.empty()) {
continue;
}
std::cout << "line: \"" << line << "\"\n";
}
}
Output:
line: "foo"
line: "bar"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

c++ char* parsing - c++

Related

How can I split a getline() into array in c++

Splitting sentences and placing in vector

first string in array of strings is being skipped

How do I split a user-defined sentence into words in C++ using substr and find?

std::string manipulation: whitespace, "newline escapes '\'" and comments #

Categories

Resources