I have an input stream with a series of bytecode-like instructions
function foo
push x
pop y
...
return
function bar
...
return
function other
...
I.e. a series of function declarations back-to-back. Each function is defined from one "function" until the next. There may be multiple "returns" within a function so I cannot use that as a delimiter. All instructions must be inside a function (i.e. the first line of the stream is always a "function" and the last line is always a "return").
I want to basically remove certain functions from the list. I have a list of the functions I want to keep and I thought about copying to an output stream, skipping over any function not on the list, something like
vector<string> wanted_functions = { "foo", "other" }
ostringstream oss;
bool skip = false;
for (string line; getline(input_stream, line);) {
istringstream iss(line);
string command;
iss >> command;
if (command == "function") {
skip = false;
string function_name;
iss >> function_name;
if (std::find(wanted_function.begin(), wanted_functions.end(), function_name)
== wanted_functions.end()) {
skip = true;
}
if (!skip) oss << line;
}
I haven't tested the above solution; it looks like it may work but I don't think it's very elegant.
I feel like stream iterators would be good here but I don't know how to use them. How can I achieve the skipping behavior using iterators, or maybe native stream methods like ignore() or seekg()?
Bonus: If there's a better way to read the first two words in the line that creating a new stream just for them I'd also like to know please.
Edit: Functions are always sequential. There are no nested functions. I.e. "function" is always immediately preceded by "return".
If it's text, you can't easily just jump/skip (seekg) without actually reading it since you don't have a known offset to go to (many binary file formats will contain such information), but you can just filter what you do read, the code in your question nearly does this.
istream_iterator<std:string> will give you each word / white-space delimited, but you can't tell where the new lines are. You can make a istream_iterator that will read lines instead, but the simplest way involves sub-classing std::string to redefine operator >>, but that is basically what getline gets you anyway, or you might make your own type containing more useful information (below).
You might use std::unordered_set<std::string> wanted_functions as that is easier to check if an item exists or not than searching a std::vector (with std::find or similar). skip also ends up working slightly weirdly as you are setting it on "unwanted" functions, then doing like if (!unwanted).
unordered_set<string> wanted_functions = { "foo", "other" };
bool is_wanted_function = false;
for (string line; getline(input_stream, line);) {
istringstream iss(line);
string command;
iss >> command;
if (command == "function") {
string function_name;
iss >> function_name;
is_wanted_function = wanted_functions.count(function_name) != 0;
}
if (is_wanted_function) {
oss << line << std::endl;
}
}
An alternative to the is_wanted_function flag would be to consume the function within the if (command == "function") {, this needs some more careful management of reading the next line, so as to not accidentally skip the one following the inner loop
unordered_set<string> wanted_functions = { "foo", "other" };
string line;
getline(input_stream, line);
while (input_stream) {
istringstream iss(line);
string command;
iss >> command;
if (command == "function") {
string function_name;
iss >> function_name;
if (wanted_functions.count(function_name)) {
oss << line << std::endl;
while (getline(input_stream, line) && line.rfind("function", 0) != 0) {
oss << line << std::endl;
}
continue; // already have a line
}
}
getline(input_stream, line); // next line
}
As is I don't think that that is much of an improvement, but if the actual parsing (iss >> command;, iss >> function_name, etc.) was refactored out elsewhere, then it would be somewhat simpler.
You might make the actual parsing (getting the command name like "function", and arguments like "foo") it's own class which can tidy up having the istringstream iss(line); iss >> command; etc. being directly in this code.
istream_iterator basically just uses operator >> to get the next item until the stream is in a failure state, so can be used with your own types, although you can get something very similar doing largely the same yourself without istream_iterator.
class command
{
public:
const std::string &cmd()const { return _cmd; }
const std::string &source_line()const { return _source_line; }
const std::string &arg(size_t i)const
{
if (i < _args.size()) return _args[i];
else throw std::out_of_range("Command does not have this many arguments.");
}
friend std::istream &operator >> (std::istream &is, command &cmd)
{
if (std::getline(is, cmd._source_line))
{
std::stringstream ss(cmd._source_line);
ss >> cmd._cmd;
cmd._args.clear(); // istream_iterator uses the same command object every time
while (true)
{
std::string val;
ss >> val;
if (!ss) break;
cmd._args.push_back(std::move(val));
}
}
return is;
}
private:
std::string _source_line;
std::string _cmd;
std::vector<std::string> _args;
};
int main()
{
using namespace std;
std::stringstream input_stream(
"function foo\n"
"push x\n"
"pop y\n"
"...\n"
"return\n"
"function bar\n"
"...\n"
"return\n"
"function other\n"
"...\n"
"return\n");
std::ostream &oss = std::cout;
std::unordered_set<string> wanted_functions = { "foo", "other" };
std::istream_iterator<command> eos; // end of stream
std::istream_iterator<command> it(input_stream); // iterator
while (it != eos)
{
if (it->cmd() == "function" && wanted_functions.count(it->arg(0)))
{
do
{
oss << it->source_line() << std::endl;
} while (++it != eos && it->cmd() != "function");
}
else ++it; // on true the while loop already advanced
}
}
istream_iterator of course does also bring compatibility with the other iterator based algorithms and constructors (std::find, etc.), and you can build some more complex things out of that. For example if you add another layer on top of this to create a istream_iterator<function>, then maybe you could use the Boost C++ filter_iterator, and then you will have an iterator with just the functions you want.
Note that if you need to start dealing with any nested constructs (like if (...) { ... } else if (...) { ... }), you might find parsing into a tree structure more convenient to do operations on than a flat sequence. See Abstract Syntax Tree. This somewhat depends on your syntax, e.g. if you use just goto if offset/label instead of while(expr), if(expr), else if, else, etc. type constructs.
Related
I read configuration files of the following format into my C++ code:
# name score
Marc 19.7
Alex 3.0
Julia 21.2
So far, I have adapted a solution found here: Parse (split) a string in C++ using string delimiter (standard C++). For example, the following code snippet reads in the file line by line, and for each line calls parseDictionaryLine, which discards the first line, splits the string as described in the original thread, and inserts the values into a (self-implemented) hash table.
void parseDictionaryLine(std::string &line, std::string &delimiter, hash_table &table) {
size_t position = 0;
std::string name;
float score;
while((position = line.find(delimiter)) != std::string::npos) {
name = line.substr(0, position);
line.erase(0, position + delimiter.length());
score = stof(line);
table.hinsert(name, score);
}
}
void loadDictionary(const std::string &path, hash_table &table) {
std::string line;
std::ifstream fin(path);
std::string delimiter = " ";
int lineNumber = 0;
if(fin.is_open()) {
while(getline(fin, line)) {
if(lineNumber++ < 1) {
continue; // first line
}
parseDictionaryLine(line, delimiter, table);
}
fin.close();
}
else {
std::cerr << "Unable to open file." << std::endl;
}
}
My question would be, is there a more elegant way in C++ to achieve this task? In particular, is there (1) a better split function as for example in Python, (2) a better method to test if a line is a comment line (starting with #), like startsWith (3) potentially even in iterator that handles files similar to a context manager in Python and makes sure the file will actually be closed? My solution works for simple cases shown here but becomes more clunky with more complicated variations such as several comment lines at unpredictable positions and more parameters. Also, it worries me that my solution does not check if the file actually agrees with the prescribed format (two values per line, first is string, second is float). Implementing these checks with my method seems very cumbersome.
I understand there is JSON and other file formats with libraries made for this use case, but I am dealing with legacy code and cannot go there.
I will try to answer all your questions.
First for splitting a string, you should not use the linked question/answer. It is from 2010 and rather outdated. Or, you need to scroll at the very bottom. There you will find more modern answers.
In C++ many things are done with iterators. Because a lot of algorithms or constructors in C++ work with iterators. So, the better approch for splitting a string is to use iterators. This will then always result in a one liner.
Background. A std::string is also a container. And you can iterate over elements like for example words or values in it. In case of space separated values you can use the std::istream_iterator on a std::istringstream. But since years there is a dedicated iterator for iterating of patterns in a string:
The std::sregex_token_iterator. And because it is specifically designed for that purpuse, it should be used.
Ans if it is used for splitting the strings, the overhead of using regexes is also minimal. So, you may split on strings, commas, colons or whatever. Example:
#include <iostream>
#include <string>
#include <vector>
#include <regex>
const std::regex re(";");
int main() {
// Some test string to be splitted
std::string test{ "Label;42;string;3.14" };
// Split and store whatever number of elements in the vector. One Liner
std::vector data(std::sregex_token_iterator(test.begin(), test.end(), re, -1), {});
// Some debug output
for (const std::string& s : data) std::cout << s << '\n';
}
So, regardless of the number of patterns, it will copy all data parts into the std::vector.
So, now you have a one liner solution for splitting strings.
For checking. if the first character is a string, you may use
the index operator (if (string[0] == '#'))
or, the std::string's front function (if (string.front() == '#'))
or again a regex
But, here you need to be careful. The string must not be empty, so, better write:
if (not string.empty() and string.front() == '#')
Closing file or iterating over files.
If you use a std::ifstream then the constructor will open the file for you and the destructor will automatically close it, when the stream variable rund out of scope. The typical pattern here is:
// Open the file and check, if it coud be opened
if (std::iftsream fileStream{"test.txt"};fileStream) {
// Do things
} // <-- This will close the file automatically for you
Then, in general you shoud use a more object oriented approach. Data, and methods operating on this data, should be encapsulated in one class. Then you would overwrite the extractor operatoe >> and the inserter operator << to read and write the data. This, because only the class should know, how to handle the data. And if you decide to use a different mechanism, modify your class and the rest of the outside world will still work.
In your example case, input and output is that simple, that easiest IO will work. No splitting of string necessary.
Please see the following example.
And note especially the only few statements in main.
If you change something inside the classes, it will simple continue to work.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
// Data in one line
struct Data {
// Name and score
std::string name{};
double score{};
// Extractor and inserter
friend std::istream& operator >> (std::istream& is, Data& d) { return is >> d.name >> d.score; }
friend std::ostream& operator << (std::ostream& os, const Data& d) { return os << d.name << '\t' << d.score; }
};
// Datbase, so all data from the source file
struct DataBase {
std::vector<Data> data{};
// Extractor
friend std::istream& operator >> (std::istream& is, DataBase& d) {
// Clear old data
d.data.clear(); Data element{};
// Read all lines from source stream
for (std::string line{}; std::getline(is, line);) {
// Ignore empty and command lines
if (not line.empty() and line.front() != '#') {
// Call extractor from Data class end get the data
std::istringstream(line) >> element;
// And save new data in the datbase
d.data.push_back(std::move(element));
}
}
return is;
}
// Inserter. Output all data
friend std::ostream& operator << (std::ostream& os, const DataBase& d) {
std::copy(d.data.begin(), d.data.end(), std::ostream_iterator<Data>(os, "\n"));
return os;
}
};
int main() {
// Open file and check, if it is open
if (std::ifstream ifs{ "test.txt" }; ifs) {
// Our database
DataBase db{};
// Read all data
ifs >> db;
// Debug output show all data
std::cout << db;
}
else std::cerr << "\nError: Could not open source file\n";
}
You can use operator>> to split at delimiters for you, like this:
#include <iostream>
#include <sstream>
#include <unordered_map>
std::istringstream input{
"# name score\n"
"Marc 19.7\n"
"Alex 3.0\n"
"Julia 21.2\n"
};
auto ReadDictionary(std::istream& stream)
{
// unordered_map has O(1) lookup, map has n(log n) lookup
// so I prefer unordered maps as dictionaries.
std::unordered_map<std::string, double> dictionary;
std::string header;
// read the first line from input (the comment line or header)
std::getline(stream, header);
std::string name;
std::string score;
// read name and score from line (>> will split at delimiters for you)
while (stream >> name >> score)
{
dictionary.insert({ name, std::stod(score) });
}
return dictionary;
}
int main()
{
auto dictionary = ReadDictionary(input); // todo replace with file stream
// range based for loop : https://en.cppreference.com/w/cpp/language/range-for
// captured binding : https://en.cppreference.com/w/cpp/language/structured_binding
for (const auto& [name, score] : dictionary)
{
std::cout << name << ": " << score << "\n";
}
return 0;
}
Writing a program to read a text file and storing it in a struct. An example of the text file:
chicken
dog
car
765
When there is some text in the line, it will get store into the struct. I have tried the following:
getline(file, aLine);
Info.animalchicken = aLine;
getline(file, aLine);
Info.animaldog = aLine;
getline(file, aLine);
Info.car = aLine;
getline(file, aLine);
Info.number = aLine;
I realised that the getline is literally getting every single line. When I run this in my program, the chicken will be stored in the struct Info.animalchicken. The next line, which is empty, will store into Info.animaldog. Dog will be stored in Info.car and so on.
I think a control loop is required here but can't think of a good one. How can I ignore the empty line so my text can enter into the struct correctly?
This is my struct
struct Info {
string animalchicken;
string animaldog;
string car;
int number;
}
The loop idea, while quite primitive, should do the trick; the easiest way would be to wrap the logic in a separate function:
std::string getlineFilterEmpty(std::istream& s) {
std::string line;
do {
if (!s) {
throw std::runtime_error("End of stream");
}
getline(s, line);
} while(line.size() == 0);
return line;
}
Then getting your values is as simple as:
Info.animalchicken = getlineFilterEmpty(file);
Info.animaldog = getlineFilterEmpty(file);
Info.car = getlineFilterEmpty(file);
The number member will require parsing the string to an integer, the code for which you'll find elsewhere on SO.
The logic needs to go something like,
Read a line.
If read succeeded
If line not empty
Provide line
Else
Try again
Else
Handle error
Translating that into code and bundling it into a function for easy reuse, we get
std::string getNotEmptyLine(std::istream & in)
{
while (true) // repeat forever!
{
std::string temp;
std::getline(in, temp); // get a line
if (in) // test the line
{
if (line.size() != 0) // line not empty
{
return temp; //give it to caller
}
}
else
{
// handle error. We'll throw an exception, but this isn't the best solution
throw std::runtime_error("Couldn't read a line!");
}
}
}
As with all literal translations, it needs a bit of work. It would also be helpful to make this function work exactly like getline so the caller can use it as a drop-in replacement.
std::istream & getNotEmptyLine(std::istream & in, // stream to read
std::string & line, // somewhere to put the string
char delim = '\n') // allow different delimiters
{
while (true) // repeat forever!
{
if (std::getline(in, line, delim)) // get a line right in line and test that we got it.
{
if (line.size() != 0) // line not empty
{
break; // success. exit.
}
}
else
{
// line will contain whatever this implementation of `getline` puts or
// leaves in the string on failure.
break; // fail. Let the caller decide what to do
}
}
return in;
}
Usage:
Info info;
std::string aLine;
if (getNotEmptyLine(in, info.animalchicken) &&
getNotEmptyLine(in, info.animaldog) &&
getNotEmptyLine(in, info.car) &&
getNotEmptyLine(in, aLine))
{
info.number = std::stoi(aLine);
}
else
{
// handle error
}
Note: even this may be too simplistic. It can't handle a line that contains nothing but whitespace. A single misplaced and nigh-invisible space will wreak havoc. If this is a concern, add more logic to if (line.size() != 0)
Here's an option adding stream operators and a helper function to skip empty lines.
#include <iostream>
#include <limits>
#include <sstream>
#include <string>
struct Info {
std::string animalchicken;
std::string animaldog;
std::string car;
int number;
};
// a helper function to do getline but skip empty lines
std::istream& getline_with_content(std::istream& is, std::string& s) {
while(std::getline(is, s)) if(not s.empty()) break;
return is;
}
// an istream operator to read one Info
std::istream& operator>>(std::istream& is, Info& i) {
getline_with_content(
getline_with_content(
getline_with_content(is,
i.animalchicken),
i.animaldog),
i.car);
is >> i.number;
// ignore everything after the number until a newline appears:
is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
return is;
}
// an ostream operator to print one Info
std::ostream& operator<<(std::ostream& os, const Info& i) {
return os << i.animalchicken << '\n'
<< i.animaldog << '\n'
<< i.car << '\n'
<< i.number << '\n';
}
int main() {
// an example istream with a lot of blank lines:
std::istringstream file(
"chicken\n\n"
"dog\n\n"
"car\n\n\n"
"765\n");
Info i;
file >> i; // read one Info from the stream
std::cout << i; // print one Info
}
Demo
So I need some ideas on how nicely parse a text file in C++. The files that I am parsing have the following format :
Command_A list of arguments
Command_B list of arguments
etc etc
Right now I am using an ifstream to open up the file and then I have this super long series of if-else statements to determine what to do for each type of command. This is proving to be a bit unwieldy (especially since some of the commands are for parsing other files...so I have nested if-elses with multiple ifstreams for the different files).
I was looking for another way of doing this but am not really sure what is the best approach. I was thinking about using a std::map where the keys are the command strings and the values are the function pointers but I am not familiar with storing function pointers in a map (especially if the different functions are of different return types, etc).
Below is basically what I am currently doing. I loop through the file and use "getline" to get the current line. Then I use a stringstream to parse the command. Then I use a very long list of if-elses to determine which function to call. Each line in the file also comes with a list of arguments so I use the stringstream to parse those and then pass those parameters into the function I call.
The problem here is two-fold
1) I have a very very large number of if-elses (around 50)
2) Some of the commands require me to parse new files and thus I have to open up another ifstream within the current ifstream. (see command_c)
So I'm looking for an easier/more efficient/prettier looking way to do this.
/*Open file and verify validity*/
std::ifstream parseFile(filename.c_str());
if(!parseFile.good())
{
cerr<<"ERROR: File is either corrupt or does not exist."<<endl;
exit(1); //Terminate program
}
//Loop over file line by line
std::string line;
while(!parseFile.eof())
{
std::getline(parseFile, line);
std::stringstream ss;
std::string command;
ss.str(line);
ss >> command;
if(command == "COMMAND_A")
{
float x,y,z;
ss >> x >> y >> z;
FunctionA(x,y,z);
}
else if(command == "COMMAND_B")
{
float a,b,c,d,e,f;
ss >> a >> b >> c >> d >> e >> f;
FunctionB(a,b,c,d,e,f);
}
else if(command == "Command_C")
{
string nextFile;
ss >> nextFile;
ParseFile(nextFile); //This is not recursive...this is another function
}
else if(...)
{
...
}
// etc, etc (this continues on for a long time)
}
parseFile.close();
You could have a command map and register a bunch of functions:
#include<fstream>
#include<functional>
#include<iostream>
#include<map>
#include<sstream>
int main() {
typedef std::function<bool (std::istringstream&)> command_function;
typedef std::map<std::string, command_function> command_map;
command_map map;
// register commands
map.insert(command_map::value_type("print", [](std::istringstream& s) {
std::string line;
if( ! getline(s, line)) return false;
std::cout << line << '\n';
return true;
}));
map.insert(command_map::value_type("add", [](std::istringstream& s) {
double a;
double b;
if( ! (s >> a >> b)) return false;
std::cout << "a + b = " << a + b << '\n';
return true;
}));
// sample data
std::istringstream file(
"print Hello World\n"
"add 1 2\n");
// command parsing
std::string line;
while(getline(file, line)) {
std::istringstream line_stream(line);
std::string command;
if(line_stream >> command >> std::ws) {
auto pos = map.find(command);
if(pos != map.end())
pos->second(line_stream);
}
}
return 0;
}
I've written many types of parsers, and I find that it's often a good idea to write a fairly generic function that takes a line and produces a list of strings (e.g. std::vector<std::string>, and then process the first element in that list as a "what do we do next", and let each command use the arguments as it likes (e.g. translate to float, use as a filename, etc).
This can then be combined with a table-based system, where a function [or object] is associated with the string. For example std::map<std::string, BaseCommand> table;.
Then you end up with something like this:
class CommandA : BaseCommand
{
public:
virtual int Run(const std::vector<std::string>& argv);
};
table["CommandA"] = new CommandA;
table["CommandB"] = new CommandB;
...
std::vector<std::string> argv = parseLine(line);
if (table.find(argv[0]) != table.end())
{
int result = table[argv[0]].second->Run(argv);
if (result < 0)
{
... do error handling here...
}
}
Of course, there are many different ways you COULD do this, and this is just one possible solution.
Yes, put the functions in a map. The key to doing this is std::function<void()>. Unfortunately, the void() means it holds functions that take no parameters, and return nothing. Obviously, your functions have parameters. So what we do, is store functions that each take a std::stringstream& (the line), parse out the parameters they need, and then call the function. The easiest way to do this is simply to use inline lambdas. Lambdas that take stringstreams and return nothing look like this: [](std::stringstream& ss) {code}.
Additionally, I use this function for easy retrieving of your parameters:
template<class T>
T get(std::stringstream& ss)
{
T t;
ss<<t;
if (!ss) // if it failed to read
throw std::runtime_error("could not parse parameter");
return t;
}
Here's the map:
std::unordered_set<std::string, std::function<void(std::stringstream&))> cmd_map=
"COMMAND_A", [](std::stringstream& ss)
{FunctionA(get<float>(ss), get<float>(ss), get<float>(ss));},
"COMMAND_B", [](std::stringstream& ss)
{FunctionB(get<float>(ss), get<float>(ss), get<float>(ss), get<float>(ss), get<float>(ss), get<float>(ss));},
"COMMAND_C", [](std::stringstream& ss)
{FunctionA(get<string>(ss));},
And here's the parser:
//Loop over file line by line
std::string line;
while(std::getline(parseFile, line)) //use this instead of eof
{
std::stringstream ss(line);
std::string command;
ss >> command;
auto it = cmd_map.find(command);
if (it != cmd_map.end())
{
try
{
(*it)(); //call the function
} catch(std::runtime_error& err) {
std::cout << "ERROR: " << err.what() << '\n';
}
} else {
std::cout << "command " << command << " not found";
}
}
parseFile.close();
I have an input stream containing integers and special meaning characters '#'. It looks as follows:
... 12 18 16 # 22 24 26 15 # 17 # 32 35 33 ...
The tokens are separated by space. There's no pattern for the position of '#'.
I was trying to tokenize the input stream like this:
int value;
std::ifstream input("data");
if (input.good()) {
string line;
while(getline(data, line) != EOF) {
if (!line.empty()) {
sstream ss(line);
while (ss >> value) {
//process value ...
}
}
}
}
The problem with this code is that the processing stops when the first '#' is encountered.
The only solution I can think of is to extract each individual token into a string (not '#') and use atoi() function to convert the string to an integer. However, it's very inefficient as the majority tokens are integer. Calling atoi() on the tokens introduces big overhead.
Is there a way I can parse the individual token by its type? ie, for integers, parse it as integers while for '#', skip it. Thanks!
One possibility would be to explicitly skip whitespace (ss >> std::ws), and then to use ss.peek() to find out if a # follows. If yes, use ss.get() to read it and continue, otherwise use ss >> value to read the value.
If the positions of # don't matter, you could also remove all '#' from the line before initializing the stringstream with it.
Usually not worth testing against good()
if (input.good()) {
Unless your next operation is generating an error message or exception. If it is not good all further operations will fail anyway.
Don't test against EOF.
while(getline(data, line) != EOF) {
The result of std::getline() is not an integer. It is a reference to the input stream. The input stream is convertible to a bool like object that can be used in bool a context (like while if etc..). So what you want to do:
while(getline(data, line)) {
I am not sure I would read a line. You could just read a word (since the input is space separated). Using the >> operator on string
std::string word;
while(data >> word) { // reads one space separated word
Now you can test the word to see if it is your special character:
if (word[0] == "#")
If not convert the word into a number.
This is what I would do:
// define a class that will read either value from a stream
class MyValue
{
public:
bool isSpec() const {return isSpecial;}
int value() const {return intValue;}
friend std::istream& operator>>(std::istream& stream, MyValue& data)
{
std::string item;
stream >> item;
if (item[0] == '#') {
data.isSpecial = true;
} else
{ data.isSpecial = false;
data.intValue = atoi(&item[0]);
}
return stream;
}
private:
bool isSpecial;
int intValue;
};
// Now your loop becomes:
MyValue val;
while(file >> val)
{
if (val.isSpec()) { /* Special processing */ }
else { /* We have an integer */ }
}
Maybe you can read all values as std::string and then check if it's "#" or not (and if not - convert to int)
int value;
std::ifstream input("data");
if (input.good()) {
string line;
std::sstream ss(std::stringstream::in | std::stringstream::out);
std::sstream ss2(std::stringstream::in | std::stringstream::out);
while(getline(data, line, '#') {
ss << line;
while(getline(ss, line, ' ') {
ss2 << line;
ss2 >> value
//process values ...
ss2.str("");
}
ss.str("");
}
}
In here we first split the line by the token '#' in the first while loop then in the second while loop we split the line by ' '.
Personally, if your separator is always going to be space regardless of what follows, I'd recommend you just take the input as string and parse from there. That way, you can take the string, see if it's a number or a # and whatnot.
I think you should re-examine your premise that "Calling atoi() on the tokens introduces big overhead-"
There is no magic to std::cin >> val. Under the hood, it ends up calling (something very similar to) atoi.
If your tokens are huge, there might be some overhead to creating a std::string but as you say, the vast majority are numbers (and the rest are #'s) so they should mostly be short.
Here is a sample program that uses stringstream. The goal is to accept lines from the user(standard input) and print each word in a separate line.
int main()
{
std::istringstream currentline;
std::string eachword;
std::string line;
// Accept line from the standard input till EOF is reached
while ( std::getline(std::cin,line) )
{
currentline.str(line); // Convert the input to stringstream
while ( currentline >> eachword ) // Convert from the entire line to individual word
{
std::cout << eachword << std::endl;
}
currentline.clear();
}
return 0;
}
I'm wondering, is there a way , I can avoid the intermediate string variable(object), line and directly store the user input to the currentline (istringstream object).
Note:
I know, the following solution already.
while ( std::cin >> eachword)
{
std::cout << eachword << std::endl;
}
std::getline needs a string reference argument, and that's where it places the line it has obtained, so of course you can't avoid passing such an argument (and still use that function). You could elegantly encapsulate the construct, if you need it often -- e.g.:
bool getline(std::istream& i, std::istringstream& current)
{
std::string line;
if ( std::getline(i, line) ) {
current.str(line);
return true;
}
return false;
}
If you want to simplify the first solution,
while ( currentline(line) >> eachword )
I assume you want to not use an intermediate object to prevent unnecessary copying?
You can achieve the same affect by explicitly setting the stream buffers buffer area.
int main()
{
std::string line;
std::istringstream currentline;
std::string eachword;
// Accept line from the standard input till EOF is reached
while ( std::getline(std::cin,line) )
{
// Set the buffer without copying.
currentline.clear();
currentline.rdbuf()->pubsetbuf(&line[0], line.length() );
while ( currentline >> eachword )
{
std::cout << eachword << std::endl;
}
}
return 0;
}
Because of the order of destruction. You just need to make sure the istringstream is destroyed before the object you are using as a buffer. So you need to re-arrange the declarations at the top of main() to make sure that line is created first and thus will be destroyed last (otherwise the destructor of the istringstream has the potential for accessing the memory of a free'ed object.