std::getline removes whitespaces? - c++

So I am creating a command line application and I am trying to allow commands with parameters, or if the parameter is enclosed with quotations, it will be treated as 1 parameter.
Example: test "1 2"
"test" will be the command, "1 2" will be a single parameter passed.
Using the following code snippet:
while(getline(t, param, ' ')) {
if (param.find("\"") != string::npos) {
ss += param;
if (glue) {
glue = false;
params.push_back(ss);
ss = "";
}
else {
glue = true;
}
}
else {
params.push_back(param);
}
}
However std::getline seems to auto remove whitespace which is causing my parameters to change from "1 2" to "12"
I've looked around but results are flooded with "How to remove whitespace" answers rather than "How to not remove whitespace"
Anybody have any suggestions?

However std::getline seems to auto remove whitespace
That's exactly what you are telling getline to do:
getline(t, param, ' ');
The third argument in getline is the delimiter. If you want to parse the input line, you should read it until '\n' is found and then process it:
while(getline(t, param)) {
/* .. */
}

Umm, you are telling it to use ' ' as a delimiter in std::getline. Of course it's going to strip the whitespace.
http://www.cplusplus.com/reference/string/getline/

Related

Parsing tokens issue during interpreter development

I'm building a code interpreter in C++ and while I have the whole token logic working, I ran into an unexpected issue.
The user inputs a string into the console, the program parses said string into different objects type Token, the problem is that the way I do this is the following:
void splitLine(string aLine) {
stringstream ss(aLine);
string stringToken, outp;
char delim = ' ';
// Break input string aLine into tokens and store them in rTokenBag
while (getline(ss, stringToken, delim)) {
// assing value of stringToken parsed to t, this labes invalid tokens
Token t (readToken(stringToken));
R_Tokens.push_back(t);
}
}
The issue here is that if the parse receives a string, say Hello World! it will split this into 2 tokens Hello and World!
The main goal is for the code to recognize double quotes as the start of a string Token and store it whole (from " to ") as a single Token.
So if I type in x = "hello world" it will store x as a token, then next run = as a token, and then hello world as a token and not split it
You can use C++14 quoted manipulator.
#include <string>
#include <sstream>
#include <iomanip>
#include <iostream>
void splitLine(std::string aLine) {
std::istringstream iss(aLine);
std::string stringToken;
// Break input string aLine into tokens and store them in rTokenBag
while(iss >> std::quoted(stringToken)) {
std::cout << stringToken << "\n";
}
}
int main() {
splitLine("Heloo world \"single token\" new tokens");
}
You really don't want to tokenize a programming language by splitting at a delimiter.
A proper tokenizer will switch on the first character to decide what kind of token to read and then keep reading as long as it finds characters that fit that token type and then emit that token when it finds the first non-matching character (which will then be used as the starting point for the next token).
That could look something like this (let's say it is an istreambuf_iterator or some other iterator that iterates over the input character-by-character):
Token Tokenizer::next_token() {
if (isalpha(*it)) {
return read_identifier();
} else if(isdigit(*it)) {
return read_number();
} else if(*it == '"') {
return read_string();
} /* ... */
}
Token Tokenizer::read_string() {
// This should only be called when the current character is a "
assert(*it == '"');
it++;
string contents;
while(*it != '"') {
contents.push_back(*it);
it++;
}
return Token(TokenKind::StringToken, contents);
}
What this doesn't handle are escape sequences or the case where we reach the end of file without seeing a second ", but it should give you the basic idea.
Something like std::quoted might solve your immediate problem with string literals, but it won't help you if you want x="hello world" to be tokenized the same way as x = "hello world" (which you almost certainly do).
PS: You can also read the whole source into memory first and then let your tokens contain indices or pointers into the source rather than strings (so instead of the contents variable, you'd just save the start index before the loop and then return Token(TokenKind::StringToken, start_index, current_index)). Which one's better depends partly on what you do in the parser. If your parser directly produces results and you don't need to keep the tokens around after processing them, the first one is more memory-efficient because you never need to hold the whole source in memory. If you create an AST, the memory consumption will be about the same either way, but the second version will allow you to have one big string instead of many small ones.
So I finally figured it out, and I CAN use getline() to achieve my goals.
This new code runs and parses the way I need it to be:
void splitLine(string aLine) {
stringstream ss(aLine);
string stringToken, outp;
char delim = ' ';
while (getline(ss, stringToken, delim)) { // Break line into tokens and store them in rTokenBag
//new code starts here
// if the current parse sub string starts with double quotes
if (stringToken[0] == '"' ) {
string torzen;
// parse me the rest of ss until you find another double quotes
getline(ss, torzen, '"' );
// Give back the space cut form the initial getline(), add the parsed sub string from the second getline(), and add a double quote at the end that was cut by the second getline()
stringToken += ' ' + torzen + '"';
}
// And we can all continue with our lives
Token t (readToken(stringToken)); // assing value of stringToken parsed to t, this labes invalid tokens
R_Tokens.push_back(t);
}
}
Thanks to everyone who answered and commented, you were all of great help!

std::getline not producing correct output (c++)

I'm doing a homework assignment in c++ and I could use a little help. I am not understanding why the following code isn't working as I want it. The object of the function I'm creating is to load a file and parse it into keys and values for a map while skipping blank lines and lines where the first character is a hastag. The file I'm reading from is below.
The problem is that my nextToken variable is not being delimited by the '=' character. I mean, when I cout nextToken, it doesn't equal the string before the '=' character. For example, the first two lines of the data file are
# Sample configuration/initialization file
DetailedLog=1
I thought that the code I have should skip all the lines that begin with a hashtag (but it's only skipping the first line) and that nextToken would equal just DetailedLog (as opposed to DetailedLog=1 or just equal to 1).
In my output, some lines with a hashtag are skipped while some are not and I can't understand where cout is printing from since the cout statement I have should print "nextToken: " and then nextToken, but it's printing nextToken then "nextToken: " then what comes after the '=' character from the data file.
Here's my code:
bool loadFile (string filename){
ifstream forIceCream(filename);
string nextToken;
if (forIceCream.is_open()){
while (getline(forIceCream, nextToken, '=')) {
if (nextToken.empty() || nextToken[0] == '#') {
continue;
}
cout << "nextToken: " << nextToken << endl;
}
}
}
Data file reading from:
# Sample configuration/initialization file
DetailedLog=1
RunStatus=1
StatusPort=6090
StatusRefresh=10
Archive=1
LogFile=/tmp/logfile.txt
Version=0.1
ServerName=Unknown
FileServer=0
# IP addresses
PrimaryIP=192.168.0.13
SecondaryIP=192.168.0.10
# Random comment
If the first two lines of your input file are:
# Sample configuration/initialization file
DetailedLog=1
Then, the call
getline(forIceCream, nextToken, '=')
will read everything up to the first = to nextToken. At the end of the line, the value of nextToken will be:
# Sample configuration/initialization file
DetailedLog
See the documentation of std::getline and pay attention to the first overload.
You need to change your strategy for processing the contents of the file a little bit.
Read the contents of the file line by line.
Process each line as you see fit.
Here's an updated version of your function.
bool loadFile (string filename)
{
ifstream forIceCream(filename);
if (forIceCream.is_open())
{
// Read the file line by line.
string line;
while ( getline(forIceCream, line) )
{
// Discard empty lines and lines starting with #.
if (line.empty() || line[0] == '#')
{
continue;
}
// Now process the line using a istringstream.
std::istringstream str(line);
string nextToken;
if ( getline(str, nextToken, '=') )
{
cout << "nextToken: " << nextToken << endl;
}
}
}
}

Can I use 2 or more delimiters in C++ function getline? [duplicate]

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 4 years ago.
I would like to know how can I use 2 or more delimiters in the getline functon, that's my problem:
The program reads a text file... each line is goning to be like:
New Your, Paris, 100
CityA, CityB, 200
I am using getline(file, line), but I got the whole line, when I want to to get CityA, then CityB and then the number; and if I use ',' delimiter, I won't know when is the next line, so I'm trying to figure out some solution..
Though, how could I use comma and \n as a delimiter?
By the way,I'm manipulating string type,not char, so strtok is not possible :/
some scratch:
string line;
ifstream file("text.txt");
if(file.is_open())
while(!file.eof()){
getline(file, line);
// here I need to get each string before comma and \n
}
You can read a line using std::getline, then pass the line to a std::stringstream and read the comma separated values off it
string line;
ifstream file("text.txt");
if(file.is_open()){
while(getline(file, line)){ // get a whole line
std::stringstream ss(line);
while(getline(ss, line, ',')){
// You now have separate entites here
}
}
No, std::getline() only accepts a single character, to override the default delimiter. std::getline() does not have an option for multiple alternate delimiters.
The correct way to parse this kind of input is to use the default std::getline() to read the entire line into a std::string, then construct a std::istringstream, and then parse it further, into comma-separate values.
However, if you are truly parsing comma-separated values, you should be using a proper CSV parser.
Often, it is more intuitive and efficient to parse character input in a hierarchical, tree-like manner, where you start by splitting the string into its major blocks, then go on to process each of the blocks, splitting them up into smaller parts, and so on.
An alternative to this is to tokenize like strtok does -- from the beginning of input, handling one token at a time until the end of input is encountered. This may be preferred when parsing simple inputs, because its is straightforward to implement. This style can also be used when parsing inputs with nested structure, but this requires maintaining some kind of context information, which might grow too complex to maintain inside a single function or limited region of code.
Someone relying on the C++ std library usually ends up using a std::stringstream, along with std::getline to tokenize string input. But, this only gives you one delimiter. They would never consider using strtok, because it is a non-reentrant piece of junk from the C runtime library. So, they end up using streams, and with only one delimiter, one is obligated to use a hierarchical parsing style.
But zneak brought up std::string::find_first_of, which takes a set of characters and returns the position nearest to the beginning of the string containing a character from the set. And there are other member functions: find_last_of, find_first_not_of, and more, which seem to exist for the sole purpose of parsing strings. But std::string stops short of providing useful tokenizing functions.
Another option is the <regex> library, which can do anything you want, but it is new and you will need to get used to its syntax.
But, with very little effort, you can leverage existing functions in std::string to perform tokenizing tasks, and without resorting to streams. Here is a simple example. get_to() is the tokenizing function and tokenize demonstrates how it is used.
The code in this example will be slower than strtok, because it constantly erases characters from the beginning of the string being parsed, and also copies and returns substrings. This makes the code easy to understand, but it does not mean more efficient tokenizing is impossible. It wouldn't even be that much more complicated than this -- you would just keep track of your current position, use this as the start argument in std::string member functions, and never alter the source string. And even better techniques exist, no doubt.
To understand the example's code, start at the bottom, where main() is and where you can see how the functions are used. The top of this code is dominated by basic utility functions and dumb comments.
#include <iostream>
#include <string>
#include <utility>
namespace string_parsing {
// in-place trim whitespace off ends of a std::string
inline void trim(std::string &str) {
auto space_is_it = [] (char c) {
// A few asks:
// * Suppress criticism WRT localization concerns
// * Avoid jumping to conclusions! And seeing monsters everywhere!
// Things like...ah! Believing "thoughts" that assumptions were made
// regarding character encoding.
// * If an obvious, portable alternative exists within the C++ Standard Library,
// you will see it in 2.0, so no new defect tickets, please.
// * Go ahead and ignore the rumor that using lambdas just to get
// local function definitions is "cheap" or "dumb" or "ignorant."
// That's the latest round of FUD from...*mumble*.
return c > '\0' && c <= ' ';
};
for(auto rit = str.rbegin(); rit != str.rend(); ++rit) {
if(!space_is_it(*rit)) {
if(rit != str.rbegin()) {
str.erase(&*rit - &*str.begin() + 1);
}
for(auto fit=str.begin(); fit != str.end(); ++fit) {
if(!space_is_it(*fit)) {
if(fit != str.begin()) {
str.erase(str.begin(), fit);
}
return;
} } } }
str.clear();
}
// get_to(string, <delimiter set> [, delimiter])
// The input+output argument "string" is searched for the first occurance of one
// from a set of delimiters. All characters to the left of, and the delimiter itself
// are deleted in-place, and the substring which was to the left of the delimiter is
// returned, with whitespace trimmed.
// <delimiter set> is forwarded to std::string::find_first_of, so its type may match
// whatever this function's overloads accept, but this is usually expressed
// as a string literal: ", \n" matches commas, spaces and linefeeds.
// The optional output argument "found_delimiter" receives the delimiter character just found.
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters, char& found_delimiter) {
const auto pos = str.find_first_of(std::forward<D>(delimiters));
if(pos == std::string::npos) {
// When none of the delimiters are present,
// clear the string and return its last value.
// This effectively makes the end of a string an
// implied delimiter.
// This behavior is convenient for parsers which
// consume chunks of a string, looping until
// the string is empty.
// Without this feature, it would be possible to
// continue looping forever, when an iteration
// leaves the string unchanged, usually caused by
// a syntax error in the source string.
// So the implied end-of-string delimiter takes
// away the caller's burden of anticipating and
// handling the range of possible errors.
found_delimiter = '\0';
std::string result;
std::swap(result, str);
trim(result);
return result;
}
found_delimiter = str[pos];
auto left = str.substr(0, pos);
trim(left);
str.erase(0, pos + 1);
return left;
}
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters) {
char discarded_delimiter;
return get_to(str, std::forward<D>(delimiters), discarded_delimiter);
}
inline std::string pad_right(const std::string& str,
std::string::size_type min_length,
char pad_char=' ')
{
if(str.length() >= min_length ) return str;
return str + std::string(min_length - str.length(), pad_char);
}
inline void tokenize(std::string source) {
std::cout << source << "\n\n";
bool quote_opened = false;
while(!source.empty()) {
// If we just encountered an open-quote, only include the quote character
// in the delimiter set, so that a quoted token may contain any of the
// other delimiters.
const char* delimiter_set = quote_opened ? "'" : ",'{}";
char delimiter;
auto token = get_to(source, delimiter_set, delimiter);
quote_opened = delimiter == '\'' && !quote_opened;
std::cout << " " << pad_right('[' + token + ']', 16)
<< " " << delimiter << '\n';
}
std::cout << '\n';
}
}
int main() {
string_parsing::tokenize("{1.5, null, 88, 'hi, {there}!'}");
}
This outputs:
{1.5, null, 88, 'hi, {there}!'}
[] {
[1.5] ,
[null] ,
[88] ,
[] '
[hi, {there}!] '
[] }
I don't think that's how you should attack the problem (even if you could do it); instead:
Use what you have to read in each line
Then split up that line by the commas to get the pieces that you want.
If strtok will do the job for #2, you can always convert your string into a char array.

Parsing a csv with comma in field

I'm trying to create an object using a csv with the below data
Alonso,Fernando,21,31,29,2,Racing
Dhoni,Mahendra Singh,22,30,4,26,Cricket
Wade,Dwyane,23,29.9,18.9,11,Basketball
Anthony,Carmelo,24,29.4,21.4,8,Basketball
Klitschko,Wladimir,25,28,24,4,Boxing
Manning,Peyton,26,27.1,15.1,12,Football
Stoudemire,Amar'e,27,26.7,21.7,5,Basketball
"Earnhardt, Jr.",Dale,28,25.9,14.9,11,Racing
Howard,Dwight,29,25.5,20.5,5,Basketball
Lee,Cliff,30,25.3,25.1,0.2,Baseball
Mauer,Joe,31,24.8,23,1.8,Baseball
Cabrera,Miguel,32,24.6,22.6,2,Baseball
Greinke,Zack,33,24.5,24.4,50,Baseball
Sharapova,Maria,34,24.4,2.4,22,Tennis
Jeter,Derek,35,24.3,15.3,9,Baseball
I'm using the following code to parse it:
void AthleteDatabase::createDatabase(void)
{
ifstream inFile(INPUT_FILE.c_str());
string inputString;
if(!inFile)
{
cout << "Error opening file for input: " << INPUT_FILE << endl;
}
else
{
getline(inFile, inputString);
while(inFile)
{
istringstream s(inputString);
string field;
string athleteArray[7];
int counter = 0;
while(getline(s, field, ','))
{
athleteArray[counter] = field;
counter++;
}
string lastName = athleteArray[0];
string firstName = athleteArray[1];
int rank = atoi(athleteArray[2].c_str());
float totalEarnings = strtof(athleteArray[3].c_str(), NULL);
float salary = strtof(athleteArray[4].c_str(), NULL);
float endorsements = strtof(athleteArray[5].c_str(), NULL);
string sport = athleteArray[6];
Athlete anAthlete(lastName, firstName, rank,
totalEarnings, salary, endorsements, sport);
athleteDatabaseBST.add(anAthlete);
display(anAthlete);
getline(inFile, inputString);
}
inFile.close();
}
}
My code breaks on the line:
"Earnhardt, Jr.",Dale,28,25.9,14.9,11,Racing
obviously because of the quotes. Is there a better way to handle this? I'm still extremely new to C++ so any assistance would be greatly appreciated.
I'd recommend just using a proper CSV parser. You can find some in the answers to this earlier question, or just search for one on Google.
If you insist on rolling your own, it's probably easiest to just get down to the basics and design it as a finite state machine that processes the input one character at a time. With a one-character look-ahead, you basically need two states: "reading normal input" and "reading a quoted string". If you don't want to use look-ahead, you can do this with a couple more states, e.g. like this:
initial state: If next character is a quote, switch to state quoted field; else behave as if in state unquoted field.
unquoted field: If next character is EOF, end parsing; else, if it is a newline, start a new row and switch to initial state; else, if it is a separator (comma), start a new field in the same row and switch to initial state; else append the character to the current field and remain in state unquoted field. (Optionally, if the character is a quote, signal a parse error.)
quoted field: If next character is EOF, signal parse error; else, if it is a quote, switch to state end quote; else append the character to the current field and remain in state quoted field.
end quote: If next character is a quote, append it to the current field and return to state quoted field; else, if it is a comma or a newline or EOF, behave as if in state unquoted field; else signal parse error.
(This is for "traditional" CSV, as described e.g. in RFC 4180, where quotes in quoted fields are escaped by doubling them. Adding support for backslash-escapes, which are used in some fairly common variants of the CSV format, is left as an exercise. It requires one or two more states, depending on whether you want to to support backslashes in quoted or unquoted strings or both, and whether you want to support both traditional and backslash escapes at the same time.)
In a high-level scripting language, such character-by-character iteration would be really inefficient, but since you're writing C++, all it needs to be blazing fast is some half-decent I/O buffering and a reasonably efficient string append operation.
You have to parse each line character by character, using a bool flag, and a std::string that accumulates the contents of the next field; instead of just plowing ahead to the next comma, as you did.
Initially, the bool flag is false, and you iterate over the entire line, character by character. The quote character flips the bool flag. The comma character, only when the bool flag is false takes the accumulated contents of the std::string and saves it as the next field on the line, and clears the std::string to empty, ready for the next field. Otherwise, the character gets appended to the buffer.
This is a basic outline of the algorithm, with some minor details that you should be able to flesh out by yourself. There are a couple of other ways to do this, that are slightly more efficient, but for a beginner like yourself this kind of an approach would be the easiest to implement.
Simple answer: use a different delimiter. Everything's a lot easier to parse if you use something like '|' instead:
Stoudemire,Amar'e|27|26.7|21.7|5|Basketball
Earnhardt, Jr.|Dale|28|25.9|14.9|11|Racing
The advantage there being any other app that might need to parse your file can also do it just as cleanly.
If sticking with commas is a requirement, then you'd have to conditionally grab a field based on its first char:
std::istream& nextField(std::istringstream& s, std::string& field)
{
char c;
if (s >> c) {
if (c == '"') {
// using " as the delimeter
getline(s, field, '"');
return s >> c; // for the subsequent comma
// could potentially assert for error-checking
}
else if (c == ',') {
// handle empty field case
field = "";
}
else {
// normal case, but prepend c
getline(s, field, ',');
field = c + field;
}
}
return s;
}
Used as a substitute for where you have getline:
while (nextField(s, field)) {
athleteVec.push_back(field); // prefer vector to array
}
Could even simplify that logic a bit by just continuing to use getline if we have an unterminated quoted string:
std::istream& nextField(std::istringstream& s, std::string& field)
{
if (std::getline(s, field, ',')) {
while (s && field[0] == '"' && field[field.size() - 1] != '"') {
std::string next;
std::getline(s, next, ',');
field += ',' + next;
}
if (field[0] == '"' && field[field.size() - 1] == '"') {
field = field.substr(1, field.size() - 2);
}
}
return s;
}
I agree with Imari's answer, why re-invent the wheel? That being said, have you considered regex? I believe this answer can be used to accomplish what you want and then some.

Read File line by line using C++

I am trying to read a file line by line using the code below :
void main()
{
cout << "b";
getGrades("C:\Users\TOUCHMATE\Documents\VS projects\GradeSystem\input.txt");
}
void getGrades(string file){
string buf;
string line;
ifstream in(file);
if (in.fail())
{
cout << "Input file error !!!\n";
return;
}
while(getline(in, line))
{
cout << "read : " << buf << "\n";
}
}
For some reason it keeps returning "input file error!!!". I have tried to full path and relative path (by just using the name of the file as its located in the same folder as the project). what am I doing wrong?
You did not escape the string. Try to change with:
getGrades("C:\\Users\\TOUCHMATE\\Documents\\VS projects\\GradeSystem\\input.txt");
otherwise all the \something are misinterpreted.
As Felice said the '\' is an escape. Thus you need two.
Or you can use the '/' character.
As windows has accepted this as a directory separator for a decade or more now.
getGrades("C:/Users/TOUCHMATE/Documents/VS projects/GradeSystem/input.txt");
This has the advantage that it looks much neater.
first, if you wanna say '\' in a string, you should put '\\', that's the path issue.
then, the string buf is not in connect to your file..
The backslash in C strings is used for escape sequences (e.g. \n is newline, \r carriage return, \t is a tabulation, ...), thus your string is getting garbled because for each backslash+character sequence the compiler is replacing the corresponding escape sequence. To enter backslashes in a C string you have to escape them, using \\:
getGrades("C:\\Users\\TOUCHMATE\\Documents\\VS projects\\GradeSystem\\input.txt");
By the way, it's int main, not void main, and you should return an exit code (usually 0 if everything went fine).