C++ reading CSVs - c++

Im having a bit of trouble reading CSVs. I have multiple types of data, so i am not sure how to get this to work:
string, string, bool, bool, int
I cant simply use >> to read in the data since the deliminator is not whitespace. scanf doesnt work, since it needs a human input, not file input, getline only reads in strings and also includes the \n char for some reason.
how can i read my csv properly?

You CAN use getline. There's an overload where the third argument passed can be a char for the delimiter. Just throw it all in a loop

Another option (which isn't typically recommended for C++, though), is fscanf. You're right that scanf is no good for you, but fscanf is its file-based equivalent.
Another canonical solution typically employed in C, but which isn't so strongly recommended in C++, is to go ahead and use getline, and then use strtok or a simple parser to parse each line.

Related

The best method for filling char array (gets vs cin.getline)

I'm using C ++ 11. I'm wondering if there are any advantages to using cin.getline () compared to gets ().
I need to fill a char array.
Also, should I use fgets or getline for files?
I'm wondering if there are any advantages to using cin.getline () compared to gets ().
I am assuming you really mean gets, not fgets.
Yes, there definitely is. gets is known to be a security problem. cin.getline() does not suffer from that problem.
It's worth comparing fgets and cin.getline.
The only difference that I see is that fgets will include the newline character in the output while cin.getline won't.
Most of the time, the newline character is ignored by application code. Hence, it is better to use cin.getline() or istream::getline() in general. If presence of the newline character in the output is important to you for some reason, you should use fgets.
Another reason to prefer istream::getline is that you can specify a character for the delimiter. If you need to parse a comma separated values (CSV) file, you can use:
std::ifstream fstr("some file name.csv");
fstr.getline(data, data_size, ',');
Of course.
First of all gets doesn't check of length of the input - so if the input if longer than char array, you are getting an overflow.
On the other hand cin.getline allows to specify the size of stream.
Anyway, the consensus among C++ programmers is that you should avoid raw arrays anyway.

scanf on an istream object

NOTE: I've seen the post What is the cin analougus of scanf formatted input? before asking the question and the post doesn't solve my problem here. The post seeks for C++-way to do it, but as I mentioned already, it is inconvenient to just use C++-way to do it sometimes and I have clear examples for that.
I am trying to read data from an istream object, and sometimes it is inconvenient to just use C++-style ways such as operator>>, e.g. the data are in special form 123:456 so you have to imbue to make ':' as space (which is very hacky, as opposed to %d:%d in scanf), or 00123 where you want to read as string and convert decimal instead of octal (as opposed to %d in scanf), and possibly many other cases.
The reason I chose istream as interface is because it can be derived and therefore more flexible. For example, we can create in-memory streams, or some customized streams that generated on the fly, etc. C-style FILE*, on the other hand, is very limited, at least in a standard-compliant way, on creating customized streams.
So my questions is, is there a way to do scanf-like data extraction on istream object? I think fscanf internally read character by character from FILE* using fgetc, while istream also provides such interface. So it is possible by just copying and pasting the code of fscanf and replace the FILE* with the istream object, but that's very hacky. Is there a smarter and cleaner way, or is there some existing work on this?
Thanks.
You should never, under any circumstances, use scanf or its relatives for anything, for three reasons:
Many format strings, including for instance all the simple uses of %s, are just as dangerous as gets.
It is almost impossible to recover from malformed input, because scanf does not tell you how far in characters into the input it got when it hit something unexpected.
Numeric overflow triggers undefined behavior: yes, that means scanf is allowed to crash the entire program if a numeric field in the input has too many digits.
Prior to C++11, the C++ specification defined istream formatted input of numbers in terms of scanf, which means that last objection is very likely to apply to them as well! (In C++11 the specification is changed to use strto* instead and to do something predictable if that detects overflow.)
What you should do instead is: read entire lines of input into std::string objects with getline, hand-code logic to split them up into fields (I don't remember off the top of my head what the C++-string equivalent of strsep is, but I'm sure it exists) and then convert numeric strings to machine numbers with the strtol/strtod family of functions.
I cannot emphasize this enough: THE ONLY 100% RELIABLE WAY TO CONVERT STRINGS TO NUMBERS IN C OR C++, unless you are lucky enough to have a C++ runtime that is already C++11-conformant in this regard, IS WITH THE strto* FUNCTIONS, and you must use them correctly:
errno = 0;
result = strtoX(s, &ends, 10); // omit 10 for floats
if (s == ends || *ends || errno)
parse_error();
(The OpenBSD manpages, linked above, explain why you have to do this fairly convoluted thing.)
(If you're clever, you can use ends and some manual logic to skip that colon, instead of strsep.)
I do not recommend you to mix C++ input output and C input output. No that they are really incompatible but they could just plain interoperate wrong.
For example Oracle docs recommend not to mix it http://www.oracle.com/technetwork/articles/servers-storage-dev/mixingcandcpluspluscode-305840.html
But no one stops you from reading data into the buffer and parsing it with standard c functions like sscanf.
...
string curString;
int a, b;
...
std::getline(inputStream, curString);
int sscanfResult == sscanf(curString.cstr(), "%d:%d", &a, &b);
if (2 != sscanfResult)
throw "error";
...
But it won't help in some situations when your stream is just one long contiguous sequence of symbols(like some string turned into memory stream).
Making your own fscanf from scratch or porting(?) the original CRT function actually isn't the worst possible idea. Just make sure you have tested it thoroughly(low level custom char manipulation was always a source of pain in C).
I've never really tried the boost\spirit and such parsing infrastructure could really be an overkill for your project. But boost libraries are usually well tested and designed. You could at least try to use it.
Based on #tmyklebu's comment, I implemented streamScanf which wraps istream as FILE* via fopencookie: https://github.com/likan999/codejam/blob/master/Common/StreamScanf.cpp

C++ tokenization

I am writing a lexer in C++ and I am reading from a file character by character, however, how do you do tokenization in this case? I can't use strtok since I have character not a string. Somehow I need to keep reading until I reach a delimeter?
The answer is Yes. You need to keep reading until you hit a delimiter.
There are multiple solutions.
The simplest thing to do is exactly that: keep a buffer (std::string) of the characters you already read until you reach a delimiter. At that point, you build a token from the accumulated characters in the buffer, clear the buffer, and push the delimiter (if necessary) in the buffer.
Another solution would be to read ahead of the time: ie, pick up the entire line with std::getline (for example), and then check what's on this line. In general the end-of-line is a natural token delimiter.
This works well... when delimiters are easy.
Unfortunately some languages, like C++, have awkward grammars. For example, in C++ >> can be either:
the operator >> (for right-shift and stream extraction)
the end of two nested templates (ie could be rewritten as > >)
In those cases... well, just don't bother with the difference in the tokenizer, and let your AST building pass disambiguate, it's got more information.
On the basis of information provided you.
If you want to read upto a delimiter from a File, use getline(char *,int,char) function.
getline() is use to read upto n characters or upto a delimiter.
Example:
#include<fstream.h>
using namespace std;
main()
{
fstream f;
f.open("test.cpp",ios::in);
char *c;
f.getline(c,2,' ');
cout<<c; // upto 1 char or till a space
}

Is there any way to read characters that satisfy certain conditions only from stdin in C++?

I am trying to read some characters that satisfy certain condition from stdin with iostream library while leave those not satisfying the condition in stdin so that those skipped characters can be read later. Is it possible?
For example, I want characters in a-c only and the input stream is abdddcxa.
First read in all characters in a-c - abca; after this input finished, start read the remaining characters dddx. (This two inputs can't happen simultaneously. They might be in two different functions).
Wouldn't it be simpler to read everything, then split the input into the two parts you need and finally send each part to the function that needs to process it?
Keeping the data in the stdin buffer is akin to using globals, it makes your program harder to understand and leaves the risk of other code (or the user) changing what is in the buffer while you process it.
On the other hand, dividing your program into "the part that reads the data", "the part that parses the data and divides the workload" and the "part that does the work" makes for a better structured program which is easy to understand and test.
You can probably use regex to do the actual split.
What you're asking for is the putback method (for more details see: http://www.cplusplus.com/reference/istream/istream/putback/). You would have to read everything, filter the part that you don't want to keep out, and put it back into the stream. So for instance:
cin >> myString;
// Do stuff to fill putbackBuf[] with characters in reverse order to be put back
pPutbackBuf = &putbackBuf[0];
do{
cin.putback(*(pPutbackBuf++));
while(*pPutbackBuf);
Another solution (which is not exactly what you're asking for) would be to split the input into two strings and then feed the "non-inputted" string into a stringstream and pass that to whatever function needs to do something with the rest of the characters.
What you want to do is not possible in general; ungetc and putback exist, but they're not guaranteed to work for more than one character. They don't actually change stdin; they just push back on an input buffer.
What you could do instead is to explicitly keep a buffer of your own, by reading the input into a string and processing that string. Streams don't let you safely rewind in many cases, though.
No, random access is not possible for streams (except for fstream an stringstream). You will have to read in the whole line/input and process the resulting string (which you could, however, do using iostreams/std::stringstream if you think it is the best tool for that -- I don't think that but iostreams gurus may differ).

gets (variable)

can anyone tell me why gets(abc) works with char[] but not with int?
int abc;
char name[] = "lolrofl";
printf("Hello %s.\n",name);
printf("\n >> ");
fflush(stdin);
gets (abc);
printf("\n die zahl ist %i.\n",abc);
system("Pause");
return(0);
The prototype for gets() is:
char* gets(char *s);
Note that the function DOES NOT read just a single character and place it in s; it actually reads an entire string into s. However, since gets() does not provide a way of specifying the maximum number of characters to read, this can actually read more characters into s than there are bytes allocated for s. Thus, this function is a serious buffer overflow vulnerability, and you should not use this function, ever. There are alternative, safer functions which allow you to read input from the user such as fgets() and getc().
If you are using C++, then using the C++ I/O Stream Library (std::cin, std::cout, std::ostream, std::istream, std::fstream, etc.) is a far better way to perform input/output than using these other functions.
The function gets() is so dangerous, in fact, that in my development and coding custom search engine, I have taken out a promotion on gets and several other such functions warning not to use it!
Because it only reads characters. Use scanf() for formatted reading.
By the way, since you appear to be using C++ (or at least your choice of tags says so), perhaps you should try std::cin/std::cout.
If you take a look at the C Reference your question will be answered. I'll paste it for you:
char *gets( char *str );
The gets() function reads characters
from stdin and loads them into str,
until a newline or EOF is reached. The
newline character is translated into a
null termination. The return value of
gets() is the read-in string, or NULL
if there is an error. Note that gets()
does not perform bounds checking, and
thus risks overrunning str. For a
similar (and safer) function that
includes bounds checking, see fgets().
So you won't be able to cast a whole string to an integer.
First, the gets function is for reading strings or text, not numbers.
Second, don't use gets as it has buffer overrun errors. See C Language FAQ for more information. The function fgets is a safer alternative.
Third, you may want to switch to C++ streams and std::string. The C++ streams are more type friendly than C streams.
Fourth, fflush does not function on input streams. The fflush function is for writing the remaining data in stream buffers to the output stream. In C++, there is a method, ignore, which will ignore incoming characters until a newline (default) or a specified character is read (or a limit is reached).
Hope that helps.