"Input is provided as CSV format via STDIN" - c++

I'm working on a programming problem in C++, where I need to write a CSV parser. I've written this before for files, but the instructions state:
Input is provided as CSV format via STDIN. The first line is a header. The subsequent lines are data.
Here's an example of the input "file":
#id,time,amount
0,4,5
2,8,3
8,1,2
...
Now I'm a bit confused by this because I haven't worked with STDIN much since picking up C++ a few years ago. How exactly does one read in a csv file through STDIN? When I've used std::cin in the past, if I try to paste multiple lines, only the first line will get read.
The instructions of the programming problem did not make it clear how the input "file" will be fed in through STDIN, or perhaps there's some classical way it's done and my lack of knowledge makes me think it's unclear? Is there some standard way a CSV file is read in through STDIN?
All I am tasked to do is to process what comes through STDIN, and I'm not given how things are passed into STDIN. I feel like I need to know how things are passed in to know what I'm supposed to do? Like it could be passed in character by character, line by line, entry by entry, or the entire file at a time?

Related

How to pass array of strings to a main()

I have not been successful on searching for this topic. I want to pass an array of strings to a C++ console app. The closest I have found is using argv(), but the number (variable) may be 50 strings which would be ugly on the calling side.
Is it possible to pass an array, or a structure to main()? I am totally open to which way to go, I have almost no experience with interprocess communication.
The conventional approach is just STDIN, as then you can send in whatever using pipes or redirection. As in: program < input
The second option is your first argument is a file to read this data from. As in program input.file
There are conventions that accommodate both, like where - as a filename is presumed to mean "read STDIN", or where no filename given means read from STDIN (e.g. grep), so you can have it both ways.
If your strings contain newlines which complicate framing you may want to use a format like INI, JSON, or YAML to read in the data.

How to read .inp file in c++?

I have a dataset, a ".inp" format file, and I need to read this file in c++. However, the fopen() fread() method seemed to fail and read the wrong data(e.g. the first integer should be 262144, the fread yields an integer much larger than this nevertheless).
To be more specific, my ".inp" file contains a few integers and float points, how can I read them successfully in c++?
enter image description here
This is the screenshot of the "*.inp" file from Notepad++. Basically this is a text file.
I solved it by coping the data into a txt. However, I am still not aware how to read "*.inp"
I found some info about INP file extension. It seems like there are multiple variances of it, each meant to be used for different purpose. Where is your file coming from? As for soultion, if you can't open the file using fopen/fstream normally, you could treat it as binary and read each value in the way you specify. Other than that, I could think of calling system functions to get file contents (like cat in linux for example), then if there are some random characters, you could parse your string to ommit them.
Here is example of how to call cat in C++:
Simple way to call 'cat' from c++?

Retrieving file from .dat via getline() w/ c++

I posted this over at Code Review Beta but noticed that there is much less activity there.
I have the following code and it works just fine. It's function is to grab the input from a file and display it out (to confirm that it's been grabbed). My task is to write a program that counts how many times a certain word (string) "abc" is found in the input file.
Is it better to store the input as a string or in arrays/vectors and have each line be stored separately? a[1], a[2] ect? Perhaps someone could also point me to a resource that I can use to learn how to filter through the input data.
Thanks.
input_file.open ("in.dat");
while(!input_file.eof()) // Inputs all the lines until the end of file (eof).
{
getline(input_file,STRING); // Saves the input_file in STRING.
cout<<STRING; // Prints our STRING.
}
input_file.close();
Reading as much of the file into memory is always more efficient than reading one letter or text line at a time. Disk drives take a lot of time to spin up and relocate to a sector. However, your program will run faster if you can minimize the number of reads from the file.
Memory is fast to search.
My recommendation is to read the entire file, or as much as you can into memory, then search the memory for a "word". Remember, that in English, words can have hyphens,'-', and single quotes, "don't". Word recognition may become more difficult if it is split across a line or you include abbreviations (with periods).
Good luck.

getline() text with UNIX formatting characters

I am writing a C++ program which reads lines of text from a .txt file. Unfortunately the text file is generated by a twenty-something year old UNIX program and it contains a lot of bizarre formatting characters.
The first few lines of the file are plain, English text and these are read with no problems. However, whenever a line contains one or more of these strange characters mixed in with the text, that entire line is read as characters and the data is lost.
The really confusing part is that if I manually delete the first couple of lines so that the very first character in the file is one of these unusual characters, then everything in the file is read perfectly. The unusual characters obviously just display as little ascii squiggles -arrows, smiley faces etc, which is fine. It seems as though a decision is being made automatically, without my knowledge or consent, based on the first line read.
Based on some googling, I suspected that the issue might be with the locale, but according to the visual studio debugger, the locale property of the ifstream object is "C" in both scenarios.
The code which reads the data is as follows:
//Function to open file at location specified by inFilePath, load and process data
int OpenFile(const char* inFilePath)
{
string line;
ifstream codeFile;
//open text file
codeFile.open(inFilePath,ios::in);
//read file line by line
while ( codeFile.good() )
{
getline(codeFile,line);
//check non-zero length
if (line != "")
ProcessLine(&line[0]);
}
//close line
codeFile.close();
return 1;
}
If anyone has any suggestions as to what might be going on or how to fix it, they would be very welcome.
From reading about your issues it sounds like you are reading in binary data, which will cause getline() to throw out content or simply skip over the line.
You have a couple of choices:
If you simply need lines from the data file you can first sanitise them by removing all non-printable characters (that is the "official" name for those weird ascii characters). On UNIX a tool such as strings would help you with that process.
You can off course also do this programmatically in your code by simply reading in X amount of data, storing it in a string, and then removing those characters that fall outside of the standard ASCII character range. This will most likely cause you to lose any unicode that may be stored in the file.
You change your program to understand the format and basically write a parser that allows you to parse the document in a more sane way.
If you can, I would suggest trying solution number 1, simply to see if the results are sane and can still be used. You mention that this is medical data, do you per-chance know what file format this is? If you are trying to find out and have access to a unix/linux machine you can use the utility file and maybe it can give you a clue (worst case it will tell you it is simply data).
If possible try getting a "clean" file that you can post the hex dump of so that we can try to provide better help than that what we are currently providing. With clean I mean that there is no personally identifying information in the file.
For number 2, open the file in binary mode. You mentioned using Windows, binary and non-binary files in std::fstream objects are handled differently, whereas on UNIX systems this is not the case (on most systems, I'm sure I'll get a comment regarding the one system that doesn't match this description).
codeFile.open(inFilePath,ios::in);
would become
codeFile.open(inFilePath, ios::in | ios::binary);
Instead of getline() you will want to become intimately familiar with .read() which will allow unformatted operations on the ifstream.
Reading will be like this:
// This code has not been tested!
char input[1024];
codeFile.read(input, 1024);
int actual_read = codeFile.gcount();
// Here you can process input, up to a maximum of actual_read characters.
//ProcessLine() // We didn't necessarily read a line!
ProcessData(input, actual_read);
The other thing as mentioned is that you can change the locale for the current stream and change the separator it considers a new line, maybe this will fix your issue without requiring to use the unformatted operators:
imbue the stream with a new locale that only knows about the newline. This method may or may not let your getline() function without issues.

Reading a file without reading the whole thing into memory

I am trying to read an extremely large text file. I want to write a program (C++) to read it line by line until I reach a certain set of characters, then begin to write the following text into a string until it reaches another set of characters.
It is a XML file, so I'm looking at
<flag>info</flag>
I need my program to read the file until it reaches <flag>, inputs "info" into a a string and notes that </flag> is the point to stop putting stuff into the string. What tools could I utilize that can actually read the file. As far as detecting the <flag>, I can do that.
Use an XML SAX parser such as Xerces; they will allow you to parse the XML file in a streaming fashion, so you don't need to load it into memory all at once. Reading line-by-line will not give you correct results on general XML files.