Reading a single character from an fstream?

Reading a single character from an fstream? - c++

I'm trying to move from stdio to iostream, which is proving very difficult. I've got the basics of loading a file and closing them, but I really don't have a clue as to what a stream even is yet, or how they work.
In stdio everything's relatively easy and straight forward compared to this. What I need to be able to do is
Read a single character from a text file.
Call a function based on what that character is.
Repeat till I've read all the characters in the file.
What I have so far is.. not much:
int main()
{
std::ifstream("sometextfile.txt", std::ios::in);
// this is SUPPOSED to be the while loop for reading. I got here and realized I have
//no idea how to even read a file
while()
{
}
return 0;
}
What I need to know is how to get a single character and how that character is actually stored(Is it a string? An int? A char? Can I decide for myself how to store it?)
Once I know that I think I can handle the rest. I'll store the character in an appropriate container, then use a switch to do things based on what that character actually is. It'd look something like this.
int main()
{
std::ifstream textFile("sometextfile.txt", std::ios::in);
while(..able to read?)
{
char/int/string readItem;
//this is where the fstream would get the character and I assume stick it into readItem?
switch(readItem)
{
case 1:
//dosomething
break;
case ' ':
//dosomething etc etc
break;
case '\n':
}
}
return 0;
}
Notice that I need to be able to check for white space and new lines, hopefully it's possible. It would also be handy if instead of one generic container I could store numbers in an int and chars in a char. I can work around it if not though.
Thanks to anyone who can explain to me how streams work and what all is possible with them.

You also can abstract away the whole idea of getting a single character with streambuf_iterators, if you want to use any algorithms:
#include <iterator>
#include <fstream>
int main(){
typedef std::istreambuf_iterator<char> buf_iter;
std::fstream file("name");
for(buf_iter i(file), e; i != e; ++i){
char c = *i;
}
}

You can also use standard for_each algorithm:
#include <iterator>
#include <algorithm>
#include <fstream>
void handleChar(const char& c)
{
switch (c) {
case 'a': // do something
break;
case 'b': // do something else
break;
// etc.
}
}
int main()
{
std::ifstream file("file.txt");
if (file)
std::for_each(std::istream_iterator<char>(file),
std::istream_iterator<char>(),
handleChar);
else {
// couldn't open the file
}
}
istream_iterator skips whitespace characters. If those are meaningful in your file use istreambuf_iterator instead.

This has already been answered but whatever.
You can use the comma operator to create a loop which behaves like a for each loop which goes through the entire file reads every character one by one and stop when it's done.
char c;
while((file.get(c), file.eof()) == false){
/*Your switch statement with c*/
}
Explanation:
The first part of the expression in the for loop (file.get(c), file.eof())
will function as follows. Firstly file.get(c) gets executed which reads a character and stores the result in c. Then, due to the comma operator, the return value is discarded and file.eof() gets executed which returns a bool whether or not the end of the file has been reached. This value is then compared.
Side Note:
ifstream::get() always reads the next character. Which means calling it twice would read the first two character in the file.

fstream::get
Next time you have similar problem go to cplusplusreference or similar site, locate class you have problem with and read description of every method. Normally, this solves the problem. Googling also works.

I would honestly just avoid iterators here since it's just hurting readability. Instead, consider:
int main()
{
std::ifstream file("sometextfile.txt")
char c;
while(file >> c) {
// do something with c
}
// file reached EOF
return 0;
}
This works because the stream implements operator bool, which makes it implicitly convertible to true if the stream hasn't reached EOF, and false if it has; and because file >> c returns the file itself, it can be used as the while condition.
Using an iterator is only really useful if you intend to use other functions from , but for plain reading, using the stream operator is simpler and easier to read.

while (textFile.good()) {
char a;
textFile.get(a);
switch(a)
{
case 1:
//dosomething
break;
case ' ':
//dosomething etc etc
break;
case '\n':
}
}

Related

How to read double digits and single digits in C++

I have an issue where I cannot get my C++ program to read double digit integers.
My idea is to read it as string and then somehow parse it into separate integers and insert them into an array, but I am stuck on getting the code to read digits properly.
Sample Output:
i: 0 codeColumn 0
i: 1 codeColumn 1
i: 2 codeColumn 0 0
i: 3 codeColumn 0
i: 4 codeColumn 31 0
i: 5 codeColumn 1
i: 6 codeColumn 43 0
i: 7 codeColumn 3
i: 8 codeColumn 9 0
So the file is basically a line of triplets delimited by a comma:
0,1,0 0,0,31 0,0,18 0,0,8 0,11,0
My question is how do you get the trailing zeroes (see above) to move to a new line? I tried using "char" and a bunch of if statements to concatenate the single digits into double digits, but I feel like that's not really efficient or ideal. Any ideas?
My code:
#include <iostream> // Basic I/O
#include <string> // string classes
#include <fstream> // file stream classes
#include <sstream>
#include <vector>
int main()
{
ifstream fCode;
fCode.open("code.txt");
vector<string> codeColumn;
while (getline(fCode, codeLine, ',')) {
codeColumn.push_back(codeLine);
}
for (size_t i = 0; i < codeColumn.size(); ++i) {
cout << " i: " << i << " codeColumn " << codeColumn[i] << endl;
}
fCode.close();
}

getline(fCode, codeLine, ',')
is going to read between commas, so 0,1,0 0,0,31 will split up exactly as you have seen.
0,1,0 0,0,31
^ ^ ^ ^
The tokens collected are everything between the ^s
You have two delimiters you need to take into account comma and space. The easiest way to handle the space is with dumb old >>.
std::string triplet;
while (fCode >> triplet)
{
// do stuff with triplet. Maybe something like
std::istringstream strm(triplet); // make a stream out of the triplet
int a;
int b;
int c;
char sep1;
char sep2;
while (strm >> a >> sep1 >> b >> sep2 >> c // read all the tokens we want from triplet
&& sep1 == sep2 == ',') // and the separators are commas. Triplet is valid
{
// do something with a, b, and c
}
}
Documentation for std::istringstream.

So, I will show you 3 solutions from easy to understand C-Style code, then more-modern C++ code using the std::algorithm library and iterators, and, at the end an object oriented C++ solution.
I will also explain to you that std::getline can be, but should not be used for splitting strings into tokens.
I saw from your question that you had difficulties to understand that. And I understand your concern.
But let's start with an easy solution. I show the code and then explain it to you:
#include <iostream>
#include <fstream>
#include <string>
int main() {
// Open the source text file, and check, if there was no failure
if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {
size_t tripletCounter{ 0 };
// Now, read all triplets from the file in a simple for loop
for (std::string triplet{}; fCode >> triplet; ) {
// Prepare output
std::cout << "\ni:\t" << tripletCounter++ << "\tcodeColumn:\t";
// Go through the triplet, search for comma, then output the parts
for (size_t i{ 0U }, startpos{ 0U }; i <= triplet.size(); ++i) {
// So, if there is a comma or the end of the string
if ((triplet[i] == ',') || (i == (triplet.size()))) {
// Print substring
std::cout << (triplet.substr(startpos, i - startpos)) << ' ';
startpos = i + 1;
}
}
}
}
else {
std::cerr << "\n*** Error, Could not open source file\n";
}
return 0;
}
You see, we need just a few lines of easy to understand code that will fullfil your requirements and produce the desired output.
Some maybe for you new features:
The if statement with initializer. This is available since C++17. You can (in addition to the condition) define a variable and initalize it. So, in
if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {
we first define a variable with name "fCode" of type std::ifstream. We use the uniform initialzer "{}", to initialze it with the input file name.
This will call the constructor for the variable "fCode", and open the file. (This is was this constructor does). After the closing "}" of the "if-statement" the variable "fCode" will fall out of scope and the destructor for the std::ifstream will be called. This will close the file automatically.
This type of if-statement has been introduced to help to prevent name space solution. The variable shall only be visible in the scope, where it is used. Without that, you would have to define the std::ifstream outside (before) the if and it would be visible for the outer context and the file would be closed at a very late time. So, please get aquainted to that.
Next we define the a "tripletCounter". That is hust necessary for output. There is no other usage.
Then, again such an if-statement with initailizer. We first define an empty std::string "triplet" and then use the extractor operator to read text until the next white space. This is how the "extractor" (>>) works. We use the whole expression as condition, to check, if the extraction worlked, or if we hit the end of file (or some other error). This works because the extractor operator returns the stream in that is was working, so a reference to "fCode". And the stream has on overwritten boolen operator !, to check the condition of the stream. Please see here.
You should always and for every IO-Operation check, if it worked or not.
So, next we split the triple (e.g. "0,1,0") into its sub-strings with an very easy for loop. We go through all characters in the string and check, if the current chacter is a comma or the end of string. In that case, we output, the characters before the delimiter.
Very simple and easy to understand. std::getline is not needed here.
So, next solution, more advanced:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
std::regex re(",");
int main() {
// Open the source text file, and check, if there was no failure
if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {
size_t tripletCounter{ 0 };
// Now, read all triplets from the file into a vector
std::vector triplets(std::istream_iterator<std::string>(fCode), {});
// Next, go through all triplets
for (const std::string &triplet : triplets) {
// Prepare output
std::cout << "\ni:\t" << tripletCounter++ << "\tcodeColumn:\t";
// Split triplet into code column. All codes are in vector codeColums
std::vector codeColumns(std::sregex_token_iterator(triplet.begin(), triplet.end(), re, -1), {});
//Show codes
for (const std::string& code : codeColumns) std::cout << code << ' ';
}
}
else {
std::cerr << "\n*** Error, Could not open source file\n";
}
return 0;
}
The beginning is the same. But then:
// Now, read all triplets from the file into a vector
std::vector triplets(std::istream_iterator<std::string>(fCode), {});
UhOh. Whats that. Let's start with the std::istream_iterator. If you read the linked description, then you will find out, that it will basically call the extractor operator >> for the specified type. And since it is an iterator, it will call it again and again, if the iterator is incremented. Ok, understandable, but then
We define variable triplets as std::vector and call its constructor with 2 arguments. That constructor is the the so called range constructor of the std::vector. Please see the descrition for constructor 5. Aha, it gets a "begin()" iterator and an "end()" iterator. Aha, but what is this strange {} instead of the "end()"-iterator. This is the default initializer (please see here and here. And if we look at the description of the std::istream_iterator we can see the the default is the end iterator. OK, understood.
I assum that you know about the range based for, which comes next. Good. But now, we come to the most difficult point. Splitting a string with delimiters. People are using std::getline. But why? Why are people doing such strange stuff?
What do people expect from the function, when they read
getline ?
Most people would say, Hm, I guess it will read a complete line from somewhere. And guess what, that was the basic intention for this function. Read a line from a stream and put it into a string.
As you can see here std::getline has some additional functionality.
And this lead to a major misuse of this function for splitting up std::strings into tokens.
Splitting strings into tokens is a very old task. In very early C there was the function strtok, which still exists, even in C++. Please see std::strtok.
But because of the additional functionality of std::getline is has been heavily misused for tokenizing strings. If you look on the top question/answer regarding how to parse a CSV file (please see here), then you will see what I mean.
People are using std::getline to read a text line, a string, from the original stream, then stuffing it into an std::istringstream again and use std::getline with delimiter again to parse the string into tokens.
Weird.
Because, since many many years, we have a dedicated, special function for tokenizing strings, especially and explicitly designed for that purpose. It is the
std::sregex_token_iterator
And since we have such a dedicated function, we should simply use it.
This thing is an iterator. For iterating over a string, hence the function name is starting with an s. The begin part defines, on what range of input we shall operate, (begin(), end()), then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
0 --> give me the stuff that I defined in the regex and
-1 --> give me that what is NOT matched based on the regex.
We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators as parameter, and copies the data between the first iterator and 2nd iterator to the std::vector. The statement
std::vector tokens(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {});
defines a variable “tokens” as a std::vector and uses again the range-constructor of the std::vector. Please note: I am using C++17 and can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction"). I also used that for the vector above.
Additionally, you can see that I do not use the "end()"-iterator explicitly.
This iterator will be constructed from the empty brace-enclosed default initializer with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that, as already described.
You can read any number of tokens in a line and put it into the std::vector
But you can do even more. You can validate your input. If you use 0 as last parameter, you define a std::regex that even validates your input. And you get only valid tokens.
Overall, the usage of a dedicated functionality is superior over the misused std::getline and people should simply use it.
Some people may complain about the function overhead, but how many of them are using big data. And even then, the approach would be probably then to use string.findand string.substring or std::stringviews or whatever.
So, somehow advanced, but you will eventually learn it.
And now we will use an object oriented approach. As you know, C++ is an object oriented language.
We can put data, and methods working with that data, in a class (struct). The functionality is encapsulated. Only the class should know, how to operate on its data. Sw, we will define a class "Code". This contains a std::array consisting of 3 st::strings. and associated functions. For the array we made a typedef for easier writing. The functions that we need, are input and output. So, we will overwrite the extractor and the inserter operator.
In these operators, we use functions as dscribed above.
And as a result of all this work, we get an elegant main function, where all the work is done in 3 lines of code.
Please see:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <array>
#include <algorithm>
using Triplet = std::array<std::string, 3>;
std::regex re(",");
struct Code {
// Our Data
Triplet triplet{};
// Overwrite extractor operator for easier input
friend std::istream& operator >> (std::istream& is, Code& c) {
// Read a triplet with commans
if (std::string s{}; is >> s) {
// Copy the single columns of the triplet in to our internal Data structure
std::copy(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {}, c.triplet.begin());
}
return is;
}
// Overwrite inserter for easier output
friend std::ostream& operator << (std::ostream& os, const Code& c) {
return os << c.triplet[0] << ' ' << c.triplet[1] << ' ' << c.triplet[2];
}
};
int main() {
// Open the source text file, and check, if there was no failure
if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {
// Now, read all triplets from the file, split it and put the Codes into a vector
std::vector code(std::istream_iterator<Code>(fCode), {});
// Show output
for (size_t tripletCounter{ 0U }; tripletCounter < code.size(); tripletCounter++)
std::cout << "\ni:\t" << tripletCounter << "\tcodeColumn:\t" << code[tripletCounter];
}
else {
std::cerr << "\n*** Error, Could not open source file\n";
}
return 0;
}

Iterating a file over all characters in C++

Newb question: what is the best way to get a char iterator over a text file?
I tried:
std::fstream csvSource (fileName);
auto aChar = csvSource.begin();
while (aChar != csvSource.end())
{
switch (*aChar)
{
case '"':
//and so on
but the compiler complains that fstream doesn't have a begin method.
Note, that I can't do it line-by-line, because newline characters that are within quotes are treated differently (literaly) than the other new line characters.

Use the >> operator from the ifstream class
std::ifstream csvSource (fileName);
csvSource >> noskipws;
char c;
while (csvSource>>c)
{
switch (c)
{
case '"':
//and so on
If you don't want to do fancy stuff with your iterators, that's the simplest way

If you have to use iterators, your best bet is istreambuf_iterator which is more optimal than istream_iterator when iterating chars.
Is there any particular reason though why you need to use iterators at all? They are there for the benefit of when you want to invoke an algorithm that requires them. But that isn't the case here as you are just looping.
You could just read in a char with get(). This might be better than operator>> which does a formatted read in, and will skip whitespace (which you might not want) unless you set skipws flag to false (I think it's noskipws) and may well be slightly less efficient.

Try std::istream_iterator. A very low level approach using raw loops could look like this:
std::fstream csvSource (fileName);
typedef std::istream_iterator<char> CharIter;
for (CharIter it(csvSource); it != CharIter(); ++it)
{
/* process *it */
char c = *it;
doSomething(c);
}
You can also use those iterators in all standard algorithms and handcrafted algorithms, which is preferable to raw loops. For example, to print all characters in the file to std::cout you can use
std::copy(CharIter{csvSource}, CarIter{}, std::ostream_iterator(std::cout));

I think you're looking for std::istreambuf_iterator as
#include <iterator> //for std::istreambuf_iterator
#include <algorithm> //for std::for_each
std:: istreambuf_iterator<char> begin(csvSource), end;
std::for_each(begin, end, [](char c)
{
switch(c)
{
//your cases and code
}
});
Or you can simply write csvSource >> c to read char-by-char. Both approaches are good. Which one to use, depends on situation.

C++: cin.peek(), cin >> char, cin.get(char)

I've got this code with use of cin.peek() method. I noticed strange behaviour, when input to program looks like qwertyu$[Enter] everything works fine, but when it looks like qwerty[Enter]$ it works only when I type double dollar sign qwerty[Enter]$$. On the other hand when I use cin.get(char) everything works also fine.
#include <iostream>
#include <cstdlib>
using namespace std;
int main()
{
char ch;
int count = 0;
while ( cin.peek() != '$' )
{
cin >> ch; //cin.get(ch);
count++;
}
cout << count << " liter(a/y)\n";
system("pause");
return 0;
}
//Input:
// qwerty$<Enter> It's ok
//////////////////////////
//qwerty<Enter>
//$ Doesn't work
/////////////////////////////
//qwerty<Enter>
//$$ works(?)

It's because your program won't get input from the console until the user presses the ENTER key (and then it won't see anything typed on the next line until ENTER is pressed again, and so on). This is normal behavior, there's nothing you can do about it. If you want more control, create a UI.

Honestly I don't think the currently accepted answer is that good.
Hmm looking at it again I think since, operator<< is a formatted input command, and get() a plain binary, the formatted version could be waiting for more input than one character to do some formatting magic.
I presume it is way more complicated than get() if you look what it can do. I think >> will hang until it is absolutely sure it read a char according to all the flags set, and then will return. Hence it can wait for more input than just one character. For example you can specify skipws.
It clearly would need to peek into more than once character of input to get a char from \t\t\t test.
I think get() is unaffected by such flags and will just extract a character from a string, that is why it is easier for get() to behave in non-blocking fashion.
The reason why consider the currently accepted answer wrong is because it states that the program will not get any input until [enter] or some other flush-like thing. In my opinion this is obviously not the case since get() version works. Why would it, if it did not get the input?
It probably still can block due to buffering, but I think it far less likely, and it is not the case in your example.

C++: std::istream check for EOF without reading / consuming tokens / using operator>>

I would like to test if a std::istream has reached the end without reading from it.
I know that I can check for EOF like this:
if (is >> something)
but this has a series of problems. Imagine there are many, possibly virtual, methods/functions which expect std::istream& passed as an argument.
This would mean I have to do the "housework" of checking for EOF in each of them, possibly with different type of something variable, or create some weird wrapper which would handle the scenario of calling the input methods.
All I need to do is:
if (!IsEof(is)) Input(is);
the method IsEof should guarantee that the stream is not changed for reading, so that the above line is equivalent to:
Input(is)
as regards the data read in the Input method.
If there is no generic solution which would word for and std::istream, is there any way to do this for std::ifstream or cin?
EDIT:
In other words, the following assert should always pass:
while (!IsEof(is)) {
int something;
assert(is >> something);
}

The istream class has an eof bit that can be checked by using the is.eof() member.
Edit: So you want to see if the next character is the EOF marker without removing it from the stream? if (is.peek() == EOF) is probably what you want then. See the documentation for istream::peek

That's impossible. How is the IsEof function supposed to know that the next item you intend to read is an int?
Should the following also not trigger any asserts?
while(!IsEof(in))
{
int x;
double y;
if( rand() % 2 == 0 )
{
assert(in >> x);
} else {
assert(in >> y);
}
}
That said, you can use the exceptions method to keep the "house-keeping' in one place.
Instead of
if(IsEof(is)) Input(is)
try
is.exceptions( ifstream::eofbit /* | ifstream::failbit etc. if you like */ )
try {
Input(is);
} catch(const ifstream::failure& ) {
}
It doesn't stop you from reading before it's "too late", but it does obviate the need to have if(is >> x) if(is >> y) etc. in all the functions.

Normally,
if (std::is)
{
}
is enough. There is also .good(), .bad(), .fail() for more exact information
Here is a reference link: http://www.cplusplus.com/reference/iostream/istream/

There are good reasons for which there is no isEof function: it is hard to specify in an usable way. For instance, operator>> usually begin by skipping white spaces (depending on a flag) while some other input functions are able to read space. How would you isEof() handle the situation? Begin by skipping spaces or not? Would it depend on the flag used by operator>> or not? Would it restore the white spaces in the stream or not?
My advice is use the standard idiom and characterize input failure instead of trying to predict only one cause of them: you'd still need to characterize and handle the others.

No, in the general case there is no way of knowing if the next read operation will reach eof.
If the stream is connected to a keyboard, the EOF condition is that I will type Ctrl+Z/Ctrl+D at the next prompt. How would IsEof(is) detect that?

Return value of ifstream.peek() when it reaches the end of the file

I was looking at this article on Cplusplus.com, http://www.cplusplus.com/reference/iostream/istream/peek/
I'm still not sure what peek() returns if it reaches the end of the file.
In my code, a part of the program is supposed to run as long as this statement is true
(sourcefile.peek() != EOF)
where sourcefile is my ifstream.
However, it never stops looping, even though it has reached the end of the file.
Does EOF not mean "End of File"? Or was I using it wrong?

Consulting the Standard,
Returns:traits::eof() ifgood()isfalse. Otherwise,returnsrdbuf()->sgetc().
As for sgetc(),
Returns: If the input sequence read position is not available, returns underflow().
And underflow,
If the pending sequence is null then the function returns traits::eof() to indicate failure.
So yep, returns EOF on end of file.
An easier way to tell is that it returns int_type. Since the values of int_type are just those of char_type plus EOF, it would probably return char_type if EOF weren't possible.
As others mentioned, peek doesn't advance the file position. It's generally easiest and best to just loop on while ( input_stream ) and let failure to obtain additional input kill the parsing process.

Things that come to mind (without seeing your code).
EOF could be defined differently than you expect
sourcefile.peek() doesn't advance the file pointer. Are you advancing it manually somehow, or are you perhaps constantly looking at the same character?

EOF is for the older C-style functions. You should use istream::traits_type::eof().
Edit: viewing the comments convinces me that istream::traits_type::eof() is guaranteed to return the same value as EOF, unless by chance EOF has been redefined in the context of your source block. While the advice is still OK, this is not the answer to the question as posted.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
//myifstream_peek1.cpp
int main()
{
char ch1, ch2;
ifstream readtext2;
readtext2.open("mypeek.txt");
while(readtext2.good())
{
if(readtext2.good())
{
ch2 = readtext2.get(); cout<< ch2;
}
}
readtext2.close();
//
ifstream readtext1;
readtext1.open("mypeek.txt");
while(readtext1.good())
{
if(readtext1.good())
{
ch2 = readtext1.get();
if(ch2 ==';')
{
ch1= readtext1.peek();
cout<<ch1; exit(1);
}
else { cout<<ch2; }
}
}
cout<<"\n end of ifstream peeking";
readtext1.close();
return 0;
}

While this technically works, using ifstream::eof() would be preferable
as in
(!sourcefile.eof())

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reading a single character from an fstream? - c++

fstream::get Next time you have similar problem go to cplusplusreference or similar site, locate class you have problem with and read description of every method. Normally, this solves the problem. Googling also works.

while (textFile.good()) { char a; textFile.get(a); switch(a) { case 1: //dosomething break; case ' ': //dosomething etc etc break; case '\n': } }

Related

How to read double digits and single digits in C++

Iterating a file over all characters in C++

C++: cin.peek(), cin >> char, cin.get(char)

C++: std::istream check for EOF without reading / consuming tokens / using operator>>

Return value of ifstream.peek() when it reaches the end of the file

Categories

Resources