How to signal EOF on a file the program opens - c++

How would I signal an EOF when reading in a file in C++? I'm writing a direct coded scanner, as a part of a compiler design, that reads in a file and splits it up into tokens for a language.
I am to read in the whole program, strip out the comments, and compress the whitespace. Then put the resulting program char by char into a buffer with max size of 1024 chars. So when we empty we will refill the buffer or what not.
To open the file I have this written:
// Open source file.
source_file.open (filename);
if (source_file.fail()) {
// Failed to open source file.
cerr << "Can't open source file " << *filename << endl;
buffer_fatal_error();
To fill the buffer, I am wanting to use a while loop and iterate like
int i = 0;
// Iterate through the whole file
while(source_file.at(i) != EOF)
{
// If not a tab or newline add to buffer
if (source_file.at(i) != "\n" || source_file.at(i) != "\t")
{
bufferList.add(source_file.at(i));
}
i++;
}
Would there be a way to signal EOF like that for the file that I am opening?
This is more or less a general outline for what to do. I will need to figure out how to refill the buffer once I am empty or to use dual buffering. I also need to figure out how to strip out a comment which would begin with #. For instance # This is a comment. My scanner would see # and remove everything after that until it gets the the next newline char.

Try this:
char c;
std::vector<char> buffer(1024);
while (source_file.get(c))
{
if ((c != '\n') || (c != '\t'))
{
buffer.push_back(c);
}
}
The standard method for reading data is to test for the result of the read operation in a while loop.
For block reading, you could do something like this:
char buffer[1024];
while (source_file.read(buffer, sizeof(buffer))
{
// Process the buffer here
}
You should also use std::istream::gcount() to get the number of characters read from the file, as it could be less than the buffer size.

Related

Recover ifstream from failed read

I currently have a piece of code that reads in text from a file using an ifstream. Each line of this file corresponds to a different piece of data that must be encoded into a struct. My "encodeLine" function takes care of this.
For safety, I want my system to be able to handle data that is too big to fit into its variable. For example, if the number 999999999 is read into a short, I want the program to be able to continue on reading the rest of the lines.
Currently, when I encounter data like this, I print out "ERROR" and clear the stream. However, when I perform more reads, the data that is read is corrupted. For example, on the next line the number "1" should be read, but instead something like 27021 is read.
How can the ifstream be reset to continue with valid reads?
Here is my code:
ifstream inputStream;
inputStream.open(foo.txt);
char token[64];
int totalSize = 0;
// Priming read
inputStream >> token;
while(inputStream.good())
{
// Read and encode line of data from file
totalSize = totalSize + encodeLine(inputStream, &recordPtr, header, filetypeChar)
if(!inputStream.eof() && !inputStream.good())
{
printf("ERROR");
inputStream.clear();
}
else if(inputStream.eof())
{
break;
}
inputStream >> token;
}

Read a file one char at a time until newline character using read(2) and write(2)

I'm trying to a read text file until the first newline character is found and print out the first line up until that first newline character.
I am using the read(2) and write(2) unix system calls to complete this. Do not want to use getline() function for this. I wanted to go through each character in the buffer array and print the elements out until I get to the first newline character in the first line. Also trying to figure out to do the same for the second line and also the last line in the text file. Each line will be displayed in their own separate program if that make sense.
Here is what I have so far for the first line program....
#define BUFFSIZE 4
int main(int argc, char** argv)
{
if(argc != 2)
{
cout << argv[0] << "need filename" << endl;
exit(4);
}
int n = 0;
char buf[BUFFSIZE];
int fd = open(argv[1], O_RDONLY);
while((n = read(fd, buf, BUFFSIZE))>0)
{
if[buf[0] != '\n')
{
cout << buf[0];
}
if[buf[1] != '\n')
{
cout << buf[1];
}
if[buf[2] != '\n')
{
cout << buf[2];
}
if[buf[3] != '\n')
{
cout << buf[3];
}
}
close(fd);
}
I am taking a input file as a command argument. I have several lines in the input text file that goes like...
Purple is a cool color
Blue is the color of the sky
Green is the color of the grass
When I run this program, it prints all the lines into one long line in command window and with no space in between each sentence.
Output
Purple is a cool colorBlue is the color of the skyGreen is the color of the grass
I want to be able to print the first line only when the first newline character is found and I am having a hard time figuring out how to print each element in the array until the newline character is read.
The way your program is formed you are literally telling it to print everything until a newline char is found, and when one is found to omit it.
Introduce an else for those if != where you print whatever it is that you want in place of newline (e.g. 2x newline).
The listing also seems malformed (conditionals followed by a square bracket), but I assume that's typos. It's always best to post code that will compile when asking for help.
Also as posted in comments you are reading 4-at-a-time with no conditionals, you will eventually get into a crash or garbage input.

How can I convert a char array to a string in C++?

I have a char array called firstFileStream[50], which is being written to from an infile using fstream.
I want to convert this char array into a string called firstFileAsString. If I write string firstFileAsString = firstFileStream; it only writes the first word within the array and stops at the first space, or empty character. If I write firstFileAsString(firstFileStream) I get the same output.
How do I write the whole char array, so all words within it, to a string?
Here is the code to read in and write:
string firstInputFile = "inputText1.txt";
char firstFileStream[50];
ifstream openFileStream;
openFileStream.open(firstInputFile);
if (strlen(firstFileStream) == 0) { // If the array is empty
cout << "First File Stream: " << endl;
while (openFileStream.good()) { // While we haven't reached the end of the file
openFileStream >> firstFileStream;
}
string firstFileAsString = firstFileStream;
}
My problem, as zdan pointed out, is I was only reading the first word of the file, so instead I've used istreambuf_iterator<char> to assign the content directly to the string rather than the character array first. This can then be broken down into a character array, rather than the other way around.
openFileStream >> firstFileStream;
reads only one word from the file.
A simple example of reading the whole file (at least up to the buffering capacity) looks like this:
openFileStream.read(firstFileStream, sizeof(firstFileStream) - 1);
// sizeof(firstFileStream) - 1 so we have space for the string terminator
int bytesread;
if (openFileStream.eof()) // read whole file
{
bytesread = openFileStream.gcount(); // read whatever gcount returns
}
else if (openFileStream) // no error. stopped reading before buffer overflow or end of file
{
bytesread = sizeof(firstFileStream) - 1; //read full buffer
}
else // file read error
{
// handle file error here. Maybe gcount, maybe return.
}
firstFileStream[bytesread] = '\0'; // null terminate string

Decoding / Encloding Text File using Stack Library - Can't Encode Large Files C++

I am working on a program that can encode and then decode text in C++. I am using the stack library. The way the program works is that it first asks you for a cypher key, which you put in manually. It then asks for the file name, which is a text file. If it is a normal txt file, it encodes the message to a new file and adds a .iia files extension. If the text file already has a .iia file extension, then it decodes the message, as long as the cypher key is the same as the one used to encode it.
My program does encode and decode, but how many characters it decodes is determined by temp.size() % cypher.length() that is in the while loop in the readFileEncode() function. I think this is what is keeping the entire file from being encoded and then decoded correctly. Another words, the ending file after it has been decoded from say "example.txt.iia" back to "example.txt" is missing a large portion of the text from the original "example.txt" file. I tried just cypher.length() but of course that does not encode or decode anything then. The entire process is determined by that argument for the decoding and encoding.
I cannot seem to find out the exact logic for this to encode and decode all the characters in any size file. Here is the following code for the function that does the decoding and encoding:
EDIT: Using WhozCraig's code that he edited for me:
void readFileEncode(string fileName, stack<char> &text, string cypher)
{
std::ifstream file(fileName, std::ios::in|std::ios::binary);
stack<char> temp;
char ch;
while (file.get(ch))
temp.push(ch ^ cypher[temp.size() % cypher.length()]);
while (!temp.empty())
{
text.push(temp.top());
temp.pop();
}
}
EDIT: A stack is required. I am going to implement my own stack class, but I am trying to get this to work first with the stack library. Also, if there is a better way of implementing this, please let me know. Otherwise, I believe that there is not much wrong with this except to get it to go through the loop to encode and decode the entire file. I am just unsure as to why it stops at, say 20 characters sometimes, or ten characters. I know it has to do with how long the cypher is too, so I believe it is in the % (mod). Just not sure how to rewrite.
EDIT: Ok, tried WhozCraig's solution and I don't get the desired output, so the error now must be in my main. Here is my code for the main:
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <cstdlib>
#include <cctype>
#include <stack>
using namespace std;
void readFileEncode(string fileName, stack<char> &text, string cypher);
int main()
{
stack<char> text; // allows me to use stack from standard library
string cypher;
string inputFileName;
string outputFileName;
int position;
cout << "Enter a cypher code" << endl;
cin >> cypher;
cout << "Enter the name of the input file" << endl;
cin >> inputFileName;
position = inputFileName.find(".iia");//checks to see if the input file has the iia extension
if (position > 1){
outputFileName = inputFileName;
outputFileName.erase(position, position + 3);// if input file has the .iia extension it is erased
}
else
//outputFileName.erase(position, position + 3);// remove the .txt extension and
outputFileName = inputFileName + ".iia";// add the .iia extension to file if it does not have it
cout << "Here is the new name of the inputfile " << outputFileName << endl; // shows you that it did actually put the .iia on or erase it depending on the situation
system("pause");
readFileEncode(inputFileName, text, cypher); //calls function
std::ofstream file(outputFileName); // calling function
while (text.size()){// goes through text file
file << text.top();
text.pop(); //clears pop
}
system("pause");
}
Basically, I am reading .txt file to encrypt and then put a .iia file extension on the filename. Then I go back through, enter the file back with the .iia extension to decode it back. When I decode it back it is gibberish after about the first ten words.
#WhozCraig Does it matter what white space, newlines, or punctuation is in the file? Maybe with the full solution here you can direct me at what is wrong.
just for information: never read file char by char it will take you hours to finish 100Mb.
read at least 512 byte(in my case i read directly 1 or 2Mb ==> store in char * and then process).
If I understand what you're trying to do correctly, you want the entire file rotationally XOR'd with the chars in the cipher key. If that is the case, you can probably address your immediate error by simply doing this:
void readFileEncode(string fileName, stack<char> &text, string cypher)
{
std::ifstream file(fileName, std::ios::in|std::ios::binary);
stack<char> temp;
char ch;
while (file.get(ch))
temp.push(ch ^ cypher[temp.size() % cypher.length()]);
while (!temp.empty())
{
text.push(temp.top());
temp.pop();
}
}
The most notable changes are
Opening the file in binary-mode using std::ios::in|std::ios::binary for the open-mode. this will eliminate the need to invoke the noskipws manipulator (which is usually a function call) for every character extracted.
Using file.get(ch) to extract the next character. The member will pull the next char form the file buffer directly if one is available, otherwise load the next buffer and try again.
Alternative
A character by character approach is going to be expensive any way you slice it. That this is going through a stack<>, which will be backed by a vector or deque isn't going to do you any favors. That it is going through two of them just compounds the agony. You may as well load the whole file in one shot, compute all the XOR's directly, then push them on to you stack via a reverse iterator:
void readFileEncode
(
const std::string& fileName,
std::stack<char> &text,
const std::string& cypher
)
{
std::ifstream file(fileName, std::ios::in|std::ios::binary);
// retrieve file size
file.seekg(0, std::ios::end);
std::istream::pos_type pos = file.tellg();
file.seekg(0, std::ios::beg);
// early exit on zero-length file.
if (pos == 0)
return;
// make space for a full read
std::vector<char> temp;
temp.resize(static_cast<size_t>(pos));
file.read(temp.data(), pos);
size_t c_len = cypher.length();
for (size_t i=0; i<pos; ++i)
temp[i] ^= cypher[i % c_len];
for (auto it=temp.rbegin(); it!=temp.rend(); ++it)
text.push(*it);
}
You still get your stack on the caller-side, but I think you'll be considerably happier with the performance.

Avoid \r\n while reading text from a binary file

I have a binary file packing lots of files (something like a .tar), where I can found both binary and text files.
When processing in memory strings, carriage lines are usually '\n', but if I read the text part from this packed file, I get "\r\n". Therefore processing this text gives me errors.
Here is the code for reading the text from a binary file:
FILE* _fileDescriptor; // it's always open to improve performance
fopen_s(&_fileDescriptor, _filePath.string().c_str(), "rb");
char* data = new char[size + 1]; // size is a known and correct value
fseek(_fileDescriptor, begin, SEEK_SET); // begin is another known value, where the file starts inside the packed one
fread(data, sizeof(char), size, _fileDescriptor);
data[it->second.size] = '\0';
This gives me the right text into data, but the following code gives me error when reading an empty line:
istringstream ss(data); // create a stringstream to process it in another function
delete[] data; // free the data buffer
// start processing the file
string line;
getline(infile, line); // read an empty line
if(line.size() > 0) {
/*
enters here, because the "empty" line was "\r\n", and now the value of line is '\r', therefore line.size() == 1
*/
...
So, any advice to avoid the '\r'?
I edited it on Notepad++. Changing its configuration to use '\n' instead of '\r\n' as line carriage works, but I don't want to depend on this because other people can miss that, and it would be very hard to spot the problem if that happens.
Probably easiest to trim the '\r' characters out of your string and then discard blank lines. See this answer for approaches to trimming a std::string (I'm assuming that's what 'line' is):
What's the best way to trim std::string?