Can I open a .xls or .PDF file using the open() function in C++ with binary mode and read its contents? If not, how can I build an application program that can read the contents of files with such file formats (and maybe more)
Yes, you can open any file in your filesystem as a binary file, and you can read it too (as long as your operating system allows the file to be opened based on file access rights, and no other application has got a lock on it, etc).
Next you'll probably ask "How do I interpret a PDF or XLS file?" and that's a whole other kettle of fish as they say here in England. Neither PDF, nor XLS files are straight forward to "understand". A PDF librar that I looked at recently contains several dozen files, and is several megabytes of source code. I've worked with XLS files in Python, and the code there was a few thousand lines of code.
Simple reading would be:
#include <iostream>
#include <fstream>
#include <algorithm>
#include <iterator>
std::vector<char> readfile(std::string const& fname)
{
std::ifstream ifs(fname.c_str(), std::ios::binary);
std::istreambuf_iterator<char> f(ifs.rdbuf()), l;
std::vector<char> bytes;
std::copy(f, l, std::back_inserter(bytes));
return bytes;
}
int main()
{
auto bytes = readfile("my.pdf");
}
The argument mode points to a string beginning with one of the following
sequences (Additional characters may follow these sequences.):
``r'' Open text file for reading. The stream is positioned at the
beginning of the file.
``r+'' Open for reading and writing. The stream is positioned at the
beginning of the file.
``w'' Truncate file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.
``w+'' Open for reading and writing. The file is created if it does not
exist, otherwise it is truncated. The stream is positioned at
the beginning of the file.
``a'' Open for writing. The file is created if it does not exist. The
stream is positioned at the end of the file. Subsequent writes
to the file will always end up at the then current end of file,
irrespective of any intervening fseek(3) or similar.
``a+'' Open for reading and writing. The file is created if it does not
exist. The stream is positioned at the end of the file. Subse-
quent writes to the file will always end up at the then current
end of file, irrespective of any intervening fseek(3) or similar.
Related
I am using the C++ streams to read in a bunch of files in a directory and then write them to another directory. Since these files may be of different types, I am using a the generic ios::binary flag when reading/writing these files.
Example code below:
std::fstream inf( "ex.txt", std::ios::in | std::ios::binary);
char c;
while( inf >> c ) {
// writing to another file in binary format
}
The issue I have is that in the case of files containing text, the end of line characters in these text files are not being written to the output file.
Edit: Or at least they do not appear to be as when the newly written file is opened, there is only a single continuous line of characters.
Edit again: The problem (of the continuous string) appears to persist even when the read / write is made in text mode.
Thus, I was wondering if there was a way to check if a file has text or binary and then read/write it appropriately. Else, is there any way to preserve the end of line characters even when opening the file in binary format?
Edit: I am using the g++ 4.8.2 compiler
When you want to manipulate bytes, you need to use read and write methods, not >> << operators.
You can get the intended behavior with inp.flags(inp.flags() & ~std::ios_base::skipws);, though.
I am using ofstream to output some text to a file in ios::app mode within a loop. But after some step, I need to clear the content of the file. I know we can do it by either delete the file and open again or to open it again with ios::trunc, but is there any where I can get it done without close and open the file again?
If you have opened it in ios::app mode, there's no way to clear content without opening it again. ofstream can only put text in a file, and as text files are sequential, you can't directly erase data on them.
Note sure if it is possible with io streams, but in general you can truncate an open file by setting its current position to 0 and then setting the EOF marker on the file. In the Win32 API, for instance, you can do that with SetFilePointer() and SetEndOfFile().
I am polling a directory constantly for files and every time I see a file that meets some certain criteria for reading, I open the file and parse it.
string newLine;
ifstream fileReader;
fileReader.open(filename.c_str());
while(!fileReader.eof())
{
getline(fileReader, newLine);
// do some stuff with the line...
}
filereader.close();
The above code is in a loop that runs every 1 second checking a directory for new files. My issue is that as I am transferring files into the folder for processing, my loop finds the file and passes the name of the file to ifstream who then opens it and tries to parse an incomplete file. How do I make ifstream wait until the file is done being written before it tries to parse the file?
EDIT:
Wanted to better word the issue here since a replier seems to have misunderstood my issue. I have 2 directories
mydirectory/
mydirectoryParsed/
THe way my code works is that my program checks for files in mydirectory/ and when it finds them, parses them and uses the information in the files. No writing to the files are done. Once I am done parsing the file, the file is moved to mydirectoryparsed/
The issue is that when I transfer files over the network into mydirectory/ the ifstream sees these files midtransfer and starts reading them before they finish writing to the directory. How do I make ifstream wait until the file is completely written before parsing it?
Don't transfer the files directly into the directory that your program is watching; instead, transfer them into a different directory on the same drive, and then when the transfer is done, move them into the watched directory. That way, the complete file appears in the watched directory in a single atomic operation.
Alternatively, you could use a naming convention in the watched directory — append a suffix like ".partial" to files that are being transferred, and then rename the file to remove the suffix when the transfer is done. Have your program ignore files whose names end with the suffix.
You're not supposed to open the file every time you write in it. Open it once!
Some pseudo-code for this would be :
1- Open file
2- Get the data you want to write, treat that data
3- Call the write to file function
4- Loop until you have nothing left to write
5- Close de file
I'm trying to write a Huffman encoder but I'm getting some compression errors. I identified the problem as mismatches between characters that were put() to the ofstream and the characters read() from the same file.
One specific instance of this problem :
The put() writes ASCII character 10 (Line feed)
The read() reads ASCII character 13 (Carriage return)
I thought read and put read and write raw data ( no character translations ) I'm not sure why this is happening. Can someone help me out?
Here is the ofstream instance for writing the compressed file:
std::ofstream compressedFileStream(getCompressedFileName(),std::ios::binary||std::ios::ate);
and the ifstream instance for reading the same
std::ifstream fileInput(getFileName()+".huf",std::ios::binary);
The code is running on Windows 7 and all streams in the program are opened in binary mode.
Not opening in binary mode due to a typo:
std::ofstream compressedFileStream(getCompressedFileName(),std::ios::binary||std::ios::ate)
should be:
std::ofstream compressedFileStream(getCompressedFileName(),std::ios::binary|std::ios::ate)
// ^
|, not ||.
The symptoms show that you are creating the ofsteam with text mode or you are creating it using a filedesc that is opened in text mode.
You will want to pass ios::binary to it at construction time or it may run in text mode on Windows.
After you added the code, the reason proves to be a typo;
std::ios::binary||std::ios::ate
should be
std::ios::binary|std::ios::ate
On Windows, if you are writing binary data, you need to open the file with the appropriate attributes.
Similarly, if you are reading binary data, you need to open the file with the appropriate attributes.
When I construct an iostream when say opening a file will this always read the entire file from the hard disk and then put it into memory, or is it streamed in and buffered by the OS on demand?
I ask because one way to check if a file exists is to see if opening it fails, but I fear if the files I am opening are very large then this take a long time if iostream must read the entire file in on open.
To check whether a file exists can be done like this if you want to use boost.
#include <boost/filesystem.hpp>
bool fileExists = boost::filesystem::exists("foo.txt");
No, it will not read the entire file into memory when you open it. It will read your file in chunks though, but I believe this process will not start until you read the first byte. Also these chunks are relatively small (on the order of 4-128 kibibytes in size), and the fact it does this will speed things up greatly if you are reading the file sequentially.
In a test on my Linux box (well, Linux VM) simply opening the file only results in the OS open system call, but no read system call. It doesn't start reading anything from the file until the first attempt to read from the stream. And then it reads 8191 (why 8191? that seems a very strange number) byte chunks as I read the file in.
Opening a file is a bad way of testing if the file exists - all it does is tell you if you can open it. Opening might fail for a number of reasons, typically because you don't have read permission, but the file will still exist. It is usually better to use an operating system specific function to test for existence. And no, opening an fstream will not cause the contents to be read.
What I think is, when you open a file, the corresponding data structures for the process opening the file are populated which include file pointer, file descriptor, v node etc.
Now one can read and write to a file using buffered streams (fwrite , fread) or using system calls (read and write).
When we use buffered streams, we buffer the data and then write or read it[This is done for efficiency puposes]. This statement itself means that the whole file is not read into memory but certain bytes are read into buffer and then made available.
In case of sys calls such as read and write , kernel level buffering is done (using fsync one can flush out kernel buffer too), but data is actually read and written to the device .file
checking existance of file
#include < sys/stat.h >
int main(){
struct stat file_i;
std::string f("myfile.txt");
if (stat(f.c_str(),&file_i) != 0){
cout << "File not found" << endl;
}
return 0;
}
Hope this clarifies a bit.