Opening an existing .doc file using ofstream in C++ - c++

Assuming I have a file with .doc extension in Windows platform, how can I open the the file for outputting its contents on the screen using the ofstream object in C++? I am aware that the object can be used to open files in text and binary modes. But I would like to know if a .doc (or even .pdf) file can be opened and its contents read.

I've never actually done this before, but after reading up on it, I think I might have a suggestion. The .docx format is actually just XML that is zipped up. After unzipping, the file is located at word/document.xml. Doing this in a program is where it gets fun.
Two options: If you're using C++ CLR (.NET) then Microsoft has an SDK for you. It should make it pretty easy to open Office documents.
Otherwise if you're just using regular C++, you might have to do some extra work.
Open the file and unzip it using a library like zlib
Find the document.xml file inside
Parse the XML document. You'll probably want to use some kind of XML parsing library for this. You'll have to look up the specs for the XML to figure out how to get the text you want.

C++ std library has ifstream class that can be used to read simple text files, and for read binary files too.
It is up to you to interpret these bytes in the file. To proper interpret the binary file you need to know the format of the file.
If you think of MS Word files then I would start from here: http://en.wikipedia.org/wiki/Office_Open_XML to understand MS Word 2007 format.
You might find the Boost Iostreams library ( http://www.boost.org/doc/libs/1_52_0/libs/iostreams/doc/home.html ) somehow useful if you want to make some filter by yourself.

Related

How to convert .trc file type to text file using C++?

I have got a trace file that is binary in nature. I want to convert it to a text file and convert the data inside it to decimal form. I mean I am not sure, how to do this. This .trc file contains data in the form of telegrams and I want to extract particular kind of telegram and save them in text file which is readable in nature. I have to do all of this using C++.
Do you suggest any other language for it or does anyone has any idea about doing this in C++?
Binary trace files are usually encoded in proprietary formats. And there are applications or profilers specifically built to parse them.
Unless you know the file format, the only way to decode it is through reverse engineering. And in most cases it's not worth the effort.
Try to find documentation about it. Or maybe an application or utility that loads the file and exports data that is easier to read.
In case you are speaking about .trc binary files from Teledyne Lecroy Oscilloscopes, I would suggest to any of the following libraries out there for that:
https://pypi.org/project/lecroyparser/
https://github.com/jneer/lecroy-reader
https://github.com/yetifrisstlama/readTrc
https://igit.ific.uv.es/ferhue/lecroyparser

Does fin in C++ work with .doc files?

I used fin to read in a .doc file, and then store all the text in a string. When I tried printing the string, I just saw unknown characters.
When I copied the contents of the .doc file into a .txt file and then read the .txt file in using fin, everything worked fine.
My question is whether fin works with complex files (such as .doc) or just with .txt files. I only had text in my .doc file (no graphics or anything), but the font was Calibri, which is not the font that fout uses to print text to a .doc file.
If by fin you mean an fistream yes it will work to read the file contents, however in the case of complex files you have to deal with the file format, the c++ library will not automatically extract just the text contents. In the case where you saved the file as text that's all that is left and so that's all a stream would read.
fstream by default does all operations in text mode and .doc files use MS-DOC binary file format. So probably when you tried to read the doc file and print it, it showed characters that you couldn't understand (probably that was binary).
If you try to read any file in fstream, it does read it.
I tried reading a .mp4 file in binary using fstream and it did read the file( i can assure that because i pasted the read contents in another file and that file turned out to be the same video).
So answer to your question is you can read any file in fstream but fstream does all this operations in only two ways, either text or binary.
So reading just any file won't do much good unless you want to do something like copying the file contents to another.
You first need to understand the .doc file format. Read first the doc (computing) wikipage. It is very complex (so you'll need months of work at least) but more or less documented.
You could consider a different approach to your overall goal. For example, if you need to parse a .doc file (provided by some Microsoft Word software), you might use libreoffice which provides some library to parse it, or you could find another library (e.g. DocxFactory, wvware, ...), or you could use some COM interface to Word (on a Microsoft Windows operating system with MicroSoft Word installed).
If your goal is to generate some document, you might consider the PDF format (which is a standard), perhaps using some text formatter like LaTeX or Lout to generate it, or some library (e.g. cairo, PoDoFo, etc ...).
My question is whether fin works with complex files (such as .doc)
BTW, C++ standard IO is capable of reading binary files, but you need to write your parser for them (so you need to understand precisely your file format). You should prefer open formats to proprietary formats.

How do I input and output various file types in c++

I've seen a lot of examples of i/o with text files I'm just wondering if you can do the same with other file types like mp3's, jpg's, zip files, etc..?
Will iostream and fstream work for all of these or do I need another library? Do I need a new sdk?
It's all binary data so I'd think it would be that simple. But I've been unpleasently surprised before.
Could I convert all files to text or binary?
It depend on what you mean by "work"
You can think of those files as a book written in Greek.
If you want to just mess with binary representation (display text in Greek on screen) then yes, you can do that.
If you want to actually extract some info: edit video stream, remove voice from audio (actually understand what is written), then you would need to either parse file format yourself (learn Greek) or use some specialized library (hire a translator).
Either way, filestreams are suited to actually access those files data (and many libraries do use them under the hood)
You can work on binary streams by opening them with openmode binary :
ifstream ifs("mydata.mp3", ios_base::binary);
Then you read and write any binary content. However, if you need to generate or modify such content, play a video or display a piture, the you you need to know the inner details of the format you are using. This can be exremely complex, so a library would be recomended. And even with a library, advanced programming skills are required.
Examples of open source libraries: ffmpeg for usual audio/video format, portaudio for audio, CImg for image processing (in C++), libpng for png graphic format, lipjpeg for jpeg. Note that most libraries offer a C api.
Some OS also supports some native file types (example, windows bitmaps).
You can open these files using fstream, but the important thing to note is you must be intricately aware of what is contained within the file in order to process it.
If you just want to open it and spit out junk, then you can definitely just start at the first line of the file and exhaustively push all data into your console.
If you know what the file looks like on the inside, then you can process it just as you would any other file.
There may be specific libraries for processing specific files, but the fstream library will allow you to access any file you'd like.
All files are just bytes. There's nothing stopping you from reading/writing those bytes however you see fit.
The trick is doing something useful with those bytes. You could read the bytes from a .jpg file, for example, but you have to know what those bytes mean, and that's complicated. Usually it's best to use libraries written by people who know about the format in question, and let them deal with that complexity.

Difference between .dat and .txt in c++

I am using .txt files in my program for reading and writing records (records contains both text and numerals). Recently i came to know that .dat file also can be used like .txt for file operations. I would like to know the difference between the two and the advantages and disadvantages of one over another.
Text files or .txt files are a bit hard to parse in programs and easy to read. whereas .dat is usually used to store data that is not just plain text.
Generally .txt files contains letters, characters and symbols which is readable.
.dat is binary text file in which data is not always printable on screen.
The extension of a file is a helper so that the operating system (or user) can choose the appropriate program to open it. The actual file contents do not matter. There are some conventions what extensions to use but there is nothing from keeping you to use any arbitrary extension for your files. For instance you can rename a .jar file to .zip-file and be able to open the file with pkunzip.
So for C++ the extension does not matter, but for you as a programmer it may give a hint of the file contents i.e. open it in text or binary mode.
In most languages like C/C++ there is no difference what is your file type in file operations(Read, Write or Edit).
just if you want to work with binary files you should open them in binary format because if you reached \0 in text file it's file end. Dat files are binary too!
If you want to store and read some data, XML file and somtimes DAT files are better because of good libraries to read them. they don't need hard parsing of Text files

How to read and write doc, pdf files using files in c++

I m writing a c++ program using files and i need to take the input from existing files such as doc files and pdf files. how to program it in c++? And after getting the inputs, how can i write those details into a new doc or pdf files? Can anyone explain me with an example?
C++ as a language doesn't equip you with such features as "write to DOC file" or "read from PDF file". The only staff available to you a a programmer is raw byte-by-byte reading or writing. To make your new brand file as PDF/DOC/etc compatible you have to conform the chosen file format. The same about reading - you should understand which portions of raw byte array are responsible for what.
In common, this task named as "parsing" or "serialization". And it's a good idea to use one of existing parsers for particular file format instead of reinventing the wheel. Moreover, some file formats can be patent-pending so you may be not allowed to deal with it without license purchase.
Some clues so far:
PDF parsing in C++ (PoDoFo)
Microsoft word Text Parser in "C"
There are some libraries available on the web now(the question is from 2013, maybe that time there weren't many).
Apart from the links in selected answer, you can try PDFTron. It also supports new features, eg. Linearization.
Here is one of their samples is ->
https://www.pdftron.com/documentation/samples/cpp/TextExtractTest
(That program itself contains 4 if blocks, with slightly different features of the library/SDK, to try)
There should be more, search on the web for PDF parsing libraries.