C++: Text File's Extensions? [closed] - c++

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
When i use text files for input and output using fstream filestream, the file extension used is .txt
I have seen people use instead of .txt:
They use .DAT and still they open it in a text editor as if it were a text file.
So is DAT a text file extension and are there what are all the extensions i can use with text files.?

The short answer: text files can have any extension you want, including NO extension. You can take somefile.txt and rename it to somefile.XYZPDQ if you feel like it. It will still be a text file.
That's from a pure C++ language perspective. At the operating system level, a file extension may be associated with a certain program type (you might have .mp4 videos open in a video player, for example). But you can still call any text file anything you want. Nothing stops you from doing this.

An extension is just part of file name. There is no difference what extension you use DAT or TXT. I mean, extensions help to people to recognize the file type, only to people.

You can use any extension, but :
using .txt, you usually give a hint that the file can be opened in a text editor (like vim or notepad) and will be readable by humans
using .dat, you usually give a hint that the file is binary and cannot be opened with a text editor. One should use a special program (maybe yours ;) or an hexadecimal binary editor, and its content will not be easily readable or modifiable.
another common extension you may use is .csv for comma-separated-values files (even when not using comma but tabs or anything), than can be opened either in a text editor or in a spreadsheet app like openoffice of excel.
windows users often use .ini extension to hint that the file is a text file (viewable in a text editor) containing some key/value paramters like ConfirmBeforeExit=true. By extension, it is used for any text file containing parameters.
another one is .log, hinting that the file is a text file, containing the log of execution of something. A linux user will then immediately do a tail -f foo.log while the app is running to look for problems.
By the way, using upper case extensions like TXT or DAT is a reminder of old DOS time and is now considered bad style. Just use lower case.

You can use any extension, because it does not matter. The .dat extension is usually used for binary data, so it may not be obvious for users of your program that it is in fact an editable text file.

The extension has no effect on what type of data you can put in the file. For example you can use TXT, DAT, and even(not recommended) EXE. It's best to stick with one extension. If something is meant to be read by a human, I would use TXT, but DAT or the like to indicate otherwise.

Related

Does fin in C++ work with .doc files?

I used fin to read in a .doc file, and then store all the text in a string. When I tried printing the string, I just saw unknown characters.
When I copied the contents of the .doc file into a .txt file and then read the .txt file in using fin, everything worked fine.
My question is whether fin works with complex files (such as .doc) or just with .txt files. I only had text in my .doc file (no graphics or anything), but the font was Calibri, which is not the font that fout uses to print text to a .doc file.
If by fin you mean an fistream yes it will work to read the file contents, however in the case of complex files you have to deal with the file format, the c++ library will not automatically extract just the text contents. In the case where you saved the file as text that's all that is left and so that's all a stream would read.
fstream by default does all operations in text mode and .doc files use MS-DOC binary file format. So probably when you tried to read the doc file and print it, it showed characters that you couldn't understand (probably that was binary).
If you try to read any file in fstream, it does read it.
I tried reading a .mp4 file in binary using fstream and it did read the file( i can assure that because i pasted the read contents in another file and that file turned out to be the same video).
So answer to your question is you can read any file in fstream but fstream does all this operations in only two ways, either text or binary.
So reading just any file won't do much good unless you want to do something like copying the file contents to another.
You first need to understand the .doc file format. Read first the doc (computing) wikipage. It is very complex (so you'll need months of work at least) but more or less documented.
You could consider a different approach to your overall goal. For example, if you need to parse a .doc file (provided by some Microsoft Word software), you might use libreoffice which provides some library to parse it, or you could find another library (e.g. DocxFactory, wvware, ...), or you could use some COM interface to Word (on a Microsoft Windows operating system with MicroSoft Word installed).
If your goal is to generate some document, you might consider the PDF format (which is a standard), perhaps using some text formatter like LaTeX or Lout to generate it, or some library (e.g. cairo, PoDoFo, etc ...).
My question is whether fin works with complex files (such as .doc)
BTW, C++ standard IO is capable of reading binary files, but you need to write your parser for them (so you need to understand precisely your file format). You should prefer open formats to proprietary formats.

Opening an existing .doc file using ofstream in C++

Assuming I have a file with .doc extension in Windows platform, how can I open the the file for outputting its contents on the screen using the ofstream object in C++? I am aware that the object can be used to open files in text and binary modes. But I would like to know if a .doc (or even .pdf) file can be opened and its contents read.
I've never actually done this before, but after reading up on it, I think I might have a suggestion. The .docx format is actually just XML that is zipped up. After unzipping, the file is located at word/document.xml. Doing this in a program is where it gets fun.
Two options: If you're using C++ CLR (.NET) then Microsoft has an SDK for you. It should make it pretty easy to open Office documents.
Otherwise if you're just using regular C++, you might have to do some extra work.
Open the file and unzip it using a library like zlib
Find the document.xml file inside
Parse the XML document. You'll probably want to use some kind of XML parsing library for this. You'll have to look up the specs for the XML to figure out how to get the text you want.
C++ std library has ifstream class that can be used to read simple text files, and for read binary files too.
It is up to you to interpret these bytes in the file. To proper interpret the binary file you need to know the format of the file.
If you think of MS Word files then I would start from here: http://en.wikipedia.org/wiki/Office_Open_XML to understand MS Word 2007 format.
You might find the Boost Iostreams library ( http://www.boost.org/doc/libs/1_52_0/libs/iostreams/doc/home.html ) somehow useful if you want to make some filter by yourself.

Turn .txt file into .pdf file on the fly? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 months ago.
Improve this question
I'm we're trying to figure out if there would be a way to convert a .txt file to a .pdf file. Here's the catch. This needs to be done behind the scenes, and on the fly. Meaning, with a radio control selected, OnOK would create a .txt file. Behind the scenes, at run time, we would like for the .txt file to be converted to a .pdf file. Ideally we would like this to be done by running an executable in the background. The executable would take input "File.txt" and output "File.pdf". We're using C++ and Visual Studio 6.
Does anyone have any experience on this? Is this possible?
libHaru may do what you want. Demo.
This a2pdf tool will probably do the trick with minimal effort. Just be sure to turn off perl syntax highlighting.
http://perl.jonallen.info/projects/a2pdf
I recommend using this open source library.
Once you have the base for generating PDF documents programmatically, you would still need a method for converting the text to the PDF elements, while keeping the text flow and word wrapping. This article may help. Please pay attention to the DoText(StreamReader sr) function. It takes text and purge it into separate lines within the PDF document, keeping the rendered within the margins.
On of the simpler methods that has worked for 3 decades e.g. more than one quarter of a century is place a postscript header before the text then use ghostscript ps2pdf it is the same method as used by some commercial apps such as acrobat
at its most basic
Copy heading.ps file.txt printfile.ps
GS -sDEVICE=pdfwrite printfile.ps printfile.pdf
Master Example can be seen here
How to modify this plaintext-to-PDF-converting PostScript from 1992 to actually specify a page size?

how to search for a word in a docx file in c++?

i am writing a search program in c++ which will search for a set of words in a set of files.. these files are either text files or docx files.The problem is how can i search a docx file in c++, i cannot open it even,if i need to convert it to text file, what is the procedure and how will i search it?
.docx is zip with a bunch of XML files in it. It's documented at http://openxmldeveloper.org/articles/GuidedTourOfSpecPart1.aspx
The OOXML file formats are officially documented in ECMA-376. There's an equivalent ISO standard (29500, if memory serves), but I believe you have to pay to get it, and the two are identical1. As a warning, however, these are huge documents, and the file formats themselves are definitely non-trivial to deal with. Just getting at the raw text is a relatively easy task, but still not exactly trivial.
1 The ECMA standard was accepted by the ISO under its "fast track" program, where they accept an existing standard intact, even in some cases where it doesn't completely follow the normal ISO guidelines.
If writing your own OOXML parser is not an option, you could convert your docx files with docx2txt .

Should the text in a C++ text based game be in the code or in external files? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am creating a text based game using C++ for a school project, the game works by allowing the user to pick a choice from a list of options in each scene; similar to how the games hosted by Choice of Games work. As a result of this I have a large amount of text that must be displayed in my game, however I am unsure as to the proper conventions when working with large amounts text in a program. Should I simply make use of std::cout and write the text directly into the code, or should I write into text files an used std::ifstream in order to read the text.
My only major concern regarding the use of files to hold the text is that each choice the user makes results in a different paragraph being displayed and as a result I believe that I would need to create a text file for each paragraph, which seems like it will lead to more issues (such as using the wrong file name or mistyping my code leading to the game reading from the wrong file) than writing the text straight into the code could. If there is a way to read particular sections of a text file then this would be useful to know, however I am currently unaware of any such method. However I am new to C++ and I am certain that there is plenty that I have yet to learn so I would not be surprised if such a method did exist.
Any help is greatly appreciated, be it anything from simply telling me if I should enter text into my code or into files, to telling me if there is a way to read text from specific sections of a text file. And once again, I am very grateful for any help you can provide.
Please don't put displayed text into code. That's an antipattern. You have to recompile your game for every minor text change like fixing typos, and for major changes like translating into other languages.
Convention for most programming languages is to put all the displayed text into (a few) resource files or properties files as key-value pairs, where the code only references the key of the paragraph to be displayed and the value will be loaded from that external file. (Usually once during startup.) No need to use one file per paragraph, but the kv pairs have to be parsed. There'll be utilities for you to reuse.
I recommend using external files. It makes changing the content much easier and doesn't require recompiling the entire program for a simple typo.
You can use one file and just separate each paragraph with a blank line. Grabbing "all text between blank lines" at that point is trivial.
If the choices cause the paragraph choices to jump around the file you can give them IDs and load them on-the-fly by searching linearly through the file for a given ID.
--EDIT--
As per the request here is an algorithm or two:
Algorithm 1:
Give each paragraph an ID, usually a simple number on the line immediately above the paragraph.
Separate each number-paragraph pair by blank lines.
Parse the file line-by-line looking for a "line" that contains only a number.
From that point you found the paragraph you are looking for, all lines until the next blank is the content of that paragraph.
Display to user.
Algorithm 2 (recommended):
Use XML to store your paragraphs and their IDs.
Use TinyXML2 to parse the file: http://www.grinninglizard.com/tinyxml2/index.html
If you do not plan to translate you game to other languages, you are on your own, both approaches have their pros and cons:
text in source: easy to write, text is near the place where it is used.
text in resource files: easier to remove duplicate strings, forces a better structure of text data.
If you simply imagine that your application could be translated, then you should put all text in ressource files. You can even find framework that will assist your for translations as Gnu gettext, but you can find others, for example qt has its own translation tools.
Storing text in the program files is not a good coding practice. This would result in unnecessary code bloat (it's not even code) and the need to recompile if you need to change the text.
A simple solution would be to create a text file with careful formatting like line numbers or whitespace that would allow you to pull out the desired text.
A more elegant solution would be to put the necessary text in xml or json files, and read them into your program when necessary. This would be a great choice.