I m writing a c++ program using files and i need to take the input from existing files such as doc files and pdf files. how to program it in c++? And after getting the inputs, how can i write those details into a new doc or pdf files? Can anyone explain me with an example?
C++ as a language doesn't equip you with such features as "write to DOC file" or "read from PDF file". The only staff available to you a a programmer is raw byte-by-byte reading or writing. To make your new brand file as PDF/DOC/etc compatible you have to conform the chosen file format. The same about reading - you should understand which portions of raw byte array are responsible for what.
In common, this task named as "parsing" or "serialization". And it's a good idea to use one of existing parsers for particular file format instead of reinventing the wheel. Moreover, some file formats can be patent-pending so you may be not allowed to deal with it without license purchase.
Some clues so far:
PDF parsing in C++ (PoDoFo)
Microsoft word Text Parser in "C"
There are some libraries available on the web now(the question is from 2013, maybe that time there weren't many).
Apart from the links in selected answer, you can try PDFTron. It also supports new features, eg. Linearization.
Here is one of their samples is ->
https://www.pdftron.com/documentation/samples/cpp/TextExtractTest
(That program itself contains 4 if blocks, with slightly different features of the library/SDK, to try)
There should be more, search on the web for PDF parsing libraries.
Related
I've seen a lot of examples of i/o with text files I'm just wondering if you can do the same with other file types like mp3's, jpg's, zip files, etc..?
Will iostream and fstream work for all of these or do I need another library? Do I need a new sdk?
It's all binary data so I'd think it would be that simple. But I've been unpleasently surprised before.
Could I convert all files to text or binary?
It depend on what you mean by "work"
You can think of those files as a book written in Greek.
If you want to just mess with binary representation (display text in Greek on screen) then yes, you can do that.
If you want to actually extract some info: edit video stream, remove voice from audio (actually understand what is written), then you would need to either parse file format yourself (learn Greek) or use some specialized library (hire a translator).
Either way, filestreams are suited to actually access those files data (and many libraries do use them under the hood)
You can work on binary streams by opening them with openmode binary :
ifstream ifs("mydata.mp3", ios_base::binary);
Then you read and write any binary content. However, if you need to generate or modify such content, play a video or display a piture, the you you need to know the inner details of the format you are using. This can be exremely complex, so a library would be recomended. And even with a library, advanced programming skills are required.
Examples of open source libraries: ffmpeg for usual audio/video format, portaudio for audio, CImg for image processing (in C++), libpng for png graphic format, lipjpeg for jpeg. Note that most libraries offer a C api.
Some OS also supports some native file types (example, windows bitmaps).
You can open these files using fstream, but the important thing to note is you must be intricately aware of what is contained within the file in order to process it.
If you just want to open it and spit out junk, then you can definitely just start at the first line of the file and exhaustively push all data into your console.
If you know what the file looks like on the inside, then you can process it just as you would any other file.
There may be specific libraries for processing specific files, but the fstream library will allow you to access any file you'd like.
All files are just bytes. There's nothing stopping you from reading/writing those bytes however you see fit.
The trick is doing something useful with those bytes. You could read the bytes from a .jpg file, for example, but you have to know what those bytes mean, and that's complicated. Usually it's best to use libraries written by people who know about the format in question, and let them deal with that complexity.
I am creating an OpenGL game and I would like to make it open to more languages than just English for obvious reasons. From looking around and fiddling around with the games installed on my computer I can see that locales play a big part in this and that .lang files, such as en-US.lang that is shipped with minecraft, are basically text documents with a language code, "item.iron.ingot" for example, an equal sign, and then what it means for that given language, English as per en-US, so in this case would be, "Iron Ingot". Well I created a file that I named en-US.lang and this is its contents:
item.iron.ingot=Iron Ingot
In my C++ main method I put:
setlocale(LC_ALL, "en-US");
After including the locale header file. So I suppose the part that I am confused by is how to use the locales to read from the .lang file? Please help SO and some example code would be appreciated.
C++ Does not come with a built-in support for resource files / internationalization. However there is a huge variety of solutions.
To support multi-language messages, you should have some basic understanding of how such strings are encoded in files and read to memory. Here is a basic introduction if you are not familiar:
"http://www.joelonsoftware.com/articles/Unicode.html"
To keep and load the correct text at runtime you need to use a third party library: GNU gettext http://www.gnu.org/software/gettext/ is one such example. However there are other solutions out there.
I am a beginner in visual studio and has only code C and C++ in command line settings.
Currently, I am taking a module(software development) which requires me to come up with an expense tracker - a program which helps user tracks his/her daily expenses. Therefore, at the end of each individual day, or after a user uses finishes the program, we would have to perform data storage to store all the info in one place which we would export it during the next usage.
My constraint include not using any relational database(although i have no idea what it is :( ). Data storage must be done using XML or text files. Following this, I have several questions regarding data storage:
1) If data is stored successfully, do we export it everytime we start the program? And everytime after the user closes the program, we overwrite the existing data file and then store it accordingly?
2) I have heard from some people that using text file may be easier. Searching on the internet and library only provides me with information regarding XML and not text. Would anyone be able to help me with it? Like tutorials link and stuff?
Thank you very much!
File writing/handling works similar to every other buffer in c++.
you can enable file handling using the fstream header. You can create a file, write to it and over-write every time the program is run, or can even create a file the first time the program is run and then append to it every subsequent time the program runs.
Ive only ever done text files, never tried XML, but Im guessing they're similar.
http://www.cplusplus.com/doc/tutorial/files/ should give you everything you need to know.
Your choice of XML vs plain text depends on the kind of data that you'll be storing.
The reason why you'll only find XML libraries on the internet is because XML is a lot more complicated than plain text. If you don't know what XML is or if the data that you're storing isn't very complex, then I would suggest going with plain text.
For example, to track expenses, you might store a file like this:
sandwich 5.00
coffee 2.30
soft drink 1.50
...
It's very easy to read/write lines like this to/from a file in C++.
i am writing a search program in c++ which will search for a set of words in a set of files.. these files are either text files or docx files.The problem is how can i search a docx file in c++, i cannot open it even,if i need to convert it to text file, what is the procedure and how will i search it?
.docx is zip with a bunch of XML files in it. It's documented at http://openxmldeveloper.org/articles/GuidedTourOfSpecPart1.aspx
The OOXML file formats are officially documented in ECMA-376. There's an equivalent ISO standard (29500, if memory serves), but I believe you have to pay to get it, and the two are identical1. As a warning, however, these are huge documents, and the file formats themselves are definitely non-trivial to deal with. Just getting at the raw text is a relatively easy task, but still not exactly trivial.
1 The ECMA standard was accepted by the ISO under its "fast track" program, where they accept an existing standard intact, even in some cases where it doesn't completely follow the normal ISO guidelines.
If writing your own OOXML parser is not an option, you could convert your docx files with docx2txt .
I've seen a lot of games use something similar to a .DAT file or a specific file type that the game has for itself. I'm just beginning with C++ and DirectX and I was interested in keeping my information in something similar to a .DAT.
My initial conception was that it would hold information on the files you wanted to store within the .DAT file. Something similar to a .RAR file. Unfortunately, my googleing skills did not help me in finding the answers.
Right now I'm simply loading textures and sound files from a folder called Data.
EDIT: While I understand that .DAT is short for data, and I've found that a .DAT file generally contains any assortment of information, I'm still unsure about how to go about doing something as packing images and sound files into any type of file and being able to read them.
I'm not sure about using fstreams to achieve my task, however I will look into streams related to storing data and how to properly read from that data. Meanwhile if anyone has another answer to offer based on this new information, it would be appreciated.
EDIT: Thanks to the answers, I stumbled across a similar question on stackoverflow and felt I'd share it here. Combining resources into a single binary file
I don't think there is really such thing as .dat file format. It's short for "data," and different applications just put in some proprietary stuff in it and call it ".dat." You can read up on fstream classes to do file IO in C++. See Input/Output with files.
What you then do is make up your own file format. For example, first 4 byte is int that indicates the number of blocks in the .dat and for each block, you have 4 byte indicating the length of each block, 4 byte indicating the type of the block, the variable length data itself .. something like that.
DAT obviously stands for data, and there is no real or de facto standard on what that extension actually refers to. Your decisions on the best file formats should be based on technical considerations, not pointless attempts at security through obscurity.
Professional games use a technique where they put all the needed resources (models, textures, sounds, ai, config, etc) zipped/packed into a single file thus making it faster to manage, harder to change (some even make use of a virtual filing system from what's inside the data file). Now, for what's inside the file is different depending on the needs of the game and the data structures that you use.
If you're just starting into gamedev, i recommend you stick with keeping all you assets separate and don't bother too much about packing them into a single file.
Now if you really want to start using a packed format here's a good pointer:
Creating a PAK File Format
Here's a link which claims that .dat is a movie format, 'DAT' being short for Digital Audio Tape.
I'm not sure I believe the link, but I do remember something about a Microsoft supported format called DAT, from long ago, when I used an earlier version of Windows.
It makes more sense as a logical extension for a DATA file of some kind.
.dat, as others have said, is literally just a data file. In reality, the file extension means nothing other than association with a program. For example, I could make a word processor that saves all the documents with the .mp3 file extension. These files wouldn't be playable in any media software, but the software might try. File extensions are used to help programs know what types of files they can and cannot open--however those rules don't have to be followed.
Anyway, you can dump any sort of information to a file. Programmers/software writers will often choose .dat as the extension of that file because it has become the standard to signify 'this file just holds a ton of data' and that the data doesn't necessarily hold any standardized headers, footers, or formatting.
A dat file could really contain anything. It might be as simple as a zip archive with the extension changed, or it could be a completely custom file type. If you're just starting out, you probably don't want to write your own file format, although doing so can be fun and educational. If you want to encapsulate your data files into some kind of container, you should probably go with a zip, paq, or maybe tar.gz.