I was trying to output a string to a file using File I/O of C++, but I decided to change the extension of the output file to .pdf, .docx and so on.
None of them seem to open up.
So are there any more internal translations required in order to make that file a proper(file which opens) pdf or docx??
void convert(char *file){
ifstream in(file);
ofstream out("out.pdf",ios::out);
char str;
while(in && out){
in.get(str);
out.put(str);
}
in.close();
out.close();
cout<<"Done!!!";
}
PDF is (and also docx which actually is OOXML; and of course also OpenDocument & DocBook) a file format (and quite a complex one), specified in a complex English document (the ISO 32000 standard) defining which sequences of bytes are valid (and what they represent).
Grossly speaking a PDF file contains elementary steps to draw ink on paper or pixels on screen. With gross simplification, these steps include things like "move the pen to position x=23 y=50", "choose the Arial font of 10pt", "draw the word abc in that font at that position", "choose the pink color for the pen" etc etc, but the details are much more complex.
File extensions don't mean anything (except as an important convention).
To generate a PDF file, you either need to spend weeks or months to study the specification of that PDF format, or to use some library for that (see this question, and also podofo, poppler and several other libraries). Even with the help of a library, you need to understand something about that format.
Are there any more internal translations required in order to make that file
I'm not sure to understand what you mean by translation (a better word would be "converter"). You could generate PDF by sending some other content to a suitable program emitting PDF. You might consider using some document formatter (like LaTeX thru pdflatex, or Lout, or some older variant of troff, etc..) which you would feed with a higher-level file format containing text formatting directives mixed with (some encoding of) the text to be formatted. On Unix like systems you might even use some command pipeline (to avoid writing some "temporary" file), perhaps using popen(3) and related functions.
To write a pdf, docx, etc. you need to setup a file header, and format the data correctly. Changing the extension does not really do much if anything at all.
You will have to use an outside an outside library or format the code yourself. Below are a couple of examples of extensions you can use:
PDF
docx
Related
I used fin to read in a .doc file, and then store all the text in a string. When I tried printing the string, I just saw unknown characters.
When I copied the contents of the .doc file into a .txt file and then read the .txt file in using fin, everything worked fine.
My question is whether fin works with complex files (such as .doc) or just with .txt files. I only had text in my .doc file (no graphics or anything), but the font was Calibri, which is not the font that fout uses to print text to a .doc file.
If by fin you mean an fistream yes it will work to read the file contents, however in the case of complex files you have to deal with the file format, the c++ library will not automatically extract just the text contents. In the case where you saved the file as text that's all that is left and so that's all a stream would read.
fstream by default does all operations in text mode and .doc files use MS-DOC binary file format. So probably when you tried to read the doc file and print it, it showed characters that you couldn't understand (probably that was binary).
If you try to read any file in fstream, it does read it.
I tried reading a .mp4 file in binary using fstream and it did read the file( i can assure that because i pasted the read contents in another file and that file turned out to be the same video).
So answer to your question is you can read any file in fstream but fstream does all this operations in only two ways, either text or binary.
So reading just any file won't do much good unless you want to do something like copying the file contents to another.
You first need to understand the .doc file format. Read first the doc (computing) wikipage. It is very complex (so you'll need months of work at least) but more or less documented.
You could consider a different approach to your overall goal. For example, if you need to parse a .doc file (provided by some Microsoft Word software), you might use libreoffice which provides some library to parse it, or you could find another library (e.g. DocxFactory, wvware, ...), or you could use some COM interface to Word (on a Microsoft Windows operating system with MicroSoft Word installed).
If your goal is to generate some document, you might consider the PDF format (which is a standard), perhaps using some text formatter like LaTeX or Lout to generate it, or some library (e.g. cairo, PoDoFo, etc ...).
My question is whether fin works with complex files (such as .doc)
BTW, C++ standard IO is capable of reading binary files, but you need to write your parser for them (so you need to understand precisely your file format). You should prefer open formats to proprietary formats.
I've seen a lot of examples of i/o with text files I'm just wondering if you can do the same with other file types like mp3's, jpg's, zip files, etc..?
Will iostream and fstream work for all of these or do I need another library? Do I need a new sdk?
It's all binary data so I'd think it would be that simple. But I've been unpleasently surprised before.
Could I convert all files to text or binary?
It depend on what you mean by "work"
You can think of those files as a book written in Greek.
If you want to just mess with binary representation (display text in Greek on screen) then yes, you can do that.
If you want to actually extract some info: edit video stream, remove voice from audio (actually understand what is written), then you would need to either parse file format yourself (learn Greek) or use some specialized library (hire a translator).
Either way, filestreams are suited to actually access those files data (and many libraries do use them under the hood)
You can work on binary streams by opening them with openmode binary :
ifstream ifs("mydata.mp3", ios_base::binary);
Then you read and write any binary content. However, if you need to generate or modify such content, play a video or display a piture, the you you need to know the inner details of the format you are using. This can be exremely complex, so a library would be recomended. And even with a library, advanced programming skills are required.
Examples of open source libraries: ffmpeg for usual audio/video format, portaudio for audio, CImg for image processing (in C++), libpng for png graphic format, lipjpeg for jpeg. Note that most libraries offer a C api.
Some OS also supports some native file types (example, windows bitmaps).
You can open these files using fstream, but the important thing to note is you must be intricately aware of what is contained within the file in order to process it.
If you just want to open it and spit out junk, then you can definitely just start at the first line of the file and exhaustively push all data into your console.
If you know what the file looks like on the inside, then you can process it just as you would any other file.
There may be specific libraries for processing specific files, but the fstream library will allow you to access any file you'd like.
All files are just bytes. There's nothing stopping you from reading/writing those bytes however you see fit.
The trick is doing something useful with those bytes. You could read the bytes from a .jpg file, for example, but you have to know what those bytes mean, and that's complicated. Usually it's best to use libraries written by people who know about the format in question, and let them deal with that complexity.
I'm writing a markdown editor (C++/Qt) and i'm using discount library for that purpose.
Documentation: http://www.pell.portland.or.us/~orc/Code/discount/
i wrote that code to convert HTML to markdown.
#include <mkdio.h>
#include <stdio.h>
#include <string.h>
int main()
{
FILE *out;
out = fopen("/home/abdeljalil/test.html","w");
const char* mkdown= "__hello__";
MMIOT *doc;
int flags = MKD_TOC|MKD_SAFELINK|MKD_EXTRA_FOOTNOTE;
doc = mkd_string(mkdown,strlen(mkdown),flags);
mkd_compile(doc,flags);
mkd_generatehtml(doc,out);
mkd_cleanup(doc);
}
is using output file an efficient method? (i will update the GUI every time markdown is changed in the editor)
can i write HTML directly to a string instead of file? (can't find such function)
is there any other notes to optimize the code?
Markdown is sort of notorious for being a bit hacked together, nonstandard, and contradictory. Anyone (including myself) who has tried to write a Markdown-to-visual system can tell you just how puzzling/maddening it is. I don't know about "discount" but see CommonMark.org for some current state of thinking from formerly-of-StackOverflow-Jeff and others.
Doing a full reformat of the document on each edit (on entry to idle so as not to block user input) to produce a markdown preview is probably okay for modestly sized documents. Haven't looked at the StackOverflow JavaScript but it is probably doing precisely that.
Your library documentation says:
There are 17 public functions in the markdown library, broken into three categories:
Those functions are file based. As far as I know, you aren't going to find any platform-independent convenience layer allowing you to pass a std::stringstream or otherwise as a C stream FILE *:
cstdio streams vs iostream streams?
You could look into fmemopen to avoid the file creation and write to a buffer, though:
http://www.gnu.org/software/libc/manual/html_node/String-Streams.html
So perhaps investigate that.
Finding the size of a file created by fmemopen
More generally, I might suggest that starting from scratch to wrap a random C-based FILE stream Markdown library up in a Qt editor is a bit of a fool's errand. Either embrace an existing project like CuteMarkEd or embed a JavaScript engine to run the common markdown code, or... something.
I m writing a c++ program using files and i need to take the input from existing files such as doc files and pdf files. how to program it in c++? And after getting the inputs, how can i write those details into a new doc or pdf files? Can anyone explain me with an example?
C++ as a language doesn't equip you with such features as "write to DOC file" or "read from PDF file". The only staff available to you a a programmer is raw byte-by-byte reading or writing. To make your new brand file as PDF/DOC/etc compatible you have to conform the chosen file format. The same about reading - you should understand which portions of raw byte array are responsible for what.
In common, this task named as "parsing" or "serialization". And it's a good idea to use one of existing parsers for particular file format instead of reinventing the wheel. Moreover, some file formats can be patent-pending so you may be not allowed to deal with it without license purchase.
Some clues so far:
PDF parsing in C++ (PoDoFo)
Microsoft word Text Parser in "C"
There are some libraries available on the web now(the question is from 2013, maybe that time there weren't many).
Apart from the links in selected answer, you can try PDFTron. It also supports new features, eg. Linearization.
Here is one of their samples is ->
https://www.pdftron.com/documentation/samples/cpp/TextExtractTest
(That program itself contains 4 if blocks, with slightly different features of the library/SDK, to try)
There should be more, search on the web for PDF parsing libraries.
Greetings all,
I am currently a rising Sophomore (CS major), and this summer, I'm trying to teach myself C++ (my school codes mainly in Java).
I have read many guides on C++ and gotten to the part with ofstream, saving and editing .txt files.
Now, I am interested in simply importing an image (jpeg, bitmap, not really important) and renaming the aforementioned image.
I have googled, asked around but to no avail.
Is this process possible without the download of external libraries (I dled CImg)?
Any hints or tips on how to expedite my goal would be much appreciated
Renaming an image is typically about the same as renaming any other file.
If you want to do more than that, you can also change the data in the Title field of the IPTC metadata. This does not require JPEG decoding, or anything like that -- you need to know the file format well enough to be able to find the IPTC metadata, and study the IPTC format well enough to find the Title field, but that's about all. Exactly how you'll get to the IPTC metadata will vary -- navigating a TIFF (for one example) takes a fair amount of code all by itself.
When you say "renaming the aforementioned image," do you mean changing metadata in the image file, or just changing the file name? If you are referring to metadata, then you need to either understand the file format or use a library that understands the file format. It's going to be different for each type of image file. If you basically just want to copy a file, you can either stream the contents from one file stream to another, or use a file system API.
std::ifstream infs("input.txt", std::ios::binary);
std::ofstream outfs("output.txt", std::ios::binary);
outfs << insfs.rdbuf();
An example of a file system API is CopyFile on Win32.
It's possible without libraries - you just need the image specs and 'C', the question is why?
Targa or bmp are probably the easiest, it's just a header and the image data as a binary block of values.
Gif, jpeg and png are more complex - the data is compressed