Parsing markdown with C discount library - c++

I'm writing a markdown editor (C++/Qt) and i'm using discount library for that purpose.
Documentation: http://www.pell.portland.or.us/~orc/Code/discount/
i wrote that code to convert HTML to markdown.
#include <mkdio.h>
#include <stdio.h>
#include <string.h>
int main()
{
FILE *out;
out = fopen("/home/abdeljalil/test.html","w");
const char* mkdown= "__hello__";
MMIOT *doc;
int flags = MKD_TOC|MKD_SAFELINK|MKD_EXTRA_FOOTNOTE;
doc = mkd_string(mkdown,strlen(mkdown),flags);
mkd_compile(doc,flags);
mkd_generatehtml(doc,out);
mkd_cleanup(doc);
}
is using output file an efficient method? (i will update the GUI every time markdown is changed in the editor)
can i write HTML directly to a string instead of file? (can't find such function)
is there any other notes to optimize the code?

Markdown is sort of notorious for being a bit hacked together, nonstandard, and contradictory. Anyone (including myself) who has tried to write a Markdown-to-visual system can tell you just how puzzling/maddening it is. I don't know about "discount" but see CommonMark.org for some current state of thinking from formerly-of-StackOverflow-Jeff and others.
Doing a full reformat of the document on each edit (on entry to idle so as not to block user input) to produce a markdown preview is probably okay for modestly sized documents. Haven't looked at the StackOverflow JavaScript but it is probably doing precisely that.
Your library documentation says:
There are 17 public functions in the markdown library, broken into three categories:
Those functions are file based. As far as I know, you aren't going to find any platform-independent convenience layer allowing you to pass a std::stringstream or otherwise as a C stream FILE *:
cstdio streams vs iostream streams?
You could look into fmemopen to avoid the file creation and write to a buffer, though:
http://www.gnu.org/software/libc/manual/html_node/String-Streams.html
So perhaps investigate that.
Finding the size of a file created by fmemopen
More generally, I might suggest that starting from scratch to wrap a random C-based FILE stream Markdown library up in a Qt editor is a bit of a fool's errand. Either embrace an existing project like CuteMarkEd or embed a JavaScript engine to run the common markdown code, or... something.

Related

"Format" CSV output in Qt

Hi have a c++ program that generates a CSV file, all works fine but when I open the CSV file it looks rather messy and I have to manually expand columns to read all the text.
My question is, is there a way in Qt to do ay kind of formatting when generating CSV file e.g. make columns a certain width?
The QString class has several methods that you can use to format the fields of a CSV file.
As an example consider the QString::leftJustified method (link to Qt docs here). Also, you may want to check the right equivalent.
The main advantage of using Qt APIs, in this case, is that you do not have to split formatting and arguments as with standard C++ APIs based on streams.
Anyhow the best API, among those of the QString class, depends on your specs. Check the Qt docs to learn more.

outputting pdf files using C++ file I/O

I was trying to output a string to a file using File I/O of C++, but I decided to change the extension of the output file to .pdf, .docx and so on.
None of them seem to open up.
So are there any more internal translations required in order to make that file a proper(file which opens) pdf or docx??
void convert(char *file){
ifstream in(file);
ofstream out("out.pdf",ios::out);
char str;
while(in && out){
in.get(str);
out.put(str);
}
in.close();
out.close();
cout<<"Done!!!";
}
PDF is (and also docx which actually is OOXML; and of course also OpenDocument & DocBook) a file format (and quite a complex one), specified in a complex English document (the ISO 32000 standard) defining which sequences of bytes are valid (and what they represent).
Grossly speaking a PDF file contains elementary steps to draw ink on paper or pixels on screen. With gross simplification, these steps include things like "move the pen to position x=23 y=50", "choose the Arial font of 10pt", "draw the word abc in that font at that position", "choose the pink color for the pen" etc etc, but the details are much more complex.
File extensions don't mean anything (except as an important convention).
To generate a PDF file, you either need to spend weeks or months to study the specification of that PDF format, or to use some library for that (see this question, and also podofo, poppler and several other libraries). Even with the help of a library, you need to understand something about that format.
Are there any more internal translations required in order to make that file
I'm not sure to understand what you mean by translation (a better word would be "converter"). You could generate PDF by sending some other content to a suitable program emitting PDF. You might consider using some document formatter (like LaTeX thru pdflatex, or Lout, or some older variant of troff, etc..) which you would feed with a higher-level file format containing text formatting directives mixed with (some encoding of) the text to be formatted. On Unix like systems you might even use some command pipeline (to avoid writing some "temporary" file), perhaps using popen(3) and related functions.
To write a pdf, docx, etc. you need to setup a file header, and format the data correctly. Changing the extension does not really do much if anything at all.
You will have to use an outside an outside library or format the code yourself. Below are a couple of examples of extensions you can use:
PDF
docx

Output a COLLADA document as a string using COLLADA DOM

I'm working on a project to add COLLADA export functionality to an existing program (PyMOL), and trying to use the COLLADA DOM library to help write the output file. There is an existing structure for the various "save" functions that I would like to follow, wherein the text to be exported is appended to a variable-length array, which is then written to disk by the parent function.
The trouble I'm having is this: I haven't found a way to output the COLLADA file as a string so I can append it to the VLA. All the DOM examples I've found work directly with files, reading from and saving to them, and after spending several days combing through the source, I can't find a function that will return the XML string for a DAE (COLLADA) object.
For example, it's possible to write a file to disk using the DOM's write() or writeAll() functions, like this example from the DOM Guide:
#include <dae.h>
int main() {
DAE dae;
dae.add("simple.dae");
dae.writeAll();
return 0;
}
What I'd like to do instead is something like this:
string generateXmlString() {
DAE dae;
dae.add("simple.dae");
string output = dae.getXml("simple.dae"); // this function doesn't exist
return output;
}
where the XML string is generated and instead either assigned to a variable or returned directly, instead of being written to a file.
Is there a way to do this using COLLADA DOM? If so, I would love to see an example.
Alternatively, is there another library that would allow me to accomplish this in a more straightforward manner?
Thanks!
I tried to get COLLADA DOM working for quite a while, but eventually gave up and decided to use libxml2 instead. It's well documented and just as easy to use, and doesn't add much in the way of extra dependencies, as it's preinstalled on many UNIX-like systems.

How to read and write doc, pdf files using files in c++

I m writing a c++ program using files and i need to take the input from existing files such as doc files and pdf files. how to program it in c++? And after getting the inputs, how can i write those details into a new doc or pdf files? Can anyone explain me with an example?
C++ as a language doesn't equip you with such features as "write to DOC file" or "read from PDF file". The only staff available to you a a programmer is raw byte-by-byte reading or writing. To make your new brand file as PDF/DOC/etc compatible you have to conform the chosen file format. The same about reading - you should understand which portions of raw byte array are responsible for what.
In common, this task named as "parsing" or "serialization". And it's a good idea to use one of existing parsers for particular file format instead of reinventing the wheel. Moreover, some file formats can be patent-pending so you may be not allowed to deal with it without license purchase.
Some clues so far:
PDF parsing in C++ (PoDoFo)
Microsoft word Text Parser in "C"
There are some libraries available on the web now(the question is from 2013, maybe that time there weren't many).
Apart from the links in selected answer, you can try PDFTron. It also supports new features, eg. Linearization.
Here is one of their samples is ->
https://www.pdftron.com/documentation/samples/cpp/TextExtractTest
(That program itself contains 4 if blocks, with slightly different features of the library/SDK, to try)
There should be more, search on the web for PDF parsing libraries.

How to embed resources into a single executable?

If you've ever used the tool Game Maker, it's a bit like that. I want to be able to take all my sounds, images, and everything else of the like and embed them into a single C++ executable. Game Maker would have a built-in editor, and would have the images embedded into the .gmk file, and when you'd open it it would read the images, and display them in the game. I'm thinking he had the images saved not as images, but as pure data stored in the .gmk file and interpreted by the editor or by some interpreter written into the .exe. How would I go about making something similar?
The windows resource system works like this, so if you make a WinAPI or MFC application, you can use this. Also, Qt provides the same functionality, but in a platform independent way. They just write the files in raw binary format into a byte array in a normal C++ file, so they get compiled as data into the exe. Then they provide functions for accessing these data blocks like normal files, although I don't know how they really work. Probably a special implementation of their file class which just accesses those byte array variables.
For images only, a very simple approach is to use the XPM format.
This format is a valid C/C++ header, so you can include it directly into a C++ source file and use it directly.
The main issue with this approach is that XPM is not a compressed format, so uses a lot of storage.
In consequence, in practice I only seen this used for icons and small graphical objects, but in principle you could do more.
The other cool thing about XPM is that it's human readable - again great for designing small and simple icons.
To generalize this idea to other formats, what you could do is to create a compile chain that:
Encodes the target file as ASCII (Uuencode or such)
Turns that into a single named C String in a source file.
Create a header just declaring the name
Define a function recovering the binary form from the string
For the Windows OS I have a solution if you are willing to use another tool and possibly framework. Try the "peresembed" tool. It embeds files into PE image sections so you can have all your images, sounds and configuration files in just one EXE file. Supports compression too, although you do need a ZIP in-memory reading framework then. Can even embed files into the PE resource tree based on their relative file paths.
Example usage:
peresembed -file content.txt _export_to_resolve input.exe output.exe
In your C++ file you have:
struct embedded_data
{
void *dataloc;
size_t datasize;
};
extern "C" __declspec(dllexport) const volatile embedded_data _export_to_resolve = { 0 };
Get peresembed from: https://osdn.net/projects/pefrm-units/releases/70746
Showcase video: https://www.youtube.com/watch?v=1uYdjiZc5XI