I'm looking for a C/C++ functional equivalent to HTML::Defang, and my Google-fu has not been able to uncover anything. I want to keep any benign tags and strip out/defang everything else. Lacking an actual library, any pointers to complete lists of tags/attributes/etc to defang would be appreciated. I know of http://en.wikipedia.org/wiki/DOM_Events. Thanks.
In Java, I use JTidy to clean up HTML. I'm not sure if it would suit your needs, but if you Google for JTidy you can follow the link to a C/C++ implementation as well, and see if it does what you want.
As for what to defang: Look at the W3C specs for HTML; any tag not in there doesn't belong in HTML. But again, I could be misunderstanding your "defang" concept.
libxml2 is free and should do what you want.
http://www.xmlsoft.org/
See this part of the API: http://www.xmlsoft.org/html/libxml-HTMLparser.html
The htmlReadFile() function might do the trick.
To get you started with libxml2 some examples can be found here:
http://www.xmlsoft.org/examples/index.html
Related
I have a set of .groovy files (Java). All of these files have the same comment format.
I developped a tool with wich I'm able to read those files and applying a REGEX to get all the comments in a list. (Finally i just have to copy paste these comments to .html file)
I would like to know if it's a correct practice in order to generate a HTML page with the comment (a kind of documentation). If not, what would you recommend ?
I read about Doxygen and Javadoc but i'm not sure about using them (if they can be really useful in my case since the comments are already written)
If you can suggest a library in order to generate easily a HTML Webpage or any other advice.
Any help is appreciated.
There exists Groovydoc which is roughly the equivalent of Javadoc, just for Groovy.
As your setup is not that (you already have comments, probably not in Groovydoc format, and you have half the tooling), there are still multiple ways open to you. As you already extract the documentation from groovy, if I were you, I would do a minimal post-formatting, if necessary, and output the documentation as markdown (e.g., github markdown) or asciidoc (e.g., asciidoctor). Then you can use any preferred tool to convert the post-formatted documentation into HTML.
To answer the question "How to parse the java comments" – you shouldn't. If possible, especially in a new project, stick with the standard tooling. In the case of Groovy that's Groovydoc. The normal (non Java/Groovy-Doc style) comments themselves you should never need to extract from the source code. They should be so much context-specific, that without the corresponding code they are anyways useless.
I'm actually starting creating a small language (in vb net, yes I know, maybe not a good idea).
I already started learning tutorials about regex, but apparently this function is saying me to get out).
I want to add some kind of commands, such as a command that allow you to arg. a /print command, something like:
/PRINT["Hello world";"blue";propety:{bold;italic}]
So, for me, the regex is :
"{{^\^{\|^#\^~\{}~\^]|\~^[}^\}^#~\[}~^\}^##{\~{^}^#\#~#}\^#}^]|\|}]#\|{"
So you understand that's not something I like writing.
Would you show me how to construct regex code for the first command I let?
Regex alone isn't the best way to create a language that, well, actually works.
Read this article for more info. I'm sure you can find better way to write a language if you really need to write it. In vb.net...
Anyway, if you insist on writing it in vb, I found a video that will help you with it.
I'm looking for a faq or overview on C/C++ template-file variables in Netbeans (7.0).
(Not to be confused with the template technique). Those you see under Tools > Templates > C++ templates.
e.g.
%<%CLASSNAME%>% %<%DEFAULT_HEADER_EXT%>% %<%DATE%>%
which are automatically filled when you create a new cpp/header file out of that file-template.
The help for the Java template-variables with Freemarker is very extensive, but I found nothing for the C++ equivalent.
When I did a search on CLASSNAME DEFAULT_HEADER_EXT, google gave me 5 results... which were not helpful. So if there is a reference or api, it seems to be hidden somewhere... Not even the netbeans site had any information about that.
And if there is nothing, maybe someone can at least tell me if there is a way to format the %DATE% variable (like this in Java's Freemarker format: ${date?date?string("yyyy")} ).
Still no luck... can't believe that such a feature is not documented... Any help would be appreciated :)
Thanks
I know it's an old question but just stumbled on it and think it's good to have it mentioned:
The documentation of all predefined template variables including the date formatting may be found here: http://wiki.netbeans.org/FaqTemplateVariables
What is the best (and preferably lightweight) library out there to programatically build html documents from C/C++? I have used TinyXML before, but I thought it must be some lib more specifically aimed at html.
EDIT: I was unclear. I did not mean for documenting the C++ code, but rather to create html documents from scratch by creating tags and attributes. In my case, by "best" I mean a lightweight lib, but that gives me better error checking than just "my_file << strBodyStartTag << endl; style programming
I am not quite sure what did you mean by "building html docs from C/C++" but if your purpose is to create function / library reference documentation from source codes, Doxygen should be ideal for that. It is widely used and well supported.
If you stick with xhtml, you should be able to keep using TinyXML.
I want my program to search wikipedia and get the info it searches for and put it into a large string and output into a file. How can I do that in C++? Any info please tell? need more anwsers please
Use wget with the query URL
wget --output-document=result.html http://en.wikipedia.org/wiki/Special:Search?search=jon+skeet&go=Go
This searches for jon skeet and stores the result in result.html
To use it from C++ you can e.g. use the system() call to execute wget in a seperate process.
libcURL is pretty popular. I don't know that the interface is especially object-oriented, but it's certainly usable from C++.
There are a number of client APIs for MediaWiki (the wiki engine that powers Wikipedia). Here's a listing. They provide the ability to create/delete/edit/search articles. Nothing in straight C++ but it still may be useful.
DotNetWikiBot was quite useful on one project that I had...