I decided to try http://www.screwturn.eu/ wiki as a code snippet storage utility. So far I am very impressed, but what irkes me is that when I copy paste my code that I want to save, '<'s and '[' (http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Character_references) invariably screw up the output as the wiki interprets them as either wiki or HTML tags.
Does anyone know a way around this? Or failing that, know of a simple utility that would take C++ code and convert it to HTML safe code?
You can use the ##...## tag to escape the code and automatically wrap it in PRE tags.
Surround your code in <nowiki> .. </nowiki> tags.
I don't know of utilities, but I'm sure you could write a very simple app that does a find/replace. To display angle brackets, you just need to replace them with > and < respectively. As for the square brackets, that is a wiki specific problem with the markdown methinks.
Dario Solera wrote "You can use the ##...## tag to escape the code and automatically wrap it in PRE tags."
If you don't want it wrapped just use: <esc></esc>
List of characters that need escaping:
< (less-than sign)
& (ampersand)
[ (opening square bracket)
Have you tried wrapping your code in html pre or code tags before pasting? Both allow any special characters (such as '<') to be used without being interpreted as html. pre also honors the formatting of the contents.
example
<pre>
if (foo <= bar) {
do_something();
}
</pre>
To post C++ code on a web page, you should convert it to valid HTML first, which will usually require the use of HTML character entities, as others have noted. This is not limited to replacing < and > with < and >. Consider the following code:
unsigned int maskedValue = value&mask;
Uh-oh, does the HTML DTD contain an entity called &mask;? Better replace & with & as well.
Going in an alternate direction, you can get rid of [ and ] by replacing them with the trigraphs ??( and ??). In C++, trigraphs and digraphs are sequences of characters that can be used to represent specific characters that are not available in all character sets. They are unlikely to be recognized by most C++ programmers though.
Related
I've been using Doxygen to document my project but I've ran into some problems.
My documentation is written in a language which apostrophes are often used. Although my language config parameter is properly set, when Doxygen generates the HTML output, it can't parse apostrophes so the code is shown instead of the correct character.
So, in the HTML documentation:
This should be the text: Vector d'Individus
But instead, it shows this: Vector d'Individus
That's strange, but searching the code in the HTML file, I found that what happens is that instead of using an ampersand to write the ' code, it uses the ampersand code. Well, seeing the code is easier to see:
<div class="ttdoc">Vector d'Individus ... </div>
One other thing is to note that this only happens with the text inside tooltips...
But not on other places (same code, same class)...
What can I do to solve this?
Thanks!
Apostrophes in code comments must be encoded with the correct glyph for doxygen to parse it correctly. This seems particularly true for the SOURCE_TOOLTIPS popups. The correct glyph is \u2019, standing for RIGHT SINGLE QUOTATION MARK. If the keyboard you are using is not providing this glyph, you may write a temporary symbol (e.g. ') and batch replace it afterwards with an unicode capable auxiliary tool, for example: perl -pC -e "s/'/\x{2019}/g" < infile > outfile. Hope it helps.
Regarding the answer from ramkinobit, this is not necessary, doxygen can use for e.g. the Right Single quote: ’ (see doxygen documentation chapter "HTML commands").
Regarding the apostrophe the OP asks for one can use (the doxygen extension) ' (see also doxygen documentation chapter "HTML commands")).
There was a double 'HTML escape' in doxygen resulting in the behavior as observed for the single quote i.e. displaying '.
I've just pushed a proposed patch to github (pull request 784, https://github.com/doxygen/doxygen/pull/784).
EDIT 07/07/2018 (alternative) patch has been integrated in main branch on github.
I have inherited a rather large/ugly php codebase (language is unimportant, this is a generic vim question) , where nothing is quoted properly (old php doesn't mind, but new php versions throw warnings).
I'd like to turn $something[somekey] into $something['somekey'], only if its not already quoted or contain the character $
I was trying to build a regular expression to quote the keys, but just cant seem to be able get it to cooperate.
This is what i have so far, which doesn't work but maybe will help explain my question better. And to show that i have actually tried.
:%s/\v\$(.{-})\[(['"$]#<!.{-})\]/$\1['\2']/
My goal is to have something like this:
$something[somekey] = $something['somekey']
$somethingelse[someotherthing] = $something['someotherthing']
$another['key'] = $another['key'] (is ignored)
$yetanother["keykey"] = $yetanother["keykey"] (is ignored)
$derp[$herp] = $derp[$herp] (is ignored)
$array[3] = $array[3] (is ignored)
These can appear anywhere in text, even multiple on the same line, and even touching each other like $something[key]$something[key2], which i would like to be replaced with $something['key']$something['key2']
Another problem, there seems to be random javascript arrays in some files.. which have [] square brackets. So the regex needs to check to see if it starts with $ and text before the brackets.
Im probably asking for the impossible, but any help on this would be great before i go insane editing each file one by one manually.
EDIT: forgot that keys can be numeric, and shouldn't be quoted.
I tried the following, which processed everything from your question correctly:
:%s/\[\(\I\i*\)\]/['\1']/g
Or, with optional white spaces inside the parens:
:%s/\[\s*\(\I\i*\)\s*\]/['\1']/g
And also checking for $identifier before the parens:
:%s/\(\$\i\+\)\[\s*\(\I\i*\)\s*\]/\1['\2']/g
Question
How to minify HTML using C++?
Resources
An external library could be the answer, but I'm more looking for improvements of my current code. Although I'm all ears for other possibilities.
Current code
This is my interpretation in c++ of the following answer.
The only part I had to change from the original post is this part on top: "(?ix)"
...and a few escape signs
#include <boost/regex.hpp>
void minifyhtml(string* s) {
boost::regex nowhitespace(
"(?ix)"
"(?>" // Match all whitespans other than single space.
"[^\\S ]\\s*" // Either one [\t\r\n\f\v] and zero or more ws,
"| \\s{2,}" // or two or more consecutive-any-whitespace.
")" // Note: The remaining regex consumes no text at all...
"(?=" // Ensure we are not in a blacklist tag.
"[^<]*+" // Either zero or more non-"<" {normal*}
"(?:" // Begin {(special normal*)*} construct
"<" // or a < starting a non-blacklist tag.
"(?!/?(?:textarea|pre|script)\\b)"
"[^<]*+" // more non-"<" {normal*}
")*+" // Finish "unrolling-the-loop"
"(?:" // Begin alternation group.
"<" // Either a blacklist start tag.
"(?>textarea|pre|script)\\b"
"| \\z" // or end of file.
")" // End alternation group.
")" // If we made it here, we are not in a blacklist tag.
);
// #todo Don't remove conditional html comments
boost::regex nocomments("<!--(.*)-->");
*s = boost::regex_replace(*s, nowhitespace, " ");
*s = boost::regex_replace(*s, nocomments, "");
}
Only the first regex is from the original post, the other one is something I'm working on and should be considered far from complete. It should hopefully give a good idea of what I try to accomplish though.
Regexps are a powerful tool, but I think that using them in this case will be a bad idea. For example, regexp you provided is maintenance nightmare. By looking at this regexp you can't quickly understand what the heck it is supposed to match.
You need a html parser that would tokenize input file, or allow you to access tokens either as a stream or as an object tree. Basically read tokens, discards those tokens and attributes you don't need, then write what remains into output. Using something like this would allow you to develop solution faster than if you tried to tackle it using regexps.
I think you might be able to use xml parser or you could search for xml parser with html support.
In C++, libxml (which might have HTML support module), Qt 4, tinyxml, plus libstrophe uses some kind of xml parser that could work.
Please note that C++ (especially C++03) might not be the best language for this kind of program. Although I strongly dislike python, python has "Beautiful Soup" module that would work very well for this kind of problem.
Qt 4 might work because it provides decent unicode string type (and you'll need it if you're going to parse html).
Background
I use JScript (Microsoft's ECMAScript implementation) for a lot of Windows system administration needs. This means I use a lot of ActiveX (Automated COM) objects. The methods of these objects often expect Number or Boolean arguments. For example:
var fso = new ActiveXObject("Scripting.FileSystemObject");
var a = fso.CreateTextFile("c:\\testfile.txt", true);
a.WriteLine("This is a test.");
a.Close();
(CreateTextFile Method on MSDN)
On the second line you see that the second argument is one that I'm talking about. A Boolean of "true" doesn't really describe how the method's behavior will change. This isn't a problem for me, but my automation-shy coworkers are easily spooked. Not knowing what an argument does spooks them. Unfortunately a long list of constants (not real constants, of course, since current JScript versions don't support them) will also spook them. So I've taken to documenting some of these basic function calls with inline block comments. The second line in the above example would be written as such:
var a = fso.CreateTextFile("c:\\testfile.txt", /*overwrite*/ true, /*unicode*/ false);
That ends up with a small syntax highlighting dilemma for me, though. I like my comments highlighted vibrantly; both block and line comments. These tiny inline block comments mean little to me, personally, however. I'd like to highlight those particular comments in a more muted fashion (light gray on white, for example). Which brings me to my dilemma.
Dilemma
I'd like to override the default syntax highlighting for block comments when both the beginning and end marks are on the same line. Ideally this is done solely in my vimrc file, and not in a superseding personal copy of the javascript.vim syntax. My initial attempt is pathetic:
hi inlineComment guifg=#bbbbbb
match inlineComment "\/\*.*\*\/"
Straight away you can see the first problem with this regular expression pattern is that it's a greedy search. It's going to match from the first "/*" to the last "*/" on the line, meaning everything between two inline block comments will get this highlight style as well. I can fix that, but I'm really not sure how to deal with my second concern.
Comments can't be defined inside of String literals in ECMAScript. So this syntax highlighting will override String highlighting as well. I've never had a problem with this in system administration scripts, but it does often bite me when I'm examining the source of many javascript libraries intended for browsers (less.js for example).
What regex pattern, syntax definition, or other solution would the amazing StackOverflow community recommend to restore my vimrc zen?
I'm not sure, but from your description it sounds like you don't need a new syntax definition. Vim syntax files usually let you override a particular syntax item with your own choice of highlighting. In this case, the item you want is called javaScriptComment, so a command like this will set its highlighting:-
hi javaScriptComment guifg=#bbbbbb
but you have to do this in your .vimrc file (or somewhere that's sourced from there), so it's evaluated before the syntax file. The syntax file uses the highlight default command, so the syntax file's choice of highlighting only affects syntax items with no highlighting set. See :help :hi-default for more details on that. BTW, it only works on Vim 5.8 and later.
The above command will change all inline /* */ comments, and leave // line comments with their default setting, because line comments are a different syntax item (javaScriptLineComment). You can find the names of all these groups by looking at the javascript.vim file. (The easiest way to do this is :e $VIMRUNTIME/syntax/javascript.vim .)
If you only want to change some inline comments, it's a little more complicated, but still easy to see what to do by looking at javascript.vim . If you do that, you can see that block comments are defined like this:-
syn region javaScriptComment start="/\*" end="\*/" contains=#Spell,javaScriptCommentTodo
See that you can use separate regexes for begin and end markers: you don't need to worry about matching the stuff in between with non-greedy quantifiers, or anything like that. To have a syntax item that works similarly but only on one line, try adding the oneline option (:h :syn-oneline for more details):-
syn region myOnelineComment start="/\*" end="\*/" oneline
I've removed the two contains groups because (1) if you're only using it for parameter names, you probably don't want spell-checking turned on inside these comments, and (2) contained sections that aren't oneline override the oneline in the container region, so you would still match all TODO comments with this region.
You can define this new kind of comment region in your .vimrc, and set the highlighting how you like: it looks like you already know how to do that, so I won't go into more details on that. I haven't tried out this particular example, so you may still need a bit of fiddling to make it work. Give it a try and let me know how it goes.
Why don't you simply add a comment line above the call?
I think that
// fso.CreateTextFile(filename:String, overwrite:Boolean, unicode:Boolean)
var a = fso.CreateTextFile("c:\\testfile.txt", true, false);
is a lot more readable and informative than
var a = fso.CreateTextFile("c:\\testfile.txt", /*overwrite*/ true, /*unicode*/ false);
I am using tinyxml to save input from a text ctrl. The user can copy whatever they like into the text box and it gets written to an xml file. I'm finding that the new lines don't get saved and neither do & characters. The weird part is that tinyxml just discards them completely without any warning. If I put a & into the textbox and save, the tag will look like:
<textboxtext></textboxtext>
newlines completely disappear as well. No characters whatsoever are stored. What's going on? Even if I need to escape them with & or something, why does it just discard everything? Also, I can't find anything on google regarding this topic. Any help?
EDIT:
I found this topic which suggest the discarding of these characters may be a bug.
TinyXML and preserving HTML Entities
It is, apparently, a bug in TinyXml.
The simple workaround is to escape anything that it might not like:
&, ", ', < and > got their regular xml entities encoding
strange characters (read non-alphanumerical / regular punctuation) are best translated to their unicode codepoint: &#....;
Remember that TinyXml is before all a lightweight xml library, not a full-fledged beast.