Minimal noweb example with cross referencing - literate-programming

I'm trying to find a good literate programming tool. Let's just say it's not an easy decision. (The generic ones are too generic and the specific ones are too specific :) )
Among others, I've got noweb up and running but I'm having trouble getting output like I'd expect. The noweb wikipedia page has a minimal example that builds correctly with
noweave -index -latex hello.noweb > hello.tex && pdflatex hello && pdflatex hello
but there are no cross-references at the end of each chunk. For example, CWEB has pointers such as "This code is used in section 12." and "See also sections 5 and 7." Is this feature simply missing from noweb or am I missing a step in the compilation?

Is this feature simply missing from noweb or am I missing a step in the compilation?
Neither: noweb uses much more subtle markers than you are used to seeing from CWEB. In the case of the Wikipedia example, the 3rd chunk (1c) appears in chunks 1a and 1b. This information is condensed into a single '1', which appears in parentheses at the right hand side of the definition.
To get long cross-references in the CWEB style use
\noweboptions{longxref}

You should get, for that WP example, a reference to the license section from each of the two chunks in the Hello World section. You won't get any lists of chunks or identifiers at the end unless you tell latex about them.
To get a list of web chunks, try putting \nowebchunks near the end of the document (i.e., after the last chunk appears), and adding a -x switch to the noweb invocation.
To get a list of identifiers, try putting \nowebindex in a similar place.

Related

Extract comment written in Chinese and translate them into English using some script

I have a C++ project in which comments of source code are in Chinese language, now I want to convert them into English.
I tried to solve using google translator but got an Issue: Whole CPP files or header didn't get converted, also I have found the name of the struct, class etc gets changed. Sometimes code also gets modified.
Note: Each .cpp or .h file is less than 1000 lines of code.But there are multiple C++ projects each having around 10 files. Thus I have around 50 files for which I need to translate Chinese text to English.
Well, what did you expect? Google Translate doesn't know what a CPP file is and how to treat it. You'll have to write your own program that extracts comments from them (not that hard), runs just those through Google Translate, and then puts them back in.
Mind you, if there is commented out code, or the comments reference variable names, those will get translated too. Detecting and handling these cases is a lot harder already.
Extracting comments is a lexical issue, and mostly a quite simple one.
In a few hours, you could write (e.g. with flex) some simple command line program extracting them. And a good editor (such as GNU emacs) could even be configured to run that filter on selected code chunks.
(handling a few corner cases, such as raw string literals, might be slightly more difficult, but these don't happen often and you might handle them manually)
BTW, if you are assigned to work on that code, you'll need to understand it, and that takes much more time than copy&pasting or editing each comments manually.
At last, I am not sure of the quality of automatic translation of code comments. You might be disappointed. Also, the code names (of functions, of classes, of variables, etc...) matter a lot more.
Perhaps adding your comments in English could be wiser.
Don't forget to use some version control system. You really need one (e.g. git)
(I am not convinced that extracting comments for automatic translation would help your work)
First separate both comment and code part in different file using python script as below,
import sys
file=sys.argv[1]
f=open(file,"r")
lines=f.readlines()
f.close()
comment=open("comment.txt","w+")
code=open("code.txt","w+")
for l in lines:
if "//" in l:
comment.write(l)
code.write("\n")
else:
code.write(l)
comment.write("\n")
comment.close()
code.close()
Now translate comment.txt with google translator and then use
paste code.txt comment_en > source
where comment_en is translated comment in english.

How do syntax highlighting tools implement automated testing?

How do syntax highlighting tools such as pygments and textmate bundle do automated testing?
Tools like this often simply resort to a large collection of snippets of text representing a chosen input and the expected output. For instance if you look at the Pygments Github, you can see they have giant lists of text files divided into an input section and a tokens section like so:
---input---
f'{"quoted string"}'
---tokens---
'f' Literal.String.Affix
"'" Literal.String.Single
'{' Literal.String.Interpol
'"' Literal.String.Double
'quoted string' Literal.String.Double
'"' Literal.String.Double
'}' Literal.String.Interpol
"'" Literal.String.Single
'\n' Text
Since a highlighting tool reads a piece of code and then has to identify which bits of text are parts of which bits of code (is this the start of a function? is this a comment? is it a variable name?), they usually perform various processing steps that will result in a list of tokens as above, which they can then feed into the next step (insert highlights from the first Literal.String.Interpol to the next, bold any Literal.String.Single, etc. by generating the appropriate HTML or CSS or other markup relevant to the system). Checking that these tokens are generated properly from the input text is key.
Then, depending on the language the tool is built in you might use an existing testing suite or build your own (pygments seems to use a Python-based tool called pyTest), which essentially consists of running each of the inputs through your tool in a loop, reading the output, and comparing it to the expected values. If the output doesn't match, you can display a message showing what test failed, what the input/output/expected/error values were. If an output passes, you could simply signal with a happy green checkmark. Then when the test finishes, the developer can hopefully reason out what they broke by looking over the results.
It is often a good idea to randomize the order that these inputs so that you can be sure that each step in the test doesn't have side effects that are getting passed along to the next test and cause it to pass or fail incorrectly. It might also be a good idea to time the length of the complete test. If the whole thing was taking 12 seconds yesterday, but now it takes two minutes, we may have broken something even if all the test technically "pass".
In tools like a code highlighter, you often have a good idea of what many of the inputs and outputs will look like before you can code everything up, for instance if some spec document already exists. In that case, it may be a good idea to include tests that you know won't pass right away, but mark them with some tag (perhaps some text marker within the file that says "NOT PASSING", or naming the file in a certain way), and telling your testing suite to expect those tests to fail. Then, as you fix bugs and add features, say you fixed Bug X in your attempt to make test #144 pass. Now when you run the text, it also alerts you that 10 other tests that should be failing are now passing. Congrats! You just saved yourself a lot of work trying to fix several separate problems that were actually caused by the same root issue.
As the codebase is updated, a developer would run and rerun the test to ensure that any changes he makes doesn't break tests that were working before, and then would add new tests to the collection to verify that his new feature, fixed edge case, etc., now has a known expected output that you can be sure someone won't accidentally break in the future.

Following the flow of code

I'm trying to learn the level format in one of my favourite games, which is almost totally undocumented. Basically the only document that describes the level format is simply by saying things like First 12 bytes: header 4 following bytes: number of materials x next bytes: array of materials, and things like that.
I'm very inexperienced in hex and don't completely understand what they're saying. However, there is a level editor, and the source is freely available on google code. I was thinking of adding this in to my visual studio and trying to learn the level format by reading how the level editor opens the files.
However, another problem, I don't know c++ (I know python). This means I probably won't be able to locate which part of the code reads the bytes and whatnot.
What I'm looking for, is something that will allow me to follow the flow of the code, in its execution. Essentially something that acts similar to setting a breakpoint on every line, and having it show me what specific portion of code is executing when reading the file contents.
However, obviously setting breakpoints on every line is very messy and slow. I'm looking for something that will simply show me what code is being run when I open the file in the editor.
Does anyone know what I could do? Thanks.
You're looking for a feature to step from one statement to the next; every debugger I know has such a feature. You start by setting a single breakpoint at the beginning of the interesting region, and starting from there you "step" through your code.
E.g. in Visual C++ 2010, the key F10 does one step; you can also "step into" the next statement (e.g. a method call) with F11.
In your case, set the breakpoint to where the reading of the level file starts, and continue from there. To find the place where the file is read can be a hard problem as well - depending on the clearness of the code; but if it's well written code, there should be a method with "read" in the name or "load" or something similar - you'll figure it out!
You might have to know at least some basic C++ syntax to be able to follow what's going, though.
I would also recommend reading up on Debugging HowTo's (e.g this one).
The document wich you find so obscure, is just the level format specifications, in most cases the specifications are all you need. You need as well some little extra experience with file reading.
When reading a file you have to warry about few things.
1) When reading byte by byte (8 bits) order is no changed.
2) When reading 32bits at a time byte order can change according to endianness of machine.
(for example 0x12345678 becomes 0x78563412 when endiannes changes)
There was a very old tutorial that can help you loading 3D models that helped me to start working with files:
http://www.spacesimulator.net/wiki/index.php?title=Tutorials:3ds_Loader
this is usefull because you have part of the specifications (like in original documentation) and it shows how you can create a loader just starting from specifications. That's all you need. That's C but there is no big difference from C++ in this case.
If you need some other simple file format specification with related file loader for making things clearer to you, you can also look at libktx and ktx specifications:
http://www.khronos.org/opengles/sdk/tools/KTX/file_format_spec/
If I remember correctly there's also a unofficial C++ KTX loader you can look at if you itend to write C++ oop code rather than C.

Associate text from source code line to line - too fragile

I need to associate textual data with the lines in a source code file. Something like "these lines are to create a Myclass object" -> lines from 20 to 32.
The problem is that this kind of line tracking is highly fragile: it is sufficient that someone adds a newline to break my correspondence between associated text and lines.
I need an idea to make this link a bit stronger (not too much but at least resisting to a few line shifts), suggestions are greatly welcome.
An easy solution would be to hash (md5 is pretty easy and accessible) the lines and store the hash along the data.
You can then check the hash against the possibly modified file. If it matches, great, otherwise begin checking previous/next lines for a match.
One approach might be to enlist the help of a source control system. For example, using Git, you could associate textual data with a specific version of the source code. If the source code is changed, you can use a "diff" algorithm to discover which line(s) have been added or removed. Using that delta information, you can then update your annotation lines (for example, adding a line at the top of the file would cause your 20-32 annotation to move to 21-33).
Are you trying to implement some form of automatic documentation system? If so, then basing this around line numbering is indeed fragile. I would suggest using some sort of markup to associate the text with semantic blocks of code that are robust when moved or altered. Perhaps something along the lines of doxygen might be what you are looking for.

Need inputs for making a tool to insert preformatted comments in C/C++ source/header file

I am trying to develop a tool that inserts comments in C/C++ source files in pre-defined formats.
The comments could be:
file headers <-> file names required
class comments <-> class name required
function comments <-> function name required
Following points are required to be taken mind:
If the comments are already there in right format then leave them intact.
If the comments are broken them fix them and insert them.
Some desirable but non important features:
Check and fix the indentation.
Check if any breaks are missing in their respective cases.
Please suggest open-source / free libraries / logic to aid in this.
I guess you 've got two choices:
Generate the whole c/c++ code and headers from a template or scripting language and use this one to insert the preformated comments. This is of course not an option if you still got a lot of code.
Or you need a tool to parse the code into sth. you can further use. You could try doxygen to generate html, xml or some other format. Problem would still be, how to get the generated documentation back into your sources...