Associate text from source code line to line - too fragile - c++

I need to associate textual data with the lines in a source code file. Something like "these lines are to create a Myclass object" -> lines from 20 to 32.
The problem is that this kind of line tracking is highly fragile: it is sufficient that someone adds a newline to break my correspondence between associated text and lines.
I need an idea to make this link a bit stronger (not too much but at least resisting to a few line shifts), suggestions are greatly welcome.

An easy solution would be to hash (md5 is pretty easy and accessible) the lines and store the hash along the data.
You can then check the hash against the possibly modified file. If it matches, great, otherwise begin checking previous/next lines for a match.

One approach might be to enlist the help of a source control system. For example, using Git, you could associate textual data with a specific version of the source code. If the source code is changed, you can use a "diff" algorithm to discover which line(s) have been added or removed. Using that delta information, you can then update your annotation lines (for example, adding a line at the top of the file would cause your 20-32 annotation to move to 21-33).

Are you trying to implement some form of automatic documentation system? If so, then basing this around line numbering is indeed fragile. I would suggest using some sort of markup to associate the text with semantic blocks of code that are robust when moved or altered. Perhaps something along the lines of doxygen might be what you are looking for.

Related

In a text file with linked structures, how do I quickly follow those links in C++, without running through the file multiple times?

I'm about to start a project that requires me to load specific information from an IFC file into classes or structs. I'm using C++, but it's been some years since I last used it so I'm a bit rusty.
The IFC file has a linked structure, where an element in a line might refer to a different line, which in turn links to another. I've included a short example where the initial "#xxx" is the line index and any other "#xxx" in the line is a link to a different line.
#170=IFCAXIS2PLACEMENT3D(#168,$,$);
#171=IFCLOCALPLACEMENT(#32,#170);
#172=IFCBUILDINGSTOREY("GlobalId", #41, "Name", "Description", "ObjectType", #171"...);
In this example I would need to search for "IFCBULDINGSTOREY", and then follow the links backwards through the file, jumping around storing the important bits of information I need.
The main problem is that my test file has 273480 lines (18MB), and links can jump from one end of the file to the other - and I'll likely have to handle larger files than this.
In this file I need to populate about 500 objects, so that's a lot of jumping around the file to grap the relevant information.
What's a performance-friendly method of jumping around a file like that?
(Disclosure - I help out with a .NET IFC implementation)
I'd question what it is you're doing that means you can't use one of the many existing implementations of the IFC schema. Parsing the IFC models is generally the simple part of the problem. If you want to visualise the geometry or take measurements from the geometry primitives there's a whole another level of complexity... E.g. Just one particular geometry type out of dozens: https://standards.buildingsmart.org/IFC/DEV/IFC4_3/RC2/HTML/link/ifcadvancedbrep.htm
If you go to BuildingSmart's software implementations list and search for 'development' you'll get a good list of them for various technologies/languages.
If you're sure you want to implement yourself, the typical approaches are to build some kind of dictionary/map holding the entities based on their key. Naively you can run an initial pass through with a Lexer, and build the map in memory. But as IFC models can be over a GB, you may need a more sophisticated approach where you build some kind of persisted index - and maybe even put it into some kind of database with indexes (maybe some flavour of a document database). This is going to be more important if you want to support 'random access' to the data over multiple sessions.

Extract comment written in Chinese and translate them into English using some script

I have a C++ project in which comments of source code are in Chinese language, now I want to convert them into English.
I tried to solve using google translator but got an Issue: Whole CPP files or header didn't get converted, also I have found the name of the struct, class etc gets changed. Sometimes code also gets modified.
Note: Each .cpp or .h file is less than 1000 lines of code.But there are multiple C++ projects each having around 10 files. Thus I have around 50 files for which I need to translate Chinese text to English.
Well, what did you expect? Google Translate doesn't know what a CPP file is and how to treat it. You'll have to write your own program that extracts comments from them (not that hard), runs just those through Google Translate, and then puts them back in.
Mind you, if there is commented out code, or the comments reference variable names, those will get translated too. Detecting and handling these cases is a lot harder already.
Extracting comments is a lexical issue, and mostly a quite simple one.
In a few hours, you could write (e.g. with flex) some simple command line program extracting them. And a good editor (such as GNU emacs) could even be configured to run that filter on selected code chunks.
(handling a few corner cases, such as raw string literals, might be slightly more difficult, but these don't happen often and you might handle them manually)
BTW, if you are assigned to work on that code, you'll need to understand it, and that takes much more time than copy&pasting or editing each comments manually.
At last, I am not sure of the quality of automatic translation of code comments. You might be disappointed. Also, the code names (of functions, of classes, of variables, etc...) matter a lot more.
Perhaps adding your comments in English could be wiser.
Don't forget to use some version control system. You really need one (e.g. git)
(I am not convinced that extracting comments for automatic translation would help your work)
First separate both comment and code part in different file using python script as below,
import sys
file=sys.argv[1]
f=open(file,"r")
lines=f.readlines()
f.close()
comment=open("comment.txt","w+")
code=open("code.txt","w+")
for l in lines:
if "//" in l:
comment.write(l)
code.write("\n")
else:
code.write(l)
comment.write("\n")
comment.close()
code.close()
Now translate comment.txt with google translator and then use
paste code.txt comment_en > source
where comment_en is translated comment in english.

Boost Spirit.Lex re-lexing altered lines using state from previous line

I'm considering writing some simple lexers with Boost's Spirit.Lex, but I can't seem to find any examples of what I'd like to do.
More or less, I'd like to lex an entire text file (this is easy). But, once the entire file has been processed, I would like to be able to "re-lex" an arbitrary line (e.g. if its contents have changed), using the state from the previous line to avoid lexing the entire file again.
I have seen related resources like this question as well as the Spirit.Lex API documentation (of course), but a simple, concise example of what I'm talking about would be very helpful.
Does such an example exist and/or is this even feasible with Sprit.Lex?
The following page documents API functions letting you specify the initial lexer state : Boost spirit API documentation.

Loading a text file in to memory and analyze its contents

For educational purposes, I would like to build an IDE for PHP coding.
I made a form app and added OpenFileDialog ..(my c# knowledge was useful, because it was easy ... even though without intelisense!)
Loading a file and reading lines from it is basically the same in every language (even PERL).
But my goal is to write homemade intelisense. I don't need info on the richtextBox and the events it generates, endline, EOF, etc, etc.
The problem I have is, how do I handle the data? line for line?
a struct for each line of text file?
looping all the structs in a linked list? ...
while updating the richtextBox?
searching for opening and closing brackets, variables, etc, etc
I think Microsoft stores a SQL type of database in the app project folders.
But how would you keep track of the variables and simulate them in some sort of form?
I would like to know how to handle this efficiently on dynamic text.
Having never thought this through before, it sounds like an interesting challenge.
Personally, I think you'll have to implement a lexical scanner, tokenizing the entire source file into a source tree, with each token also having information about it mapping the token to a line/character inside of the source file.
From there you can see how far you want to go with it - when someone hovers over a token, it can use the context of the code around it to be more intelligent about the "intellisense" you are providing.
Hovering over something would map back to your source tree, which (as you are building it) you would load up with any information that you want to display.
Maybe it's overkill, but it sounds like a fun project.
This sounds to be related to this question:
https://softwareengineering.stackexchange.com/questions/189471/how-do-ide-s-provide-auto-completion-instant-error-checking-and-debugging
The accepted answer of that question recommends this link which I found very interesting:
http://msdn.microsoft.com/en-us/magazine/cc163781.aspx
In a nutshell, most IDEs generate the parse tree from the code and that is what they stores and manage.

Need inputs for making a tool to insert preformatted comments in C/C++ source/header file

I am trying to develop a tool that inserts comments in C/C++ source files in pre-defined formats.
The comments could be:
file headers <-> file names required
class comments <-> class name required
function comments <-> function name required
Following points are required to be taken mind:
If the comments are already there in right format then leave them intact.
If the comments are broken them fix them and insert them.
Some desirable but non important features:
Check and fix the indentation.
Check if any breaks are missing in their respective cases.
Please suggest open-source / free libraries / logic to aid in this.
I guess you 've got two choices:
Generate the whole c/c++ code and headers from a template or scripting language and use this one to insert the preformated comments. This is of course not an option if you still got a lot of code.
Or you need a tool to parse the code into sth. you can further use. You could try doxygen to generate html, xml or some other format. Problem would still be, how to get the generated documentation back into your sources...