Debugging parser generated code

Debugging parser generated code - c++

I have generated a parser code using Lemon Parser. I am not able to debug the generated code. Control shows some other source code than the currently executing statement. Breakpoints are displaced. I tried on gdb and Visual C++. Both have the same problem. Please tell me the way to debug it.

Let's say your input file is named mylexer.y in which case Lemon will generate myparser.c and myparser.h
Inside of myparser.c you will see lines such as this
#line 1 "myparser.y"
These are line directives. They are good for tracing syntax errors back to the file that was used to generate the code. They are not good for debugging.
To suppress them invoke Lemon with the -l option.
lemon -l myparser.y
To see other options not mentioned in the documentation use -?
lemon -?

The following is a certified WAG (Wild Ass Guess):
I would recommend looking at all of the macros being used by the parser generator and see if there are any escaped newlines in them. If there are, try removing all of them (by joining the lines together) and then recompile the file. Then looked at the code in the debugger -- things suddenly may be back to where they should be.
Backstory: Back in the 80's I developed and marketed a debugger called CDB. As I ported it to anything that had U*NX in it's name I became intimately familiar with the idiosyncrasies of the various compilers and how they emitted debug info in certain situations.
One widespread problem had to do with macros that had escaped newlines them. E.g.
#define foo(bar) bar + \
snort + something_else
x = foo(5);
y = 2;
If the line number for y = 2; should have been 5, many symbol tables would end up showing it as 6, and every line after it would be off-by-one. And each use of such a macro would throw the line numbers farther and farther off.

Check optimization, debug information options if you are building them as lib/dll.

Related

Tell LLDB to ignore files

Is there a way to tell LLDB to ignore a file, i.e. step over code in that file when debugging?
(This could be used as a workaround for 1, 2, 3)

There is a setting to avoid stepping into functions whose name match a regular expression,
(lldb) set list target.process.thread.step-avoid-regexp
step-avoid-regexp -- A regular expression defining functions step-in won't stop in.
e.g. put this in your ~/.lldbinit file
settings set target.process.thread.step-avoid-regexp ^[^ ]+ std::|^std::
but in Xcode 4.5.x that's the best that can be done. I mentioned in #2 of your links that inlined stepping support has been added to the LLDB sources at http://lldb.llvm.org/ but that won't be in Xcode until the next release.

Creating a simple parser in (V)C++ (2010) similar to PEG

For an school project, I need to parse a text/source file containing a simplified "fake" programming language to build an AST. I've looked at boost::spirit, however since this is a group project and most seems reluctant to learn extra libraries, plus the lecturer/TA recommended leaning to create a simple one on C++. I thought of going that route. Is there some examples out there or ideas on how to start? I have a few attempts but not really successful yet ...
parsing line by line
Test each line with a bunch of regex (1 for procedure/function declaration), one for assignment, one for while etc...
But I will need to assume there are no multiple statements in one line: eg. a=b;x=1;
When I reach a container statement, procedures, whiles etc, I will increase the indent. So all nested statements will go under this
When I reach a } I will decrement indent
Any better ideas or suggestions? Example code I need to parse (very simplified here ...)
procedure Hello {
a = 1;
while a {
b = a + 1 + z;
}
}
Another idea was to read whole file into a string, and go top down. Match all procedures, then capture everything in { ... } then start matching statements (end with ;) or containers while { ... }. This is similar to how PEG does things? But I will need to read entire file

Multipass makes things easier. On a first pass, split things into tokens, like "=", or "abababa", or a quote-delimited string, or a block of whitespace. Don't be destructive (keep the original data), but break things down to simple chunks, and maybe have a little struct or enum that describes what the token is (ie, whitespace, a string literal, an identifier type thing, etc).
So your sample code gets turned into:
identifier(procedure) whitespace( ) identifier(Hello) whitespace( ) operation({) whitespace(\n\t) identifier(a) whitespace( ) operation(=) whitespace( ) number(1) operation(;) whitespace(\n\t) etc.
In those tokens, you might also want to store line number and offset on the line (this will help with error message generation later).
A quick test would be to turn this back into the original text. Another quick test might be to dump out pretty-printed version in html or something (where you color whitespace to have a pink background, identifiers as light blue, operations as light green, numbers as light orange), and see if your tokenizer is making sense.
Now, your language may be whitespace insensitive. So discard the whitespace if that is the case! (C++ isn't, because you need newlines to learn when // comments end)
(Note: a professional language parser will be as close to one-pass as possible, because it is faster. But you are a student, and your goal should be to get it to work.)
So now you have a stream of such tokens. There are a bunch of approaches at this point. You could pull out some serious parsing chops and build a CFG to parse them. (Do you know what a CFG is? LR(1)? LL(1)?)
An easier method might be to do it a bit more ad-hoc. Look for operator({) and find the matching operator(}) by counting up and down. Look for language keywords (like procedure), which then expects a name (the next token), then a block (a {). An ad-hoc parser for a really simple language may work fine.
I've done exactly this for a ridiculously simple language, where the parser consisted of a really simple PDA. It might work for you guys. Or it might not.

Since you mentioned PEG i'll like to throw in my open source project : https://github.com/leblancmeneses/NPEG/tree/master/Languages/npeg_c++
Here is a visual tool that can export C++ version: http://www.robusthaven.com/blog/parsing-expression-grammar/npeg-language-workbench
Documentation for rule grammar: http://www.robusthaven.com/blog/parsing-expression-grammar/npeg-dsl-documentation
If i was writing my own language I would probably look at the terminals/non-terminals found in System.Linq.Expressions as these would be a great start for your grammar rules.
http://msdn.microsoft.com/en-us/library/system.linq.expressions.aspx
System.Linq.Expressions.Expression
System.Linq.Expressions.BinaryExpression
System.Linq.Expressions.BlockExpression
System.Linq.Expressions.ConditionalExpression
System.Linq.Expressions.ConstantExpression
System.Linq.Expressions.DebugInfoExpression
System.Linq.Expressions.DefaultExpression
System.Linq.Expressions.DynamicExpression
System.Linq.Expressions.GotoExpression
System.Linq.Expressions.IndexExpression
System.Linq.Expressions.InvocationExpression
System.Linq.Expressions.LabelExpression
System.Linq.Expressions.LambdaExpression
System.Linq.Expressions.ListInitExpression
System.Linq.Expressions.LoopExpression
System.Linq.Expressions.MemberExpression
System.Linq.Expressions.MemberInitExpression
System.Linq.Expressions.MethodCallExpression
System.Linq.Expressions.NewArrayExpression
System.Linq.Expressions.NewExpression
System.Linq.Expressions.ParameterExpression
System.Linq.Expressions.RuntimeVariablesExpression
System.Linq.Expressions.SwitchExpression
System.Linq.Expressions.TryExpression
System.Linq.Expressions.TypeBinaryExpression
System.Linq.Expressions.UnaryExpression

Why is this program erroneously rejected by three C++ compilers?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I am having some difficulty compiling a C++ program that I've written.
This program is very simple and, to the best of my knowledge, conforms to all the rules set forth in the C++ Standard. I've read over the entirety of ISO/IEC 14882:2003 twice to be sure.
The program is as follows:
Here is the output I received when trying to compile this program with Visual C++ 2010:
c:\dev>cl /nologo helloworld.png
cl : Command line warning D9024 : unrecognized source file type 'helloworld.png', object file assumed
helloworld.png : fatal error LNK1107: invalid or corrupt file: cannot read at 0x5172
Dismayed, I tried g++ 4.5.2, but it was equally unhelpful:
c:\dev>g++ helloworld.png
helloworld.png: file not recognized: File format not recognized
collect2: ld returned 1 exit status
I figured that Clang (version 3.0 trunk 127530) must work, since it is so highly praised for its standards conformance. Unfortunately, it didn't even give me one of its pretty, highlighted error messages:
c:\dev>clang++ helloworld.png
helloworld.png: file not recognized: File format not recognized
collect2: ld returned 1 exit status
clang++: error: linker (via gcc) command failed with exit code 1 (use -v to see invocation)
To be honest, I don't really know what any of these error message mean.
Many other C++ programs have source files with a .cpp extension, so I thought perhaps I needed to rename my file. I changed its name to helloworld.cpp, but that didn't help. I think there is a very serious bug in Clang because when I tried using it to compile the renamed program, it flipped out, printed "84 warnings and 20 errors generated." and made my computer beep a lot!
What have I done wrong here? Have I missed some critical part of the C++ Standard? Or are all three compilers really just so broken that they can't compile this simple program?

Originally from Overv # reddit.

Try this way:

Your < and >, ( and ), { and } don't seem to match very well; Try drawing them better.

In the standard, §2.1/1 specifies:
Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary.
Your compiler doesn't support that format (aka cannot map it to the basic source character set), so it cannot move into further processing stages, hence the error. It is entirely possible that your compiler support a mapping from image to basic source character set, but is not required to.
Since this mapping is implementation-defined, you'll need to look at your implementations documentation to see the file formats it supports. Typically, every major compiler vendor supports (canonically defined) text files: any file produced by a text editor, typically a series of characters.
Note that the C++ standard is based off the C standard (§1.1/2), and the C(99) standard says, in §1.2:
This International Standard does not specify
— the mechanism by which C programs are transformed for use by a data-processing
system;
— the mechanism by which C programs are invoked for use by a data-processing
system;
— the mechanism by which input data are transformed for use by a C program;
So, again, the treatment of source files is something you need to find in your compilers documentation.

You could try the following python script. Note that you need to install PIL and pytesser.
from pytesser import *
image = Image.open('helloworld.png') # Open image object using PIL
print image_to_string(image) # Run tesseract.exe on image
To use it, do:
python script.py > helloworld.cpp; g++ helloworld.cpp

You forgot to use Comic Sans as a font, that's why its erroring.

I can't see a new-line after that last brace.
As you know: "If a source file that is not empty does not end in a new-line character, ... the behavior is undefined".

This program is valid -- I can find no errors.
My guess is you have a virus on your machine. It would be best if you reformat your drive, and reinstall the operating system.
Let us know how that works out, or if you need help with the reinstall.
I hate viruses.

I've found it helps to not write my code on my monitor's glass with a magic marker, even though it looks nice when its really black. The screen fills up too fast and then the people who give me a clean monitor call me names each week.
A couple of my employees (I'm a manager) are chipping in to buy me one of those red pad computers with the knobs. They said that I won't need markers and I can clean the screen myself when it's full but I have to be careful shaking it. I supposed it's delicate that way.
That's why I hire the smart people.

File format not recognized You need to properly format your file. That means using the right colors and fonts for your code. See the specific documentations for each compiler as these colors vary between compiler ;)

You forgot the pre-processor. Try this:
pngtopnm helloworld.png | ocrad | g++ -x 'c++' -

Did you handwrite the program and then scan it into the computer? That is what is implied by "helloworld.png". If that is the case, you need to be aware that the C++ standard (even in its newest edition) does not require the presence of optical character recognition, and unfortunately it is not included as an optional feature in any current compiler.
You may want to consider transposing the graphics to a textual format. Any plain-text editor may be used; the use of a word processor, while capable of generating a pretty printout, will most likely result in the same error that you get while trying to scan.
If you are truly adventurous, you may attempt to write your code into a word processor. Print it, preferably using a font like OCR-A. Then, take your printout and scan it back in. The scan can then be run through a third-party OCR package to generate a text form. The text form may then be compiled using one of many standard compilers.
Beware, however, of the great cost of paper that this will incur during the debugging phase.

Draw the include below to make it compile:
#include <ChuckNorris>
I hear he can compile syntax errors...

Unfortunately, you have selected three compilers that all support multiple languages, not just C++. They all have to guess at the programming language you used. As you probably already know, the PNG format is suitable for all programming languages, not just C++.
Usually the compiler can figure out the language itself. For instance, if the PNG is obviously drawn with crayons, the compiler will know it contains Visual Basic. If it looks like it's drawn with a mechanical pencil, it's easy to recognize the engineer at work, writing FORTRAN code.
This second step doesn't help the compiler either, in this case. C and C++ just look too similar, down to the #include. Therefore, you must help the compiler decide what language it really is. Now, you could use non-standard means. For instance, the Visual Studio compiler accepts the /TC and /TP command-line arguments, or you could use the "Compile as: C++" option in the project file. GCC and CLang have their own mechanisms, which I don't know.
Therefore, I'd recommend using the standard method instead to tell your compiler that the code following is in C++. As you've discovered by now, C++ compilers are very picky about what they accept. Therefore the standard way to identify C++ is by the intimidation programmers add to their C++ code. For instance, the following line will clarify to your compiler that what follows is C++ (and he'd better compile it without complaints).
// To the compiler: I know where you are installed. No funny games, capice?

Try this one:

Is your compiler set in expert mode?! If yes, it shouldn't compile. Modern compilers are tired of "Hello World!"

OCR Says:
N lml_�e <loJ+_e__}
.lnt Mk.,n ( ln+ _rSC Lhc_yh )
h_S_
_l
s_l . co__ <, " H llo uo/_d ! '` << s l� . ena_ .
TP__rn _ |
_|
Which is pretty damn good, to be fair.

helloworld.png: file not recognized: File format not recognized
Obviously, you should format your hard drive.
Really, these errors aren't that hard to read.

I did convert your program from PNG to ASCII, but it does not compile yet. For your information, I did try with line width 100 and 250 characters but both yield in comparable results.
` ` . `. ` ...
+:: ..-.. --.:`:. `-` .....:`../--`.. `-
` ` ````
`
` `` .` `` .` `. `` . -``- ..
.`--`:` :::.-``-. : ``.-`- `-.-`:.-` :-`/.-..` ` `-..`...- :
.` ` ` ` .` ````:`` - ` ``-.` `
`- .. ``
. ` .`. ` ` `. ` . . ` . ` . . .` .` ` ` `` ` `
`:`.`:` ` -..-`.`- .-`-. /.-/.-`.-. -...-..`- :``` `-`-` :`..`-` ` :`.`:`- `
`` ` ```. `` ```` ` ` ` ` ` ` ` .
: -...`.- .` .:/ `
- ` `` .
-`
`

The first problem is, that you are trying to return an incorrect value at the end of the main function. C++ standard dictates that the return type of main() is int, but instead you are trying to return the empty set.
The other problem is - at least with g++ - that the compiler deduces the language used from the file suffix. From g++(1):
For any given input file, the file
name suffix determines what kind of
compilation is done:
file.cc file.cp file.cxx file.cpp file.CPP file.c++ file.C
C ++ source code which must be preprocessed. Note that in .cxx, the
last two letters must both be literally x. Likewise, .C refers to a
literal capital C.
Fixing these should leave you with a fully working Hello World application, as can be seen from the demo here.

Your font sucks, how should a parser ever be able to read that? Take a calligraphy course.

Your compilers are expecting ASCII, but that program is obviously written using EBCDIC.

You're trying to compile an image.
Type out what you've hand written into a document called main.cpp, run that file through your compiler, then run the output file.

You need to specify the precision of your output preceded by a colon immediately before the final closing brace. Since the output is not numeric, the precision is zero, so you need this-
:0}

add :
using namespace std;
right after include :P:D

Seems that your compiler doesn't support files in such hmm... encoding. Try to convert it to ASCII.

The problem lies with the syntax definition, try using ruler and compasses for a more classical description!
Cheers,

Try switching input interface. C++ expects a keyboard to be plugged in to your computer, not a scanner. There may be peripherals conflict issues here. I didn't check in ISO Standard if keyboard input interface is mandatory, but that is true for all compilers I ever used. But maybe scanner input is now available in C99, and in this case your program should indeed work. If not you'll have to wait the next standard release and upgrade of compilers.

You could try different colors for brackets, maybe some green or red would help ?
I think your compiler can't rcognize black ink :P

Am I the only one who can't recognize the character between 'return' and the semicolon? That could be it!

Easily comment (C++) code in vim

I have looked at the following question:
How to comment out a block of Python code in Vim
But that does not seem to work for me. How do I comment code easily without resorting to plugins/scripts?

Use ctrl-V to do a block selection and then hit I followed by //[ESC].
Alternatively, use shift-V to do a line-based select and then type :s:^://[Enter]. The latter part could easily go into a mapping. eg:
:vmap // :s:^://<CR>
Then you just shift-V, select the range, and type // (or whatever you bind it to).

You can add this to your .vimrc file
map <C-c> :s/^/\/\//<Enter>
Then when you need to comment a section just select all lines (Shift-V + movement) and then press CtrlC.
To un-comment you can define in a similar way
map <C-u> :s/^\/\///<Enter>
that removes a // at begin of line from the selected range when pressing CtrlU.

You can use the NERD commenter plugin for vim, which has support for a whole bunch of languages (I'm sure C++ is one of them). With this installed, to comment/uncomment any line, use <Leader>ci. To do the same for a block of text, select text by entering the visual mode and use the same command as above.
There are other features in this such as comment n lines by supplying a count before the command, yank before comment with <Leader>cy, comment to end of line with <Leader>c$, and many others, which you can read about in the link. I've found this plugin to be extremely useful and is one of my 'must have' plugins.

There's always #ifdef CHECK_THIS_LATER ... #endif which has the advantage of not causing problems with nested C-style comments (if you use them) and is easy to find and either uncomment or remove completely later.

Hints and tools for finding unmatched braces / preprocessor directives

This is one of my most dreaded C/C++ compiler errors:
file.cpp(3124) : fatal error C1004: unexpected end-of-file found
file.cpp includes almost a hundred header files, which in turn include other header files. It's over 3000 lines. The code should be modularized and structured, the source files smaller. We should refactor it. As a programmer there's always a wish list for improving things.
But right now, the code is a mess and the deadline is around the corner. Somewhere among all these lines—quite possibly in one of the included header files, not in the source file itself—there's apparently an unmatched brace, unmatched #ifdef or similar.
The problem is that when something is missing, the compiler can't really tell me where it is missing. It just knows that when it reached end of the file it wasn't in the right parser state.
Can you offer some tools or other hints / methodologies to help me find the cause for the error?

If the #includes are all in one place in the source file, you could try putting a stray closing brace in between the #includes. If you get an 'unmatched closing brace' error when you compile, you know it all balances up to that point. It's a slow method, but it might help you pinpoint the problem.

One approach: if you have Notepad++, open all of the files (no problem opening 100 files), search for { and } (Search -> Find -> Find in All Open Documents), note the difference in count (should be 1). Randomly close 10 files and see if the difference in count is still 1, if so continue, else the problem is in one of those files. Repeat.

Handy tip:
For each header file, auto-generate a source file which includes it, then optionally contains an empty main method, and does nothing else. Compile all of these files as test cases, although there's no point running them.
Provided that each header includes its own dependencies (which is a big "provided"), this should give you a better idea which header is causing the problem.
This tip is adapted from Google's published C++ style guide, which says that each component's source files should include the interface header for that component before any other header. This has the same effect, of ensuring that there is at least one source file which will fail to compile, and implicate that header, if there's anything wrong with it.
Of course it won't catch unmatched braces in macros, so if you use much in the way of macros, you may have to resort to running the preprocessor over your source file, and examining the result by hand and IDE.
Another handy tip, which is too late now:
Check in more often (on a private branch to keep unfinished code out of everyone else's way). That way, when things get nasty you can diff against the last thing that compiled, and focus on the changed lines.

Hints:
make small changes and recompile after each small change
for unmatched braces, use an editor/IDE that supports brace match hilighting
if all else failds, Ye Olde method of commenting out blocks of code using a binary chop approach works for me

Very late to the party, but on linux you can use fgrep -o
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
So if you fgrep -o { then you'll get a list of all the opening braces in your file.
You can then pipe that to wc -l and that will give you the number of opening braces in your file.
With a bit of bash arithmetic, you can do the same with closing braces, and then print the difference.
In the below example I'm using git status -s to get the short-format output of all modified files from my git repo, and then I'm iterating over them to find which files may have mismatched braces
for i in $(git status -s | awk '{print $2}'); do
open=$(fgrep -o { $i | wc -l); # count number of opening braces
close=$(fgrep -o } $i | wc -l); # count number of closing braces
diff=$((open-close)); # find difference
echo "$diff : $i"; # print difference for filename
done

I think using some editor with brace highlighting will help. There should also be some tools around that do automatic indention on code.
EDIT: Does this vim script help? It seems to do #ifdef highlighting.

Can you create a script that actually does all the including, and then write the whole stuff to a temporary file? Or try the compiler for helping you with that? Then you can use the bracket highlighting features of various editors to find the problem and you'll probably be able to identify the file. (An extra help can be to have your script add a comment around every included file).

This might not be relevant to your problem, but I was getting an "Unexpected #else" error when framing some header files in an #if/#else/#endif block.
I found that if I set the problem modules to not use pre-compiled headers, the problem went away. Something to do with the "#pragma hdrstop" should not be within an #if/#endif.

Have a look at this question (Highlighting unmatched brackets in vim)

Pre-compile your code first, this will create a big chunk with the include files stuffed into the same file. Then you can use those brace-matching scripts.

I recently ran into this situation while refactoring some code and I'm not a fan of any of the answers above. The problem is that they neglect to incorporate a pretty basic assumption:
Any given file (most likely) will have matching braces and if/endif macros.
While it is true one can have a C++ source file that opens a bracket or an if block and includes another module that closes it, I have never seen this in practice. Something like the following:
foo.cpp
namespace Blah{
#include "bar.h"
bar.h:
}; /// namespace Blah
So with the assumption that any given file / module contains matching sets of braces and preprocessor directives, this problem is now trivial to solve. We just need a script to read the files / count the brackets & directives that it finds and report mismatches.
I've implemented one in Ruby here, feel free to take and adapt to your needs:
https://gist.github.com/movitto/6c6d187f7a350c2d71480834892552ab

I just spent an hour with a problem like this. It was very hard to spot, but in the end, I had typed a double # on one line:
##ifdef FEATURE_X
...
That broke the world.
Easy one to search for though.

The simplest and fastest solution is to comment file from the end by consistent blocks. If the error still exists, then you not yet comment the open brace.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js