Parsing a C++ source file after preprocessing - c++

I am trying to analyze c++ files using my custom made parser (written in c++). Before start parsing, I will like to get rid of all #define. I want the source file to be compilable after preprocessing. So best way will be to run C Preprocessor on the file.
cpp myfile.cpp temp.cpp
// or
g++ -E myfile.cpp > templ.cpp
[New suggestions are welcome.]
But due to this, the original lines and their line numbers will be lost as the file will contain all the header information also and I want to retain the line numbers. So the way out I have decided is,
Add a special symbol before
every line in the source file (except preprocessors)
Run the preprocessor
Extract the lines with that special
symbol and analyze them
For example, a typical source file will look like:
#include<iostream>
#include"xyz.h"
int x;
#define SOME value
/*
** This is a test file
*/
typedef char* cp;
void myFunc (int* i, ABC<int, X<double> > o)
{
//...
}
class B {
};
After adding symbol it will be like,
#include<iostream>
#include"xyz.h"
#3#int x;
#define SOME value
#5#/*
#6#** This is a test file
#7#*/
#8#typedef char* cp;
#9#
#10#void myFunc (int* i, ABC<int, X<double> > o)
#11#{
#12# //...
#13#}
#14#
#15#class B {
#16#};
Once all the macros and comments are removed, I will be left with thousands of line in which few hundred will be the original source code.
Is this approach correct ? Am I missing any corner case ?

You realize that g++ -E adds some of its own lines to its output which indicate line numbers in the original file? You'll find lines like
# 2 "foo.cc" 2
which indicate that you're looking at line 2 of file foo.cc . These lines are inserted whenever the regular sequence of lines is disrupted.

The imake program that used to come with X11 sources used a faintly similar system, marking the ends of lines with ## so that it could post-process them properly.
The output from gcc -E usually includes #line directives; you could perhaps use those instead of your symbols.

Related

Include string in file on compilation

I work on a team project using a teensy and matlab, and to avoid version differences (e.g one person loads the teensy with version A, and the person now using it with matlab has version B of the code), I'd like to send a version string on pairing.
However, I want the version string to sit in a shared file between the matlab code and the teensy, and every time the program is loaded to the teensy, have it included on compilation as a constant.
Sort of like:
const string version = "<included file content>";
The matlab on its part can read it at runtime.
I thought of using a file whose contents are an assignment to a variable whose name is shared both by teensy and matlab, however I would prefer a more elegant solution if such exists, especially one that doesn't include executing code from an external file at runtime.
One way is just to have a simple setup like so:
version.inc:
"1.0.0rc1";
main.cpp:
const string version =
#include "version.inc"
...
Note that the newline between the = and the #include is in place to keep the compiler happy. Also, if you don't want to include the semicolon in the .inc file, you can do this:
main.cpp:
const string version =
#include "version.inc"
; // Put the semicolon on a newline, again to keep the compiler happy
EDIT: Instead of a .inc file, you can really have any file extension you desire. It's all up to taste
EDIT: If you really wanted to, you could omit the quotes from the .inc file, but that would lead to messy code like this:
version.inc:
STRINGIFY(
1.0.0rc1
);
main.cpp:
#define STRINGIFY(X) #X
const string version =
#include "version.inc"
...
EDIT:
As #Ôrel pointed out, you could handle the generation of a version.h or similar in your Makefile. Assuming you're running a *nix system, you could try a setup like this:
Makefile:
...
# "1.0.0rc1"; > version.h
echo \"`cat version.inc`\"\; > version.h
...
version.inc:
1.0.0rc1
main.cpp:
const string version =
#include "version.h"

Find function/variable definition for a reference in source code

My project contains many files.
Sometimes I need to know where a particular function is defined (implemented) in source code. What I currently do is text search within source files for the function name, which is very time consuming.
My question is: Is there a better way (compiler/linker flag) to find that function definition in source files?.... Since the linker has gone through all the trouble of resolving all these references already.
I am hoping for method better than stepping into a function call in debugger, since a function can be buried within many calls.
Try cscope utility.
From the manual:
Allows searching code for:
all references to a symbol
global definitions
functions called by a function
functions calling a function
text string
regular expression pattern
a file
files including a file
Curses based (text screen)
An information database is generated for faster searches and later reference
The fuzzy parser supports C, but is flexible enough to be useful for C++ and Java, and for use as a generalized 'grep database' (use it to browse large text documents!)
Has a command line mode for inclusion in scripts or as a backend to a GUI/frontend
Runs on all flavors of Unix, plus most monopoly-controlled operating systems.
A "screenshot":
C symbol: atoi
File Function Line
0 stdlib.h <global> 86 extern int atoi (const char *nptr);
1 dir.c makefilelist 336 dispcomponents = atoi(s);
2 invlib.c invdump 793 j = atoi(term + 1);
3 invlib.c invdump 804 j = atoi(term + 1);
4 main.c main 287 dispcomponents = atoi(s);
5 main.c main 500 dispcomponents = atoi(s);
6 stdlib.h atoi 309 int atoi (const char *nptr) __THROW
Find this C symbol:
Find this global definition:
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:
If the symbol is exported, then you could wire up objdump or nm and look at the .o files. This is not useful for finding things in header files though.
My suggestion would be to put your project in git (which carries numerous other advantages) and use git grep which looks only at those files under git's revision control (meaning you don't grep object files and other irrelevances). git grep is also nice and quick.

Compiling arrays stored in external text files (C++ compiled using command line g++)

I am a novice c++ programmer so please forgive me if this is a naive question. I have files containing large arrays holding tens-of-thousands of strings that I have used previously in javascript applications. Is there some way to include these into C++ source code so that the arrays are compiled along with the code?
At present, the files are formatted as functions that return (javascript) literal arrays, like this:
// javascript array stored in .js text file
function returnMyArray()
{
return ["string1", "string2", "string3", ... "stringBigNumber"];
} // eof returnMyArray()
I 'include' the external file with the usual javascript script & src tags and assign the array with something like:
myArray = returnMyArray();
I want to achieve the equivalent in c++, i.e. assign an array stored in a file to an array in my c++ source code so that the data is available for execution when compiled.
I suppose in theory I could copy and paste (suitable formatted) arrays from files into my c++ source code but they are too large for this to be practical.
I can easily re-write the files to whatever format would be easiest to have c++ access the data - either in c++ array syntax or one string per line to be read into an array.
In a similar vein, is there an easy way to include files containing custom function libraries when compiling with g++ in terminal? (my web searches show plenty of ways for various IDE applications but I am writing source in vim and compiling with g++ on the command line).
I am sorry if this is trivial and I have missed it but I am stumped!
Thank you.
Here's how I'd structure this:
file: data.array
/* C++ style comments are ok in this file and will be ignored
* both single and multiline comments will work */
// the data in the array is a comma seperated list, lines can be any length
1, 2, 3, 4,
5, 6, 7, 8,
9, 10, 11, 12,
// more comma seperated data
9996, 9997, 9998, 9999
file: class.h
extern int myArray[]; // you should fill in the size if you can
// more stuff here
file: class.cpp
// if you have an editor that highlights syntax and errors, it may not like this
// however, #include is handled before compiling and performs a blind substitution
// so this is perfectly legal and should compile.
// Visual C++ 2010 highlights this as an error, but the project builds fine.
int myArray[]
{
#include "data.array"
};
// other definitions of stuff in class.h

Reset the C/C++ preprocessor #line the physical file/line

I have a code generator that's going to take some user-written code and embed chunks of it in a larger generated file. I want the underlying compiler to provide good diagnostics when there are defects in the user's code, but I also don't want defects in the generated code to be misattributed to the source when they shouldn't be.
I intend to emit #line lineNum "sourceFile" directives at the beginning of each chunk of user-written code. However, I can't find any documentation of the #line directive that mentions a technique for 'resetting' __LINE__ and __FILE__ back to the actual line in the generated file once I leave the user-provided code. The ideal solution would be analogous to the C# preprocessor's #line default directive.
Do I just need to keep track of how many lines I've written and manually reset that myself? Or is there a better way, some sort of reset directive or sentinel value I can pass to #line to erase the association with the user's code?
It looks like this may have been posed before, though there's no solid answer there. To distinguish this from that, I'll additionally ask whether the lack of answer there has changed with C++11.
A technique I've used before is to have my code generator output a # by itself on a line when it wants to reset the line directives, and then use a simple awk script to postprocess the file and change those to correct line directives:
#!/bin/awk -f
/^#$/ { printf "#line %d \"%s\"\n", NR+1, FILENAME; next; }
{ print; }
Yes, you need to keep track of the number of lines you've output, and you need to know the name of the file you're outputting into. Remember that the line number you specify is the line number of the next line. So if you've written 12 lines so far, you need to output #line 14 "filename", since the #line directive will go on line 13, and so the next line is 14.
There's no difference between the #line preprocessor directive in C and C++.
Suppose the input to the code generator, "user.code", contains the following:
int foo () {
return error1 ();
}
int bar () {
return error2 ();
}
Suppose you want to augment this so it looks basically look like this:
int foo () {
return error1 ();
}
int generated_foo () {
return generated_error1 ();
}
int bar () {
return error2 ();
}
int generated_bar () {
return generated_error2 ();
}
Except you don't want that. You want to add #line directives to the generated code so that the compiler messages indicate whether the errors / warnings are from the user code or the autogenerated code. The #line directive indicates the source of the next line of code (rather than the line containing the #line directive).
#line 1 "user.code"
int foo () {
return error1 ();
}
#line 7 "generated_code.cpp" // NOTE: This is line #6 of generated_code.cpp
int generated_foo () {
return generated_error1 ();
}
#line 5 "user.code"
int bar () {
return error2 ();
}
#line 17 "generated_code.cpp" // NOTE: This is line #16 of generated_code.cpp
int generated_bar () {
return generated_error2 ();
}
#Novelocrat,
I had asked this question here before, and no solid answers were posted, but I figured out that if line directives are inserted in the auto-generated code that points to the user code, then this makes the auto-generated code hard to relocate. You have to keep the auto-generated and user code in the locations where the compiler can find them for reporting errors. I thought it was better to simply insert the file name and line numbers of the user code in the generated code. In good text editors it is a matter of a couple of keystrokes to jump to a line in a file by placing the cursor on the file name.
Eg: in vim placing the cursor on the file-name and pressing g-f takes you to the file, and :42 takes you to the line 42 (say) that had the error.
Just posting this bit here, so that someone else coming up with the same questions might consider this alternative too.
Have you tried what __LINE__ and __FILE__ give you? I believe they are taken from your #line directives (what would be the point if not?).
(A quick test with gcc-4.7.2 and clang-3.1 confirms my hunch).

Compile a program with local file embedded as a string variable?

Question should say it all.
Let's say there's a local file "mydefaultvalues.txt", separated from the main project. In the main project I want to have something like this:
char * defaultvalues = " ... "; // here should be the contents of mydefaultvalues.txt
And let the compiler swap " ... " with the actual contents of mydefaultvalues.txt. Can this be done? Is there like a compiler directive or something?
Not exactly, but you could do something like this:
defaults.h:
#define DEFAULT_VALUES "something something something"
code.c:
#include "defaults.h"
char *defaultvalues = DEFAULT_VALUES;
Where defaults.h could be generated, or otherwise created however you were planning to do it. The pre-processor can only do so much. Making your files in a form that it will understand will make things much easier.
The trick I did, on Linux, was to have in the Makefile this line:
defaultvalues.h: defaultvalues.txt
xxd -i defaultvalues.txt > defaultvalues.h
Then you could include:
#include "defaultvalues.h"
There is defined both unsigned char defaultvalues_txt[]; with the contents of the file, and unsigned int defaultvalues_txt_len; with the size of the file.
Note that defaultvalues_txt is not null-terminated, thus, not considered a C string. But since you also have the size, this should not be a problem.
EDIT:
A small variation would allow me to have a null-terminated string:
echo "char defaultvalues[] = { " `xxd -i < defaultvalues.txt` ", 0x00 };" > defaultvalues.h
Obviously will not work very well if the null character is present inside the file defaultvalues.txt, but that won't happen if it is plain text.
One way to achieve compile-time trickery like this is to write a simple script in some interpreted programming language(e.g. Python, Ruby or Perl will do great) which does a simple search and replace. Then just run the script before compiling.
Define your own #pramga XYZ directive which the script looks for and replaces it with the code that declares the variable with file contents in a string.
char * defaultvalues = ...
where ... contains the text string read from the given text file. Be sure to compensate for line length, new lines, string formatting characters and other special characters.
Edit: lvella beat me to it with far superior approach - embrace the tools your environment supplies you. In this case a tool which does string search and replace and feed a file to it.
Late answer I know but I don't think any of the current answers address what the OP is trying to accomplish although zxcdw came really close.
All any 7 year old has to do is load your program into a hex editor and hit CTRL-S. If the text is in your executable code (or vicinity) or application resource they can find it and edit it.
If you want to prevent the general public from changing a resource or static data just encrypt it, stuff it in a resource then decrypt it at runtime. Try DES for something small to start with.