Any utility to test expand C/C++ #define macros? - c++

It seems I often spend way too much time trying to get a #define macro to do exactly what i want. I'll post my current dilemma below and any help is appreciated. But really the bigger question is whether there is any utility someone could recommend, to quickly display what a macro is actually doing? It seems like even the slow trial and error process would go much faster if I could see what is wrong.
Currently, I'm dynamically loading a long list of functions from a DLL I made. The way I've set things up, the function pointers have the same nanes as the exported functions, and the typedef(s) used to prototype them have the same names, but with a prepended underscore. So I want to use a define to simplify assignments of a long long list of function pointers.
For example, In the code statement below, 'hexdump' is the name of a typedef'd function point, and is also the name of the function, while _hexdump is the name of the typedef. If GetProcAddress() fails, a failure counter in incremented.
if (!(hexdump = (_hexdump)GetProcAddress(h, "hexdump"))) --iFail;
So let's say I'd like to replace each line like the above with a macro, like this...
GETADDR_FOR(hexdump )
Well this is the best I've come up with so far. It doesn't work (my // comment is just to prevent text formatting in the message)...
// #define GETADDR_FOR(a) if (!(a = (#_#a)GetProcAddress(h, "/""#a"/""))) --iFail;
And again, while I'd APPRECIATE an insight into what silly mistake I've made, it would make my day to have a utility that would show me the error of my ways, by simply plugging in my macro.

Go to https://godbolt.org/. Enter your code in the left pane and select compiler as gcc put the argument as -E in the right pane. Your pre-processed code will appear on the right.

You can just run your code through the preprocessor, which will show you what it will be expanded into (or spit out errors as necessary):
$ cat a.c
#define GETADDR_FOR(a) if (!(a = (#_#a)GetProcAddress(h, "/""#a"/"")))
GETADDR_FOR(hexdump)
$ gcc -E a.c
# 1 "a.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "a.c"
a.c:1:36: error: '#' is not followed by a macro parameter
GETADDR_FOR(hexdump)
In GCC, it's gcc -E foo.c to only preprocess the file.
Visual Studio uses the /P argument.

http://visualstudiogallery.msdn.microsoft.com/59a2438f-ba4a-4945-a407-a1a295598088 - visual studio plugin to expand macroses

You appear to be confused about what the exact syntax is for stringifying or token pasting in C preprocessor macros.
You might find this page about C preprocessor macros in general helpful.
In particular, I think this macro should read like this:
#define GETADDR_FOR(a) if (!(a = (_##a)GetProcAddress(h, #a))) --iFail
The trailing ; should be skipped because you will likely be typing this as GETADDR_FOR(hexdump);, and if you don't it will look very strange in your C code and confuse many syntax highlighters.
And as someone else mentioned gcc -E will run the preprocessor and skip the other compilation steps. This is useful for debugging preprocessor problems.

You might want to take a look at Boost Wave. Like most of Boost, it's really more a library than a utility, but it does have a driver to act as a complete preprocessor.

Related

How do I strip out inactive #if directives with the gcc/g++ preprocessor?

I am using a third party open source project and need to strip out the inactive #ifs, #ifdefs, etc to better understand the code flow.
Is there a way to use make to produce versions of the source files without these directives? I'd like to avoid expanding macros, just remove directives.
I was looking at
https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html
and it seems like -dD and -fdirectives-only are good options to start.
Where will these preprocessed files appear? Where do I add these commands for use with a Makefile and "make"?
I tried running "make -n" to produce a script and adding options to the g++ and gcc calls in the script after -Wformat among other things, but I dont notice anything.
I'm not sure if this complicates anything, but I am also using avr-gcc and avr-g++.
I have looked at coan, which does not support #included #defines so it would not work for this purpose, and I could not get sunifdef to work. Is there is a way of doing this with the preprocessor.
The defines are scattered among the current file, the included files, and included makefiles that specify -Dfoo=opt options.
You're on the right track with your preprocessor options. -D will define a macro with a value of 1, -U will cancel any previous definition (it will become undefined), and -fdirectives-only will suppress macro expansion. In addition to those, you can use the -E flag with gcc to tell it to provide the preprocessor output as separate files for your examination. However, I don't think they're going to be quite what you expect. The CPP (C pre-processor) output may have other things added to it, as suggested by this SO question, and you should check the gnu CPP output manual page. That is what you will get from the CPP.
It sounds like you want to be able to strip this extraneous code once and develop from there. To do that, I would encourage you to give unifdef another try. This is what unifdef was designed to do, while the CPP was designed to prepare code for compilation. They're different tasks, so you should use the right tools for them. It is available as a standalone application at http://dotat.at/prog/unifdef/ and is built into some Linux Shells.
It allows you to specify macros that you want it to consider defined or undefined, and it removes blocks of code where the conditional directive would evaluate to false. For example, you can run it like this:
unifdef -I< path > -DMACRO1 -UMACRO2
It will search through the directory specified by < path > through C/C++ source files, looking for #if, #ifdef, #ifndef, etc. When it encounters them, it will evaluate the conditional expression and selectively remove the code controlled by that expression. Consider an input file with this code:
int i = 0;
#ifdef MACRO1
int j = 0;
#endif /* ifdef MACRO1 */
int k = 0;
int m = 0;
#if (MACRO1 && MACRO2)
int n = 0;
#endif /* if (MACRO1 && MACRO2) */
int p = 0;
int q = 0;
#ifdef MACRO3
int r = 0;
#endif /* ifdef MACRO3 */
int t = 0;
If we call unifdef like my example above, the output will be this:
int i = 0;
int j = 0;
int k = 0;
int m = 0;
int p = 0;
int q = 0;
#ifdef MACRO3
int r = 0;
#endif /* ifdef MACRO3 */
int t = 0;
Notice that the declaration of n has been removed, because it was contained in a preprocessor #if/#endif block whose controlling expression evaluated to false (we told unifdef to consider MACRO2 undefined). The declaration of j remains, but the #ifdef and #endif statements were removed because the controlling expression was known to be true.
The block that depends on MACRO3 is left untouched because its state is unknown.
There is a significant amount of flexibility and control over how this runs, too.
If you decided you do want it to be part of your build process, you can always add it to your makefile.
If you do not have a list of which macros should be defined or undefined available, you can use the "unifdefall" script provided with unifdef and it will use the CPP to discover macro definitions in the source code on its own, and remove/keep code blocks according to the definitions contained in the source code.
TL;DR
Yes you can (sort of) do it with the preprocessor. But unifdef and sunifdef are tools that are made to do exactly this, so you should use them instead.
Assumptions
The aim of the exercise is to produce a body of C/C++ source code with most of the conditional compilation removed, and which compiles to the identical binaries.
This is third party source code, and you are aware of the problems of merging subsequent updates.
This is open source, but you have no intention of ever distributing modified source code.
The programs are arbitrarily complex and are built by arbitrarily complex makefiles or similar tools, with command-line symbol definitions and/or configuration include files.
My strategy is to use a program like unifdef. The first time I did this I wrote my own, and you may have to modify the program to produce the desired results.
The core strategy is:
Identify a single likely defined symbol (experimentation or trial and error required).
Run the code through unifdef.
Optionally, compare before and after source visually to spot obvious problems.
Build the after version to ensure it builds correctly.
Compile the before and after versions to produce pre-processed output using the same makefiles.
Compare pairs of before and after pre-processed source. They should be identical, give or take some white space.
Resolve issues by editing either before or after version as required.
Optionally, remove all references to the symbol from all makefiles. [It should make no difference.]
Repeat, using the after version and a different symbol.
One symbol at a time, testing thoroughly every time. Some symbols may turn out to be too hard, and if you have much more than a million lines of source code and a hundred or so symbols it can all get out of hand.
Final step: if you modify unifdef then feel free to contribute your changes back to the community. This is a seriously challenging task to do well!
Use make -n to create the shell script produced by the makefile.
Go to the line where it runs avr-g++ and add -dM -E before all the rest of the options.
Go to the file after the -o and the list of #defines will be there (it should probably be something.o)
Use unifdef -f definesFile.o filename

Can I get the C++ preprocessor to send output during compilation?

I have been debugging a particularly insidious bug which I now believe to be caused by unexpected changes which stem from different behavior when different headers are included (or not).
This is not exactly the structure of my code but let's just take a look at this scenario:
#include "Newly_created_header_which_accidentally_undefines_SOME_DEFINE.h"
// ...
#ifdef SOME_DEFINE
code_which_i_believe_i_am_always_running();
#else
code_which_fails_which_i_have_forgotten_about(); // runtime error stack traces back here, but I don't know this... or maybe it's some strange linker error
#endif
I search through my git commits and narrow down the cause of the bug, compiling and running my code countless times, only to find after several hours that the only difference required for causing the bug is the inclusion of what appears to be a completely benign and unrelated header.
Perhaps this is a great argument for why the preprocessor basically just sucks.
But I like it. The preprocessor is cool because it lets us make shortcuts. It's only that some of these shortcuts, when not used carefully, bite us in the butt pretty hard.
So at this juncture it would have helped if I could use a directive like #echo "Running old crashy code" where I'll be able to see this during compilation so I could be tipped off immediately to start investigating why SOME_DEFINE was not defined.
As far as I know the straightforward way of determining if SOME_DEFINE is defined is to do something like
#ifndef SOME_DEFINE
printf("SOME_DEFINE not defined!!\n");
This will surely get the job done but there is no good reason for this task to be performed at runtime because it is entirely determined at compile-time. This is simply something I'd like to see at compile-time.
That being said, in this situation, using the print (or log or even throwing an exception) may be an acceptable thing to do because I won't really care about slowing down or cluttering up the questionable code. But that doesn't apply if I have for instance two code paths both of which are important, and I just want to know at compile-time which one is being activated. I'd have to worry about running the code that does the preprocessor-conditioned print at the beginning of the program.
This is really just a long-winded way of asking the question, "Can I echo a string to the output during compilation by using a preprocessor directive?"
If you use the #error directive, the output will be printed directly and the compilation will stop:
$ make days_in_month
cc days_in_month.c -o days_in_month
days_in_month.c:2:2: error: #error "ugly!"
make: *** [days_in_month] Error 1
$
This might not be quite what you wanted, but it gets the job done quickly.
$ cat days_in_month.c
#include <stdio.h>
#error "ugly!"
...
If you wish processing to continue, you can use #warning:
$ make days_in_month
cc days_in_month.c -o days_in_month
days_in_month.c:2:2: warning: #warning "ugly!" [-Wcpp]
$ head days_in_month.c
#include <stdio.h>
#warning "ugly!"
Answer more in line with what I was looking for is here: https://stackoverflow.com/a/3826876/340947
Sorry #sarnold

How To Extract Function Name From Main() Function Of C Source

I just want to ask your ideas regarding this matter. For a certain important reason, I must extract/acquire all function names of functions that were called inside a "main()" function of a C source file (ex: main.c).
Example source code:
int main()
{
int a = functionA(); // functionA must be extracted
int b = functionB(); // functionB must be extracted
}
As you know, the only thing that I can use as a marker/sign to identify these function calls are it's parenthesis "()". I've already considered several factors in implementing this function name extraction. These are:
1. functions may have parameters. Ex: functionA(100)
2. Loop operators. Ex: while()
3. Other operators. Ex: if(), else if()
4. Other operator between function calls with no spaces. Ex: functionA()+functionB()
As of this moment I know what you're saying, this is a pain in the $$$... So please share your thoughts and ideas... and bear with me on this one...
Note: this is in C++ language...
You can write a Small C++ parser by combining FLEX (or LEX) and BISON (or YACC).
Take C++'s grammar
Generate a C++ program parser with the mentioned tools
Make that program count the funcion calls you are mentioning
Maybe a little bit too complicated for what you need to do, but it should certainly work. And LEX/YACC are amazing tools!
One option is to write your own C tokenizer (simple: just be careful enough to skip over strings, character constants and comments), and to write a simple parser, which counts the number of {s open, and finds instances of identifier + ( within. However, this won't be 100% correct. The disadvantage of this option is that it's cumbersome to implement preprocessor directives (e.g. #include and #define): there can be a function called from a macro (e.g. getchar) defined in an #include file.
An option that works for 100% is compiling your .c file to an assembly file, e.g. gcc -S file.c, and finding the call instructions in the file.S. A similar option is compiling your .c file to an object file, e.g, gcc -c file.c, generating a disassembly dump with objdump -d file.o, and searching for call instructions.
Another option is finding a parser using Clang / LLVM.
gnu cflow might be helpful

Declaring the Unix flavour in C/C++

How do I declare in C/C++ that the code that is written is to be built in either HP-UX or Solaris or AIX?
I found that, a good way to figure this king of question, is, at least with gcc, to have this makefile:
defs:
g++ -E -dM - < /dev/null
then, :
$ make defs
should output all the definitions you have available.
So:
$ make defs | grep -i AIX
$ make defs | grep -i HP
should give you the answer. Example for Linux:
$ make defs | grep -i LINUX
#define __linux 1
#define __linux__ 1
#define __gnu_linux__ 1
#define linux 1
Once you found the define you are looking for, you type at the beginning of your code:
#if !(defined(HP_DEFINE) || defined(AIX_DEFINE) || defined(SOLARIS_DEFINE))
# error This file cannot be compiled for your plateform
#endif
How about a macro passed to the compiler ?
i.e. gcc -Dmacro[=defn]
Then test for the macro in your code with a simple #ifdef of #if (if you've given it a value). There may already be a predefined macro for your target platform as well.
[EDIT: Put some of my comments here in my answer that explain how -D works]
-Dmacro[=defn] on the command line for the compiler is the same as having #define macro defn in the code. You expand it out like this: -Dfoo=bar is equivalent to #define foo bar. Also, the definition is optional so -Dfoo is equivalent to #define foo.
Be careful about how you handle this. You should identify the features of the O/S that you want to use by feature, not by O/S, and write your code accordingly. Then, in one header, you can identify which of the features are available on the O/S that you are compiling on. This is the technique used by autoconf, and even if you do not use autoconf itself, the technique it espouses is better than the platform-based technique. Remember, the features found on one O/S often migrate and become available on others too, so if you work by features, you can adapt to the future more easily than if you work solely on the O/S.
You also have to write your code appropriately, and portably. Isolate the O/S dependencies in separate files whenever possible, and code to an abstract O/S interface that does what you need. Taken to an extreme, you end up with a Java JVM; you don't need to go that far, but you can obviate most of the problems.
Take a look at portable libraries like the Apache Portable Runtime (APR) library.
And write your code along the lines of:
#ifdef HAVE_PWRITE
...code using pread() and pwrite()...
#else
...code using plain old read() and write()...
#endif
This is a grossly over-simplified example - there could be a number of fallbacks before you use plain read() and write(). Nevertheless, this is the concept used in the most portable code - things like GCC and Apache and so on.
Perhaps a less convoluted solution that some of those suggested is to consult Pre-defined C/C++ Compiler Macros. This site provides an extensive list of compiler macros for a large number of compiler/OS/Architecture combinations.

finding a function name and counting its LOC

So you know off the bat, this is a project I've been assigned. I'm not looking for an answer in code, but more a direction.
What I've been told to do is go through a file and count the actual lines of code while at the same time recording the function names and individual lines of code for the functions. The problem I am having is determining a way when reading from the file to determine if the line is the start of a function.
So far, I can only think of maybe having a string array of data types (int, double, char, etc), search for that in the line and then search for the parenthesis, and then search for the absence of the semicolon (so i know it isn't just the declaration of the function).
So my question is, is this how I should go about this, or are there other methods in which you would recommend?
The code in which I will be counting will be in C++.
Three approaches come to mind.
Use regular expressions. This is fairly similar to what you're thinking of. Look for lines that look like function definitions. This is fairly quick to do, but can go wrong in many ways.
char *s = "int main() {"
is not a function definition, but sure looks like one.
char
* /* eh? */
s
(
int /* comment? // */ a
)
// hello, world /* of confusion
{
is a function definition, but doesn't look like one.
Good: quick to write, can work even in the face of syntax errors; bad: can easily misfire on things that look like (or fail to look like) the "normal" case.
Variant: First run the code through, e.g., GNU indent. This will take care of some (but not all) of the misfires.
Use a proper lexer and parser. This is a much more thorough approach, but you may be able to re-use an open source lexer/parsed (e.g., from gcc).
Good: Will be 100% accurate (will never misfire). Bad: One missing semicolon and it spews errors.
See if your compiler has some debug output that might help. This is a variant of (2), but using your compiler's lexer/parser instead of your own.
Your idea can work in 99% (or more) of the cases. Only a real C++ compiler can do 100%, in which case I'd compile in debug mode (g++ -S prog.cpp), and get the function names and line numbers from the debug information of the assembly output (prog.s).
My thoughts for the 99% solution:
Ignore comments and strings.
Document that you ignore preprocessor directives (#include, #define, #if).
Anything between a toplevel { and } is a function body, except after typedef, class, struct, union, namespace and enum.
If you have a class, struct or union, you should be looking for method bodies inside it.
The function name is sometimes tricky to find, e.g. in long(*)(char) f(int); .
Make sure your parser works with template functions and template classes.
For recording function names I use PCRE and the regex
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
and then filter out names like "if", "while", "do", "for", "switch". Note that the function name is (\w+), group 1.
Of course it's not a perfect solution but a good one.
I feel manually doing the parsing is going to be a quite a difficult task. I would probably use a existing tool such as RSM redirect the output to a csv file (assuming you are on windows) and then parse the csv file to gather the required information.
Find a decent SLOC count program, eg, SLOCCounter. Not only can you count SLOC, but you have something against which to compare your results. (Update: here's a long list of them.)
Interestingly, the number of non-comment semicolons in a C/C++ program is a decent SLOC count.
How about writing a shell script to do this? An AWK program perhaps.