Pycparser not working on preprocessed code - python-2.7

I need to use pycparser on preprocessed C code (the results produced by 'gcc -E'). However I am currently running into issue that I can't understand or solve.
I am using the provided samples year2.c and func_defs.py, which i modified to use a variety of preprocessors and fake libraries to no avail. Maybe some of you can look into this and see if you can reproduce/solve the issue. I will append all necessary code.
The errors were generated using year2.c (regular sample file) and year2.i ('gcc -E' output). There was no useable result for the latter while the former worked with both preprocessor/fakelib variants.
I have created a bitbucket repo with all relevant errors, the script used (albeit only its last variation) and the year2.c and year2.i files.
Error & Sample Repo
Thanks for your time.

The error you're getting is:
pycparser.plyparser.ParseError: /usr/lib/gcc/x86_64-linux-gnu/4.8/include/stdarg.h:40:27: before: __gnuc_va_list
The line indicated as causing the error (stdarg.h:40):
typedef __builtin_va_list __gnuc_va_list;
In gcc, __builtin_va_list is, as its name indicates, built in to the compiler. Consequently, no declaration of that type is necessary (or allowed).
It's pretty common for C compilers to use a symbol-table-based technique to parse typenames, since there are a number of ambiguities in the grammar if you cannot distinguish a typename from another identifier. Such a parser will assume that an undeclared identifier is not a typename, and if __builtin_va_list is not a typename, that typedef is a syntax error.
So I suppose that the pyparser grammar you're using doesn't know about gcc builtin types (and why should it?).
Your fakelib seems to be including the same header file. That's not surprising since it is hard to fake stdarg.h; although technically a library header, it is part of the small set of headers which must be provided by the compiler even in a freestanding (no standard library) implementation: <float.h>, <iso646.h>, <limits.h>, <stdalign.h>,
<stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and
<stdnoreturn.h> (C11 standard, clause 4, paragraph 6). These must be implemented by the compiler because there is no way an external library can know enough about the nature of the compiled code to properly define them.
Depending on what you require from the pyparsed output, you may be able to workaround this for pyparser by including a definition of __builtin_va_list, such as:
typedef struct __builtin_va_list { } __builtin_va_list;
__builtin_va_list is not the only builtin gcc datatype, although you may not run into the other ones. So you might have to iterate this solution a few times until you achieve whatever it is you are trying to achieve.

As #rici has explained the cause of the error. I'd focus more on how to solve it. I've taken my answer from pycparser author's blog -
http://eli.thegreenplace.net/2015/on-parsing-c-type-declarations-and-fake-headers
The idea is that pycparser needs to know what anyheader.h contains so it can properly parse the code. As actually parsing anyheader.h and all the other headers it transitively includes, could be very time consuming and perhaps not required for your task, fakeheaders can be used. A fake anyheader.h will only contain the parts of the original that are necessary for parsing - the #defines and the typedefs.
gcc -nostdinc -E -I/home/rg/pycparser-master/utils/fake_libc_include test.c > testPP.c
The above command preprocess test.c which contains <stdio.h> using fake headers provided with pycparser package. -nostdinc flag is used to block some pre-set system header directories that gcc automatically includes. Now, parsing the preprocessed file, using e.g. below code
import pycparser
pycparser.parse_file('testPP.c')
should work in the most of the cases. If it doesn't make sure you provide all the dependencies for preprocessing.
In case, for some headers fakes are not provided, you can fake error causing typedef using #defining e.g. to resolve an error caused by __builtin_va_list, you can try faking it as follows:
gcc -nostdinc -E -D'__builtin_va_list(x)=' -I/home/rg/pycparser-master/utils/fake_libc_include test.c > testPP.c

Related

How to enable _USE_UNIX98 (Gcc/C++ v2.96)

I've a C++ application that works in actual compilers (I compile it with eclipse).
Now, I need compile it on a very old compiler version (gcc/c++ v2.96) on a Redhat 7.3 with Kdevelop.
When I compile the app it gives the following error: swprintf undeclared.
wchar.h header it's included, but I saw this file in the RH7.3 OS and only declare this function if __USE_UNIX98 __USE_ISOC99 are declared.
How can I enable __USE_UNIX98?
GNU libc defines the features that should be enabled in all of its headers using a special system header <features.h>. If you define the appropriate macros, <features.h> will define __USE_UNIX98 for you.
The typical way to get all functions, regardless of what standard (if any) covers them, is by adding -D_GNU_SOURCE on the command-line. Getting only the functions covered by a specific standard requires defining the macro as specified in that standard using the value specified in that standard, such as -D_POSIX_C_SOURCE=200112L. The precise values that are supported on your particular implementation are probably easiest found by inspecting /usr/include/features.h manually.
From inspection of <features.h> defining _XOPEN_SOURCE to 500 or greater will cause __USE_UNIX98 to be defined

gcc Make sure that no c++ isms are compiled?

If I use gcc as a driver, call all my source files .c and .h, can I be sure that I wont have any C++ source in my sources? Are there any gcc parameters to make sure that he throws errors in case any c++ is encountered in the source?
I am especially paranoid about include files, because I am not 100% sure whether I include C headers or C++ headers.
Some examples I ran into in the past:
trying to use the type bool
using wrong includes cstdio vs. stdio.h
trouble with the struct keyword
I just want to make 100% sure that my source is only C and has no C++ in it.
GCC will figure out itself whether it's a C or a C++ source code. How? It scans the file extension the file you passed has.
These are the extensions accepted.
In case you want to force a specific language, use the -x flag (documented in the link above). Furthermore, you may check whether the macro __cplusplus is defined.

How can I ensure no code uses an API?

I want to ban use of iostreams in a code base I have (for various reasons). Is there a way I can inspect symbol files or force the compiler to emit an error when that API is used?
A simple approach is provide a dummy iostream implementation that does nothing but throw a compile-time error.
The following example assumes a GCC toolchain - I imagine the process is similar with other compilers.
First, create your dummy iostream file:
#error 'Use of iostream is prohibited'
Some dummy application code to demonstrate:
#include <iostream>
int main (int argc, char** argv) {
std::cout << "foo!";
return 0;
}
Compile as follows (assuming the dummy iostream and main.cpp are in the working directory):
g++ -I. main.cpp
Compilation fails with the following errors:
In file included from main.cpp:2:0:
./iostream:1:2: error: #error 'Use of iostream is prohibited'
main.cpp: In function 'int main(int, char**)':
main.cpp:4:2: error: 'cout' is not a member of 'std'
Added bonus: symbols usually declared in that file (e.g. cout here) are undefined, and so get flagged in the compiler output as well. As such, you also get pointers to exactly where you're using your prohibited API.
UPDATE: Instructions for Visual C++ 2012.
As #RaymondChen points out in the comments below, a solution tailored to Visual C++ is likely more useful to the OP. As such, the following outlines the process I went through to achieve the same as the above under Visual C++ 2012.
First, create a new console project, using the above C++ code. Also create the dummy iostream header I described above, and place it in a directory somewhere easy to find (I put mine in the main project source directory).
Now, in the Solution Explorer, right click on the project node and select "Properties" from the drop-down list. In the dialog that appears, select "VC++ Directories" from the tree on the left. Prepend the directory containing the dummy iostream file into the list of include directories that appears on the right, separated from the other directories with a semicolon:
Here, my project was called TestApp1, and I just prepended its main directory to the $(IncludePath) that was already there. Note that it is important to prepend rather than append - the order of the directories in the list determines the search order, so if $(IncludePath) appears before your custom directory, the system header will be used in preference to your dummy header - not what you want.
Click OK, and rebuild the project.
For me, doing so resulted in the following errors in the VC++ console (edited slightly for brevity):
error C1189: #error : 'Use of iostream is prohibited'
IntelliSense: #error directive: 'Use of iostream is prohibited'
IntelliSense: namespace "std" has no member "cout"
Note that IntelliSense also picks up the (now) illegal use of cout - it is highlighted with an error mark in the editor.
This is a nasty hack, but it should work.
The C standard (and consequently the C++ standard as well) allows preprocessor tokens in #include directives. This is also known as "computed includes".
Thus, adding something like -Diostream to CFLAGS inside your makefile (or to compiler options in your IDE's project settings) should reliably break the build if someone tries to use iostream.
Of course, with an empty macro, the error message will not be very informative, but you could instead use something like -Diostream=DUDE_DONT_USE_IOSTREAM, which will show an error like: DUDE_DONT_USE_IOSTREAM: file not found.
It's also something that you can turn off again without much hassle if you change your mind later. Just remove the build option.
Your idea to inspect symbol files is feasible and very realistic. virtual ~ios_base(); is a single method that all streams will inherit, and which can't easily be inlined since it's virtual and non-trivial. Its presence in an object file is therefore a very strong indication of IOstream use.
In addition to compiler-assist method mentioned by Mac you can use generic search functions. For example (I assume zsh shell - for bash doesn't have ** and on Windows you need to find how to do it with Powershell):
# Find all mentioning on `iostream` `cin` in all files ending in cc in all subdirectories of current directory
grep iostream **/*.c
grep cin **/*.cc
If you don't want to/can't use command line you can use your favourite editor and search for unwanted symbols.
I usually combine both methods:
Compilation, especially of large project with large number of templates, is slow while searching is fast so you're more productive with search
On the other hand search operates is not exact and might miss something. So I'd use header tricks to verify solution done in previous step
As final verification you can search for symbols after compilation. It is especially useful if you compile with no optimization. You can use objdump or similar (depending on platform) and watch for imported symbol (this works if you don't say link statically to something using iostreams).
No, not at all. For a very limited subset, you could provide your own definitions, causing the linker to error at the duplicates. This would be very undefined behaviour though. And a good portion is templates that aren't susceptible to this. Without doing drastic things like deleting the iostream header, or using a compiler like Clang and modifying the source code, there's really nothing you can do.

How to detect if errno_t is defined?

I'm compiling code using gcc that comes from Visual C++ 2008. The code is using errno_t, but in some versions of gcc headers including <errno.h> doesn't define the type. How do I detect if the type is defined? Is there a define that signals that the type was defined? In the case it isn't defined I'd like to provide the typedef to let the code compile correctly on all platforms.
Microsoft's errno_t is redundant. errno is defined by the ISO C standard to be a modifiable lvalue of type int. If your code needs to store errno values, then you should put them into an int.
Do a global search and replace s/errno_t/int/ and you're done.
Edit: Also, you shouldn't be providing a typedef int errno_t in your code, because all names that end with _t are reserved.
You can't check for a typedef the way you can for a macro, so this is a bit on the tricky side. If you're using autoconf, this patch shows the minimum changes that you need to have autoconf check for the presence of errno_t and define it if it's missing (the typedef would be placed in a file that includes your generated config.h and is included by all files that need errno_t). If you're not using autoconf you need to come up with some way to do the same thing within your build system, or a very clever set of tests against compiler version macros.
This is typically the case where GNU autoconf comes to the rescue. Basically autoconf will generate a configure script that can detect various system-dependent features such as whether this type exists and how it is defined. You then include the generated C header file within your application.
If you know which versions of GCC are giving you trouble, you can test for them. You can check for versions of GCC using something like:
#if __GNUC__ == 3
...
#else
...
#endif

How should I detect unnecessary #include files in a large C++ project?

I am working on a large C++ project in Visual Studio 2008, and there are a lot of files with unnecessary #include directives. Sometimes the #includes are just artifacts and everything will compile fine with them removed, and in other cases classes could be forward declared and the #include could be moved to the .cpp file. Are there any good tools for detecting both of these cases?
While it won't reveal unneeded include files, Visual studio has a setting /showIncludes (right click on a .cpp file, Properties->C/C++->Advanced) that will output a tree of all included files at compile time. This can help in identifying files that shouldn't need to be included.
You can also take a look at the pimpl idiom to let you get away with fewer header file dependencies to make it easier to see the cruft that you can remove.
PC Lint works quite well for this, and it finds all sorts of other goofy problems for you too. It has command line options that can be used to create External Tools in Visual Studio, but I've found that the Visual Lint addin is easier to work with. Even the free version of Visual Lint helps. But give PC-Lint a shot. Configuring it so it doesn't give you too many warnings takes a bit of time, but you'll be amazed at what it turns up.
There's a new Clang-based tool, include-what-you-use, that aims to do this.
!!DISCLAIMER!! I work on a commercial static analysis tool (not PC Lint). !!DISCLAIMER!!
There are several issues with a simple non parsing approach:
1) Overload Sets:
It's possible that an overloaded function has declarations that come from different files. It might be that removing one header file results in a different overload being chosen rather than a compile error! The result will be a silent change in semantics that may be very difficult to track down afterwards.
2) Template specializations:
Similar to the overload example, if you have partial or explicit specializations for a template you want them all to be visible when the template is used. It might be that specializations for the primary template are in different header files. Removing the header with the specialization will not cause a compile error, but may result in undefined behaviour if that specialization would have been selected. (See: Visibility of template specialization of C++ function)
As pointed out by 'msalters', performing a full analysis of the code also allows for analysis of class usage. By checking how a class is used though a specific path of files, it is possible that the definition of the class (and therefore all of its dependnecies) can be removed completely or at least moved to a level closer to the main source in the include tree.
I don't know of any such tools, and I have thought about writing one in the past, but it turns out that this is a difficult problem to solve.
Say your source file includes a.h and b.h; a.h contains #define USE_FEATURE_X and b.h uses #ifdef USE_FEATURE_X. If #include "a.h" is commented out, your file may still compile, but may not do what you expect. Detecting this programatically is non-trivial.
Whatever tool does this would need to know your build environment as well. If a.h looks like:
#if defined( WINNT )
#define USE_FEATURE_X
#endif
Then USE_FEATURE_X is only defined if WINNT is defined, so the tool would need to know what directives are generated by the compiler itself as well as which ones are specified in the compile command rather than in a header file.
Like Timmermans, I'm not familiar with any tools for this. But I have known programmers who wrote a Perl (or Python) script to try commenting out each include line one at a time and then compile each file.
It appears that now Eric Raymond has a tool for this.
Google's cpplint.py has an "include what you use" rule (among many others), but as far as I can tell, no "include only what you use." Even so, it can be useful.
If you're interested in this topic in general, you might want to check out Lakos' Large Scale C++ Software Design. It's a bit dated, but goes into lots of "physical design" issues like finding the absolute minimum of headers that need to be included. I haven't really seen this sort of thing discussed anywhere else.
Give Include Manager a try. It integrates easily in Visual Studio and visualizes your include paths which helps you to find unnecessary stuff.
Internally it uses Graphviz but there are many more cool features. And although it is a commercial product it has a very low price.
You can build an include graph using C/C++ Include File Dependencies Watcher, and find unneeded includes visually.
If your header files generally start with
#ifndef __SOMEHEADER_H__
#define __SOMEHEADER_H__
// header contents
#endif
(as opposed to using #pragma once) you could change that to:
#ifndef __SOMEHEADER_H__
#define __SOMEHEADER_H__
// header contents
#else
#pragma message("Someheader.h superfluously included")
#endif
And since the compiler outputs the name of the cpp file being compiled, that would let you know at least which cpp file is causing the header to be brought in multiple times.
PC-Lint can indeed do this. One easy way to do this is to configure it to detect just unused include files and ignore all other issues. This is pretty straightforward - to enable just message 766 ("Header file not used in module"), just include the options -w0 +e766 on the command line.
The same approach can also be used with related messages such as 964 ("Header file not directly used in module") and 966 ("Indirectly included header file not used in module").
FWIW I wrote about this in more detail in a blog post last week at http://www.riverblade.co.uk/blog.php?archive=2008_09_01_archive.xml#3575027665614976318.
Adding one or both of the following #defines
will exclude often unnecessary header files and
may substantially improve
compile times especially if the code that is not using Windows API functions.
#define WIN32_LEAN_AND_MEAN
#define VC_EXTRALEAN
See http://support.microsoft.com/kb/166474
If you are looking to remove unnecessary #include files in order to decrease build times, your time and money might be better spent parallelizing your build process using cl.exe /MP, make -j, Xoreax IncrediBuild, distcc/icecream, etc.
Of course, if you already have a parallel build process and you're still trying to speed it up, then by all means clean up your #include directives and remove those unnecessary dependencies.
Start with each include file, and ensure that each include file only includes what is necessary to compile itself. Any include files that are then missing for the C++ files, can be added to the C++ files themselves.
For each include and source file, comment out each include file one at a time and see if it compiles.
It is also a good idea to sort the include files alphabetically, and where this is not possible, add a comment.
If you aren't already, using a precompiled header to include everything that you're not going to change (platform headers, external SDK headers, or static already completed pieces of your project) will make a huge difference in build times.
http://msdn.microsoft.com/en-us/library/szfdksca(VS.71).aspx
Also, although it may be too late for your project, organizing your project into sections and not lumping all local headers to one big main header is a good practice, although it takes a little extra work.
If you would work with Eclipse CDT you could try out http://includator.com to optimize your include structure. However, Includator might not know enough about VC++'s predefined includes and setting up CDT to use VC++ with correct includes is not built into CDT yet.
The latest Jetbrains IDE, CLion, automatically shows (in gray) the includes that are not used in the current file.
It is also possible to have the list of all the unused includes (and also functions, methods, etc...) from the IDE.
Some of the existing answers state that it's hard. That's indeed true, because you need a full compiler to detect the cases in which a forward declaration would be appropriate. You cant parse C++ without knowing what the symbols mean; the grammar is simply too ambiguous for that. You must know whether a certain name names a class (could be forward-declared) or a variable (can't). Also, you need to be namespace-aware.
Maybe a little late, but I once found a WebKit perl script that did just what you wanted. It'll need some adapting I believe (I'm not well versed in perl), but it should do the trick:
http://trac.webkit.org/browser/branches/old/safari-3-2-branch/WebKitTools/Scripts/find-extra-includes
(this is an old branch because trunk doesn't have the file anymore)
If there's a particular header that you think isn't needed anymore (say
string.h), you can comment out that include then put this below all the
includes:
#ifdef _STRING_H_
# error string.h is included indirectly
#endif
Of course your interface headers might use a different #define convention
to record their inclusion in CPP memory. Or no convention, in which case
this approach won't work.
Then rebuild. There are three possibilities:
It builds ok. string.h wasn't compile-critical, and the include for it
can be removed.
The #error trips. string.g was included indirectly somehow
You still don't know if string.h is required. If it is required, you
should directly #include it (see below).
You get some other compilation error. string.h was needed and isn't being
included indirectly, so the include was correct to begin with.
Note that depending on indirect inclusion when your .h or .c directly uses
another .h is almost certainly a bug: you are in effect promising that your
code will only require that header as long as some other header you're using
requires it, which probably isn't what you meant.
The caveats mentioned in other answers about headers that modify behavior
rather that declaring things which cause build failures apply here as well.