Tool to Verify Format Strings in C/C++ source - c++

I have updated the contents and arguments of printf style format strings in a large C/C++ code base. The code compiles ok but it is hard to actually get those errors and verify that my changes were right.
Is there a tool/compiler option that can validate the format strings for having the right no. of arguments. It would be nice if it doesn't try to compile the whole thing because then the dependencies etc must be present in the expected places.
I could write a quick script but I could reuse something that already exists and handles the corner cases too.
Something like :-
% cat test.c
#include <iostream>
#include "dependency2.h"
int main()
{
function2(log, "You encountered a common error %s: %d", error)
}
% somenicetool test.c
5: too few arguments
I tried clang but the first error it gives is this :-
% clang -fsyntax-only test.c
#include "dependency2.h"
^
file not found
1 error generated

If using gcc and --Wformat, and you have your own functions, you will need to use the __attribute__(format, printf, format_argno, first_var_arg) after the function.
For example:
void log_print(FILE *logfile, int level, const char *format, ...)
__attribute__(format, printf, 3, 4);
gcc also understands "scanf", "strfmon" and "strftime" format specifications, just replace "printf" with whatever suits your function.

With gcc, you can use the --Wformat option:
Check calls to printf and scanf, etc., to make sure that the arguments
supplied have types appropriate to the format string specified, and
that the conversions specified in the format string make sense. This
includes standard functions, and others specified by format attributes
(see Function Attributes), in the printf, scanf, strftime and strfmon
(an X/Open extension, not in the C standard) families (or other
target-specific families).
See the gcc documentation for more details.
Edit: Looking at this more carefully, it looks like you want to check calls to your own functions which possibly forward the calls to printf and friends. You probably have to decorate your function with the format function attribute in order to get the warnings from gcc.

I think CPPCheck picks up those, and plenty, plenty more...
Edit: Hmm. I have a feeling it only works for standard library functions, and doesn't understand the 'Function Attributes' you can use to tell GCC that your 'own' functions use format strings.

One thing to remember about the __attribute__ ((format (printf, n, m))) solution: if your function is a non-static member method of a class, you have to add 1 to both n and m since it's compiled as a simple function with the this pointer as the actual first parameter.

Looks like cppcheck can do it! See manual, section "3.1.4 Format string".
Also, ReSharper can do it.

Related

How can I find all places a given member function or ctor is called in g++ code?

I am trying to find all places in a large and old code base where certain constructors or functions are called. Specifically, these are certain constructors and member functions in the std::string class (that is, basic_string<char>). For example, suppose there is a line of code:
std::string foo(fiddle->faddle(k, 9).snark);
In this example, it is not obvious looking at this that snark may be a char *, which is what I'm interested in.
Attempts To Solve This So Far
I've looked into some of the dump features of gcc, and generated some of them, but I haven't been able to find any that tell me that the given line of code will generate a call to the string constructor taking a const char *. I've also compiled some code with -s to save the generated equivalent assembly code. But this suffers from two things: the function names are "mangled," so it's impossible to know what is being called in C++ terms; and there are no line numbers of any sort, so even finding the equivalent place in the source file would be tough.
Motivation and Background
In my project, we're porting a large, old code base from HP-UX (and their aCC C++ compiler) to RedHat Linux and gcc/g++ v.4.8.5. The HP tool chain allowed one to initialize a string with a NULL pointer, treating it as an empty string. The Gnu tools' generated code fails with some flavor of a null dereference error. So we need to find all of the potential cases of this, and remedy them. (For example, by adding code to check for NULL and using a pointer to a "" string instead.)
So if anyone out there has had to deal with the base problem and can offer other suggestions, those, too, would be welcomed.
Have you considered using static analysis?
Clang has one called clang analyzer that is extensible.
You can write a custom plugin that checks for this particular behavior by implementing a clang ast visitor that looks for string variable declarations and checks for setting it to null.
There is a manual for that here.
See also: https://github.com/facebook/facebook-clang-plugins/blob/master/analyzer/DanglingDelegateFactFinder.cpp
First I'd create a header like this:
#include <string>
class dbg_string : public std::string {
public:
using std::string::string;
dbg_string(const char*) = delete;
};
#define string dbg_string
Then modify your makefile and add "-include dbg_string.h" to cflags to force include on each source file without modification.
You could also check how is NULL defined on your platform and add specific overload for it (eg. dbg_string(int)).
You can try CppDepend and its CQLinq a powerful code query language to detect where some contructors/methods/fields/types are used.
from m in Methods where m.IsUsing ("CClassView.CClassView()") select new { m, m.NbLinesOfCode }

Pass CString to fprintf

I have ran the code analyzer in visual studio on a large code base and i got about a billion of this error:
warning C6284: Object passed as parameter '3' when string is required in call to 'fprintf'
According to http://msdn.microsoft.com/en-us/library/ta308ywy.aspx "This defect might produce incorrect output or crashes." My colleague however states that we can just ignore all these errors without any problems. So one of my questions is do we need to do anything about this or can we just leave it as is?
If these errors need to be solved what is the nicest approach to solve it?
Would it work to do like this:
static_cast<const char*>(someCString)
Is there a better or more correct approach for this?
The following lines generate this warning:
CString str;
fprintf(pFile, "text %s", str);
I'm assuming that you're passing a Microsoft "CString" object to a printf()-family function where the corresponding format specifier is %s. If I'm right, then your answer is here: How can CString be passed to format string %s? (in short, your code is OK).
It seems that originally an implementation detail allowed CString to be passed directly to printf(), and later it was made part of the contract. So you're good to go as far as your program being correct, but if you want to avoid the static analysis warning, you may indeed need to use the static_cast to a char pointer. I'm not sure it's worth it here...maybe there's some other way to make these tools place nice together, since they're all from Microsoft.
Following the MSDN suggestions in C6284, you may cast the warnings away. Using C++ casts will be the most maintainable option to do this. Your example above would change to
fprintf(pFile, "text %s", static_cast<const TCHAR*>(str));
or, just another spelling of the same, to
fprintf(pFile, "text %s", static_cast<LPCTSTR>(str));
The most convincing option (100% cast-free, see Edits section) is
fprintf(pFile, "text %s", str.GetString());
Of course, following any of these change patterns will be a first porting step, and if nothing indicates a need for it, this may be harmful (not only for your team atmosphere).
Edits: (according to the comment of xMRi)
1) I added const because the argument is read-only for fprintf
2) notes to the cast-free solution CSimpleStringT::GetString: the CSimpleStringT class template is used for the definition of CStringT which again is used to typedef the class CString used in the original question
3) reworked answer to remove noise.
4) reduced the intro about the casting option
Technically speaking it is ok because the c-string is stored in such a way in CString that you can use it as stated but it is not good rely on how CString is implemented to do a shortcut. printf is a C-runtime function and knows nothing about C++ objects but here one is relying on an that the string is stored first in the CString - an implementation detail.
If I recall correctly originally CString could not be used that way and one had to cast the CString to a c-string to print it out but in later versions MS changed the implementation to allow for it to be treated as a c-string.
Another breaking issue is UNICODE, it will definitely not work if you one day decide to compile the program with UNICODE character set since even if you changed all string formatters to %ld, embedded 0s will sometimes prevent the string from being printed.
The actual problem is rather why are you using printf instead of C++ to print/write files?

Forcing sscanf to return more than number of arguments satisfied

My software validation group is testing a piece of code like the following:
unsigned int alarm_id;
char alarm_text[16];
static const char text_string[] = "105, Water_Boiling";
signed int arguments_satisfied =
sscanf(text_string,
"%3d, %16s",
&alarm_id, &alarm_text[0]);
if (arguments_satisfied < 2)
{
system_failure();
}
Using the code fragment above, is there a way to get sscanf to return a value greater than 2 without changing the format specifier or changing the arguments to sscanf?
They are exercising the if statement expression, using a unit testing tool.
For C++, are there any differences with the above fragment when compiling as C++?
(We plan to use the same code, but compile as C++.)
FYI, we are using an ARM7 processor with IAR Embedded Workbench.
sscanf returns the number of arguments converted. It cannot convert more arguments than you have told it about. Therefore, unless the format string is changed, sscanf cannot return a value greater than 2. One possible exception -- it may be possible for EOF to be returned if you run out of data before the first argument is converted, but I suspect that only applies to scanf, not sscanf.
For many toolchains (and I'm pretty sure IAR is one), if you have a symbol in an object file and a library, the linker will link to the one in an object file in preference to the one in the library.
So you may be able to provide your own sscanf() function to link during tests and have it return whatever you like.
If the linker has a problem with a symbol conflict between your sscanf() implementation and one in the library, and alternative that may work is to have you unit test sscanf() use a different name (such as unittest_sscanf) and have the build system define a macro to rename sscanf() during the build using something like /Dsscanf=unittest_sscanf for the module under test.
Of course, it may be tricky to make sure that other sscanf() calls that aren't under test don't cause problems.

get the value of a c constant

I have a .h file in which hundreds of constants are defined as macros:
#define C_CONST_NAME Value
What I need is a function that can dynamically get the value of one of these constants.
needed function header :
int getConstValue(char * constName);
Is that even possible in the C langage?
---- EDIT
Thanks for the help, That was quick :)
as i was thinking there is no miracle solution for my needs.
In fact the header file i use is generated by "SCADE : http://www.esterel-technologies.com/products/scade-suite/"
On of the solution i got from #Chris is to use some python to generate c code that does the work.
Now its to me to make some optimizations in order to find the constant name. I have more than 5000 constants O(500^2)
i'm also looking at the "X-Macros" The first time i hear of that, home it works in C because i'm not allowed to use c++.
Thanks
C can't do this for you. You will need to store them in a different structure, or use a preprocessor to build the hundreds of if statements you would need. Something like Cogflect could help.
Here you go. You will need to add a line for each new constant, but it should give you an idea about how macros work:
#include <stdio.h>
#define C_TEN 10
#define C_TWENTY 20
#define C_THIRTY 30
#define IFCONST(charstar, define) if(strcmp((charstar), #define) == 0) { \
return (define); \
}
int getConstValue(const char* constName)
{
IFCONST(constName, C_TEN);
IFCONST(constName, C_TWENTY);
IFCONST(constName, C_THIRTY);
// No match
return -1;
}
int main(int argc, char **argv)
{
printf("C_TEN is %d\n", getConstValue("C_TEN"));
return 0;
}
I suggest you run gcc -E filename.c to see what gcc does with this code.
A C preprocessor macro (that is, something named by a #define statement) ceases to exist after preprocessing completes. A program has no knowledge of the names of those macros, nor any way to refer back to them.
If you tell us what task you're trying to perform, we may be able to suggest an alternate approach.
This is what X-Macros are used for:
https://secure.wikimedia.org/wikipedia/en/wiki/C_preprocessor#X-Macros
But if you need to map a string to a constant, you will have to search for the string in the array of string representations, which is O(n^2).
You can probably do this with gperf, which generates a lookup function that uses a perfect hash function.
Create a file similar to the following and run gperf with the -t option:
struct constant { char *name; int value; };
%%
C_CONST_NAME1, 1
C_CONST_NAME2, 2
gperf will output C (or C++) code that does the lookup in constant time, returning a pointer to the key/value pair, or NULL.
If you find that your keyword set is too large for gperf, consider using cmph instead.
There's no such capability built into C. However, you can use a tool such as doxygen to extract all #defines from your source code into a data structure that can be read at runtime (doxygen can store all macro definitions to XML).

How To Extract Function Name From Main() Function Of C Source

I just want to ask your ideas regarding this matter. For a certain important reason, I must extract/acquire all function names of functions that were called inside a "main()" function of a C source file (ex: main.c).
Example source code:
int main()
{
int a = functionA(); // functionA must be extracted
int b = functionB(); // functionB must be extracted
}
As you know, the only thing that I can use as a marker/sign to identify these function calls are it's parenthesis "()". I've already considered several factors in implementing this function name extraction. These are:
1. functions may have parameters. Ex: functionA(100)
2. Loop operators. Ex: while()
3. Other operators. Ex: if(), else if()
4. Other operator between function calls with no spaces. Ex: functionA()+functionB()
As of this moment I know what you're saying, this is a pain in the $$$... So please share your thoughts and ideas... and bear with me on this one...
Note: this is in C++ language...
You can write a Small C++ parser by combining FLEX (or LEX) and BISON (or YACC).
Take C++'s grammar
Generate a C++ program parser with the mentioned tools
Make that program count the funcion calls you are mentioning
Maybe a little bit too complicated for what you need to do, but it should certainly work. And LEX/YACC are amazing tools!
One option is to write your own C tokenizer (simple: just be careful enough to skip over strings, character constants and comments), and to write a simple parser, which counts the number of {s open, and finds instances of identifier + ( within. However, this won't be 100% correct. The disadvantage of this option is that it's cumbersome to implement preprocessor directives (e.g. #include and #define): there can be a function called from a macro (e.g. getchar) defined in an #include file.
An option that works for 100% is compiling your .c file to an assembly file, e.g. gcc -S file.c, and finding the call instructions in the file.S. A similar option is compiling your .c file to an object file, e.g, gcc -c file.c, generating a disassembly dump with objdump -d file.o, and searching for call instructions.
Another option is finding a parser using Clang / LLVM.
gnu cflow might be helpful