I'm dealing with a bunch of legacy C code whose formatting is all over the place and I would like to clean it up as I work. However, there are some sections that I do not want touched (or I'd rather format manually (such as structures being used as bitfields with an explanation as to what each bit represents in comments). I'm working in an embedded environment, so understanding what each bit represents is important and keeping that nicely formatted is important, however, the C-lang formatter in VS Code wants to do its own thing with this kind of stuff.
How can I exclude sections of a file from autoformat?
Are you using the clang-format extension clang-format ? If so, see disable formatting on a piece of code.
int formatted_code;
// clang-format off
void unformatted_code ;
// clang-format on
void formatted_code_again;
Related
I have a project originally written for Windows, and I am currently in the process of porting it over to Linux. Most of the platform specific code has been #ifdef'ed or wrapped, so it's been easy so far.
This project has about 2000 instances of gettext() scattered throughout about 200 source files (.cpp and .c compiled as C++). The intended function call is:
std::string boost::locale::gettext(const char*);
This works in Windows, but in Linux builds, it resolves to:
char * gettext (const char * msgid);
Which I assume it's resolving from <libintl.h>, which is interesting, since I'm not including it.
What I need to do is to do the following:
Find in all my source files (ignoring the .svn directories):
1.1. Lines containing gettext(.*).c_str() and modify them to become boost::locale::gettext(.*).c_str().
1.2. Lines containing gettext(.*) and modify them to become boost::locale::gettext(.*).c_str().
What's the best way to accomplish this, preferably using BASh and sed, or some command-line-fu in general? The requirements for 1.1 I could probably do easily enough, but 1.2 is a bit more complex, and I'm not sure how to have it know which right parentheses ) to append .c_str() to correctly.
Thank you.
This problem is not solvable with a regex in the general case, since you cannot find the matching closing parenthesis of the gettext()-call with it if other calls are nested in its argument list.
But if usually no nested calls are made, it might be an option to just fix these cases automatically and do the rest by hand.
This sed expression
sed -r "s/gettext\(([^()]*)\)(\.c_str\(\))?/boost::locale::gettext(\1).c_str()/g"
should leave invocations with nested calls untouched and replace the rest.
I am reading through a .cpp trying to figure out some things and came across code like this:
some_function()
{
CustomClass some_sort_of_list;
string sample;
if (sample != "") {
some_sort_of_list = #BOING(args);
}
}
Has anyone seen the # operator before, or is it just #define used somewhere in one of the header files? I do not have access to the headers.
Since #Captain Obvlious mentioned early versions of Visual C++, I will look there to see what is going on...
PS: I should mention also, in case it's not obvious enough, that the names have been changed since I don't know if I have the license to share this source. The main issue is the #SOMETHING.
PPS: the comments are in Japanese, and I have limited access to the original authors.
That's not standard C++, it's not even legal as a #define since they're not permitted to start with #.
It's probably something that gets run through a pre-processor of some sort, like Oracle's Pro*C compiler which can turn EXEC SQL into C function calls, before passing to an actual C compiler.
Your best bet would be to think about the environment that this code runs in, such as "is it an internationalised application where #GEN may retrieve a locale-specific string for output?".
And, since, you mention that the comments are in Japanese, you should at least give Google Translate a try. It can sometimes result in hilarity for complex phrases but it may well give you a needed clue.
I have following requirement:
Adding text at the entry and exit point of any function.
Not altering the source code, beside inserting from above (so no pre-processor or anything)
For example:
void fn(param-list)
{
ENTRY_TEXT (param-list)
//some code
EXIT_TEXT
}
But not only in such a simple case, it'd also run with pre-processor directives!
Example:
void fn(param-list)
#ifdef __WIN__
{
ENTRY_TEXT (param-list)
//some windows code
EXIT_TEXT
}
#else
{
ENTRY_TEXT (param-list)
//some any-os code
if (condition)
{
return; //should become EXIT_TEXT
}
EXIT_TEXT
}
So my question is: Is there a proper way doing this?
I already tried some work with parsers used by compilers but since they all rely on running a pre-processor before parsing, they are useless to me.
Also some of the token generating parser, which do not need a pre-processor are somewhat useless because they generate a memory-mapping of tokens, which then leads to a complete new source code, instead of just inserting the text.
One thing I am working on is to try it with FLEX (or JFlex), if this is a valid option, I would appreciate some input on it. ;-)
EDIT:
To clarify a little bit: The purpose is to allow something like a stack trace.
I want to trace every function call, and in order to follow the call-hierachy, I need to place a macro at the entry-point of a function and at the exit point of a function.
This builds a function-call trace. :-)
EDIT2: Compiler-specific options are not quite suitable since we have many different compilers to use, and many that are propably not well supported by any tools out there.
Unfortunately, your idea is not only impractical (C++ is complex to parse), it's also doomed to fail.
The main issue you have is that exceptions will bypass your EXIT_TEXT macro entirely.
You have several solutions.
As has been noted, the first solution would be to use a platform dependent way of computing the stack trace. It can be somewhat imprecise, especially because of inlining: ie, small functions being inlined in their callers, they do not appear in the stack trace as no function call was generated at assembly level. On the other hand, it's widely available, does not require any surgery of the code and does not affect performance.
A second solution would be to only introduce something on entry and use RAII to do the exit work. Much better than your scheme as it automatically deals with multiple returns and exceptions, it suffers from the same issue: how to perform the insertion automatically. For this you will probably want to operate at the AST level, and modify the AST to introduce your little gem. You could do it with Clang (look up the c++11 migration tool for examples of rewrites at large) or with gcc (using plugins).
Finally, you also have manual annotations. While it may seem underpowered (and a lot of work), I would highlight that you do not leave logging to a tool... I see 3 advantages to doing it manually: you can avoid introducing this overhead in performance sensitive parts, you can retain only a "summary" of big arguments and you can customize the summary based on what's interesting for the current function.
I would suggest using LLVM libraries & Clang to get started.
You could also leverage the C++ language to simplify your process. If you just insert a small object into the code that is constructed on function scope entrance & rely on the fact that it will be destroyed on exit. That should massively simplify recording the 'exit' of the function.
This does not really answer you question, however, for your initial need, you may use the backtrace() function from execinfo.h (if you are using GCC).
How to generate a stacktrace when my gcc C++ app crashes
I have code that currently passes around a lot of (sometimes nested) C (or C++ Plain Old Data) structs and arrays.
I would like to convert these to/from google protobufs. I could manually write code that converts between these two formats, but it would be less error prone to auto-generate such code. What is the best way to do this? (This would be easy in a language with enough introspection to iterate over the names of member variables, but this is C++ code we're talking about)
One thing I'm considering is writing python code that parses the C structs and then spits out a .proto file, along with C code that copies from member to member (in either direction) for all of the types, but maybe there is a better way... or maybe there is another IDL that already can generate:
.h file containing all of nested types
.proto file containing equivalents
.c file with functions that copy either direction between the C++ structs that the .proto file generates and the structs defined in the .h file
I could not find a ready solution for this problem, if there is one, please let me know!
If you decide to roll your own in python, the python bindings for gdb might be useful. You could then read the symbol table, find all structs defined in specified file, and iterate all struct members.
Then use <gdbtype>.strip_typedefs() to get the primitive type of each member and translate it to appropriate protobuf type.
This is probably safer then a text parsers as it will handle types that depends on architecture, compiler flags, preprocessor macros, etc.
I guess the code to convert to and from protobuf also could be generated from the struct member to message field relation, but does not sound easy.
Protocol buffers can be built by parsing an ASCII representation using TextFormat. So one option would be to add a method dumpAsciiProtoBuf to each of your structs. The method would dump any simple fields (like strings, bools, etc) and call dumpAsciiProtoBuf recursively on nested structs fields. You would then have to make sure that the concatenated result is a valid ASCII protocol buffer which can be parsed using TextFormat.
Note though that this might have some performance implications (since parsing the ASCII representation could be expensive). However, this would save you the trouble of writing a converter in a different language, so it seems to be a convenient solution.
I would not parse the C source code myself, instead I would use the LibClang to parse C files into an AST and my own AST walker to generate the Protobuf and the transcoders as necessary. Googling for "libclang walk AST" should give something to start with, like ast-walker.cc and ast-dumper.cc from this github repository, for example.
The question brought up is the age old challenge with "C" (and C++) code - No easy (or standard) way to reflect on c "struct" (or classes). Just search stack overflow on C reflection, and you will see lot of unsuccessful attempts. My first advice will be NOT to try to build another solution (in python, etc.).
One simple approach: Consider using gdb ptype to get structured output for you structures, which you can use to create the .proto file. The advantage is that there is no need to handle the full syntax of the C language (#define, line breaks, ...). See How do I show what fields a struct has in GDB?
From the gdb ptype, it's a short trip to protobuf '.proto' file.
You can get similar result from libCLang (and I believe there is comparable gcc plugin, but I can not locate it). However, you will have to write some non-trivial "C" code.
Another approach - will be to use 'swig' (https://www.swig.org), and process the swig xml output (or the -xmlout option) to dump the parse tree into XML. While this approach will require a little bit of digging to locate the structure that are needed, the information in XML format is complete, easy to parse (using whatever XML parser you want - python, perl). If you are brave enough, you can use xslt to generate the output.
It should turn this
int Yada (int yada)
{
return yada;
}
into this
int Yada (int yada)
{
SOME_HEIDEGGER_QUOTE;
return yada;
}
but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.
Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.
My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.
Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.
In fact, the quickstart has pretty much exactly what you are after defined as an example:
http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf
If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!
Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.
I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.
You can turn instrumenting off on a per-function or per-file basis... see the docs for details.
The feature goes back at least as far as version 3.0.4 (from February 2002).
This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).
This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.
There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.
But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.
People are always asking here for ideas for projects - how about this for one?
I use this regex,
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
to locate the functions and add extra lines of code.
With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".
This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.
DMS parses source code (using full langauge front end, in this case for C++),
builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():function->function
"\visibility \returntype \fnname(int \parametername)
{ \body } "
->
"\visibility \returntype \fnname(int \parametername)
{ Heidigger(\CppString\(\methodname\),
\CppString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.