Extract references/Compile a single function in C/C++ - c++

I recently got into the situation where i had access to a huge code base, which i couldn't build and i needed to test couple of its functions.
Nevertheless, those functions had references of functions/variables found in other files so it is a big mess trying to extract them manually.
Is there a way to do this automatically?
For example, i want to test function foo in test.c but foo depends on bar function found in file test2.c. The bar function could then be dependent on booz which is found in test3.c.
So in the case above, one could gather foo, bar, and booz in one file and compile.

To generate cross references you can use Doxygen.
You can have a look for PostgreSQL C source code here and search for example definition of main function.

Related

Using clang tools for adding auto-generated code to source files

I'm working on a logging mechanism for a project, and currently each API function needs to add
start_api_call(***)
in the beginning of the function
where *** for the function do_something(int foo, int bar) is "foo", foo, "bar",bar
(The log function takes the parameters and form the desired message)
I would like to make this line auto-generated, where the vision is that somehow (mabye clang tools?) the compiler checks for each function in a .cpp if it is an API function (more details later) and if it is, just add the start_api_call(***) to the code.
I have 2 major problems regarding which direction should i go
1) I have never wrote code that it's 'goal' is to parse a source code, hence i don't know which direction should i head, I've read some documentation about the clang tools, but maybe just a python script would be better here?
2) Our design is as follows:
object_foo.h //inside .include/
class foo{
API functions
}
object_foo_impl.h //inside .src/
include "object_foo.h"
class foo_impl :foo{
foo API functions
a lot more functions
}
object_foo_impl.cpp
include "object_foo_impl.h"
{
implementation of all foo_impl functions
}
The start_api_call should be inserted only for the API functions,
thus i need to find a way to query inside the .cpp file, if this function came from the foo.h, or from the foo_impl.h file.
I have a somehow working concept of how to do it using python scripts, that parse all of our source code, identifying using regex which text inside a /include/.h file is a function and then finding all the functions inside .cpp files that implement those functions,
but (if possible) the concept of adding a rule in the compilation(/preprocessor) time is much more attractive.
Any help would be very much appreciated.

Call function from arbitrary souce .cpp file using the file name as a string

I am going through coding problems online, and I thought of a system that could make testing really efficient. My plan is to
give each problem a distinct .cpp file with one function that solves the problem and outputs the results to a .txt file
create a .cpp file with a main function that takes 2 command line arguments: a string the file name for the problem I am currently solving and a string for the file name of the test case to be used for the problem.
The main function first creates a FILE* to store the test case and another FILE* to create an empty output file. Then, it calls the function from the problem .cpp file specified in the command line.
The function prototype for each problem file looks like this:
static void Problem(FILE* test_case, FILE* output);
The idea is to pass in the test case to a specific problem and have the "Problem" function write the results into an output text file.
I've succeeded in running test cases for a specific problem and getting unique output files for each run.
However, my current issue is enabling the main function to use the .cpp file name as an argument. It seems there is no direct way in C++ to call a function from a file using only the file name. I know that the preprocessor parses .cpp files and turns function names into function pointers. Is there a way I can do this at runtime? In other words, can I stream the .cpp file, search for the line that matches my function prototype, and return a pointer to that function? Is there some sort of macro I could use to cue the preprocessor to associate a file's name with its function?
From my search for answers thus far, the simplest solution seems to be to create a map that matches function name to function. This seems tedious and inelegant to me. I figure I might as well just change the function call manually in my main file every time I switch problems, instead of setting the name of the file in the command line. I prefer not to do either.
In Standard C++ there is no direct means of doing this. However, most implementations provide facilities for loading libraries, and calling named functions from those libraries, at run-time. For example, under Windows, you can call LoadLibrary to load a named DLL, and then call GetProcAddress to get a pointer to a named callable (C) function. Other operating systems provide similar features. This is not as useful a feature as it may seem, and tends to result in fragile code.
Your files could have a class that is static global. In its constructor pass it the function address. This constructor updates a global vector of functions that can get called by the main framework. If you need input and output file names, then add them to the constructor's argument list and vector.

Vim c++ generate source file based on header file

I do a lot of c++ programming in vim and I was wondering if there are any plugins or snippets out there that can generate a source file depending on the contents of the header file.
I.E: test.h
class test {
public:
test();
};
and then going into the test.cpp file and typing "src" and expanding it (using some sort of snippet plugin like UltiSnips) it would look in the test.h file for the funcions and (in this case) make:
test::test() {
//code
}
I got this idea from Derek Wyatt's blog and he does this using XPTemplate so I thought it would be great to do the same in UltiSnips.
Use the xptemplate plugin.
Examples:
http://www.derekwyatt.org/wp-content/uploads/2009/08/my.cpp.xpt.vim
http://www.derekwyatt.org/vim/working-with-vim-and-cpp/cpp-snippets
lh-cpp offers a :GOTOIMPL function that analyses the prototype of a given function, and either jumps to the associated definition or generates it on-the-fly. [NB: it knows what to do with virtual, static, namespace/embedded classes, return type, modifiers, and so on (except templates yet)]
Regarding how to parse a header file and generate all associated functions, the exact same question has been asked on vim mailing list 2-3 weeks ago where another solution has been given (protodef, that you have read about).

Is it possible to write COM code in a static library and then link it to a DLL?

I am currently working on a project that has a number of COM objects written in C++ with ATL.
Currently, they are all defined in .cpp and .idl files that are directly compiled into the COM DLL.
To allow unit tests to be written easier, I am planning on moving the implementation of the COM objects out into a separate static library. That library can then be linked in to the main DLL, and the separate unit test project.
I am assuming that there's nothing particularly special about the code generated by ATL, and that this will work much like all other C++ code when it comes to linking with static libraries. However, I don't have too much actual knowledge of ATL myself so don't know if this is really the case.
Will this work as I'm expecting? Or are there pitfalls that I should look out for?
There are gotchas since LIBs are pulled in only if they are referenced, as opposed to OBJs which are explicitly included.
Larry Osterman discussed some of the subtleties a few years ago:
When I moved my code into a library, what happened to my ATL COM
objects?
A caveat: This post discusses details of how ATL7 works. For other
version of ATL, YMMV. The general principals apply for all
versions, but the details are likely to be different.
My group’s recently been working on reducing the number of DLLs
that make up the feature we’re working on (going from somewhere
around 8 to 4). As a part of this, I’ve spent the past couple of
weeks consolidating a bunch of ATL COM DLL’s.
To do this, I first changed the DLLs to build libraries, and then
linked the libraries together with a dummy DllInit routine (which
basically just called CComDllModule::DllInit()) to make the DLL.
So far so good. Everything linked, and I got ready to test the new
DLL.
For some reason, when I attempted to register the DLL, the
registration didn’t actually register the COM objects. At that
point, I started kicking my self for forgetting one of the
fundamental differences between linking objects together to make an
executable and linking libraries together to make an executable.
To explain, I’ve got to go into a bit of how the linker works. When
you link an executable (of any kind), the linker loads all the
sections in the object files that make up the executable. For each
extdef symbol in the object files, it starts looking for a public
symbol that matches the symbol.
Once all of the symbols are matched, the linker then makes a second
pass combining all the .code sections that have identical contents
(this has the effect of collapsing template methods that expand into
the same code (this happens a lot with CComPtr)).
Then a third pass is run. The third pass discards all of the
sections that have not yet been referenced. Since the sections
aren’t referenced, they’re not going to be used in the resulting
executable, so to include them would just bloat the executable.
Ok, so why didn’t my ATL based COM objects get registered? Well,
it’s time to play detective.
Well, it turns out that you’ve got to dig a bit into the ATL code to
figure it out.
The ATL COM registration logic gets picked in the CComModule
object. Within that object, there’s a method
RegisterClassObjects, which redirects to
AtlComModuleRegisterClassObjects. This function walks a list of
_ATL_OBJMAP_ENTRY structures and calls the RegisterClassObject
on each structure. The list is retrieved from the
m_ppAutoObjMapFirst member of the CComModule (ok, it’s really a
member of the _ATL_COM_MODULE70, which is a base class for the
CComModule). So where did that field come from?
It’s initialized in the constructor of the CAtlComModule, which
gets it from the __pobjMapEntryFirst global variable. So where’s
__pobjMapEntryFirst field come from?
Well, there are actually two fields of relevance,
__pobjMapEntryFirst and __pobjMapEntryLast.
Here’s the definition for the __pobjMapEntryFirst:
__declspec(selectany) __declspec(allocate("ATL$__a")) _ATL_OBJMAP_ENTRY* __pobjMapEntryFirst = NULL;
And here’s the definition for __pobjMapEntryLast:
__declspec(selectany) __declspec(allocate("ATL$__z")) _ATL_OBJMAP_ENTRY* __pobjMapEntryLast = NULL;
Let’s break this one down:
__declspec(selectany): __declspec(selectany) is a directive to
the linker to pick any of the similarly named items from the section
– in other words, if a __declspec(selectany) item is found
in multiple object files, just pick one, don’t complain about it
being multiply defined.
__declspec(allocate("ATL$__a")): This one’s the one that makes
the magic work. This is a declaration to the compiler, it tells the
compiler to put the variable in a section named "ATL$__a" (or
"ATL$__z").
Ok, that’s nice, but how does it work?
Well, to get my ATL based COM object declared, I included the
following line in my header file:
OBJECT_ENTRY_AUTO(<my classid>, <my class>)
OBJECT_ENTRY_AUTO expands into:
#define OBJECT_ENTRY_AUTO(clsid, class) \
__declspec(selectany) ATL::_ATL_OBJMAP_ENTRY __objMap_##class = {&clsid, class::UpdateRegistry, class::_ClassFactoryCreatorClass::CreateInstance, class::_CreatorClass::CreateInstance, NULL, 0, class::GetObjectDescription, class::GetCategoryMap, class::ObjectMain }; \
extern "C" __declspec(allocate("ATL$__m")) __declspec(selectany) ATL::_ATL_OBJMAP_ENTRY* const __pobjMap_##class = &__objMap_##class; \
OBJECT_ENTRY_PRAGMA(class)
Notice the declaration of __pobjMap_##class above – there’s
that declspec(allocate("ATL$__m")) thingy again. And that’s where
the magic lies. When the linker’s laying out the code, it sorts
these sections alphabetically – so variables in the ATL$__a
section will occur before the variables in the ATL$__z section.
So what’s happening under the covers is that ATL’s asking the linker
to place all the __pobjMap_<class name> variables in the
executable between __pobjMapEntryFirst and __pobjMapEntryLast.
And that’s the crux of the problem. Remember my comment above about
how the linker works resolving symbols? It first loads all the items
(code and data) from the OBJ files passed in, and resolves all the
external definitions for them. But none of the files in the wrapper
directory (which are the ones that are explicitly linked) reference
any of the code in the DLL (remember, the wrapper doesn’t do much more
than simply calling into ATL’s wrapper functions – it doesn’t
reference any of the code in the other files.
So how did I fix the problem? Simple. I knew that as soon as the
linker pulled in the module that contained my COM class definition,
it'd start resolving all the items in that module. Including the
__objMap_<class>, which would then be added in the right location so that ATL would be able to pick it up. I put a dummy function call
called ForceLoad<MyClass> inside the module in the library, and
then added a function called CallForceLoad<MyClass> to my DLL
entry point file (note: I just added the function – I didn’t
call it from any code).
And voila, the code was loaded, and the class factories for my COM
objects were now auto-registered.
What was even cooler about this was that since no live code called
the two dummy functions that were used to pull in the library, pass
three of the linker discarded the code!

Should I put many functions into one file? Or, more or less, one function per file?

I love to organize my code, so ideally I want one class per file or, when I have non-member functions, one function per file.
The reasons are:
When I read the code I will always
know in what file I should find a
certain function or class.
If it's one class or one non-member
function per header file, then I won't
include a whole mess when I
include a header file.
If I make a small change in a function then only that function will have to be recompiled.
However, splitting everything up into many header and many implementation files can considerately slow down compilation. In my project, most functions access a certain number of templated other library functions. So that code will be compiled over and over, once for each implementation file. Compiling my whole project currently takes 45 minutes or so on one machine. There are about 50 object files, and each one uses the same expensive-to-compile headers.
Maybe, is it acceptable to have one class (or non-member function) per header file, but putting the implementations of many or all of these functions into one implementation file, like in the following example?
// foo.h
void foo(int n);
// bar.h
void bar(double d);
// foobar.cpp
#include <vector>
void foo(int n) { std::vector<int> v; ... }
void bar(double d) { std::vector<int> w; ... }
Again, the advantage would be that I can include just the foo function or just the bar function, and compilation of the whole project will be faster because foobar.cpp is one file, so the std::vector<int> (which is just an example here for some other expensive-to-compile templated construction) has to be compiled in only once, as opposed to twice if I compiled a foo.cpp and bar.cpp separately. Of course, my reason (3) above is not valid for this scenario: After just changing foo(){...} I have to recompile the whole, potentially big, file foobar.cpp.
I'm curious what your opinions are!
IMHO, you should combine items into logical groupings and create your files based on that.
When I'm writing functions, there are often a half a dozen or so that are tightly related to each other. I tend to put them together in a single header and implementation file.
When I write classes, I usually limit myself to one heavyweight class per header and implementation file. I might add in some convenience functions or tiny helper classes.
If I find that an implementation file is thousands of lines long, that's usually a sign that there's too much there and I need to break it up.
One function per file could get messy in my opinion. Imagine if POSIX and ANSI C headers were made the same way.
#include <strlen.h>
#include <strcpy.h>
#include <strncpy.h>
#include <strchr.h>
#include <strstr.h>
#include <malloc.h>
#include <calloc.h>
#include <free.h>
#include <printf.h>
#include <fprintf.h>
#include <vpritnf.h>
#include <snprintf.h>
One class per file is a good idea though.
We use the principle of one external function per file. However, within this file there may be several other "helper" functions in unnamed namespaces that are used to implement that function.
In our experience, contrary to some other comments, this has had two main benefits. The first is build times are faster as modules only need to be rebuilt when their specific APIs are modified. The second advantage is that by using a common naming scheme, it is never necessary to spend time searching for the header that contains the function you wish to call:
// getShapeColor.h
Color getShapeColor(Shape);
// getTextColor.h
Color getTextColor(Text);
I disagree that the standard library is a good example for not using one (external) function per file. Standard libraries never change and have well defined interfaces and so neither of the points above apply to them.
That being said, even in the case of the standard library there are some potential benefits in splitting out the individual functions. The first is that compilers could generate a helpful warning when unsafe versions of functions are used, e.g. strcpy vs. strncpy, in a similar way to how g++ used to warn for inclusion of <iostream.h> vs. <iostream>.
Another advantage is that I would no longer be caught out by including memory when I want to use memmove!
One function per file has a technical advantage if you're making a static library (which I guess it's one of the reasons why projects like the Musl-libc project follow this pattern).
Static libraries are linked with object-file granularity and so if you have a static library libfoobar.a composed of*:
foo.o
foo1
foo2
bar.o
bar
then if you link the lib for the bar function, the bar.o archive member will get linked but not the foo.o member. If you link for foo1, then the foo.o member will get linked, bringing in the possibly unnecessary foo2 function.
There are possibly other ways of preventing unneeded functions from being linked in (-ffunction-sections -fdata-sections and --gc-sections) but one function per file is probably most reliable.
There's also the middle ground of putting small number of related functions/data-objects in a file. That way the compiler can better optimize intersymbol references compared to -ffunction-sections/-fdata-sections and you still get at least some granularity for static libs.
I'm ignoring C++ name mangling here for the sake of simplicity
I can see some advantages to your approach, but there are several disadvantages:
Including a package is nightmare. You can end up with 10-20 includes to get the functions you need. For example, imagine if STDIO or StdLib was implemented this way.
Browsing the code will be a bit of pain, since in general it is easier to scroll through a file than to switch files. Obviously too big of file is hard, but even there with modern IDEs it is pretty easy to collapse the file down to what you need and a lot of them have function short cut lists.
Make file maintenance is a pain.
I am a huge fan of small functions and refactoring. When you add overhead (making a new file, adding it to source control,...) it encourages people to write longer functions where instead of breaking one function into three parts, you just make one big one.
You can redeclare some of your functions as being static methods of one or more classes: this gives you an opportunity (and a good excuse) to group several of them into a single source file.
One good reason for having or not having several functions in one source files, if that source file are one-to-one with object files, and the linker links entire object files: if an executable might want one function but not another, then put them in separate source files (so that the linker can link one without the other).
An old programming professor of mine suggested breaking up modules every several hundred lines of code for maintainability. I don't develop in C++ anymore, but in C# I restrict myself to one class per file, and size of the file doesn't matter as long as there's nothing unrelated to my object. You can make use of #pragma regions to gracefully reduce editor space, not sure if the C++ compiler has them, but if it does then definitely make use of them.
If I were still programming in C++ I would group functions by usage using multiple functions per file. So I may have a file called 'Service.cpp' with a few functions that define that "service". Having one function per file will in turn cause regret to find its way back into your project somehow, someway.
Having several thousand lines of code per file isn't necessary some of the time though. Functions themselves should never be much more than a few hundred lines of code at most. Always remember that a function should only do one thing and be kept minimal. If a function does more than one thing, it should be refactored into helper methods.
It never hurts to have multiple source files that define a single entity either. Ie: 'ServiceConnection.cpp' 'ServiceSettings.cpp', and so on so forth.
Sometimes if I make a single object, and it owns other objects I will combine multiple classes into a single file. For example a button control that contains 'ButtonLink' objects, I might combine that into the Button class. Sometimes I don't, but that's a "preference of the moment" decision.
Do what works best for you. Experiment a little with different styles on smaller projects can help. Hope this helps you out a bit.
I also tried to split files in a function per file, but it had some drawbacks. Sometimes functions tend to get larger than they need to (you don't want to add a new .c file every time) unless you are diligent about refactoring your code (I am not).
Currently I put one to three functions in a .c file and group all the .c files for a functionality in a directory. For header files I have Funcionality.h and Subfunctionality.h so that I can include all the functions at once when needed or just a small utility function if the whole package is not needed.
For the header part, you should combine items into logical groupings and create your header files based on that. This seems and is very logical IMHO.
For the source part, you should put each function implementation in a separate source file (static functions are exceptions in this case). This may not seem logical at first, but remember, a compiler knows about the functions, but a linker knows only about the .o and .obj files and its exported symbols. This may change the size of the output file considerably, and this is a very important issue for embedded systems.
Checkout glibc or Visual C++ CRT source tree...