"Templates" and Elaboration in Verilog?

"Templates" and Elaboration in Verilog? - templates

I'm reading through and trying to understand some verilog, and sprinkled through is the compiler directive:
// synopsys template
But I do not know what this is or what it does. My Google Fu in researching variations of 'verilog templates' has lead to more example verilog code than answers.
I did find this synopsis user guide: http://acms.ucsd.edu/info/documents/dc/syn3.pdf, which on p282 provides some information, the directive seems to affect this variable:
hdlin_auto_save_templates
Controls whether HDL designs containing parameters are read in as templates.
...
It goes on to imply this directive affects "elaboration" (perhaps delaying it? to what end?), which my current understanding is loosely analogous to the code emission step of traditional compilation, when the verilog is converted to an "actual" hardware representation?
I would appreciate an explanation of what templates are / do in Verilog and perhaps a correction on my understanding of 'elaboration' in this context - Thanks!

It goes on to imply this directive affects "elaboration" (perhaps delaying it? to what
end?), which my current understanding is loosely analogous to the code
emission step of traditional compilation, when the verilog is
converted to an "actual" hardware representation?
Not really. Elaboration is part of the language specification and is a required step to process a design. Processing Verilog usually requires two distinct steps which the specification describes as parsing and elaboration. SystemVerilog more precisely defines these and calls them compilation and elaboration.
1364-2005:
Elaboration is the process that occurs between parsing and simulation.
It binds modules to module instances, builds the model hierarchy,
computes parameter values, resolves hierarchical names, establishes
net connectivit, and prepares all of this for simulation. With the
addition of generate constructs, the order in which these tasks occur
becomes significant.
Verilog contains some constructs that makes it impossible to completely build a module then 'link' it to a larger design later. Consider the following code:
module mod1 #(parameter WIDTH = 0) (output [WIDTH:0] port1);
generate
if(WIDTH > 3)
assign port1 = {WIDTH{1'b1}};
else
assign port1 = {WIDTH{1'b0}};
endgenerate
endmodule
When the above module is read, the parser has no idea what WIDTH will be because the value given can be overridden in the instantiation. This prevents it from resolving the code inside the generate block until the entire Verilog source text is read. It gets more complicated with defparams, forward declarations of functions and hierarchical references.
The command // synopsys template and the term 'templates' are not part of verilog. Given toolic's answer and the doc you linked, it appears to tell the tool that any module read after the command will need a parameter definition so it should not be elaborated when read. For instance a netlist will not have any parameter overrides in the instantiations so if you want to place an RTL instance in a netlist, you would need to tell the synthesis tool directly what the parameters should be.

The // synopsys template comment is used by the Synopsys Design Compiler tool when synthesizing Verilog RTL code into a gate netlist. The pragma only affects Verilog code which contains parameters. It is especially useful when synthesizing several instances of the same Verilog module, each having a different passed parameter value.
For example, if you have a FIFO that uses a parameter to specify its word depth:
parameter DEPTH = 4;
you can synthesize two FIFOs, overriding the default depth, if desired.
Using the template pragma also allows control over the synthesized instance names, using other Design Compiler script variables.
The template pragma only has meaning to the synthesis tool; it does not affect simulation.

Related

ELF INIT section code to prepopulate objects used at runtime

I'm fairly new to c++ and am really interested in learning more. Have been reading quite a bit. Recently discovered the init/fini elf sections.
I started to wonder if & how one would use the init section to prepopulate objects that would be used at runtime. Say for example you wanted
to add performance measurements to your code, recording the time, filename, linenumber, and maybe some ID (monotonic increasing int for ex) or name.
You would place for example:
PROBE(0,"EventProcessing",__FILE__,__LINE__)
...... //process event
PROBE(1,"EventProcessing",__FILE__,__LINE__)
......//different processing on same event
PROBE(2,"EventProcessing",__FILE__,__LINE__)
The PROBE could be some macro that populates a struct containing this data (maybe on an array/list, etc using the id as an indexer).
Would it be possible to have code in the init section that could prepopulate all of this data for each PROBE (except for the time of course), so only the time would need to be retrieved/copied at runtime?
As far as I know the __attribute__((constructor)) can not be applied to member functions?
My initial idea was to create some kind of
linked list with each node pointing to each probe and code in the init secction could iterate it populating the id, file, line, etc, but
that idea assumed I could use a member function that could run in the "init" section, but that does not seem possible. Any tips appreciated!

As far as I understand it, you do not actually need an ELF constructor here. Instead, you could emit descriptors for your probes using extended asm statements (using data, instead of code). This also involves switching to a dedicated ELF section for the probe descriptors, say __probes.
The linker will concatenate all the probes and in an array, and generate special symbols __start___probes and __stop___probes, which you can use from your program to access thes probes. See the last paragraph in Input Section Example.
Systemtap implements something quite similar for its userspace probes:
User Space Probe Implementation
Adding User Space Probing to an Application (heapsort example)
Similar constructs are also used within the Linux kernel for its self-patching mechanism.

There's a pretty simple way to have code run on module load time: Use the constructor of a global variable:
struct RunMeSomeCode
{
RunMeSomeCode()
{
// your code goes here
}
} do_it;
The .init/.fini sections basically exist to implement global constructors/destructors as part of the ABI on some platforms. Other platforms may use different mechanisms such as _start and _init functions or .init_array/.deinit_array and .preinit_array. There are lots of subtle differences between all these methods and which one to use for what is a question that can really only be answered by the documentation of your target platform. Not all platforms use ELF to begin with…
The main point to understand is that things like the .init/.fini sections in an ELF binary happen way below the level of C++ as a language. A C++ compiler may use these things to implement certain behavior on a certain target platform. On a different platform, a C++ compiler will probably have to use different mechanisms to implement that same behavior. Many compilers will give you tools in the form of language extensions like __attributes__ or #pragmas to control such platform-specific details. But those generally only make sense and will only work with that particular compiler on that particular platform.

You don't need a member function (which gets a this pointer passed as an arg); instead you can simply create constructor-like functions that reference a global array, like
#define PROBE(id, stuff, more_stuff) \
__attribute__((constructor)) void \
probeinit##id(){ probes[id] = {id, stuff, 0/*to be written later*/, more_stuff}; }
The trick is having this macro work in the middle of another function. GNU C / C++ allows nested functions, but IDK if you can make them constructors.
You don't want to declare a static int dummy#id = something because then you're adding overhead to the function you profile. (gcc has to emit a thread-safe run-once locking mechanism.)
Really what you'd like is some kind of separate pass over the source that identifies all the PROBE macros and collects up their args to declare
struct probe global_probes[] = {
{0, "EventName", 0 /*placeholder*/, filename, linenum},
{1, "EventName", 0 /*placeholder*/, filename, linenum},
...
};
I'm not confident you can make that happen with CPP macros; I don't think it's possible to #define PROBE such that every time it expands, it redefines another macro to tack on more stuff.
But you could easily do that with an awk/perl/python / your fave scripting language program that scans your program and constructs a .c that declares an array with static storage.
Or better (for a single-threaded program): keep the runtime timestamps in one array, and the names and stuff in a separate array. So the cache footprint of the probes is smaller. For a multi-threaded program, stores to the same cache line from different threads is called false sharing, and creates cache-line ping-pong.
So you'd have #define PROBE(id, evname, blah blah) do { probe_times[id] = now(); }while(0)
and leave the handling of the later args to your separate preprocessing.

How to modify C++ code from user-input

I am currently writing a program that sits on top of a C++ interpreter. The user inputs C++ commands at runtime, which are then passed into the interpreter. For certain patterns, I want to replace the command given with a modified form, so that I can provide additional functionality.
I want to replace anything of the form
A->Draw(B1, B2)
with
MyFunc(A, B1, B2).
My first thought was regular expressions, but that would be rather error-prone, as any of A, B1, or B2 could be arbitrary C++ expressions. As these expressions could themselves contain quoted strings or parentheses, it would be quite difficult to match all cases with a regular expression. In addition, there may be multiple, nested forms of this expression
My next thought was to call clang as a subprocess, use "-dump-ast" to get the abstract syntax tree, modify that, then rebuild it into a command to be passed to the C++ interpreter. However, this would require keeping track of any environment changes, such as include files and forward declarations, in order to give clang enough information to parse the expression. As the interpreter does not expose this information, this seems infeasible as well.
The third thought was to use the C++ interpreter's own internal parsing to convert to an abstract syntax tree, then build from there. However, this interpreter does not expose the ast in any way that I was able to find.
Are there any suggestions as to how to proceed, either along one of the stated routes, or along a different route entirely?

What you want is a Program Transformation System.
These are tools that generally let you express changes to source code, written in source level patterns that essentially say:
if you see *this*, replace it by *that*
but operating on Abstract Syntax Trees so the matching and replacement process is
far more trustworthy than what you get with string hacking.
Such tools have to have parsers for the source language of interest.
The source language being C++ makes this fairly difficult.
Clang sort of qualifies; after all it can parse C++. OP objects
it cannot do so without all the environment context. To the extent
that OP is typing (well-formed) program fragments (statements, etc,.)
into the interpreter, Clang may [I don't have much experience with it
myself] have trouble getting focused on what the fragment is (statement? expression? declaration? ...). Finally, Clang isn't really a PTS; its tree modification procedures are not source-to-source transforms. That matters for convenience but might not stop OP from using it; surface syntax rewrite rule are convenient but you can always substitute procedural tree hacking with more effort. When there are more than a few rules, this starts to matter a lot.
GCC with Melt sort of qualifies in the same way that Clang does.
I'm under the impression that Melt makes GCC at best a bit less
intolerable for this kind of work. YMMV.
Our DMS Software Reengineering Toolkit with its full C++14 [EDIT July 2018: C++17] front end absolutely qualifies. DMS has been used to carry out massive transformations
on large scale C++ code bases.
DMS can parse arbitrary (well-formed) fragments of C++ without being told in advance what the syntax category is, and return an AST of the proper grammar nonterminal type, using its pattern-parsing machinery. [You may end up with multiple parses, e.g. ambiguities, that you'll have decide how to resolve, see Why can't C++ be parsed with a LR(1) parser? for more discussion] It can do this without resorting to "the environment" if you are willing to live without macro expansion while parsing, and insist the preprocessor directives (they get parsed too) are nicely structured with respect to the code fragment (#if foo{#endif not allowed) but that's unlikely a real problem for interactively entered code fragments.
DMS then offers a complete procedural AST library for manipulating the parsed trees (search, inspect, modify, build, replace) and can then regenerate surface source code from the modified tree, giving OP text
to feed to the interpreter.
Where it shines in this case is OP can likely write most of his modifications directly as source-to-source syntax rules. For his
example, he can provide DMS with a rewrite rule (untested but pretty close to right):
rule replace_Draw(A:primary,B1:expression,B2:expression):
primary->primary
"\A->Draw(\B1, \B2)" -- pattern
rewrites to
"MyFunc(\A, \B1, \B2)"; -- replacement
and DMS will take any parsed AST containing the left hand side "...Draw..." pattern and replace that subtree with the right hand side, after substituting the matches for A, B1 and B2. The quote marks are metaquotes and are used to distinguish C++ text from rule-syntax text; the backslash is a metaescape used inside metaquotes to name metavariables. For more details of what you can say in the rule syntax, see DMS Rewrite Rules.
If OP provides a set of such rules, DMS can be asked to apply the entire set.
So I think this would work just fine for OP. It is a rather heavyweight mechanism to "add" to the package he wants to provide to a 3rd party; DMS and its C++ front end are hardly "small" programs. But then modern machines have lots of resources so I think its a question of how badly does OP need to do this.

Try modify the headers to supress the method, then compiling you'll find the errors and will be able to replace all core.
As far as you have a C++ interpreter (as CERN's Root) I guess you must use the compiler to intercept all the Draw, an easy and clean way to do that is declare in the headers the Draw method as private, using some defines
class ItemWithDrawMehtod
{
....
public:
#ifdef CATCHTHEMETHOD
private:
#endif
void Draw(A,B);
#ifdef CATCHTHEMETHOD
public:
#endif
....
};
Then compile as:
gcc -DCATCHTHEMETHOD=1 yourfilein.cpp

In case, user want to input complex algorithms to the application, what I suggest is to integrate a scripting language to the app. So that the user can write code [function/algorithm in defined way] so the app can execute it in the interpreter and get the final results. Ex: Python, Perl, JS, etc.
Since you need C++ in the interpreter http://chaiscript.com/ would be a suggestion.

What happens when someone gets ahold of the Draw member function (auto draw = &A::Draw;) and then starts using draw? Presumably you'd want the same improved Draw-functionality to be called in this case too. Thus I think we can conclude that what you really want is to replace the Draw member function with a function of your own.
Since it seems you are not in a position to modify the class containing Draw directly, a solution could be to derive your own class from A and override Draw in there. Then your problem reduces to having your users use your new improved class.
You may again consider the problem of automatically translating uses of class A to your new derived class, but this still seems pretty difficult without the help of a full C++ implementation. Perhaps there is a way to hide the old definition of A and present your replacement under that name instead, via clever use of header files, but I cannot determine whether that's the case from what you've told us.
Another possibility might be to use some dynamic linker hackery using LD_PRELOAD to replace the function Draw that gets called at runtime.

There may be a way to accomplish this mostly with regular expressions.
Since anything that appears after Draw( is already formatted correctly as parameters, you don't need to fully parse them for the purpose you have outlined.
Fundamentally, the part that matters is the "SYMBOL->Draw("
SYMBOL could be any expression that resolves to an object that overloads -> or to a pointer of a type that implements Draw(...). If you reduce this to two cases, you can short-cut the parsing.
For the first case, a simple regular expression that searches for any valid C++ symbol, something similar to "[A-Za-z_][A-Za-z0-9_\.]", along with the literal expression "->Draw(". This will give you the portion that must be rewritten, since the code following this part is already formatted as valid C++ parameters.
The second case is for complex expressions that return an overloaded object or pointer. This requires a bit more effort, but a short parsing routine to walk backward through just a complex expression can be written surprisingly easily, since you don't have to support blocks (blocks in C++ cannot return objects, since lambda definitions do not call the lambda themselves, and actual nested code blocks {...} can't return anything directly inline that would apply here). Note that if the expression doesn't end in ) then it has to be a valid symbol in this context, so if you find a ) just match nested ) with ( and extract the symbol preceding the nested SYMBOL(...(...)...)->Draw() pattern. This may be possible with regular expressions, but should be fairly easy in normal code as well.
As soon as you have the symbol or expression, the replacement is trivial, going from
SYMBOL->Draw(...
to
YourFunction(SYMBOL, ...
without having to deal with the additional parameters to Draw().
As an added benefit, chained function calls are parsed for free with this model, since you can recursively iterate over the code such as
A->Draw(B...)->Draw(C...)
The first iteration identifies the first A->Draw( and rewrites the whole statement as
YourFunction(A, B...)->Draw(C...)
which then identifies the second ->Draw with an expression "YourFunction(A, ...)->" preceding it, and rewrites it as
YourFunction(YourFunction(A, B...), C...)
where B... and C... are well-formed C++ parameters, including nested calls.
Without knowing the C++ version that your interpreter supports, or the kind of code you will be rewriting, I really can't provide any sample code that is likely to be worthwhile.

One way is to load user code as a DLL, (something like plugins,)
this way, you don't need to compile your actual application, just the user code will be compiled, and you application will load it dynamically.

How does to!string(enum.member) works?

How does std.conv.to!string(enum.member) work? How is it possible that a function takes an enum member and returns its name? Does it use a compiler extension or something similar? It's a bit usual to me since I came from C/C++ world.

What it does is use compile time reflection on the enum type to get a list of members (the names as strings) and their values. It constructs a switch statement out of this information for a fast lookup to get the name from a value. to!SomeEnum("a_string") uses the same principle, just in the other direction.
The compile time reflection info is accessed with __traits(allMembers, TheEnumType), which returns a list of strings that can be looped over to build the switch statement. Then __traits(getMember, TheEnumType, memberName) is used to fetch the body.
Traits can be seen more of here: http://dlang.org/traits.html#allMembers
That allMembers one works on many types, not just classes as seen in the example, but also structs, enums, and more, even modules.
The phobos source code has some examples like EnumMembers in std.traits: https://github.com/D-Programming-Language/phobos/blob/master/std/traits.d#L3360
though the phobos source is kinda hard to read, but on line 3399, at the bottom of that function, you can see it using __traits(allMembers) as its data source. std.conv.to is implemented in terms of many std.traits functions.
You can also check out the sample chapter tab to get the Reflection chapter out of my D cookbook which discusses this stuff too:
http://www.packtpub.com/discover-advantages-of-programming-in-d-cookbook/book
The final example in that chapter shows how to use several of the reflection capabilities to build a little function dispatcher based on strings. The following chapter (not available for free though) shows how to build a switch out of it for better efficiency too. It's actually pretty easy: just put the case statements inside a foreach over the compile time data and the D compiler will unroll then optimize the lookup table for you!

Looking for a C++ implementation of the C4.5 algorithm

I've been looking for a C++ implementation of the C4.5 algorithm, but I haven't been able to find one yet. I found Quinlan's C4.5 Release 8, but it's written in C... has anybody seen any open source C++ implementations of the C4.5 algorithm?
I'm thinking about porting the J48 source code (or simply writing a wrapper around the C version) if I can't find an open source C++ implementation out there, but I hope I don't have to do that! Please let me know if you have come across a C++ implementation of the algorithm.
Update
I've been considering the option of writing a thin C++ wrapper around the C implementation of the C5.0 algorithm (C5.0 is the improved version of C4.5). I downloaded and compiled the C implementation of the C5.0 algorithm, but it doesn't look like it's easily portable to C++. The C implementation uses a lot of global variables and simply writing a thin C++ wrapper around the C functions will not result in an object oriented design because each class instance will be modifying the same global parameters. In other words: I will have no encapsulation and that's a pretty basic thing that I need.
In order to get encapsulation I will need to make a full blown port of the C code into C++, which is about the same as porting the Java version (J48) into C++.
Update 2.0
Here are some specific requirements:
Each classifier instance must encapsulate its own data (i.e. no global variables aside from constant ones).
Support the concurrent training of classifiers and the concurrent evaluation of the classifiers.
Here is a good scenario: suppose I'm doing 10-fold cross-validation, I would like to concurrently train 10 decision trees with their respective slice of the training set. If I just run the C program for each slice, I would have to run 10 processes, which is not horrible. However, if I need to classify thousands of data samples in real time, then I would have to start a new process for each sample I want to classify and that's not very efficient.

A C++ implementation for C4.5 called YaDT is available here, in the "Decision Trees" section:
http://www.di.unipi.it/~ruggieri/software.html
This is the source code for the last version:
http://www.di.unipi.it/~ruggieri/YaDT/YaDT1.2.5.zip
From the paper where the tool is described:
[...] In this paper, we describe a new from-scratch C++ implementation of a decision tree induction algorithm, which yields entropy-based decision trees in the style of C4.5. The implementation is called YaDT, an acronym for Yet another Decision Tree builder. The intended contribution of this paper is to present the design principles of the implementation that allowed for obtaining a highly efficient system. We discuss our choices on memory representation and modelling of data and metadata,on the algorithmic optimizations and their effect on memory and time performances, and on the trade-off between efficiency and accuracy of pruning heuristics. [...]
The paper is available here.

I may have found a possible C++ "implementation" of C5.0 (See5.0), but I haven't been able to dig into the source code enough to determine if it really works as advertised.
To reiterate my original concerns, the author of the port states the following about the C5.0 algorithm:
Another drawback with See5Sam [C5.0] is the impossibility to have more than
one application tree at the same time. An application is read from
files each time the executable is run and is stored in global
variables here and there.
I will update my answer as soon as I get some time to look into the source code.
Update
It's looking pretty good, here is the C++ interface:
class CMee5
{
public:
/**
Create a See 5 engine from tree/rules files.
\param pcFileStem The stem of the See 5 file system. The engine
initialisation will look for the following files:
- pcFileStem.names Vanilla See 5 names file (mandatory)
- pcFileStem.tree or pcFileStem.rules Vanilla See 5 tree or rules
file (mandatory)
- pcFileStem.costs Vanilla See 5 costs file (mandatory)
*/
inline CMee5(const char* pcFileStem, bool bUseRules);
/**
Release allocated memory for this engine.
*/
inline ~CMee5();
/**
General classification routine accepting a data record.
*/
inline unsigned int classifyDataRec(DataRec Case, float* pOutConfidence);
/**
Show rules that were used to classify the last case.
Classify() will have set RulesUsed[] to
number of active rules for trial 0,
first active rule, second active rule, ..., last active rule,
number of active rules for trial 1,
first active rule, second active rule, ..., last active rule,
and so on.
*/
inline void showRules(int Spaces);
/**
Open file with given extension for read/write with the actual file stem.
*/
inline FILE* GetFile(String Extension, String RW);
/**
Read a raw case from file Df.
For each attribute, read the attribute value from the file.
If it is a discrete valued attribute, find the associated no.
of this attribute value (if the value is unknown this is 0).
Returns the array of attribute values.
*/
inline DataRec GetDataRec(FILE *Df, Boolean Train);
inline DataRec GetDataRecFromVec(float* pfVals, Boolean Train);
inline float TranslateStringField(int Att, const char* Name);
inline void Error(int ErrNo, String S1, String S2);
inline int getMaxClass() const;
inline int getClassAtt() const;
inline int getLabelAtt() const;
inline int getCWtAtt() const;
inline unsigned int getMaxAtt() const;
inline const char* getClassName(int nClassNo) const;
inline char* getIgnoredVals();
inline void FreeLastCase(void* DVec);
}
I would say that this is the best alternative I've found so far.

If I'm reading this correctly...it appears not to be organized as a C API, but as a C program. A data set is fed in, then it runs an algorithm and gives you back some rule descriptions.
I'd think the path you should take depends on whether you:
merely want a C++ interface for supplying data and retrieving rules from the existing engine, or...
want a C++ implementation that you can tinker with in order to tweak the algorithm to your own ends
If what you want is (1) then you could really just spawn the program as a process, feed it input as strings, and take the output as strings. That would probably be the easiest and most future-proof way of developing a "wrapper", and then you'd only have to develop C++ classes to represent the inputs and model the rule results (or match existing classes to these abstractions).
But if what you want is (2)...then I'd suggest trying whatever hacks you have in mind on top of the existing code in either C or Java--whichever you are most comfortable. You'll get to know the code that way, and if you have any improvements you may be able to feed them upstream to the author. If you build a relationship over the longer term then maybe you could collaborate and bring the C codebase slowly forward to C++, one aspect at a time, as the language was designed for.
Guess I just think the "When in Rome" philosophy usually works better than Port-In-One-Go, especially at the outset.
RESPONSE TO UPDATE: Process isolation takes care of your global variable issue. As for performance and data set size, you only have as many cores/CPUs and memory as you have. Whether you're using processes or threads usually isn't the issue when you're talking about matters of scale at that level. The overhead you encounter is if the marshalling is too expensive.
Prove the marshalling is the bottleneck, and to what extent... and you can build a case for why a process is a problem over a thread. But, there may be small tweaks to existing code to make marshalling cheaper which don't require a rewrite.

Is there a tool that enables me to insert one line of code into all functions and methods in a C++-source file?

It should turn this
int Yada (int yada)
{
return yada;
}
into this
int Yada (int yada)
{
SOME_HEIDEGGER_QUOTE;
return yada;
}
but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.
Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.
My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.

Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.
In fact, the quickstart has pretty much exactly what you are after defined as an example:
http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf

If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!
Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.

I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.
You can turn instrumenting off on a per-function or per-file basis... see the docs for details.
The feature goes back at least as far as version 3.0.4 (from February 2002).
This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).
This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.

There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.
But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.
People are always asking here for ideas for projects - how about this for one?

I use this regex,
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
to locate the functions and add extra lines of code.
With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".

This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.
DMS parses source code (using full langauge front end, in this case for C++),
builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():function->function
"\visibility \returntype \fnname(int \parametername)
{ \body } "
->
"\visibility \returntype \fnname(int \parametername)
{ Heidigger(\CppString\(\methodname\),
\CppString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js