I am reading through a .cpp trying to figure out some things and came across code like this:
some_function()
{
CustomClass some_sort_of_list;
string sample;
if (sample != "") {
some_sort_of_list = #BOING(args);
}
}
Has anyone seen the # operator before, or is it just #define used somewhere in one of the header files? I do not have access to the headers.
Since #Captain Obvlious mentioned early versions of Visual C++, I will look there to see what is going on...
PS: I should mention also, in case it's not obvious enough, that the names have been changed since I don't know if I have the license to share this source. The main issue is the #SOMETHING.
PPS: the comments are in Japanese, and I have limited access to the original authors.
That's not standard C++, it's not even legal as a #define since they're not permitted to start with #.
It's probably something that gets run through a pre-processor of some sort, like Oracle's Pro*C compiler which can turn EXEC SQL into C function calls, before passing to an actual C compiler.
Your best bet would be to think about the environment that this code runs in, such as "is it an internationalised application where #GEN may retrieve a locale-specific string for output?".
And, since, you mention that the comments are in Japanese, you should at least give Google Translate a try. It can sometimes result in hilarity for complex phrases but it may well give you a needed clue.
Related
struct Foo{
Bar get(){
}
}
auto f = Foo();
f.get();
For example you decide that get was a very poor choice for a name but you have already used it in many different files and manually changing ever occurrence is very annoying.
You also can't really make a global substitution because other types may also have a method called get.
Is there anything for D to help refactor names for types, functions, variables etc?
Here's how I do it:
Change the name in the definition
Recompile
Go to the first error line reported and replace old with new
Goto 2
That's semi-manual, but I find it to be pretty easy and it goes quickly because the compiler error message will bring you right to where you need to be, and most editors can read those error messages well enough to dump you on the correct line, then it is a simple matter of telling it to repeat the last replacement again. (In my vim setup with my hotkeys, I hit F4 for next error message, then dot for repeat last change until it is done. Even a function with a hundred uses can be changed reliably* in a couple minutes.)
You could probably write a script that handles 90% of cases automatically too by just looking for ": Error: " in the compiler's output, extracting the file/line number, and running a plain text replace there. If the word shows up only once and outside a string literal, you can automatically replace it, and if not, ask the user to handle the remaining 10% of cases manually.
But I think it is easy enough to do with my editor hotkeys that I've never bothered trying to script it.
The one case this doesn't catch is if there's another function with the same name that might still compile. That should never happen if you do this change in isolation, because an ambiguous name wouldn't compile without it.
In that case, you could probably do a three-step compiler-assisted change:
Make sure your code compiles before. Then add #disable to the thing you want to rename.
Compile. Every place it complains about it being unusable for being disabled, do the find/replace.
Remove #disable and rename the definition. Recompile again to make sure there's nothing you missed like child classes (the compiler will then complain "method foo does not override any function" so they stand right out too.
So yeah, it isn't fully automated, but just changing it and having the compiler errors help find what's left is good enough for me.
Some limited refactoring support can be found in major IDE plugins like Mono-D or VisualD. I remember that Brian Schott had plans to add similar functionality to his dfix tool by adding dependency on dsymbol but it doesn't seem implemented yet.
Not, however, that all such options are indeed of a very limited robustness right now. This is because figuring out the fully qualified name of any given symbol is very complex task in D, one that requires full semantics analysis to be done 100% correctly. Think about local imports, templates, function overloading, mixins and how it all affects identifying the symbol.
In the long run it is quite certain that we need to wait before reference D compiler frontend becomes available as a library to implement such refactoring tool in clean and truly reliable way.
A good find all feature can be better than a bad refactoring which, as mentioned previously, requires semantic.
Personally I have a find all feature in Coedit which displays the context of a match and works on all the project sources.
It's fast to process the results.
I have following requirement:
Adding text at the entry and exit point of any function.
Not altering the source code, beside inserting from above (so no pre-processor or anything)
For example:
void fn(param-list)
{
ENTRY_TEXT (param-list)
//some code
EXIT_TEXT
}
But not only in such a simple case, it'd also run with pre-processor directives!
Example:
void fn(param-list)
#ifdef __WIN__
{
ENTRY_TEXT (param-list)
//some windows code
EXIT_TEXT
}
#else
{
ENTRY_TEXT (param-list)
//some any-os code
if (condition)
{
return; //should become EXIT_TEXT
}
EXIT_TEXT
}
So my question is: Is there a proper way doing this?
I already tried some work with parsers used by compilers but since they all rely on running a pre-processor before parsing, they are useless to me.
Also some of the token generating parser, which do not need a pre-processor are somewhat useless because they generate a memory-mapping of tokens, which then leads to a complete new source code, instead of just inserting the text.
One thing I am working on is to try it with FLEX (or JFlex), if this is a valid option, I would appreciate some input on it. ;-)
EDIT:
To clarify a little bit: The purpose is to allow something like a stack trace.
I want to trace every function call, and in order to follow the call-hierachy, I need to place a macro at the entry-point of a function and at the exit point of a function.
This builds a function-call trace. :-)
EDIT2: Compiler-specific options are not quite suitable since we have many different compilers to use, and many that are propably not well supported by any tools out there.
Unfortunately, your idea is not only impractical (C++ is complex to parse), it's also doomed to fail.
The main issue you have is that exceptions will bypass your EXIT_TEXT macro entirely.
You have several solutions.
As has been noted, the first solution would be to use a platform dependent way of computing the stack trace. It can be somewhat imprecise, especially because of inlining: ie, small functions being inlined in their callers, they do not appear in the stack trace as no function call was generated at assembly level. On the other hand, it's widely available, does not require any surgery of the code and does not affect performance.
A second solution would be to only introduce something on entry and use RAII to do the exit work. Much better than your scheme as it automatically deals with multiple returns and exceptions, it suffers from the same issue: how to perform the insertion automatically. For this you will probably want to operate at the AST level, and modify the AST to introduce your little gem. You could do it with Clang (look up the c++11 migration tool for examples of rewrites at large) or with gcc (using plugins).
Finally, you also have manual annotations. While it may seem underpowered (and a lot of work), I would highlight that you do not leave logging to a tool... I see 3 advantages to doing it manually: you can avoid introducing this overhead in performance sensitive parts, you can retain only a "summary" of big arguments and you can customize the summary based on what's interesting for the current function.
I would suggest using LLVM libraries & Clang to get started.
You could also leverage the C++ language to simplify your process. If you just insert a small object into the code that is constructed on function scope entrance & rely on the fact that it will be destroyed on exit. That should massively simplify recording the 'exit' of the function.
This does not really answer you question, however, for your initial need, you may use the backtrace() function from execinfo.h (if you are using GCC).
How to generate a stacktrace when my gcc C++ app crashes
I've been looking for a language that provides the same functionality that Coffeescript has, but for C/C++. I mean a language that converts the code into C, but readable, just like CoffeeScript converts to Javascript, readable and all.
I think this is possible, and even desirable (I grudgingly deal with C++ when writing Node.js native modules), but more challenging than with a higher-level language like JavaScript.
What you're asking for is a language that would provide syntactic sugar without sacrificing performance or flexibility. Some syntactic sugars (say, syntactic whitespace or Ruby-style def/end blocks instead of curly braces) would be trivial to add. But adding anything more advanced, you'd run into two major hurdles: static typing, and garbage collection.
For instance, let's say that you wanted to add implicit returns. It seems like a small feature, but think about it: In order for the feature to be useful, you'd have to—at the very least—throw a compile-time error when the value of the last expression doesn't match the function's return type. That means that your compiler needs to inspect a line like
a->b
and figure out what type it is. That's possible in principle, but it's a heck of a lot more work than the CoffeeScript compiler does.
Or say you added list comprehensions. That means you're allocating an array whose length isn't known at compile-time, which means you'll need to later deallocate it yourself. So the syntactic sugar could actually hurt you. The rule "If you malloc it, you free it" doesn't work if the compiler is adding in the malloc for you, unless it can figure out where to put the free (which, again, is generally possible but would take a lot of work).
So, while I'd love to someone give C++ the CoffeeScript treatment, I don't expect it to happen any time soon—if ever. I think it's more likely that the world will eventually move on to something like D or Go for system-level programming.
I think OOC is probably the closest thing to Coffeescript for C. It's a programming language with a lot of the features you'd expect from dynamic languages (objects, first class functions, clean syntax) that compiles directly into C99.
http://ooc-lang.org/
One item missing from Jacindas list you might want to know about: Vala/Genie is a compiler targetting C with the GObject library implementing objects, written by Gnome. Vala is a C#-like syntax, and Genie a Python-like syntax, but for the rest they are the same system. It was actually created because bare C + GObject became too much of a pain to work with for the Gnome guys. Vala does objects and automatic memory management based on reference counting or ownership tracking, and a lot of other things you'd expect in a C# like language.
As for the CoffeeScript-like property, I just saw that there was an experimental feature to disable the dependency of the generated code on GObject, so it generates just plain C without any runtime dependencies. Doing so disables a number of more advanced OO features, but it still leaves you with a better syntax, a basic object system, and (semi-)automatic memory management.
I don't know how readable the output is, but if you run it through a pretty printer it might be very close to what you're looking for.
SugarCpp is a language which can compile to C++11. It should be what you are looking for. Visit https://github.com/curimit/SugarCpp for more details.
For Python specifically, take a look at this question:
Convert Python program to C/C++ code?
They mention Shed Skin, which will take a subset of pure python and convert to standalone C++ code.
Cython is typically used to create Python extension modules, but can create standalone programs if the Python interpreter is embedded. This doesn't sound like what you're looking for, though.
Cython is based on Pyrex, and they are compatible with each other in many ways.
For some of the other languages you mentioned there seem to be similar projects: Ruby and PHP. Toba for Java (though no longer maintained), Marst for Algol, BCX for BASIC, COB2C, PtoC for Pascal and I should probably stop there before this turns into "List of Converters from Foo to C/C++."
Hope that helps!
Take a look at this fresh new project: https://bixense.com/coffeepp/
Coffee++
Coffee++ is a little language that compiles into C++. It has been created to have something similar to CoffeeScript for C++. Currently Coffee++ is in a alpha state and not at all usable or final. Check out the source on Github to get involved.
The golden rule of Coffee++ is: "It's just C++". The code compiles one-to-one into the equivalent C++, and there is no runtime library. You can use any existing C++ library seamlessly from Coffee++ (and vice-versa).
Overview:
source file Test.cf++
include iostream
int main():
age := 5
dog := Dog(age)
if age != 7:
dog.bark()
class Dog:
public Dog(int age):
this->age := age
public void bark():
std::cout << "Woof!\n"
private int age
};
compiled Test.hpp
#pragma once
int main();
class Dog {
public:
Dog();
void bark();
private:
int age;
};
compiled Test.cpp
#include "test.hpp"
#include <iostream>
int main() {
auto age = 5;
auto dog = Dog(age);
if (age != 7) {
dog.bark();
}
}
Dog::Dog(int age) : age(age) {
}
void Dog::bark() {
std::cout << "Woof!\n";
}
Since vala and genie were already mentioned, I'll put BaCon (Basic Converter) out there for those who reminisce about hand coding programs from a monthly print publication, but want to use it with a modern GUI.
Must run on each Unix/Linux/BSD platform, including MacOSX
Converted sourcecode must be compilable with GCC
Must resemble genuine BASIC with implicit variable delarations
Spoken language constructs are preferred
The website http://www.basic-converter.org/ has lots of examples (some of theme pretty complex for "BASIC") and plugins for nearly every opensource IDE or you can use the BACON IDE.
Well, this is not what you want, but.. : http://www.campbell.nu/oscar/cython/index.html - This cython/cytoc is a significant space (pythonish) transpiler for C/C++ that I coded around 1999/2000, it has no relation to the cython project that arrived seven years later.
Frankly, I wrote it in Perl and it's heuristical, using regular expressions. I used it for an entire project of a Gameboy Color game (regular ansi C). But I wouldn't trust it... Which is why I'm looking around too, instead of using that dusty old bugger ;)
Follow up:
I've been working on Onyx (https://github.com/ozra/onyx-lang) for a year plus now, and finally realized the obvious thing to do is rewrite it to compile to C++ instead of LLVM-IR. The re-target idea is brand fresh, so rewrite is still vapor. But your input would be made well use of in RFC's, if you like the idea of the language, it's your chance to shape it.
It should turn this
int Yada (int yada)
{
return yada;
}
into this
int Yada (int yada)
{
SOME_HEIDEGGER_QUOTE;
return yada;
}
but for all (or at least a big bunch of) syntactically legal C/C++ - function and method constructs.
Maybe you've heard of some Perl library that will allow me to perform these kinds of operations in a view lines of code.
My goal is to add a tracer to an old, but big C++ project in order to be able to debug it without a debugger.
Try Aspect C++ (www.aspectc.org). You can define an Aspect that will pick up every method execution.
In fact, the quickstart has pretty much exactly what you are after defined as an example:
http://www.aspectc.org/fileadmin/documentation/ac-quickref.pdf
If you build using GCC and the -pg flag, GCC will automatically issue a call to the mcount() function at the start of every function. In this function you can then inspect the return address to figure out where you were called from. This approach is used by the linux kernel function tracer (CONFIG_FUNCTION_TRACER). Note that this function should be written in assembler, and be careful to preserve all registers!
Also, note that this should be passed only in the build phase, not link, or GCC will add in the profiling libraries that normally implement mcount.
I would suggest using the gcc flag "-finstrument-functions". Basically, it automatically calls a specific function ("__cyg_profile_func_enter") upon entry to each function, and another function is called ("__cyg_profile_func_exit") upon exit of the function. Each function is passed a pointer to the function being entered/exited, and the function which called that one.
You can turn instrumenting off on a per-function or per-file basis... see the docs for details.
The feature goes back at least as far as version 3.0.4 (from February 2002).
This is intended to support profiling, but it does not appear to have side effects like -pg does (which compiles code suitable for profiling).
This could work quite well for your problem (tracing execution of a large program), but, unfortunately, it isn't as general purpose as it would have been if you could specify a macro. On the plus side, you don't need to worry about remembering to add your new code into the beginning of all new functions that are written.
There is no such tool that I am aware of. In order to recognise the correct insertion point, the tool would have to include a complete C++ parser - regular expressions are not enough to accomplish this.
But as there are a number of FOSS C++ parsers out there, such a tool could certainly be written - a sort of intelligent sed for C++ code. The biggest problem would probably be designing the specification language for the insert/update/delete operation - regexes are obviously not the answer, though they should certainly be included in the language somehow.
People are always asking here for ideas for projects - how about this for one?
I use this regex,
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
to locate the functions and add extra lines of code.
With that regex I also get the function name (group 1) and the arguments (group 2).
Note: you must filter out names like, "while", "do", "for", "switch".
This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C#, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C++.
DMS parses source code (using full langauge front end, in this case for C++),
builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():function->function
"\visibility \returntype \fnname(int \parametername)
{ \body } "
->
"\visibility \returntype \fnname(int \parametername)
{ Heidigger(\CppString\(\methodname\),
\CppString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not C++ quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is C++ syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the function, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.
So you know off the bat, this is a project I've been assigned. I'm not looking for an answer in code, but more a direction.
What I've been told to do is go through a file and count the actual lines of code while at the same time recording the function names and individual lines of code for the functions. The problem I am having is determining a way when reading from the file to determine if the line is the start of a function.
So far, I can only think of maybe having a string array of data types (int, double, char, etc), search for that in the line and then search for the parenthesis, and then search for the absence of the semicolon (so i know it isn't just the declaration of the function).
So my question is, is this how I should go about this, or are there other methods in which you would recommend?
The code in which I will be counting will be in C++.
Three approaches come to mind.
Use regular expressions. This is fairly similar to what you're thinking of. Look for lines that look like function definitions. This is fairly quick to do, but can go wrong in many ways.
char *s = "int main() {"
is not a function definition, but sure looks like one.
char
* /* eh? */
s
(
int /* comment? // */ a
)
// hello, world /* of confusion
{
is a function definition, but doesn't look like one.
Good: quick to write, can work even in the face of syntax errors; bad: can easily misfire on things that look like (or fail to look like) the "normal" case.
Variant: First run the code through, e.g., GNU indent. This will take care of some (but not all) of the misfires.
Use a proper lexer and parser. This is a much more thorough approach, but you may be able to re-use an open source lexer/parsed (e.g., from gcc).
Good: Will be 100% accurate (will never misfire). Bad: One missing semicolon and it spews errors.
See if your compiler has some debug output that might help. This is a variant of (2), but using your compiler's lexer/parser instead of your own.
Your idea can work in 99% (or more) of the cases. Only a real C++ compiler can do 100%, in which case I'd compile in debug mode (g++ -S prog.cpp), and get the function names and line numbers from the debug information of the assembly output (prog.s).
My thoughts for the 99% solution:
Ignore comments and strings.
Document that you ignore preprocessor directives (#include, #define, #if).
Anything between a toplevel { and } is a function body, except after typedef, class, struct, union, namespace and enum.
If you have a class, struct or union, you should be looking for method bodies inside it.
The function name is sometimes tricky to find, e.g. in long(*)(char) f(int); .
Make sure your parser works with template functions and template classes.
For recording function names I use PCRE and the regex
"(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{"
and then filter out names like "if", "while", "do", "for", "switch". Note that the function name is (\w+), group 1.
Of course it's not a perfect solution but a good one.
I feel manually doing the parsing is going to be a quite a difficult task. I would probably use a existing tool such as RSM redirect the output to a csv file (assuming you are on windows) and then parse the csv file to gather the required information.
Find a decent SLOC count program, eg, SLOCCounter. Not only can you count SLOC, but you have something against which to compare your results. (Update: here's a long list of them.)
Interestingly, the number of non-comment semicolons in a C/C++ program is a decent SLOC count.
How about writing a shell script to do this? An AWK program perhaps.