Expanding macro inside raw string - c++

I would like to do some debugging of my crazy macros, but there's no way to do it because macros generate code, not strings. I'd have to change the macros to emit strings in order for my program to print out the code that it would otherwise produce.
New in C++11 are R"delim("Raw Strings")delim", and I was hoping that there is some way to interpolate code-macros inside of one of these to turn that code into a string literal.

Raw string literals concatenate the same way as normal string literals.
#define MYMACRO "hello"
std::string blah = R"(first part -)" MYMACRO R"(- second part)";
std::cout << blah;
will output first part -hello- second part

If you want to debug your crazy macros, you'd probably get more mileage out of directly examining the preprocessed output. Any C/C++ compiler will have an option for this. In GCC it's -E; for MSVC, I don't recall where it is exactly, but one of the properties sections has "keep preprocessed output". When you do this, keep your #includes to a minimum, especially standard-library #includes; these can add hundreds or thousands of lines of code to the top of the preprocessed output.

Related

How to cleanup the results of preprocessing by removing some or all #include

I have a lot of preprocessing magic happening in header files. I'd like to view the results of such magic on my source file, but without all the #include stuff there.
For example, when I run the preprocessor on
#include<stdio.h>
#define astring "hello world"
int main()
{
printf("%s\n",astring);
return 0;
}
I get 27k lines of output. I'd like just the last 7 or so lines in this case.
Sometimes I want the results of certain include directives, others (like nearly always the system headers) I'd like to ignore.
Rarely I'll have include directives in unusual spots, these too I'd like the option of omitting or keeping output.
Are there any tools/methods out there to help me?
EDIT: The goal is not to get compilable code. In my case the preprocessor functionally changes the source code (for example using the preprocessor to implement template like functionality in c) and viewing the postprocessed source is useful for debugging. I'm using the gnu cpreprocessor through gcc/g++, which as far as i know is calling cpp.
I came up with this just playing around with gcc -E and awk. I noticed that gcc -E outputs lines of the format # <linenum> "<filename>" <other stuff> when going into other files. So basically, if I can keep track of that filename I can print out the lines I care about.
I'm not skilled in awk so there may be a more efficient way to do this.
gcc -E example.c | awk '
/^#/ { filename = $3 }
!/^#/ {
if (filename == "\"example.c\"")
print $0
}'
If the line starts with #, then store the filename. Otherwise, if the filename is the one I care about, print out the line. Replace example.c with your filename.
For your example this outputs:
int main()
{
printf("%s\n","hello world");
return 0;
}
I'm not sure if it's 100% correct (I don't know if there can be other lines starting with # for example). You can play around with it to get to something you want.
A possible simple solution is to insert start and end markers around the program text whose expansion you care about. While you cannot insert markers using comments, you can achieve the same effect with #pragma directives, at least with common C compilers.
According to the standard (§6.10.6) a pragma directive is either recognised by the implementation and then has implementation-defined behaviour (which might cause the compilation to fail) or it is not recognised by the implementation, in which case it is ignored. Since implementation-defined behaviour must be documented by a conforming implementation, it should, in theory, be possible to ascertain which pragmas are recognised by the implementation, and then you can use anything which doesn't match that pattern. In practice, it is rarely that simple, but in general the first token following "#pragma" will identify the compiler or subsystem, so most pragmas recognised by gcc will start with the token GCC. (There are lots of legacy pragmas, though.)
So you might have to experiment a bit, but at least on the compilers I had kicking around, the lines
#pragma X_PPTRACE 0
and
#pragma X_PPTRACE 1
were just passed through by the preprocessor (albeit with a warning enabled by -Wall), allowing for a very simple awk program:
gcc -Wall -Wno-unknown-pragmas -E ... |
awk '/#pragma[[:space:]]+X_PPTRACE/{trace=$3;}trace'

How does GCC know what line an error is on when the compiler takes all whitespace and comments out of the code?

I'm sure this applies to other compilers as well, but I've only used GCC. If the compiler optimizes the code by removing everything extraneous that isn't code (comments, whitespace, etc.), how does it correctly show what line an error is on in the original file? Does it only optimize the code after checking for errors? Or does it somehow insert tags so that if an error is found it knows what line it's on?
mycode.cpp: In function ‘foo(int bar)’:
mycode.cpp:59: error: no matching function for call to ‘bla(int bar)’
The compiler converts source code to an object format, or more
correctly, here, an intermediate format which will later be used
to generate object format. I've not looked into the internals
of g++, but typically, a compiler will tokenize the input and
build a tree structure. When doing so, it will annotate the
nodes of the tree with the position in the file where it read
the token which the node represents. Many errors are detected
during this very parsing, but for those that aren't, the
compiler will use the information on the annotated node in the
error message.
With regards to "removing everything extraneouss that isn't
code", that's true in the sense that the compiler tokenizes the
input, and converts it into the tree. But when doing so, it is
reading the files; at every point, it is either reading the
file, or accessing a node which was annotated while the file was
being read.
The preprocessor (conceptually) adds #line directives, to tell the compiler which source file and line number correspond to each line of preprocessed source. They look like
// set the current line number to 100, in the current source file
#line 100
// set the current line number to 1, in a header file
#line 1 "header.h"
(Of course, a modern preprocessor usually isn't a separate program, and usually doesn't generated an intermediate text representation, so these are actually some kind of metadata passed to the compiler along with the stream of preprocessed tokens; but it may be simpler, and not significantly incorrect, to think in terms of preprocessed source).
You can add these yourself if you want. Possible uses are testing macros that use the __FILE__ and __LINE__ definitions, and laying traps for maintenance programmers.

Large Integer Literal Source Formatting in C++

I'm working with very large integer literal defines eg:
#define X 999999999999
To improve readability I tried changing this to:
#define X 999/**/999/**/999/**/999
But the compiler was like "nah bru.."
Is there any way to make these more readable?
Just to clarify, this question is asking only about the appearance of the values in the source code. I'm not asking how to format these values in a printf or anything.
You can do this in a define (but not outside of a define):
#define X 999##111##333##444
I'm not sure that I'd recommend it, but it's legal. (## is the preprocessor token concatenation operator.)
You explicitly didn't ask about output formatting, so you're probably not interested in input formatting either, but both of them can be made locale-aware, which includes allowing locale-specific grouping characters.
You could do this:
#include <boost/preprocessor.hpp>
BOOST_PP_SEQ_CAT((345)(678)(901))
Which would show up in source code as:
345678901

Complicated multi-argument #define macro for strings

I'm working on a project and have a problem that I believe can be solved with macros, but given the nature of the issue I don't have the experience to write one myself.
Here's what I would expect as input and output of the #define macro:
Inputting code such as this
printf(foobar(Hello World.));
Should result in the preprocessor producing code that reads:
printf((char *)(std::string("")+'H'+'e'+'l'+'l'+'o'+' '+'W'+'o'+'r'+'l'+'d'+'.').c_str());
I'm assuming something this complicated is possible, and I hope one of you guys can help me out.
I NEED IT TO BE A MACRO, I DO NOT want a string constant anywhere.
The only solution I can think of is to run your code through a suitable script (probably just some light awk), that does the substitution before your code reaches the pre-compiler.
Depending on your environment you could do this as a "Pre-Build Event" in Visual Studio, or just add a step directly into your makefile.
Uh, I fear it is impossible (unless I don't know something).
I believe there is no macro to split a given input token (e.g. Hello) into characters building it (H e l l o)
There were some attempts to do such thing, but I fear it is not exactly what you need:
C++: Can a macro expand "abc" into 'a', 'b', 'c'?
"More powerful precompiler" ?
Try this topic: Replacements for the C preprocessor
Macros are basically substitution or addition of strings.
You could do this with a pre-processor of your own, but the standard pre-processor won't split strings into component parts.
How about this:
Put all these (assuming there is more than one) 'macros' in a separate file. Write a program that translates them into the expansion you require and then include THAT file in your c program? You could then make the expansion program part of your make file so it's always up to date.
Using a separate file makes the expansion program much easier than parsing a c/c++ file.
Since you're looking for a narrow, direct answer to your question and without suggestions, here goes:
This is impossible. You must find a different solution to whatever it is you're trying to achieve.
Have you tried:
#define toString(x) #x
You can use it after like this:
printf("%s", toString(hello world));
Don't try to use printf directly with the string because you can have format specifier in the string.
printf(toString(hello world)); //wrong, you can have for example %d in the string

Macro Replacement during Code Generation

Presently I have a some legacy code, which generates the op code. If the code has more number of macros then the code generation takes so much of time (In terms of hours!!).
I have gone through the logic, they are handling the macro by searching for it and doing a replace of each variable in it some thing like inlining.
Is there a way that I can optimize it without manipulating the string?
You must tokenize your input before starting this kind of process. (I can't recommend the famous Dragon Book highly enough - even the ancient edition stood the test of time, the updated 2006 version looks great). Compiling is the sort of job that's best split up into smaller phases: if your first phase performs lexical analysis into tokens, splitting lines into keywords, identifiers, constants, and so on, then it's much simpler to find the references to macros and look them up in a symbol table. (It's also relatively easier to use a tool like lex or flex or one of their modern equivalents to do this job for you, than to attempt to do it from scratch).
The 'clue' seems to be if the code has more number of macros then the code generation takes so much of time. That sounds like the process is linear in the number of macros, which is certainly too much. I'm assuming this process occurs one line at a time (if your language allows that, obviously that has enormous value, since you don't need to treat the program as one huge string), and the pseudocode looks something like
for(each line in the program)
{
for(each macro definition)
{
test if the macro appears;
perform replacement if needed;
}
}
That clearly scales with the number of macro definitions.
With tokenization, it looks something like this:
for(each line in the program)
{
tokenize the line;
for(each token in the line)
{
switch(based on the token type)
{
case(an identifier)
lookup the identifier in the table of macro names;
perform replacement as necessary;
....
}
}
}
which scales mostly with the size of the program (not the number of definitions) - the symbol table lookup can of course be done with more optimal data structures than looping through them all, so that no longer becomes the significant factor. That second step is something that again programs like yacc and bison (and their more modern variants) can happily generate code to do.
afterthought: when parsing the macro definitions, you can store those as a token stream as well, and mark the identifiers that are the 'placeholder' names for parameter replacement. When expanding a macro, switch to that token stream. (Again, something things like flex can easily do).
I have an application which has its own grammer. It supports all types of datatypes that a typical compiler supports (Even macros). More precisely it is a type of compiler which generates the opcodes by taking a program (which is written using that grammer) as input.
For handling the macros, it uses the text replacement logic
For Example:
Macro Add (a:int, b:int)
int c = a + b
End Macro
// Program Sum
..
int x = 10, y = 10;
Add(x, y);
..
// End of the program
After replacement it will be
// Program Sum
..
int x = 10, y = 10;
int c = x + y
..
// End of program
This text replacement is taking so much of time i.e., replacing the macro call with macro logic.
Is there a optimal way to do it?
This is really hard to answer without knowing more of your preprocessor/parse/compile process. One idea would be to store the macro names in a symbol table. When parsing, check text tokens against that table first, If you find a match, write the replacement into a new string, and run that through the parser, then continue parsing the original text following the macrto's close parens.
Depending on your opcode syntax, another idea might be - when you encounter the macro definition while parsing, generate the opcodes, but put placeholders in place of the arguments. Then when the parser encounter calls to the macro, generate the code for evaluating the arguments, and insert that code in place of the placeholders in the pre-generated macro code.