is it possible to concatenate strings during preprocessing?
I found this example
#define H "Hello "
#define W "World!"
#define HW H W
printf(HW); // Prints "Hello World!"
However it does not work for me - prints out "Hello" when I use gcc -std=c99
UPD This example looks like working now. However, is it a normal feature of c preprocessor?
Concatenation of adjacent string litterals isn't a feature of the preprocessor, it is a feature of the core languages (both C and C++). You could write:
printf("Hello "
" world\n");
You can indeed concatenate tokens in the preprocessor, but be careful because it's tricky. The key is the ## operator. If you were to throw this at the top of your code:
#define myexample(x,y,z) int example_##x##_##y##_##z## = x##y##z
then basically, what this does, is that during preprocessing, it will take any call to that macro, such as the following:
myexample(1,2,3);
and it will literally turn into
int example_1_2_3 = 123;
This allows you a ton of flexibility while coding if you use it correctly, but it doesn't exactly apply how you are trying to use it. With a little massaging, you could get it to work though.
One possible solution for your example might be:
#define H "Hello "
#define W "World!"
#define concat_and_print(a, b) cout << a << b << endl
and then do something like
concat_and_print(H,W);
From gcc online docs:
The '##' preprocessing operator performs token pasting. When a macro is expanded, the two tokens on either side of each '##' operator are combined into a single token, which then replaces the '##' and the two original tokens in the macro expansion.
Consider a C program that interprets named commands. There probably needs to be a table of commands, perhaps an array of structures declared as follows:
struct command
{
char *name;
void (*function) (void);
};
struct command commands[] =
{
{ "quit", quit_command },
{ "help", help_command },
...
};
It would be cleaner not to have to give each command name twice, once in the string constant and once in the function name. A macro which takes the name of a command as an argument can make this unnecessary. The string constant can be created with stringification, and the function name by concatenating the argument with _command. Here is how it is done:
#define COMMAND(NAME) { #NAME, NAME ## _command }
struct command commands[] =
{
COMMAND (quit),
COMMAND (help),
...
};
I just thought I would add an answer that cites the source as to why this works.
The C99 standard §5.1.1.2 defines translation phases for C code. Subsection 6 states:
Adjacent string literal tokens are concatenated.
Similarly, in the C++ standards (ISO 14882) §2.1 defines the Phases of translation. Here Subsection 6 states:
6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
This is why you can concatenate strings simply by placing them adjacent to one another:
printf("string"" one\n");
>> ./a.out
>> string one
The preprocessing part of the question is simply the usage of the #define preprocessing directive which does the substitution from identifier (H) to string ("Hello ").
Related
Is it possible to concatenate quoted string literals outside of the language (C++, in this case)?
That is, can I define MY_MACRO(a,b,c) and use it thus:
MY_MACRO("one", "two", "three")
and have it expand to: "onetwothree"?
The use case is to apply an attribute and its message to, say, a function signature, like so:
MY_ATTRIBUTE_MACRO("this", "is", "the reason") int foo() { return 99; }
and it would result in:
[[nodiscard("thisisthe reason")]] int foo() { return 99; }
The language already does string concatenation!
This:
"hi" "James"
becomes just one string literal.
That means you do not need any preprocessor tricks for this at all.
You need only employ this in the output of your macro:
#define MY_ATTRIBUTE_MACRO(x,y,z) [[nodiscard(x y z)]]
Now this:
MY_ATTRIBUTE_MACRO("this", "is", "the reason") int foo() { return 99; }
is this:
[[nodiscard("this" "is" "the reason")]] int foo() { return 99; }
which is actually already what you wanted, because of the implicit string concatenation (which happens after macro expansion):
[[nodiscard("thisisthe reason")]] int foo() { return 99; }
Translation phase 4:
[lex.phases]/4: Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal-character-name is produced by token concatenation, the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted.
Translation phase 6:
[lex.phases]/6: Adjacent string literal tokens are concatenated.
I'm not sure what you mean by "outside the language" but, in C++, any string literals separated just by whitespace are implicitly concatenated into one. Thus, your MY_MACRO definition is actually very simple:
#include <iostream>
#define MY_MACRO(a, b, c) a b c
int main()
{
std::cout << MY_MACRO("one", "two", "three") << std::endl;
return 0;
}
The output from this short program is what you asked for: onetwothree.
Note: As a matter of curiosity/interest, it is normally recommended to enclose macro arguments in parentheses, in the definition part, so as to avoid unwanted side effects of the evaluation. However, in this case, using such parentheses won't work, and breaks the implicit concatenation:
#define MY_MACRO(a, b, c) (a) (b) (c) // Broken!
char* a="dsa" "qwe";
printf("%s", a);
output: dsaqwe
My question is why does this thing work. If I give a space or nothing in between two string literals it concatenates the string literals.
How is this working?
It's defined by the ISO C standard, adjacent string literals are combined into a single one.
The language is a little dry (it is a standard after all) but section 6.4.5 String literals of C11 states:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed wide string literal tokens are concatenated into a single multibyte character sequence.
This is also mentioned in 5.1.1.2 Translation phases, point 6 of the same standard, though a little more succinctly:
Adjacent string literal tokens are concatenated.
This basically means that "abc" "def" is no different to "abcdef".
It's often useful for making long strings while still having nice formatting, something like:
const char *myString = "This is a really long "
"string and I don't want "
"to make my lines in the "
"editor too long, because "
"I'm basically anal retentive :-)";
And to answer your unasked question, "What is this good for?"
For one thing, you can put constants in string literals. You can write
#define FIRST "John"
#define LAST "Doe"
const char* name = FIRST " " LAST;
const char* salutation = "Dear " FIRST ",";
and then if you'll need to change the name later, you'll only have to change it in one spot.
Things like that.
You answered your own question.
If I give a space or nothing in between two string literals it concatenates the string literals.
That's one of the features of the C syntax.
ISO C standard §5.1.1.2 says:-
Adjacent string literal tokens are concatenated.
White-space characters separating tokens are no longer significant.
I have a C++ library that uses the predefined macro __FUNCTION__, by way of crtdefs.h. The macro is documented here. Here is my usage:
my.cpp
#include <crtdefs.h>
...
void f()
{
L(__FUNCTIONW__ L" : A diagnostic message");
}
static void L(const wchar_t* format, ...)
{
const size_t BUFFERLENGTH = 1024;
wchar_t buf[BUFFERLENGTH] = { 0 };
va_list args;
va_start(args, format);
int count = _vsnwprintf_s(buf, BUFFERLENGTH, _TRUNCATE, format, args);
va_end(args);
if (count != 0)
{
OutputDebugString(buf);
}
}
crtdefs.h
#define __FUNCTIONW__ _STR2WSTR(__FUNCTION__)
The library (which is compiled as a static library, if that matters) is consumed by another project in the same solution, a WPF app written in C#.
When I compile the lib, I get this error:
identifier "L__FUNCTION__" is undefined.
According to the docs, the macro isn't expanded if /P or /EP are passed to the compiler. I have verified that they are not. Are there other conditions where this macro is unavailable?
You list the error as this:
identifier "L__FUNCTION__" is undefined.
Note it's saying "L__FUNCTION__" is not defined, not "__FUNCTION__".
Don't use __FUNCTIONW__ in your code. MS didn't document that in the page you linked, they documented __FUNCTION__. And you don't need to widen __FUNCTION__.
ETA: I also note that you're not assigning that string to anything or printing it in anyway in f().
Just use
L(__FUNCTION__ L" : A diagnostic message");
When adjacent string literals get combined, the result will be a wide string if any of the components were.
There's nothing immediately wrong with using L as the name of a function... it's rather meaningless however. Good variable and function identifiers should be descriptive in order to help the reader understand the code. But the compiler doesn't care.
Since your L function wraps vsprintf, you may also use:
L(L"%hs : A diagnostic message", __func__);
since __func__ is standardized as a narrow string, the %hs format specifier is appropriate.
The rule is found in 2.14.5p13:
In translation phase 6 (2.2), adjacent string literals are concatenated. If both string literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix. If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand. If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed. Any other concatenations are conditionally-supported with implementation-defined behavior.
I think the definition of __FUNCTIONW__ is incorrect. (I know you did not write it.)
From: http://gcc.gnu.org/onlinedocs/gcc/Function-Names.html
These identifiers are not preprocessor macros. In GCC 3.3 and earlier,
in C only, __FUNCTION__ and __PRETTY_FUNCTION__ were treated as string
literals; they could be used to initialize char arrays, and they could
be concatenated with other string literals. GCC 3.4 and later treat
them as variables, like __func__. In C++, __FUNCTION__ and
__PRETTY_FUNCTION__ have always been variables.
At least in current GCC then you cannot prepend L to __FUNCTION__, because it is like trying to prepend L to a variable. There probably was a version of VC++ (like there was of GCC) where this would have worked, but you are not using that version.
This is one usage I found in a open source software.And I don't understant how it works.
when I ouput it to the stdout,it was "version 0.8.0".
const char version[] = " version " "0" "." "8" "." "0";
It's called string concatenation -- when you put two (or more) quoted strings next to each other in the source code with nothing between them, the compiler puts them together into a single string. This is most often used for long strings -- anything more than one line long:
char whatever[] = "this is the first line of the string\n"
"this is the second line of the string\n"
"This is the third line of the string";
Before string concatenation was invented, you had to do that with a rather clumsy line continuation, putting a backslash at the end of each line (and making sure it was the end, because most compilers wouldn't treat it as line continuation if there was any whitespace after the backslash). There was also ugliness with it throwing off indentation, because any whitespace at the beginning of subsequent lines might be included in the string.
This can cause a minor problem if you intended to put a comma between the strings, such as when initializing an array of pointers to char. If you miss a comma, the compiler won't warn you about it -- you'll just get one string that includes what was intended to be two separate ones.
This is a basic feature of both C89 and C++98 called 'adjacent string concatenation' or thereabouts.
Basically, if two string literals are adjacent to each other with no punctuation in between, they are merged into a single string, as your output shows.
In the C++98 standard, section §2.1 'Phases of translation [lex.phases]' says:
6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
This is after the preprocessor has completed.
In the C99 standard, the corresponding section is §5.1.2.1 'Translation Phases' and it says:
6 Adjacent string literal tokens are concatenated.
The wording would be very similar in any other C or C++ standard you can lay hands on (and I do recognize that both C++98 and C99 are superseded by C++11 and C11; I just don't have electronic copies of the final standards, yet).
Part of the C++ standard implementation states that string literals that are beside each other will be concatenated together.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character
sequences specified by any sequence of adjacent character and
identically-prefixed string literal tokens are concatenated into a
single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string
literals are concatenated."
const char version[] = " version " "0" "." "8" "." "0";
is same as:
const char version[] = " version 0.8.0";
Compiler concatenates the adjacent pieces of string-literals, making one bigger piece of string-literal.
As a sidenote, const char* (which is in your title) is not same as char char[] (which is in your posted code).
The compiler automatically concatenates string literals written after each other (separated by white-space only).. It is as if you have written
const char version[] = "version 0.8.0";
EDIT: corrected pre-processor to compiler
Adjacent string literals are concatenated:
When specifying string literals, adjacent strings are concatenated.
Therefore, this declaration:
char szStr[] = "12" "34"; is identical to this declaration:
char szStr[] = "1234"; This concatenation of adjacent strings makes it
easy to specify long strings across multiple lines:
cout << "Four score and seven years "
"ago, our forefathers brought forth "
"upon this continent a new nation.";
Simply putting strings one after the other concatenates them at compile time, so:
"Hello" ", " "World!" => "Hello, World!"
This is a strange usage of the feature, usually it is to allow #define strings to be used:
#define FOO "World!"
puts("Hello, " FOO);
Will compile to the same as:
puts("Hello, World!");
The preprocessor can be used to replace certain keywords with other words using #define. For example I could do #define name "George" and every time the preprocessor finds 'name' in the program it will replace it with "George".
However, this only seems to work with code. How could I do this with strings and text? For example if I print "Hello I am name" to the screen, I want 'name' to be replaced with "George" even though it is in a string and not code.
I do not want to manually search the string for keywords and then replace them, but instead want to use the preprocessor to just switch the words.
Is this possible? If so how?
I am using C++ but C solutions are also acceptable.
#define name "George"
printf("Hello I am " name "\n");
Adjacent string literals are concatenated in C and C++.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string literals are concatenated."
EDIT: as requested, add quotes from C and C++ Standard. Thanks to #MatteoItalia for the C++11 quote.
#define name "George"
printf("Hello I am %s\n", name);
Here name will be replaced by "George"
Your issue is that the preprocessor will (wisely) not replace tokens that are inside string literals.
So you must either use a function like printf or a variable rather than the preprocessor, or pull the token out of the string like so:
#include <iostream>
#define name "George"
int main(int argc, char** argv) {
std::cout << "Hello I am " << name << std::endl;
}