Using #define to a string in C++ - c++

While trying to upload an image with cURL, I am confused because of the code below.
#define UPLOAD_FILE_AS "testImage.jpg"
static const char buf_1 [] = "RNFR " UPLOAD_FILE_AS;
What I want to understand is,
exact type of defined UPLOAD_FILE_AS : array of char / string / or something else?
exact operation performed in second line : After second line, buf_1 becomes "RNFR testImage.jpg". But second line only have a space between "RNFR" and UPLOAD_FILE_AS. I've never heard a space can replace "+" operator or merging function. How is this possible?

The definition of the macro is preprocessor-style, it is just the sequence of characters, which happen to be within "". There is no type. The macro is expanded before the compiler (with its notion of types) starts the actual compilation.
C++ will always concatenate all character-sequences-within-"". "A" "B" will always be handled as "AB" during building. There is not operator, no implicit operator either. This is often used to have very long string literals, spanning several line in code.

define's do not create a variable, so there is no concept of type. What happens is, before the source file is compiled, a preprocessor is run. It literally replaces all the instances of your macro, namely UPLOAD_FILE_AS with its value ("testImage.jpg").
In other words, after the preprocessor stage, your code looks like this:
static const char buf_1 [] = "RNFR " "testImage.jpg";
And as C++ strings expand automatically, both of these strings become one: "RNFR testImage.jpg". You can find a better explanation here: link, mainly:
String literals placed side-by-side are concatenated at translation phase 6 (after the preprocessor). That is, "Hello," " world!" yields the (single) string "Hello, world!". If the two strings have the same encoding prefix (or neither has one), the resulting string will have the same encoding prefix (or no prefix).

exact type of defined UPLOAD_FILE_AS
There is no type. It is not a variable. It is a macro. Macros exist entirely outside of the type system.
The pre-processor replaces all instances of the macro with its definition. See the next paragraph for an example.
exact operation performed in second line
The operation is macro replacement. After the file is pre-processed, the second line becomes:
static const char buf_1 [] = "RNFR " "testImage.jpg";

Related

#define/macro to return a function pointer to a function given in the same file

Say I have code
void 1funct() {
(...)
}
void 2funct() {
(...)
}
etc., to
void nfunct() {
(...)
}
Is it possible to return the function pointer to the correct function given an n by
#define RET_FUNC_POINTER(n) (&nfunct); ?
Yes — macros can concatenate source code characters into a single token using ##:
#define RET_FUNC_POINTER(n) (&n ## funct);
(Tokens are the invisible "building blocks" of your source code, discrete units of text that the parser generates from your code while trying to understand your program. They are roughly analogous to "words" in English, but they do not need to be separated by spaces in C++ unless omitting a space would just produce a single token: e.g. int main vs intmain, but int * vs int*. With ## we can take the two would-be tokens int and main, and use the preprocessor to force them into intmain instead. Just, one of your arguments is a "variable" to the macro. Notice that you don't need to join & and the n ## funct part, as the & is already a separate token, and should remain that way.)
However, you may wish to consider a nice array of pointers instead.
If the n is known statically at compile-time (which it must be for your macro to work) then you don't really gain anything over just writing &1funct, &2funct etc other than obfuscating your code (which isn't a gain).
Also note that your function names cannot start with digits; you'll have to choose a different naming scheme.
Not sure why you want to do this (instead of just using a const array), but it's quite easy to do with a macro:
#define RET_FUNC_POINTER(n) (n##funct)

Can you use C/C++ preprocessor tokens in multiline string literals

Extending on this question and this question , is it possible to have a multiline string literal using either the preprocessor method shown or C++ multiline string literals that contain the values of a preprocessor symbol. For example:
#define SOME_CONSTANT 64
#define QUOTE(...) #__VA_ARGS__
const char * aString = QUOTE(
{
"key":"fred",
"value":"SOME_CONSTANT"
}
);
Ideally I want "SOME_CONSTANT" to be replaced with "64".
I have tried using all the tricks in my limited skill set including stringizing and have had no luck.
Any ideas?
You have two problems. The first is that preprocessor tokens inside quotes (i.e. string literals) aren't substituted. The second is that you must defer the actual stringification until all preprocessing tokens have been replaced. The stringification must be the very last macro that the preprocessor deals with.
Token substitution happens iterativly. The preprocessor deals with the substitution, and then goes back to see if there is anything left to substitute in the sequence it just replaced. We need to use it to our advantage. If we have an hypothetical TO_STRING macro, we need the very next iteration to substitute all preprocessing tokens, and only the one after that to produce a call to the "real" stringification. Fortunately, it's fairly simple to write:
#define TO_STRING(...) DEFER(TO_STRING_)(__VA_ARGS__)
#define DEFER(x) x
#define TO_STRING_(...) #__VA_ARGS__
#define SOME_CONSTANT 64
#define QUOTE(...) TO_STRING(__VA_ARGS__)
const char * aString = QUOTE({
"key":"fred",
"value": TO_STRING(SOME_CONSTANT)
});
Live example
We need the DEFER macro because the preprocessor won't substitute inside something that it recognizes as an argument to another macro. The trick here, is that the x in DEFER(TO_STRING_)(x) is not an argument to a macro. So it's substituted in the same go as DEFER(TO_STRING_). And what we get as a result is TO_STRING_(substituted_x). That becomes a macro invocation in the next iteration. So the preprocessor will perform the substitution dictated by TO_STRING_, on the previously substituted x.

L_ macro in glibc source code

I was reading through the source code of glibc and I found that it has two macros which have the same name
This one is on the line 105
#define L_(Str) L##Str
and this on the line 130
#define L_(Str) Str
What do these macros really mean ? The usage is only for comparing two characters
For example on line 494, you could see it is used for comparing character values between *f and '$'
if(*f == L_('$')). If we wanted to compare the two characters, we could have compared them directly, instead of directing them through a macro ? Also, what is the use case for the macro on line 105 ?
It prepends macro argument with L prefix (wchar_t literal - it uses as large datatype as is needed to represent every possible character code point instead of normal 8 bit in char type) if you're compiling wscanf version of function (line 105). Otherwise it just passes argument as it is (line 130).
## is string concatenation operator in c preprocessor, L##'$' will expand to L'$' eventually.
To sum up: it is used to compile two, mutually exclusive versions of vscanf function - one operating on wchar_t, one on char.
Check out this answer: What exactly is the L prefix in C++?
Let's read the code. (I have no idea what it does, but I can read code)
First, why are there two defines as you point out? One of them is used when COMPILE_WSCANF is defined, the other is used otherwise. What is COMPILE_WSCANF? If we look further down the file, we can see that different functions are defined. When COMPILE_WSCANF is defined, the function we end up with (through various macros) is vfwscanf otherwise we get vfscanf. This is a pretty good indication that this file might be used to compile two different functions one for normal characters, one for wide characters. Most likely, the build system compiles the file twice with different defines. This is done so that we don't have to write the same file twice since both the normal and wide character functions will be pretty similar.
I'm pretty sure that means that this macro has something to do with wide characters. If we look at how it's used, it is used to wrap character constants in comparisons and such. When 'x' is a normal character constant, L'x' is a wide character constant (wchar_t type) representing the same character.
So the macro is used to wrap character constants inside the code so that we don't have to have #ifdef COMPILE_WSCANF.

Are repeated constant c_strings duplicated?

So lets say for instance in my program I pass a string to a method.
someMethod("hello World");
On compilation, i'm assuming the literal, "Hello world" recognized as constant without directly declaring it so.
If it does recognize it as constant, does it store duplicates as the same address?
More specifically c++11?
So, lets have a case scenario, Lets say I populate a map with a Object to String List.
map<std::string,Shader> list;
list["shaders/sprite.vs"] = Shader("shaders/sprite.vs");
... (Sometime later in another file)
//Some call that needs a shader, that I have stored in a map.
SomeGLFunction("shaders/sprite.vs");
Excuse the obvious need to use a variable to hold it.
Without out the compiler option of "/GF" to enable string pooling, The compiler will commonly take all three literals and store them separately?
From the C++ Standard (2.13.5 String literals)
16 Evaluating a string-literal results in a string literal object with
static storage duration, initialized from the given characters as
specified above. Whether all string literals are distinct (that is,
are stored in nonoverlapping objects) and whether successive
evaluations of a string-literal yield the same or a different object
is unspecified
So it is implementation defined whether the same string literals are distinct objects or not. Usually it depends on compiler options.
If you have for example such a call like this
someMethod("hello World");
in a loop then there is used only one string literal. So the function will get the same address of the first character of the string literal in each iteration of the loop.
However if you will write
if ( "hello World" == "hello World" )
{
//...
}
then the condition can yield either true or false depending on the corresponding compiler option.
Maybe. A compiler should do that. It doesn't have to.

Implementation of string literal concatenation in C and C++

AFAIK, this question applies equally to C and C++
Step 6 of the "translation phases" specified in the C standard (5.1.1.2 in the draft C99 standard) states that adjacent string literals have to be concatenated into a single literal. I.e.
printf("helloworld.c" ": %d: Hello "
"world\n", 10);
Is equivalent (syntactically) to:
printf("helloworld.c: %d: Hello world\n", 10);
However, the standard doesn't seem to specify which part of the compiler has to handle this - should it be the preprocessor (cpp) or the compiler itself. Some online research tells me that this function is generally expected to be performed by the preprocessor (source #1, source #2, and there are more), which makes sense.
However, running cpp in Linux shows that cpp doesn't do it:
eliben#eliben-desktop:~/test$ cat cpptest.c
int a = 5;
"string 1" "string 2"
"string 3"
eliben#eliben-desktop:~/test$ cpp cpptest.c
# 1 "cpptest.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "cpptest.c"
int a = 5;
"string 1" "string 2"
"string 3"
So, my question is: where should this feature of the language be handled, in the preprocessor or the compiler itself?
Perhaps there's no single good answer. Heuristic answers based on experience, known compilers, and general good engineering practice will be appreciated.
P.S. If you're wondering why I care about this... I'm trying to figure out whether my Python based C parser should handle string literal concatenation (which it doesn't do, at the moment), or leave it to cpp which it assumes runs before it.
The standard doesn't specify a preprocessor vs. a compiler, it just specifies the phases of translation you already noted. Traditionally, phases 1 through 4 were in the preprocessor, Phases 5 though 7 in the compiler, and phase 8 the linker -- but none of that is required by the standard.
Unless the preprocessor is specified to handle this, it's safe to assume it's the compiler's job.
Edit:
Your "I.e." link at the beginning of the post answers the question:
Adjacent string literals are concatenated at compile time; this allows long strings to be split over multiple lines, and also allows string literals resulting from C preprocessor defines and macros to be appended to strings at compile time...
In the ANSI C standard, this detail is covered in section 5.1.1.2, item (6):
5.1.1.2 Translation phases
...
4. Preprocessing directives are executed and macro invocations are expanded. ...
5. Each source character set member and escape sequence in character constants and string literals is converted to a member of the execution character set.
6. Adjacent character string literal tokens are concatenated and adjacent wide string literal tokens are concatenated.
The standard does not define that the implementation must use a pre-processor and compiler, per se.
Step 4 is clearly a preprocessor responsibility.
Step 5 requires that the "execution character set" be known. This information is also required by the compiler. It is easier to port the compiler to a new platform if the preprocessor does not contain platform dependendencies, so the tendency is to implement step 5, and thus step 6, in the compiler.
I would handle it in the scanning token part of the parser, so in the compiler. It seems more logical. The preprocessor has not to know the "structure" of the language, and in fact it ignores it usually so that macros can generate uncompilable code. It handles nothing more than what it is entitled to handle by directives that are specifically addressed to it (# ...), and the "consequences" of them (like those of a #define x h, which would make the preprocessor change a lot of x into h)
There are tricky rules for how string literal concatenation interacts with escape sequences.
Suppose you have
const char x1[] = "a\15" "4";
const char y1[] = "a\154";
const char x2[] = "a\r4";
const char y2[] = "al";
then x1 and x2 must wind up equal according to strcmp, and the same for y1 and y2. (This is what Heath is getting at in quoting the translation steps - escape conversion happens before string constant concatenation.) There's also a requirement that if any of the string constants in a concatenation group has an L or U prefix, you get a wide or Unicode string. Put it all together and it winds up being significantly more convenient to do this work as part of the "compiler" rather than the "preprocessor."