How to stringize string with trailing backslash - c++

When I build my C++ project the compiler generates this equivalent macro:
#define SOLUTION_DIR "c:\dev\my_project\"
In a normally #defined macro the trailing escaped double quotes would trigger compiler errors due to the unterminated string but compiler can do whatever it wants and makes this available to the code literally even if the string is invalid.
The usual way to expand macro values to C strings:
#define STRINGIZE( x ) #x
#define EXPAND( x ) STRINGIZE( x )
doesn't work in this case due to the unterminated string passed as argument.
std::string s = EXPAND( SOLUTION_DIR );
...
error: newline in constant
Is there a way to extract the string value of this macro and use it in my code equivalent to:
std::string str = R"(c:\dev\my_project\)";
where R is raw character prefix described here https://en.cppreference.com/w/cpp/language/string_literal
Notes:
I tried re-writing these macros using the R prefix to avoid escaping
the final quote mark but couldn't get to a functional version.
I can tell the compiler to define SOLUTION_DIR string without the
surrounding quotes, but I can't avoid the trailing backslash. In
this case however I get other warnings and errors due to the unknown
escape sequences (\d) and the fact that the trailing
backslash is taken to indicate that the macro is continuing on
the next line.
Update:
Here's the context for those who think something is broken and needs to be fixed.
I use Visual Studio 2019 (VS). In the project properties "C++/Preprocessor/Preprocessor Definitions" one can define various macros in the format:
NAME1=VALUE1;NAME2=VALUE2;...
which are then made available at compile time as
#define NAME1 VALUE1
#define NAME2 VALUE2
VS generates a number of predefined macros (not C++ but build environment macros) for various directories and other values (debug/release, 32 or 64 bit etc). They take the form $(Name) and are set to some string value such as:
$(Configuration) Debug
$(SolutionDir) C:\dev\some_project\
They are used to create location independent project settings such as the temp or binary output directories, or set the correct environment for whatever version of the project is being built (for instance Debug/x64).
In my case I need to get a hold of the current solution path directly in my code, and using the $(SolutionDir) VS macro seemed the easiest way to do it.
So here's how I defined my SOLUTION_PATH macro in "Properties/Preprocessor/Preprocessor Definitions":
SOLUTION_DIR="$(SolutionDir)
which translates into the compile time macro described initially:
#define SOLUTION_DIR "c:\dev\my_project\"
However, by default many macros that expand to paths, including $(SolutionDir), contain a trailing backslash which can't be removed hence the "broken" macro above.
Generally an executable binary doesn't need to and should not know anything about its build directories, so the path related macros are not necessarily designed to be used to define C++ macros, and the trailing backslash is not an issue. But my project needs that information because it itself triggers other build actions that depend on the current environment.
So this is not a malfunction of any of the components, everything works as designed, it just happens that for my specific project it would be very useful to be able to do things this way, even if it's non-standard.

I was able to make this work by adding a trailing ".":
SOLUTION_DIR="$(SolutionDir)."
which results in the equivalent:
#define SOLUTION_DIR "C:\dev\my_project\."
which points to the same directory and now compiles with no errors.

Related

L_ macro in glibc source code

I was reading through the source code of glibc and I found that it has two macros which have the same name
This one is on the line 105
#define L_(Str) L##Str
and this on the line 130
#define L_(Str) Str
What do these macros really mean ? The usage is only for comparing two characters
For example on line 494, you could see it is used for comparing character values between *f and '$'
if(*f == L_('$')). If we wanted to compare the two characters, we could have compared them directly, instead of directing them through a macro ? Also, what is the use case for the macro on line 105 ?
It prepends macro argument with L prefix (wchar_t literal - it uses as large datatype as is needed to represent every possible character code point instead of normal 8 bit in char type) if you're compiling wscanf version of function (line 105). Otherwise it just passes argument as it is (line 130).
## is string concatenation operator in c preprocessor, L##'$' will expand to L'$' eventually.
To sum up: it is used to compile two, mutually exclusive versions of vscanf function - one operating on wchar_t, one on char.
Check out this answer: What exactly is the L prefix in C++?
Let's read the code. (I have no idea what it does, but I can read code)
First, why are there two defines as you point out? One of them is used when COMPILE_WSCANF is defined, the other is used otherwise. What is COMPILE_WSCANF? If we look further down the file, we can see that different functions are defined. When COMPILE_WSCANF is defined, the function we end up with (through various macros) is vfwscanf otherwise we get vfscanf. This is a pretty good indication that this file might be used to compile two different functions one for normal characters, one for wide characters. Most likely, the build system compiles the file twice with different defines. This is done so that we don't have to write the same file twice since both the normal and wide character functions will be pretty similar.
I'm pretty sure that means that this macro has something to do with wide characters. If we look at how it's used, it is used to wrap character constants in comparisons and such. When 'x' is a normal character constant, L'x' is a wide character constant (wchar_t type) representing the same character.
So the macro is used to wrap character constants inside the code so that we don't have to have #ifdef COMPILE_WSCANF.

define string at compiler options

Using Tornado 2.2.1 GNU
at C/C++ compiler options I'm trying to define string as follow:
-DHELLO="Hello" and it doesn't work (it also failed for -DHELLO=\"Hello\" and for -DHELLO=\\"Hello\\" which works in other platforms)
define value -DVALUE=12 works without issue.
does anybody know to proper way to define string in Tornado?
The problem with such a macro is, that it normally isn't a string (in the C/C++ sense), just a preprocessor symbol. With numbers it works indeed, because preprocessor number can be used in C/C++ as is, but with string symbols, if you want to convert them to C/C++ strings (besides adding the escaped quotes) you need to "stringize" them.
So, this should work (without extra escaped quotes):
#define _STRINGIZE(x) #x
#define STRINGIZE(x) _STRINGIZE(x)
string s = STRINGIZE(HELLO)
(note the double expansion to get the value of the macro stringized, i.e. "Hello", instead of the macro name itself, i.e. "HELLO")

Trying to understand the C preprocessor

Why do these blocks of code yield different results?
Some common code:
#define PART1PART2 works
#define STRINGAFY0(s) #s
#define STRINGAFY1(s) STRINGAFY0(s)
case 1:
#define GLUE(a,b,c) a##b##c
STRINGAFY1(GLUE(PART1,PART2,*))
//yields
"PART1PART2*"
case 2:
#define GLUE(a,b) a##b##*
STRINGAFY1(GLUE(PART1,PART2))
//yields
"works*"
case 3:
#define GLUE(a,b) a##b
STRINGAFY1(GLUE(PART1,PART2*))
//yields
"PART1PART2*"
I am using MSVC++ from VS.net 2005 sp1
Edit:
it is currently my belief that the preprocessor works like this when expanding macros:
Step 1:
- take the body
- remove any whitespace around ## operators
- parse the string, in the case that an identifier is found that matches the name of a parameter:
-if it is next to a ## operator, replace the identifier with the literal value of the parameter (i.e. the string passed in)
-if it is NOT next to a ## operator, run this whole explanation process on the value of the parameter first, then replace the identifier with that result.
(ignoring the stringafy single '#' case atm)
-remove all ## operators
Step 2:
- take that resultant string and parse it for any macros
now, from that I believe that all 3 cases should produce the exact same resultant string:
PART1PART2*
and hence after step 2, should result in
works*
but at very least should result in the same thing.
cases 1 and 2 have no defined behavior since your are tempting to paste a * into one preprocessor token. According to the association rules of your preprocessor this either tries to glue together the tokens PART1PART2 (or just PART2) and *. In your case this probably fails silently, which is one of the possible outcomes when things are undefined. The token PART1PART2 followed by * will then not be considered for macro expansion again. Stringfication then produces the result you see.
My gcc behaves differently on your examples:
/usr/bin/gcc -O0 -g -std=c89 -pedantic -E test-prepro.c
test-prepro.c:16:1: error: pasting "PART1PART2" and "*" does not give a valid preprocessing token
"works*"
So to summarize your case 1 has two problems.
Pasting two tokens that don't result
in a valid preprocessor token.
evaluation order of the ## operator
In case 3, your compiler is giving the wrong result. It should
evaluate the arguments to
STRINGAFY1
to do that it has to expand GLUE
GLUE results in PART1PART2*
which must be expanded again
the result is works*
which then is passed to
STRINGAFY1
It's doing exactly what you are telling it to do. The first and second take the symbol names passed in and paste them together into a new symbol. The third takes 2 symbols and pastes them, then you are placing the * in the string yourself (which will eventually evaluate into something else.)
What exactly is the question with the results? What did you expect to get? It all seems to be working as I would expect it to.
Then of course is the question of why are you playing with the dark arts of symbol munging like this anyways? :)

Is there a way to 'expand' the #define directive?

I have a lot of "stupid" #define in a project and I want to remove them. Unfortunately, I can't do a simple search and replace, since the #define is parameterized. For example:
#define FHEADGRP( x ) bool _process_grp##x( grp_id_t , unsigned char )
This is used to generate headers of a couple of functions. I would like to somehow do the same thing as the preprocessor does - replace each call of the macro by its result (with correct parameters inserted. I hope you understand what I want to do.
I found out that with Visual Studio, one can get the preprocessed intermediate files with the /P option. Unfortunately, this does not help me, since the file is "polluted" with thousands of other lines and with all #defines expanded. I do not want to do this, I just want to expand some of the macros and preferably do it in my IDE (which is Visual Studio). Is there any way how to achieve this?
You can normally get the output of the preprocessor with gcc -E (assuming you're using gcc of course, though other compiler tend to have the same feature).
Of course, processing that file to automatically expand the #define's into other text is not a trivial task. I'd probably write a shell script (or Perl since it's a lot better at massaging text in my opinion) to automate the task.
In Visual Studio, you can use /P to perform the same operation. This can be set in the IDE according to this page.
Yes, there is - since you're using Visual Studio.
The Visual Studio IDE has a powerful search & replace mechanism. You seem to assume it can only handle literal strings. It can do more. Hit Ctrl-Shift-H for a global search and replace. In the "Find options", select "Use: Wildcards".
Now replace FHEADGRP(*) by bool _process_grp\1( grp_id_t , unsigned char )
The wildcard is *, and \1 is the backreference.
[edit]
Macros work on the tokenized source, but Search&Replace works on characters. This can cause a slight problem. Consider the cases FHEADGRP(Foo) and FHEADGRP( Foo ). For a C macro, they're equivalent, but in the second case the backreference will expand to Foo - with spaces.
The workaround is to use regexes, in particular replace FHEADGRP\(:b*(.*):b*\) with bool _process_grp\0( grp_id_t , unsigned char ). I find that the VS2005 implementation is a bit buggy; for instance the simple ? expression fails to match a single space. But the example above should work.
Uh I would advise you to use sed, http://www.gnu.org/software/sed/, or another regex tool.

Should I use _T or _TEXT on C++ string literals?

For example:
// This will become either SomeMethodA or SomeMethodW,
// depending on whether _UNICODE is defined.
SomeMethod( _T( "My String Literal" ) );
// Becomes either AnotherMethodA or AnotherMethodW.
AnotherMethod( _TEXT( "My Text" ) );
I've seen both. _T seems to be for brevity and _TEXT for clarity. Is this merely a subjective programmer preference or is it more technical than that? For instance, if I use one over the other, will my code not compile against a particular system or some older version of a header file?
A simple grep of the SDK shows us that the answer is that it doesn't matter—they are the same. They both turn into __T(x).
C:\...\Visual Studio 8\VC>findstr /spin /c:"#define _T(" *.h
crt\src\tchar.h:2439:#define _T(x) __T(x)
include\tchar.h:2390:#define _T(x) __T(x)
C:\...\Visual Studio 8\VC>findstr /spin /c:"#define _TEXT(" *.h
crt\src\tchar.h:2440:#define _TEXT(x) __T(x)
include\tchar.h:2391:#define _TEXT(x) __T(x)
And for completeness:
C:\...\Visual Studio 8\VC>findstr /spin /c:"#define __T(" *.h
crt\src\tchar.h:210:#define __T(x) L ## x
crt\src\tchar.h:889:#define __T(x) x
include\tchar.h:210:#define __T(x) L ## x
include\tchar.h:858:#define __T(x) x
However, technically, for C++ you should be using TEXT() instead of _TEXT(), but it (eventually) expands to the same thing too.
Commit to Unicode and just use L"My String Literal".
From Raymond Chen:
TEXT vs. _TEXT vs. _T, and UNICODE vs. _UNICODE
The plain versions without the
underscore affect the character set
the Windows header files treat as
default. So if you define UNICODE,
then GetWindowText will map to
GetWindowTextW instead of
GetWindowTextA, for example.
Similarly, the TEXT macro will map to
L"..." instead of "...".
The versions with the underscore
affect the character set the C runtime
header files treat as default. So if
you define _UNICODE, then _tcslen will
map to wcslen instead of strlen, for
example. Similarly, the _TEXT macro
will map to L"..." instead of "...".
What about _T? Okay, I don't know
about that one. Maybe it was just to
save somebody some typing.
Short version: _T() is a lazy man's _TEXT()
Note: You need to be aware of what code-page your source code text editor is using when you write:
_TEXT("Some string containing Çontaining");
TEXT("€xtended characters.");
The bytes the compiler sees depends on the code page of your editor.
Here's an interesting read from a well-known and respected source.
Similarly, the _TEXT macro will map to L"..." instead of "...".
What about _T? Okay, I don't know about that one. Maybe it was just to save somebody some typing.
These macros are a hold over from the days when an application might have actually wanted to compile both a unicode and ANSI version.
There is no reason to do this today - this is all vestigial. Microsoft is stuck with supporting every possible configuration forever, but you aren't. If you are not compiling to both ANSI and Unicode (and no one is, let's be honest) just go to with L"text".
And yes, in case it wasn't clear by now: _T == _TEXT
I've never seen anyone use _TEXT() instead of _T().
Neither. In my experience there are two basic types of string literals, those that are invariant, and those that need to be translated when your code is localized.
It's important to distinguish between the two as you write the code so you don't have to come back and figure out which is which later.
So I use _UT() for untranslatable strings, and ZZT() (or something else that is easy to search on) for strings that will need to be translated. Instances of _T() or _TEXT() in the code are evidence of string literals that have not yet be correctly categorized.
_UT and ZZT are both #defined to _TEXT
Use neither, and also please don't use the L"..." crap.
Use UTF-8 for all strings, and convert them just before passing to microsoft APIs.