I'm (trying) to design a domain-specific language (I called it "Fahrenheit") for designing citation styles.
A program written in Fahrenheit:
MUST have exactly one citation block
MAY have zero or more macro blocks.
Here's a simplified yet valid example:
macro m1
"Hello World!"
end
macro m2
"Hello World!"
end
citation
"Hello World!"
end
This grammar will recognise the above code as syntactically correct:
style = macro* citation
(* example of macro definition
macro hw
"Hello World!"
end
*)
macro = <'macro'> #'[a-z0-9]+' statement+ end
citation = <'citation'> statement+ end
statement = #'".*?"'
<end> = <'end'>
However the ordering of "blocks" (e.g macro or citation) shouldn't matter.
Question: How should I change my grammar so that it recognises the following program as syntactically correct?
macro m1
"Hello World!"
end
citation
"Hello World!"
end
macro m2
"Hello World!"
end
PS: I'm intending to add other optional blocks which order is also irrelevant.
For the 0..n rules you can put them before or after the citation. E.g.
style = tools* citation tools*
tools = macro | foo | bar
...
Related
I came across some syntax in a C++ project I'm working with and do not know what to make of it. The compiler does not throw any errors in relation to this:
lua_pushstring(L,"swig_runtime_data_type_pointer" SWIG_RUNTIME_VERSION SWIG_TYPE_TABLE_NAME);
Notice the spaces between [what I assume to be] the function parameters.
The function definition for lua_pushstring is
LUA_API const char *(lua_pushstring) (lua_State *L, const char *s);
SWIG_RUNTIME_VERSION is a #define equal to "4"
SWIG_TYPE_TABLE_NAME is defined in the following block:
#ifdef SWIG_TYPE_TABLE
# define SWIG_QUOTE_STRING(x) #x
# define SWIG_EXPAND_AND_QUOTE_STRING(x) SWIG_QUOTE_STRING(x)
# define SWIG_TYPE_TABLE_NAME SWIG_EXPAND_AND_QUOTE_STRING(SWIG_TYPE_TABLE)
#else
# define SWIG_TYPE_TABLE_NAME
#endif
Can anyone explain what is going on here?
For further reference, the code is used in the swig project on GitHub: luarun.swg:353 and luarun.swg:364.
Static string concatination. "Hello " "World" is the same as "Hello World".
The constant strings are cat'ed together
The following code produces output equal to all three strings in the parameter list.
#include <iostream>
void f(const char* s) {
std::cerr << s << std::endl;
}
int main() {
f("sksksk" "jksjksj" "sjksjks");
}
C++ (and C) will automatically concatenate adjacent string literals. So
std::cout << "Hello " "World" << std::endl;
will output "Hello World". This only applies to literals though, not to variables:
std::string a = "Hello ", b = "World";
std::string c = a b //error, use a + b
You can use std::string's operator+ for that purpose (or strcat, but avoid that if you can).
This feature is mainly useful when we have a really long string literal that doesn't fit on one line:
process_string("The quick brown fox jumps over "
"the lazy dog");
It can also be useful with preprocessing directives, as in your example.
This question already has answers here:
Why allow concatenation of string literals?
(10 answers)
Closed 9 years ago.
#include <iostream>
#include <string>
int main() {
std::string str = "hello " "world" "!";
std::cout << str;
}
The following compiles, runs, and prints:
hello world!
see live
It seems as though the string literals are being concatenated together, but interestingly this can not be done with operator +:
#include <iostream>
#include <string>
int main() {
std::string str = "hello " + "world";
std::cout << str;
}
This will fail to compile.
see live
Why is this behavior in the language? My theory is that it is allows strings to be constructed with multiple #include statements because #include statements are required to be on their own line. Is this behavior simply possible due to the grammar of the language, or is it an exception that was added to help solve a problem?
Adjacent string literals are concatenated we can see this in the draft C++ standard section 2.2 Phases of translation paragraph 6 which says:
Adjacent string literal tokens are concatenated
In your other case, there is no operator+ defined to take two *const char**.
As to why, this comes from C and we can go to the Rationale for International Standard—Programming Languages—C and it says in section 6.4.5 String literals:
A string can be continued across multiple lines by using the backslash–newline line continuation, but this requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §6.10.3), the C89 Committee introduced string literal concatenation. Two string literals in a row are pasted together, with no null character in the middle, to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash–newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation.
without this feature you would have to do this to continue a string literal over multiple lines:
std::string str = "hello \
world\
!";
which is pretty ugly.
Like #erenon said, the compiler will merge multiple string literals into one, which is especially helpful if you want to use multiple lines like so:
cout << "This is a very long string-literal, "
"which for readability in the code "
"is divided over multiple lines.";
However, when you try to concatenate string-literals together using operator+, the compiler will complain because there is no operator+ defined for two char const *'s. The operator is defined for the string class (which is totally different from C-strings), so it is legal to do this:
string str = string("Hello ") + "world";
The compiler concatenates the string literals automatically into a single one.
When the compiler sees "hello " + "world"; is looking for a global + operator which takes two const char* ... And since by default there is none it fails.
The "hello " "world" "!" is resolved by the compiler as a single string. This allows you to have concatenated strings written over multiple lines .
In the first example, the consecutive string literals are concatenated by magic, before compilation has properly started. The compiler sees a single literal, as if you'd written "hello world!".
In the second example, once compilation has begun, the literals become static arrays. You can't apply + to two arrays.
Why is this behavior in the language?
This is a legacy of C, which comes from a time when memory was a precious resource. It allows you to do quite a lot of string manipulation without requiring dynamic memory allocation (as more modern idioms like std::string often do); the price for that is some rather quirky semantics.
The preprocessor can be used to replace certain keywords with other words using #define. For example I could do #define name "George" and every time the preprocessor finds 'name' in the program it will replace it with "George".
However, this only seems to work with code. How could I do this with strings and text? For example if I print "Hello I am name" to the screen, I want 'name' to be replaced with "George" even though it is in a string and not code.
I do not want to manually search the string for keywords and then replace them, but instead want to use the preprocessor to just switch the words.
Is this possible? If so how?
I am using C++ but C solutions are also acceptable.
#define name "George"
printf("Hello I am " name "\n");
Adjacent string literals are concatenated in C and C++.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string literals are concatenated."
EDIT: as requested, add quotes from C and C++ Standard. Thanks to #MatteoItalia for the C++11 quote.
#define name "George"
printf("Hello I am %s\n", name);
Here name will be replaced by "George"
Your issue is that the preprocessor will (wisely) not replace tokens that are inside string literals.
So you must either use a function like printf or a variable rather than the preprocessor, or pull the token out of the string like so:
#include <iostream>
#define name "George"
int main(int argc, char** argv) {
std::cout << "Hello I am " << name << std::endl;
}
is it possible to concatenate strings during preprocessing?
I found this example
#define H "Hello "
#define W "World!"
#define HW H W
printf(HW); // Prints "Hello World!"
However it does not work for me - prints out "Hello" when I use gcc -std=c99
UPD This example looks like working now. However, is it a normal feature of c preprocessor?
Concatenation of adjacent string litterals isn't a feature of the preprocessor, it is a feature of the core languages (both C and C++). You could write:
printf("Hello "
" world\n");
You can indeed concatenate tokens in the preprocessor, but be careful because it's tricky. The key is the ## operator. If you were to throw this at the top of your code:
#define myexample(x,y,z) int example_##x##_##y##_##z## = x##y##z
then basically, what this does, is that during preprocessing, it will take any call to that macro, such as the following:
myexample(1,2,3);
and it will literally turn into
int example_1_2_3 = 123;
This allows you a ton of flexibility while coding if you use it correctly, but it doesn't exactly apply how you are trying to use it. With a little massaging, you could get it to work though.
One possible solution for your example might be:
#define H "Hello "
#define W "World!"
#define concat_and_print(a, b) cout << a << b << endl
and then do something like
concat_and_print(H,W);
From gcc online docs:
The '##' preprocessing operator performs token pasting. When a macro is expanded, the two tokens on either side of each '##' operator are combined into a single token, which then replaces the '##' and the two original tokens in the macro expansion.
Consider a C program that interprets named commands. There probably needs to be a table of commands, perhaps an array of structures declared as follows:
struct command
{
char *name;
void (*function) (void);
};
struct command commands[] =
{
{ "quit", quit_command },
{ "help", help_command },
...
};
It would be cleaner not to have to give each command name twice, once in the string constant and once in the function name. A macro which takes the name of a command as an argument can make this unnecessary. The string constant can be created with stringification, and the function name by concatenating the argument with _command. Here is how it is done:
#define COMMAND(NAME) { #NAME, NAME ## _command }
struct command commands[] =
{
COMMAND (quit),
COMMAND (help),
...
};
I just thought I would add an answer that cites the source as to why this works.
The C99 standard §5.1.1.2 defines translation phases for C code. Subsection 6 states:
Adjacent string literal tokens are concatenated.
Similarly, in the C++ standards (ISO 14882) §2.1 defines the Phases of translation. Here Subsection 6 states:
6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
This is why you can concatenate strings simply by placing them adjacent to one another:
printf("string"" one\n");
>> ./a.out
>> string one
The preprocessing part of the question is simply the usage of the #define preprocessing directive which does the substitution from identifier (H) to string ("Hello ").
#define PR ( A, B ) cout << ( A ) << ( B ) << endl ;
- error -> A was not declared in scope
- error -> B was not declared in scope
- error -> expected "," before "cout"
I thought C++ was space free language but when I write above code, then I see some errors.
I am still thinking "Is my console is not working properly or library?".
If I am not wrong, how can someone say "C++ is a space free language"?
There are numerous exceptions where whitespace matters; this is one of them. With the space after PR, how is the preprocessor supposed to know whether (A,B) is part of the macro expansion, or its arguments? It doesn't, and simply assumes that wherever it sees PR, it should substitute ( A, B ) cout << ( A ) << ( B ) << endl ;.
Another place where whitespace matters is in nested template arguments, e.g.:
std::vector<std::vector<int> >
That final space is mandatory, otherwise the compiler assumes it's the >> operator. (Although I believe this is sorted out in C++0x).
Yet another example is:
a + +b;
The space in between the two + symbols is mandatory, for obvious reasons.
You can't have a space between the macro-function-name and the parenthesis starting the argument list.
#define PR(A, B) cout << ( A ) << ( B ) << endl
Whitespace in the form of the newline also matters, because a #define statement ends when the preprocessor hits the newline.
Note that its usually a bad idea to put semicolons at the end of macro function definitions, it makes them look confusing when used without a semicolon below.
A #define is not c++, it's preprocessor. The rules of c++ aren't the same as the rules of the preprocessor.
To indicate a macro, you mustn't have a space between the name and the parenthesis.
#define PR(A, B) cout << ( A ) << ( B ) << endl;
You're asking for defense of a claim I've never heard anyone bother to voice...?
The preprocessor stage doesn't follow the same rules as the later lexing etc. stages. There are other quirks: the need for a space between > closing templates, newline-delimited comments, string literals can't embed actual newlines (as distinct from escape sequences for them), space inside character and string literals affects them....
Still, there's a lot of freedom to indent and line-delimit the code in different ways, unlike in say Python.
You can think of the c++ preprocessor as instruction to the preprocessor (part of the compiler) and not exactly a part of the "c++ space".. So the rules are indeed different although many references are shared between the two 'spaces'..