Is c++ a space free language? - c++

#define PR ( A, B ) cout << ( A ) << ( B ) << endl ;
- error -> A was not declared in scope
- error -> B was not declared in scope
- error -> expected "," before "cout"
I thought C++ was space free language but when I write above code, then I see some errors.
I am still thinking "Is my console is not working properly or library?".
If I am not wrong, how can someone say "C++ is a space free language"?

There are numerous exceptions where whitespace matters; this is one of them. With the space after PR, how is the preprocessor supposed to know whether (A,B) is part of the macro expansion, or its arguments? It doesn't, and simply assumes that wherever it sees PR, it should substitute ( A, B ) cout << ( A ) << ( B ) << endl ;.
Another place where whitespace matters is in nested template arguments, e.g.:
std::vector<std::vector<int> >
That final space is mandatory, otherwise the compiler assumes it's the >> operator. (Although I believe this is sorted out in C++0x).
Yet another example is:
a + +b;
The space in between the two + symbols is mandatory, for obvious reasons.

You can't have a space between the macro-function-name and the parenthesis starting the argument list.
#define PR(A, B) cout << ( A ) << ( B ) << endl
Whitespace in the form of the newline also matters, because a #define statement ends when the preprocessor hits the newline.
Note that its usually a bad idea to put semicolons at the end of macro function definitions, it makes them look confusing when used without a semicolon below.

A #define is not c++, it's preprocessor. The rules of c++ aren't the same as the rules of the preprocessor.
To indicate a macro, you mustn't have a space between the name and the parenthesis.
#define PR(A, B) cout << ( A ) << ( B ) << endl;

You're asking for defense of a claim I've never heard anyone bother to voice...?
The preprocessor stage doesn't follow the same rules as the later lexing etc. stages. There are other quirks: the need for a space between > closing templates, newline-delimited comments, string literals can't embed actual newlines (as distinct from escape sequences for them), space inside character and string literals affects them....
Still, there's a lot of freedom to indent and line-delimit the code in different ways, unlike in say Python.

You can think of the c++ preprocessor as instruction to the preprocessor (part of the compiler) and not exactly a part of the "c++ space".. So the rules are indeed different although many references are shared between the two 'spaces'..

Related

Sign & Unsigned Char is not working in C++

In C++ Primer 5th Edition I saw this
when I tried to use it---
At this time it didn't work, but the program's output did give a weird symbol, but signed is totally blank And also they give some warnings when I tried to compile it. But C++ primer and so many webs said it should work... So I don't think they give the wrong information did I do something wrong?
I am newbie btw :)
But C++ primer ... said it should work
No it doesn't. The quote from C++ primer doesn't use std::cout at all. The output that you see doesn't contradict with what the book says.
So I don't think they give the wrong information
No1.
did I do something wrong?
It seems that you've possibly misunderstood what the value of a character means, or possibly misunderstood how character streams work.
Character types are integer types (but not all integer types are character types). The values of unsigned char are 0..255 (on systems where size of byte is 8 bits). Each2 of those values represent some textual symbol. The mapping from a set of values to a set of symbols is called a "character set" or "character encoding".
std::cout is a character stream. << is stream insertion operator. When you insert a character into a stream, the behaviour is not to show the numerical value. Instead, the behaviour to show the symbol that the value is mapped to3 in the character set that your system uses. In this case, it appears that the value 255 is mapped to whatever strange symbol you saw on the screen.
If you wish to print the numerical value of a character, what you can do is convert to a non-character integer type and insert that to the character stream:
int i = c;
std::cout << i;
1 At least, there's no wrong information regarding your confusion. The quote is a bit inaccurate and outdated in case of c2. Before C++20, the value was "implementation defined" rather than "undefined". Since C++20, the value is actually defined, and the value is 0 which is the null terminator character that signifies end of a string. If you try to print this character, you'll see no output.
2 This was bit of a lie for simplicity's sake. Some characters are not visible symbols. For example, there is the null terminator charter as well as other control characters. The situation becomes even more complex in the case of variable width encodings such as the ubiquitous Unicode, where symbols may consist of a sequence of several char. In such encoding, and individual char cannot necessarily be interpreted correctly without other char that are part of such sequence.
3 And this behaviour should feel natural once you grok the purpose of character types. Consider following program:
unsigned char c = 'a';
std::cout << c;
It would be highly confusing if the output would be a number that is the value of the character (such as 97 which may be the value of the symbol 'a' on the system) rather than the symbol 'a'.
For extra meditation, think about what this program might print (and feel free to try it out):
char c = 57;
std::cout << c << '\n';
int i = c;
std::cout << i << '\n';
c = '9';
std::cout << c << '\n';
i = c;
std::cout << i << '\n';
This is due to the behavior of the << operator on the char type and the character stream cout. Note, the << is known as formatted output means it does some implicit formatting.
We can say that the value of a variable is not the same as its representation in certain contexts. For example:
int main() {
bool t = true;
std::cout << t << std::endl; // Prints 1, not "true"
}
Think of it this way, why would we need char if it would still behave like a number when printed, why not to use int or unsigned? In essence, we have different types so to have different behaviors which can be deduced from these types.
So, the underlying numeric value of a char is probably not what we looking for, when we print one.
Check this for example:
int main() {
unsigned char c = -1;
int i = c;
std::cout << i << std::endl; // Prints 255
}
If I recall correctly, you're somewhat close in the Primer to the topic of built-in types conversions, it will bring in clarity when you'll get to know these rules better. Anyway, I'm sure, you will benefit greatly from looking into this article. Especially the "Printing chars as integers via type casting" part.

Varying case-insensitive string comparisons performance

So, my phd project relies on a piece of software I've been building for nearly 3 years. It runs, its stable (It doesn't crash or throw exceptions) and I'm playing with the release version of it. And I've come to realise that there is a huge performance hit, because I'm relying too much on boost::iequals.
I know, there's a lot of on SO about this, this is not a question on how to do it, but rather why is this happening.
Consider the following:
#include <string.h>
#include <string>
#include <boost/algorithm/string.hpp>
void posix_str ( )
{
std::string s1 = "Alexander";
std::string s2 = "Pericles";
std::cout << "POSIX strcasecmp: " << strcasecmp( s1.c_str(), s2.c_str() ) << std::endl;
}
void boost_str ( )
{
std::string s1 = "Alexander";
std::string s2 = "Pericles";
std::cout << "boost::iequals: " << boost::iequals( s1, s2 ) << std::endl;
}
int main ( )
{
posix_str();
boost_str();
return 0;
}
I put this through valgrind and cachegrind, and to my suprise, boost is 4 times slower than the native posix or the std (which appears to be using the same posix) methods. Four times, now that is a lot, even considering that C++ offers a nice safety net. Why is that? I would really like other people to run this, and explain to me, what makes such a performance hit. Is it all the allocations (seems to be from the caller map).
I'm not dissing on boost, I love it and use it everywhere and anywhere.
EDIT: This graph shows what I mean
Boost::iequals is locale-aware. As you can see from its definition here it takes an optional third parameter that is defaulted to a default-constructed std::locale, that represents the current global C++ locale, as set by std::locale::global.
This more or less means that the compiler has no way to know in advance which locale is going to be used, and that means that there will be an indirect call to a certain function to convert each character to lower-case in the current locale.
On the other hand, the documentation for strcasecmp states that:
In the POSIX locale, strcasecmp() and strncasecmp() shall behave as if the strings had been converted to lowercase and then a byte comparison performed. The results are unspecified in other locales.
That means that the locale is fixed, hence you can expect it to be heavily optimized.

Macro metaprogramming horror

I am trying to do something like:
custommacro x;
which would expand into:
declareSomething; int x; declareOtherthing;
Is this even possible?
I already tricked it once with operator= to behave like that, but it can't be done with declarations.
You can elide the parentheses as long as you are willing to accept two additions:
the whole code needs to be wrapped in a block macro
there needs to be something following the echo directive
e.g. thusly:
#define LPAREN (
#define echo ECHO_MACRO LPAREN
#define done )
#define ECHO_MACRO(X) std::cout << (X) << "\n"
#define DSL(X) X
...
DSL(
echo "Look ma, no brains!" done;
)
...
Reasons for this:
There is no way to make a function-like macro expand without parentheses. This is just a basic requirement of the macro language; if you want something else investigate a different macro processor
Therefore, we need to insert the parentheses; in turn we need to have something after the directive, like a done macro, that will expand to a form containinf the necessary close paren
Unfortunately, because the echo ... done form didn't look like a macro invocation to the preprocessor, it wasn't marked for expansion when the preprocessor entered it, and whether we put parens in or not is irrelevant. Just using echo ... done will therefore dump an ECHO_MACRO call in the text
Text is re-scanned, marked for expansion, and expanded again when it is the argument to a function-like macro, so wrapping the entire block with a block macro (here it's DSL) will cause the call to ECHO_MACRO to be expanded on this rescan pass (DSL doesn't do anything with the result: it exists just to force the rescan)
We need to hide the ( in the expansion of echo behind the simple macro LPAREN, because otherwise the unmatched parenthesis in the macro body will confuse the preprocessor
If you wanted to create an entire domain-specific language for such commands, you could also reduce the number of done commands by making the core commands even more unwieldy:
#define LPAREN (
#define begin NO_OP LPAREN 0
#define done );
#define echo ); ECHO_MACRO LPAREN
#define write ); WRITE_MACRO LPAREN
#define add ); ADD_MACRO LPAREN
#define sub ); SUB_MACRO LPAREN
#define NO_OP(X)
#define ECHO_MACRO(X) std::cout << (X) << "\n"
#define WRITE_MACRO(X) std::cout << (X)
#define ADD_MACRO(D, L, R) (D) = (L) + (R)
#define SUB_MACRO(D, L, R) (D) = (L) - (R)
#define DSL(X) DSL_2 X
#define DSL_2(X) X
int main(void) {
int a, b;
DSL((
begin
add a, 42, 47
sub b, 64, 50
write "a is: "
echo a
write "b is: "
echo b
done
))
return 0;
}
In this form, each command is pre-designed to close the preceding command, so that only the last one needs a done; you need a begin line so that there's an open command for the first real operation to close, otherwise the parens will mismatch.
Messing about like this is much easier in C than in C++, as C's preprocessor is more powerful (it supports __VA_ARGS__ which are pretty much essential for complicated macro metaprogramming).
Oh yeah, and one other thing -
...please never do this in real code.
I understand what you're trying to do and it simply can't be done. A macro is only text replacement, it has no knowledge of what comes after it, so trying to do custommacro x will expand to whatever custommacro is, a space, and then x, which just won't work semantically.
Also, about your echo hack: this is actually very simple with the use of operators in C++:
#include <iostream>
#define echo std::cout <<
int main()
{
echo "Hello World!";
}
But you really shouldn't be writing code like this (that is, using macros and a psuedo-echo hack). You should write code that conforms to the syntax of the language and the semantics of what you're trying to do. If you want to write to standard output use std::cout. Moreover, if you want to use echo, make a function called echo that invokes std::cout internally, but don't hack the features of the language to create your own.
You could use for-loop and GnuC statement expression extension.
#define MY_MACRO\
FOR_MACRO(_uniq##__COUNTER__##name,{/*declareSomething*/ },{ /* declareOtherthing */ }) int
#define FOR_MACRO(NAME,FST_BLOCK,SND_BLOCK)\
for(int NAME = ({FST_BLOCK ;0;}); NAME<1 ; NAME++,(SND_BLOCK))
It's "practically hygienic", though this means that whatever you do inside those code blocks wont escape the for-loop scope.

What is the semicolon in C++?

Roughly speaking in C++ there are:
operators (+, -, *, [], new, ...)
identifiers (names of classes, variables, functions,...)
const literals (10, 2.5, "100", ...)
some keywords (int, class, typename, mutable, ...)
brackets ({, }, <, >)
preprocessor (#, ## ...).
But what is the semicolon?
The semicolon is a punctuator, see 2.13 §1
The lexical representation of C++ programs includes a number of preprocessing tokens which are used in
the syntax of the preprocessor or are converted into tokens for operators and punctuators
It is part of the syntax and therein element of several statements. In EBNF:
<do-statement>
::= 'do' <statement> 'while' '(' <expression> ')' ';'
<goto-statement>
::= 'goto' <label> ';'
<for-statement>
::= 'for' '(' <for-initialization> ';' <for-control> ';' <for-iteration> ')' <statement>
<expression-statement>
::= <expression> ';'
<return-statement>
::= 'return' <expression> ';'
This list is not complete. Please see my comment.
The semicolon is a terminal, a token that terminates something. What exactly it terminates depends on the context.
Semicolon denotes sequential composition. It is also used to delineate declarations.
Semicolon is a statement terminator.
The semicolon isn't given a specific name in the C++ standard. It's simply a character that's used in certain grammar productions (and it just happens to be at the end of them quite often, so it 'terminates' those grammatical constructs). For example, a semicolon character is at the end of the following parts of the C++ grammar (not necessarily a complete list):
an expression-statement
a do/while iteration-statement
the various jump-statements
the simple-declaration
Note that in an expression-statement, the expression is optional. That's why a 'run' of semicolons, ;;;;, is valid in many (but not all) places where a single one is.
';'s are often used to delimit one bit of C++ source code, indicating it's intentionally separate from the following code. To see how it's useful, let's imagine we didn't use it:
For example:
#include <iostream>
int f() { std::cout << "f()\n"; }
int g() { std::cout << "g()\n"; }
int main(int argc)
{
std::cout << "message"
"\0\1\0\1\1"[argc] ? f() : g(); // final ';' needed to make this compile
// but imagine it's not there in this new
// semicolon-less C++ variant....
}
This (horrible) bit of code, called with no arguments such that argc is 1, prints:
ef()\n
Why not "messagef()\n"? That's what might be expected given first std::cout << "message", then "\0\1\0\1\1"[1] being '\1' - true in a boolean sense - suggests a call to f() printing f()\n?
Because... (drumroll please)... in C++ adjacent string literals are concatenated, so the program's parsed like this:
std::cout << "message\0\1\0\1\1"[argc] ? f() : g();
What this does is:
find the [argc/1] (second) character in "message\0\1\0\1\1", which is the first 'e'
send that 'e' to std::cout (printing it)
the ternary operator '?' triggers casting of std::cout to bool which produces true (because the printing presumably worked), so f() is called...!
Given this string literal concatenation is incredibly useful for specifying long strings
(and even shorter multi-line strings in a readable format), we certainly wouldn't want to assume that such strings shouldn't be concatenated. Consequently, if the semicolon's gone then the compiler must assume the concatenation is intended, even though visually the layout of the code above implies otherwise.
That's a convoluted example of how C++ code with and with-out ';'s changes meaning. I'm sure if I or other readers think on it for a few minutes we could come up with other - and simpler - examples.
Anyway, the ';' is necessary to inform the compiler that statement termination/separation is intended.
The semicolon lets the compiler know that it's reached the end of a command AFAIK.
The semicolon (;) is a command in C++. It tells the compiler that you're at the end of a command.
If I recall correctly, Kernighan and Ritchie called it punctuation.
Technically, it's just a token (or terminal, in compiler-speak), which
can occur in specific places in the grammar, with a specific semantics
in the language. The distinction between operators and other punctuation
is somewhat artificial, but useful in the context of C or C++, since
some tokens (,, = and :) can be either operators or punctuation,
depending on context, e.g.:
f( a, b ); // comma is punctuation
f( (a, b) ); // comma is operator
a = b; // = is assignment operator
int a = b; // = is punctuation
x = c ? a : b; // colon is operator
label: // colon is punctuation
In the case of the first two, the distinction is important, since a user
defined overload will only affect the operator, not punctuation.
It represents the end of a C++ statement.
For example,
int i=0;
i++;
In the above code there are two statements. The first is for declaring the variable and the second one is for incrementing the value of variable by one.

C/C++: Optimization of pointers to string constants

Have a look at this code:
#include <iostream>
using namespace std;
int main()
{
const char* str0 = "Watchmen";
const char* str1 = "Watchmen";
char* str2 = "Watchmen";
char* str3 = "Watchmen";
cerr << static_cast<void*>( const_cast<char*>( str0 ) ) << endl;
cerr << static_cast<void*>( const_cast<char*>( str1 ) ) << endl;
cerr << static_cast<void*>( str2 ) << endl;
cerr << static_cast<void*>( str3 ) << endl;
return 0;
}
Which produces an output like this:
0x443000
0x443000
0x443000
0x443000
This was on the g++ compiler running under Cygwin. The pointers all point to the same location even with no optimization turned on (-O0).
Does the compiler always optimize so much that it searches all the string constants to see if they are equal? Can this behaviour be relied on?
It can't be relied on, it is an optimization which is not a part of any standard.
I'd changed corresponding lines of your code to:
const char* str0 = "Watchmen";
const char* str1 = "atchmen";
char* str2 = "tchmen";
char* str3 = "chmen";
The output for the -O0 optimization level is:
0x8048830
0x8048839
0x8048841
0x8048848
But for the -O1 it's:
0x80487c0
0x80487c1
0x80487c2
0x80487c3
As you can see GCC (v4.1.2) reused first string in all subsequent substrings. It's compiler choice how to arrange string constants in memory.
It's an extremely easy optimization, probably so much so that most compiler writers don't even consider it much of an optimization at all. Setting the optimization flag to the lowest level doesn't mean "Be completely naive," after all.
Compilers will vary in how aggressive they are at merging duplicate string literals. They might limit themselves to a single subroutine — put those four declarations in different functions instead of a single function, and you might see different results. Others might do an entire compilation unit. Others might rely on the linker to do further merging among multiple compilation units.
You can't rely on this behavior, unless your particular compiler's documentation says you can. The language itself makes no demands in this regard. I'd be wary about relying on it in my own code, even if portability weren't a concern, because behavior is liable to change even between different versions of a single vendor's compiler.
You surely should not rely on that behavior, but most compilers will do this. Any literal value ("Hello", 42, etc.) will be stored once, and any pointers to it will naturally resolve to that single reference.
If you find that you need to rely on that, then be safe and recode as follows:
char *watchmen = "Watchmen";
char *foo = watchmen;
char *bar = watchmen;
You shouldn't count on that of course. An optimizer might do something tricky on you, and it should be allowed to do so.
It is however very common. I remember back in '87 a classmate was using the DEC C compiler and had this weird bug where all his literal 3's got turned into 11's (numbers may have changed to protect the innocent). He even did a printf ("%d\n", 3) and it printed 11.
He called me over because it was so weird (why does that make people think of me?), and after about 30 minutes of head scratching we found the cause. It was a line roughly like this:
if (3 = x) break;
Note the single "=" character. Yes, that was a typo. The compiler had a wee bug and allowed this. The effect was to turn all his literal 3's in the entire program into whatever happened to be in x at the time.
Anyway, its clear the C compiler was putting all literal 3's in the same place. If a C compiler back in the 80's was capable of doing this, it can't be too tough to do. I'd expect it to be very common.
I would not rely on the behavior, because I am doubtful the C or C++ standards would make explicit this behavior, but it makes sense that the compiler does it. It also makes sense that it exhibits this behavior even in the absence of any optimization specified to the compiler; there is no trade-off in it.
All string literals in C or C++ (e.g. "string literal") are read-only, and thus constant. When you say:
char *s = "literal";
You are in a sense downcasting the string to a non-const type. Nevertheless, you can't do away with the read-only attribute of the string: if you try to manipulate it, you'll be caught at run-time rather than at compile-time. (Which is actually a good reason to use const char * when assigning string literals to a variable of yours.)
No, it can't be relied on, but storing read-only string constants in a pool is a pretty easy and effective optimization. It's just a matter of storing an alphabetical list of strings, and then outputting them into the object file at the end. Think of how many "\n" or "" constants are in an average code base.
If a compiler wanted to get extra fancy, it could re-use suffixes too: "\n" can be represented by pointing to the last character of "Hello\n". But that likely comes with very little benifit for a significant increase in complexity.
Anyway, I don't believe the standard says anything about where anything is stored really. This is going to be a very implementation-specific thing. If you put two of those declarations in a separate .cpp file, then things will likely change too (unless your compiler does significant linking work.)